Fast Fourier transform Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA, Spring 2008 [L.11] Tuesday, February 12, 2008 1
Sources for today’s material CS 267 (Yelick & Demmel, UCB) Applied Numerical Linear Algebra , by Demmel Xiaoye (Sherry) Li (LBNL) Van Loan (Cornell) “The ubiquitous Kronecker product” Computational frameworks for the FFT Heath (UIUC) 2
Review: Sparse direct solvers 3
Anatomy of a sparse direct solver P r · A · P T c = LU Order equations & variables to minimize fill in L, U [Combinatorial] Symbolic factorization [Allocate memory for factorization] Numerical factorization [Dominates run-time] Triangular solves [Small fraction, unless many RHS] 4
Sparse LU software See also: http://www.netlib.org/utk/people/JackDongarra/la-sw.html LU SuperLU [Xiaoye “Sherry” Li @ LBNL] MUMPS [Amestoy, et al ., INPT-ENSEEIHT-IRIT, France] UMFPACK [Davis @ U. Florida] Cholesky PSPASES / WSMP [Gupta @ IBM] TAUCS [Toledo @ Tel Aviv U.] Oblio [Dobrian, formerly @ Columbia / ODU] Also: MUMPS, UMFPACK 5
Source: Gould, Yu, Scott (2005); http://www.numerical.rl.ac.uk/reports/reports.shtml 6
SuperLU MUMPS (LU) Source: Amestoy, Duff, L’Excellent, & Li (2001). http://mumps.enseeiht.fr/doc.html 7
Source: Amestoy, Duff, L’Excellent, & Li (2001). http://mumps.enseeiht.fr/doc.html 8
Source: Amestoy, Duff, L’Excellent, & Li (2001). http://mumps.enseeiht.fr/doc.html 9
Algorithms for 2-D (3-D) Poisson, N=n 2 (=n 3 ) Algorithm Serial PRAM Memory # procs Dense LU N 3 N N 2 N 2 Band LU N 2 (N 7/3 ) N N 3/2 (N 5/3 ) N (N 4/3 ) Jacobi N 2 (N 5/3 ) N (N 2/3 ) N N Explicit inverse N 2 log N N 2 N 2 Sparse LU N 3/2 (N 2 ) N 1/2 N log N (N 4/3 ) N Conj. grad. N 3/2 (N 4/3 ) N 1/2(1/3) log N N N RB SOR N 3/2 (N 4/3 ) N 1/2 (N 1/3 ) N N FFT N log N log N N N Multigrid N log 2 N N N N log N N Lower bound PRAM = idealized parallel model with zero communication cost. Source: Demmel (1997) 10
Review: Poisson’s equation 11
Discretizing 1-D Poisson − d 2 u ( x ) = f ( x ) , 0 < x < 1 , u (0) = u (1) = 0 dx 2 1 h = n + 1 i = 0 n + 1 Discretize: u i = u ( i · h ) Approximate: − d 2 u ( x ) | x = x i ≈ 2 u i − u i − 1 − u i +1 dx 2 h 2 “Stencil”: -1 -1 2 12
Express stencil in matrix notation ≈ − u i − 1 + 2 u i − u i +1 f i � f ( ih ) Approximation: = h 2 u 1 f 1 2 − 1 u 2 f 2 − 1 2 − 1 u 3 f 3 h 2 − 1 2 − 1 = · . . . . . . · · · − 1 2 u n f n ⇓ h 2 f = T n · u 13
Discretizing 2-D Poisson Graph and stencil 2 4 5 6 8 4 − 1 − 1 − 1 4 − 1 − 1 1 4 7 − 1 4 − 1 − 1 4 − 1 − 1 2 5 8 − 1 − 1 4 − 1 − 1 − 1 − 1 4 − 1 3 6 9 − 1 4 − 1 − 1 − 1 4 − 1 − 1 − 1 4 14
Serial time complexity for RB SOR, CG, and sparse LU In 2-D: O ( n 3 ) O ( n 6 ) Na¨ ıve (dense) : O ( n 2 ) Optimal : 15
Exploiting structure to obtain fast algorithms for 2-D Poisson Dense LU : Assume no structure ⇒ O( n 6 ) Sparse LU : Sparsity ⇒ O( n 3 ), need extra memory CG : Symmetric positive definite ⇒ O( n 3 ), a little extra memory RB SOR : Fixed sparsity pattern ⇒ O( n 3 ), no extra memory Today — FFT : Exploit numerical structure ⇒ O( n 2 log n ) 16
Numerical properties of the Poisson stencil 17
Express stencil in matrix notation ≈ − u i − 1 + 2 u i − u i +1 Approximation: f i � f ( ih ) = h 2 u 1 f 1 2 − 1 u 2 f 2 − 1 2 − 1 u 3 f 3 h 2 − 1 2 − 1 = · . . . . . . · · · − 1 2 u n f n ⇓ h 2 f = T n · u 18
1-D Poisson: Eigendecomposition of T n Lemma : Eigenvalues and eigenvectors of T n are Z Λ Z T = T n T n z j λ j z j ⇒ = = ZZ T I = � � π j = 2 1 − cos λ j n + 1 � 2 n + 1 sin π jk z j ( k ) = n + 1 Exercise : Verify . Hint: sin a + sin b = 2 sin a + b cos a − b 2 2 19
Eigendecomposition for discretized 2-D Poisson stencil matrix? Graph and stencil 2 4 5 6 8 4 − 1 − 1 − 1 4 − 1 − 1 1 4 7 − 1 4 − 1 − 1 4 − 1 − 1 2 5 8 − 1 − 1 4 − 1 − 1 T n × n = − 1 − 1 4 − 1 3 6 9 − 1 4 − 1 − 1 − 1 4 − 1 − 1 − 1 4 20
2-D Poisson − ∂ 2 u ( x, y ) − ∂ 2 u ( x, y ) f ( x, y ) = ∂ x 2 ∂ y 2 ⇓ h 2 f ij = 4 u ij − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 21
2-D Poisson − ∂ 2 u ( x, y ) − ∂ 2 u ( x, y ) f ( x, y ) = ∂ x 2 ∂ y 2 ⇓ h 2 f ij = 4 u ij − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 ⇓ = ( f ij ) F = ( u ij ) U h 2 F = T n · U + U · T n 22
2-D Poisson: Eigendecomposition h 2 F = T n · U + U · T n ⇓ z i z T X ≡ j � z i z T � � z i z T � T n · X + X · T n = + T n · · T n j j λ i z i z T z i z T � � = j + λ j j = ( λ i + λ j ) · X ⇓ z i z T = “eigenvector” ∴ j λ i + λ j = eigenvalue 23
Relating T n and T nxn : “vec(X)” and Kronecker products Let vec(X) = stacks columns of matrix X into a single vector � � x 1 · · · x n X ≡ x 1 . . vec( X ) ≡ . x n Kronecker product a 1 , 1 B a 1 ,n B · · · A ⊗ B ≡ . . . . . . a m, 1 B a m,n B · · · 24
Facts about “vec( X )” and “ ⊗ ” vec( A + B ) = vec( A ) + vec( B ) vec( A · X ) = ( I ⊗ A ) · vec( X ) vec( X · B ) = ( B ⊗ I ) · vec( X ) A ⊗ ( B ⊗ C ) = ( A ⊗ B ) ⊗ C ( A · C ) ⊗ ( B · D ) = ( A ⊗ B ) · ( C ⊗ D ) A T ⊗ B T ( A ⊗ B ) T = A − 1 ⊗ B − 1 ( A ⊗ B ) − 1 = Note : In MATLAB, kron(A,B) computes A ⊗ B 25
Relating T n and T nxn h 2 F = T n · U + U · T n ⇓ h 2 vec( F ) = (( I ⊗ T n ) + ( T n ⊗ I )) · vec( U ) ⇓ ( I ⊗ T n ) + ( T n ⊗ I ) T n × n ≡ ( Z ⊗ Z ) · ( I ⊗ Λ + Λ ⊗ I ) · ( Z ⊗ Z ) T = 26
Extending to higher dimensions Z Λ Z T = T n ⇓ = ( I ⊗ T n ) + ( T n ⊗ I ) T n × n = ( Z ⊗ Z ) · ( I ⊗ Λ + Λ ⊗ I ) ⇓ = ( I ⊗ I ⊗ T n ) + ( I ⊗ T n ⊗ I ) + ( T n ⊗ I ⊗ I ) T n × n × n ( Z ⊗ Z ⊗ Z ) · ( I ⊗ I ⊗ Λ + ... ) · ( Z ⊗ Z ⊗ Z ) T = 27
Using the eigendecomposition to solve 1-D Poisson T − 1 · h 2 f = u n ( Z Λ Z T ) − 1 h 2 f = h 2 Z Λ − 1 Z T f = Need a fast Z T *f , Z*x 28
Using the eigendecomposition to solve 2-D Poisson h 2 · ( Z ⊗ Z )( I ⊗ Λ + Λ ⊗ I ) − 1 ( Z ⊗ Z ) T · vec( F ) vec( U ) = ⇓ F ′ = Z T FZ 1 . h 2 f ′ jk u ′ 2 . jk = ∀ j, k λ j + λ k U = ZU ′ Z T 3 . Need a fast Z T *f , Z*x 29
Administrivia 30
Two joint classes with CS 8803 SC Tues 2/19 : Floating-point issues in parallel computing by me Tues 2/26 : GPGPUs by Prof. Hyesoon Kim Both classes meet in Klaus 1116E 31
Administrative stuff Front-end login node: ccil.cc.gatech.edu (CoC Unix account) We “own” warp43—warp56 Some docs ( MPI ): http://www-static.cc.gatech.edu/projects/ihpcl/mpi.html Sign-up for mailing list: https://mailman.cc.gatech.edu/mailman/listinfo/ihpc-lab 32
Homework 1: Parallel conjugate gradients Implement a parallel solver for Ax = b (serial C version provided) Evaluate on three matrices: 27-pt stencil, and two application matrices “Simplified:” No preconditioning Bonus : Reorder, precondition Performance models to understand scalability of your implementation Make measurements Build predictive models Collaboration encouraged: Compare programming models or platforms 33
Fast Fourier transform 34
Discrete Fourier transform (DFT) Recall “Z” � n + 1 sin π ( j + 1)( k + 1) 2 Note: = z jk j, k 0-based n + 1 Multiplying by “Z” is almost the (m-point) DFT: Φ ( m ) x y = Φ ( m ) ω jk � � ≡ 0 ≤ j,k<m = cos 2 π m − i · sin 2 π √ − 2 π i = e i = − 1 m ω m 35
DFT and its inverse ω jk � � Φ ≡ 0 ≤ j,k<m Φ · x = y ⇓ m − 1 � ω jk x k = y j k =0 ⇓ 1 Φ − 1 m Φ ∗ = m − 1 1 � ω − jk y j = x k m j =0 36
DFT as polynomial evaluation Assume m = power of 2 m − 1 m − 1 ω j � k � � ω jk x k = � = y j x k · k =0 k =0 X ( ω j ) ≡ m − 1 � x k · w k X ( w ) ≡ k =0 x 0 + x 2 w 2 + x 4 w 4 + · · · ⇐ Even terms = x 1 + x 3 w 2 + x 5 w 4 + · · · � � ⇐ Odd terms + w · X even( w 2 ) + w · X odd( w 2 ) = 37
Recommend
More recommend