Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.1 - PowerPoint PPT Presentation

Discrete Fourier Transform Convolution Fast Fourier Transform Parallel FFT Parallel Numerical Algorithms Chapter 6 – Matrix Models Section 6.1 – Fast Fourier Transform Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 34

Discrete Fourier Transform Convolution Fast Fourier Transform Parallel FFT Outline Discrete Fourier Transform 1 Roots of Unity DFT Inverse DFT Convolution 2 Problem Fast Fourier Transform 3 Computing DFT FFT Algorithm Parallel FFT 4 Binary Exchange Parallel FFT Transpose Parallel FFT Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 34

Discrete Fourier Transform Roots of Unity Convolution DFT Fast Fourier Transform Inverse DFT Parallel FFT Roots of Unity For given integer n , we use notation ω n = cos(2 π/n ) − i sin(2 π/n ) = e − 2 πi/n for primitive n th root of unity, where i = √− 1 n th roots of unity, sometimes called twiddle factors in this context, are then given by ω k n or by ω − k n , k = 0 , . . . , n − 1 For convenience, we will assume that n is power of two, and all logarithms used will be base two We will also index sequences (components of vectors) starting from 0 rather than 1 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 34

Discrete Fourier Transform Roots of Unity Convolution DFT Fast Fourier Transform Inverse DFT Parallel FFT Discrete Fourier Transform Discrete Fourier Transform , or DFT , of sequence x = [ x 0 , . . . , x n − 1 ] T is sequence y = [ y 0 , . . . , y n − 1 ] T given by n − 1 � x k ω mk y m = n , m = 0 , 1 , . . . , n − 1 k =0 or y = F n x where entries of DFT matrix F n are given by { F n } mk = ω mk n Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 4 / 34

Discrete Fourier Transform Roots of Unity Convolution DFT Fast Fourier Transform Inverse DFT Parallel FFT Inverse DFT It is easily seen that F − 1 = (1 /n ) F H n n So inverse DFT is given by n − 1 x k = 1 � y m ω − mk k = 0 , 1 , . . . , n − 1 n n m =0 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 5 / 34

Discrete Fourier Transform Roots of Unity Convolution DFT Fast Fourier Transform Inverse DFT Parallel FFT Example     1 1 1 1 1 1 1 1 ω 1 ω 2 ω 3 1 1 − i − 1 i     F 4 =  =     ω 2 ω 4 ω 6 1 1 − 1 1 − 1    ω 3 ω 6 ω 9 1 1 i − 1 − i     1 1 1 1 1 1 1 1 ω − 1 ω − 2 ω − 3 1 1 i − 1 − i     4 F − 1 =  =     ω − 2 ω − 4 ω − 6 4 1 1 − 1 1 − 1    ω − 3 ω − 6 ω − 9 1 1 − i − 1 i Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 6 / 34

Discrete Fourier Transform Convolution Problem Fast Fourier Transform Parallel FFT Convolution Convolution takes input a and b and computes c k � ∀ k ∈ [0 , n − 1] c k = a j b k − j j =0 If a and b are coefficients of degree n/ 2 − 1 polynomials n/ 2 − 1 n/ 2 − 1 � � a k x k , b k x k p a ( x ) = p b ( x ) = k =0 k =0 the convolution computes the coefficients c of the product n − 1 � c k x k p c ( x ) = p a ( x ) p b ( x ) = k =0 naive evaluation costs O ( n 2 ) operations Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 7 / 34

Discrete Fourier Transform Convolution Problem Fast Fourier Transform Parallel FFT Convolution and Toeplitz Matrices Convolution can be interpreted as matrix-vector multiplication with a triangular Toeplitz matrix   b 0 b 1 b 2 b 3 0 b 0 b 1 b 2   [ c 0 c 1 c 2 c 3 ] = [ a 1 a 2 a 3 a 4 ]   0 0 b 0 b 1   0 0 0 b 0 Toeplitz and Hankel matrices (in the latter, each antidiagonal is defined by a single element) provide a general matrix representation for convolutional operators Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 8 / 34

Discrete Fourier Transform Convolution Problem Fast Fourier Transform Parallel FFT Convolution via Interpolation by DFT The DFT, F n a evaluates polynomial p a at each ω j The values of p c at each ω j are then easily obtained p c ( ω j ) = p a ( ω j ) p b ( ω j ) The inverse DFT, F − 1 n p c ( x ) interpolates the values of the polynomial p c at each ω j producing its coefficients c The overall procedure is described by c = F − 1 n [( F n a ) ⊙ ( F n b )] where ⊙ is an elementwise product ( a and b are padded with trailing zeros) Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 9 / 34

Discrete Fourier Transform Convolution Problem Fast Fourier Transform Parallel FFT Convolution via DFT Lets write out the full expression c k = 1 � � �� ω − ks ω sj ω st n a j n b t n n s j t Rearrange the order of the summations to see what happens to every product of a and b c k = 1 � � � ω ( j + t − k ) s a j b t n n s j t For any u = j + t − k � = 0 , we observe � n ) s = 0 s ( ω u When j + t − k = 0 the products ω ( s + t − j ) k = 1 , so there are n n nonzero terms a j b k − j in the summation Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 10 / 34

Discrete Fourier Transform Computing DFT Convolution Fast Fourier Transform FFT Algorithm Parallel FFT Computing DFT To illustrate, consider computing DFT for n = 4 , 3 � x k ω mk y m = n , m = 0 , . . . , 3 k =0 Writing out equations in full, x 0 ω 0 n + x 1 ω 0 n + x 2 ω 0 n + x 3 ω 0 y 0 = n x 0 ω 0 n + x 1 ω 1 n + x 2 ω 2 n + x 3 ω 3 y 1 = n x 0 ω 0 n + x 1 ω 2 n + x 2 ω 4 n + x 3 ω 6 y 2 = n x 0 ω 0 n + x 1 ω 3 n + x 2 ω 6 n + x 3 ω 9 y 3 = n Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 11 / 34

Discrete Fourier Transform Computing DFT Convolution Fast Fourier Transform FFT Algorithm Parallel FFT Computing DFT Noting that ω 0 n = ω 4 ω 2 n = ω 6 ω 9 n = ω 1 n = 1 , n = − 1 , n and regrouping, we obtain ( x 0 + ω 0 n x 2 ) + ω 0 n ( x 1 + ω 0 y 0 = n x 3 ) ( x 0 − ω 0 n x 2 ) + ω 1 n ( x 1 − ω 0 y 1 = n x 3 ) ( x 0 + ω 0 n x 2 ) + ω 2 n ( x 1 + ω 0 y 2 = n x 3 ) ( x 0 − ω 0 n x 2 ) + ω 3 n ( x 1 − ω 0 y 3 = n x 3 ) DFT can now be computed with only 8 additions and 6 multiplications, instead of expected (4 − 1) ∗ 4 = 12 additions and 4 2 = 16 multiplications Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 12 / 34

Discrete Fourier Transform Computing DFT Convolution Fast Fourier Transform FFT Algorithm Parallel FFT Computing DFT Actually, even fewer multiplications are required for this small case, since ω 0 n = 1 , but we have tried to illustrate how algorithm works in general Main point is that computing DFT of original 4 -point sequence has been reduced to computing DFT of its two 2 -point even and odd subsequences This property holds in general: DFT of n -point sequence can be computed by breaking it into two DFTs of half length, provided n is even Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 13 / 34

Discrete Fourier Transform Computing DFT Convolution Fast Fourier Transform FFT Algorithm Parallel FFT Computing DFT General pattern becomes clearer when viewed in terms of first few Fourier matrices   1 1 1 1 � 1 � 1 1 − i − 1 i   F 1 = 1 , F 2 = , F 4 =   1 − 1 1 − 1 1 − 1   1 i − 1 − i Let P 4 be permutation matrix   1 0 0 0 0 0 1 0   P 4 =   0 1 0 0   0 0 0 1 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 14 / 34

Discrete Fourier Transform Computing DFT Convolution Fast Fourier Transform FFT Algorithm Parallel FFT Computing DFT Let D 2 be diagonal matrix � 1 � 0 D 2 = diag(1 , ω 4 ) = 0 − i Then we have   1 1 1 1 � F 2 � 1 − 1 − i i D 2 F 2   F 4 P 4 =  =   1 1 − 1 − 1 − D 2 F 2 F 2  1 − 1 i − i Thus, F 4 can be rearranged so that each block is diagonally scaled version of F 2 Such hierarchical splitting can be carried out at each level, provided number of points is even Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 15 / 34

Discrete Fourier Transform Computing DFT Convolution Fast Fourier Transform FFT Algorithm Parallel FFT Computing DFT In general, P n is permutation that groups even-numbered columns of F n before odd-numbered columns, and � � 1 , ω n , . . . , ω ( n/ 2) − 1 D n/ 2 = diag n To apply F n to sequence of length n , we need merely apply F n/ 2 to its even and odd subsequences and scale results, where necessary, by ± D n/ 2 Resulting recursive divide-and-conquer algorithm for computing DFT is called Fast Fourier Transform , or FFT FFT is particular way of computing DFT efficiently Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 16 / 34

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.1 - PowerPoint PPT Presentation

Discrete Fourier Transform Convolution Fast Fourier Transform Parallel FFT Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.1 Fast Fourier Transform Michael T. Heath and Edgar Solomonik Department of Computer Science

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

ECC minicourse Daniel J. Bernstein University of Illinois at Chicago Tanja Lange Technische

Rock On with Bank On Joanne Nguyen | Irene Fong A Special Thanks to Our Sponsors Who Are

supplementary slides 2 Revenue split NET INTEREST INCOME * = 60% NON-INTEREST REVENUE = 40%

Natural Language Processing with Deep Learning CS224N/Ling284 Matthew Lamm Lecture

Spooky Claims 2018 Stephen C. Reid, III October 18, 2018 The webinar will begin shortly. In

Peoples United Financial, Inc. (Exact name of registrant as specified in its charter) Delaware

Peoples United Financial, Inc. (Exact name of registrant as specified in its charter) Delaware

A Coupling Library for Partitioned Multi-Physics Simulations 1. What is preCICE? 2. How to get

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.1 - PowerPoint PPT Presentation

Discrete Fourier Transform Convolution Fast Fourier Transform Parallel FFT Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.1 Fast Fourier Transform Michael T. Heath and Edgar Solomonik Department of Computer Science

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts &amp; Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

ECC minicourse Daniel J. Bernstein University of Illinois at Chicago Tanja Lange Technische

Rock On with Bank On Joanne Nguyen | Irene Fong A Special Thanks to Our Sponsors Who Are

supplementary slides 2 Revenue split NET INTEREST INCOME * = 60% NON-INTEREST REVENUE = 40%

Natural Language Processing with Deep Learning CS224N/Ling284 Matthew Lamm Lecture

Spooky Claims 2018 Stephen C. Reid, III October 18, 2018 The webinar will begin shortly. In

Peoples United Financial, Inc. (Exact name of registrant as specified in its charter) Delaware

Peoples United Financial, Inc. (Exact name of registrant as specified in its charter) Delaware

A Coupling Library for Partitioned Multi-Physics Simulations 1. What is preCICE? 2. How to get

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions