Parallel Fast Fourier Transforms Gavin J. Pringle Joahcim Hein

Introduction � The Fourier Transform � What, who, why? � Mathematics and and its inherent properties � Discrete Fourier Transform � Fast Fourier Transform, or FFT � Parallel FFTs � FFT libraries � Fastest Fourier Transform in the West � Configuration, installation, compilation and runtime tuning � Execution times and other users experiences

Fourier Transforms � Jean Baptiste Joseph Fourier (1768-1830) first employed what we now call Fourier transforms whilst working on the theory of heat � �� Mathematical tool which alters the problem to one which is more easily solved � Linear transform which converts temporal or spatial information and converts into information which lies in the frequency domain � And visa versa � Frequency domain also known as Fourier space, Reciprocal space, or G-space

Pictures of Joseph Fourier

Who would use Fourier Transforms? � Physics � Cosmology (P 3 M N -body solvers) � Fluid mechanics � Quantum physics � Signal and image processing � Antenna studies � Optics � Numerical analysis � Linear systems analysis � Boundary value problems � Large integer multiplication (Prime finding) � Statistics � Random process modelling � Probability theory

Fourier Transforms in a nutshell � All periodic signals may be represented by an infinite sum of sines and cosines of different �� The cosines and sines are associated with the symmetrical and asymmetric information, respectively � �� each chunk may be considered periodic. � Fourier transforms encode this information via � � � � � e i cos i sin

The Top Hat function � The top hat function, along with the individual 1 st , 2 nd and 3 rd Fourier components and their sum.

The Top Hat function and its discrete Fourier components animation

The Fourier transform of a continuous Top Hat function

Mathematics of the Fourier Transform � The Fourier transform of a complex function f ( x ) is given as � � � i � � 2 xs F ( s ) f ( x ) e dx � � � The inverse Fourier transform is given as � � i � � 2 xs f ( x ) F ( s ) e ds � � � The Fourier pair is defined as �� f ( x ) F ( s )

Properties 1: Scaling � Time scaling � � 1 s � � � f ( at ) F a � a � � Frequency scaling � � 1 t � � � f F ( bs ) b b � �

Properties 2: Shifting � Time shifting � � � 2 ist f ( t t ) F ( s ) e 0 0 � Frequency shifting � � � � 2 is t f ( t ) e F ( s s ) 0 0

Properties 3: Convolution Theorem � Say we have two functions, g ( t ) and h ( t ), then the convolution of the two functions is defined as � � � � � � � � g h g ( ) h ( t ) d � � � The Fourier transform of the convolution is simply the product of the individual Fourier transforms � � g h G ( s ) H ( s )

Properties 4: Correlation � The correlation of the two functions is defined by � � � � � � � Corr ( g , h ) g ( t ) h ( ) d � � � The Fourier transform of the correlation is simply � � Corr ( g , h ) G ( s ) H ( s )

Discrete Fourier Transform � The discrete Fourier transform of N complex points f k is defined as � N 1 � � � 2 ikn / N F f e n k � k 0 � The discrete inverse Fourier transform, which recovers the set of f k s exactly from F n s is � 1 N 1 � � � � 2 ikn / N f F e k n N � n 0 � Both the input function and its Fourier transform are periodic

Discrete Fourier Transform II � The DFT can be rewritten as � � � � � � � N 1 n n � � � � � � � � � � � � F a a cos 2 k b i sin 2 k � � n 0 k k � N � � N � � � � k 1 � Thus, DFT routines are basically returning real number values for a k and b k , stored in a complex array � a k and b k are functions of f k � remaining trigonometric constants (twiddle factors) may be pre-computed for a given N � The scaling, shifting, convolution and correlation relationships, which hold for the continuous case, also hold for the discrete case.

Fast Fourier Transforms � What is the computational cost of the DFT? � Each of the N points of the DFT is calculated in terms of all the N points in the original function: O ( N 2 ) � In 1965, J.W. Cooley and J.W. Tukey published an DFT algorithm which is of O ( N log N ) � N is a power of 2 � FFTs are not limited to powers of 2, however, the order may resort to O ( N 2 ) � Details are beyond the scope of this talk � F(N) = F(N/2)+F(N/2) � Bit reversal � In hindsight, faster algorithms were previously, independently discovered � Gauss was probably first to use such an algorithm in 1805

Parallel 1D FFT � Parallelisations of a 1D FFT is hard � Typically N � 100 in many scientific codes � Algorithm is hard to decompose � Literature example: Franchetti, Voronenko, Püschel �� SC06, Tampa, FL http://sc06.supercomputing.org/schedule/pdf/pap169.pdf

FFTs in two dimensions � What needs calculating for a 2D FFT: � We may compute this in a 2 separate calculations � as each part is linearly independent

Parallel array transpose � Assignment of a 4x4 grid to 4 processors for an array transpose P1 P2 P3 P4 1 2 3 4 1 2 3 4 P1 P2 5 6 8 5 6 8 7 7 P3 9 10 11 12 9 10 11 12 P4 13 14 15 16 13 14 15 16

Algorithm for distributed 2D FFT on a 1D grid of processors � Calculate 1st FFT in first direction � Perform parallel transpose � MPI_Alltoall � Now, what used to be the columns of the original matrix is now processor local � Now we may perform the 2nd FFT in second direction � Finally, perform parallel transpose back � Sometimes this last expensive step can be avoided � Code performs calculations in Fourier space using this new processor grid

Fourier Transformation of a 3D array � Definition of the Fourier Transformation of a three dimensional array A x,y,z � Can be performed as three subsequent 1 dimensional Fourier Transformations

Parallel FFT of a 3D array � Traditionally: 1 dimensional processor grid � Each processor gets several �� Perform FFT in two of the three directions � Single All-to-all before performing FFT in third direction

Alternatively: 2D processor grid for 3D FFT � �� of the 3D array � Perform FFT in 1 st direction � Perform All-to-all transformation in the columns of the processor grid

2D processor grid for 3D FFT (cont.) � Perform FFT in the 2 nd direction � Perform All-to-all in the rows of the processor grid � Perform 3 rd FFT in the last direction

Performance comparison of 1D pencils vs 2D slabs: IBM BlueGene/L Number of nodes 1 10 100 1000 10000 10.0000 1024 ³ 1D 1024 ³ 2D 1.0000 512 ³ 1D 512 ³ 2D Execution times 256 ³ 1D 0.1000 256 ³ 2D 128 ³ 1D 0.0100 128 ³ 2D 64 ³ 1D 0.0010 64 ³ 2D 32 ³ 1D 0.0001 32 ³ 2D � Heike Jagode, MSc thesis, University of Edinburgh, 2006

Pencils vs Slabs � For 3D data points, users employ 1D or 2D processor grid � 1D processor grid: sticks/pencils � More communications � Requires less memory � In general, better scalability � 2D processor grid: slabs/slices � Less communications � Requires more memory � The optimum choice depends on both the problem and the target platform � Tip: let the physics be your guide and pick the decomposition that suits your problem � Try not to make your code platform-specific

Parallel Fast Fourier Transforms Gavin J. Pringle Joahcim Hein - PowerPoint PPT Presentation

Parallel Fast Fourier Transforms Gavin J. Pringle Joahcim Hein Introduction The Fourier Transform What, who, why? Mathematics and and its inherent properties Discrete Fourier Transform Fast Fourier Transform, or FFT

Fourier Series and Transform Overview Why Fourier transform? Trigonometric functions Who is

Sparse Fourier Transforms Eric Price UT Austin Eric Price Sparse Fourier Transforms 1 / 36

Chapter 4 Chapter 4 The Fourier Series and The Fourier Series and Fourier Transform Fourier

Chapter 4 Chapter 4 The Fourier Series and The Fourier Series and Fourier Transform Fourier

Signals and Systems Chapter 4: The Continuous Time Fourier Transform Derivation of the CT Fourier

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

Topic 5: Discrete-Time Fourier Transform (DTFT) o DT Fourier Transform o Overview of Fourier

Lecture 5: Statistical models of images Today Review Fourier transforms Image statistics

Topic 4: Continuous-Time Fourier Transform (CTFT) o Introduction to Fourier Transform o Fourier

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.1 Fast Fourier Transform

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

F F Fast Transforms using the Cell/B.E. Processor Fast Transforms using the Cell/B.E. Processor

Fourier Transform for Partial Differential Equations Introduction: Fourier Transform

Lecture 3.7: Fourier transforms Matthew Macauley Department of Mathematical Sciences Clemson

Chapter 14: Fourier Transforms and Boundary Value Problems in an Unbounded Region Department of

Fourier Transforms and Spectral Analysis Chaiwoot Boonyasiriwat September 17, 2020

TYPES FOR EXACT REAL COMPUTATION (USING AERN2/HASKELL) Michal Kone n, Eike Neumann Aston

Data Management Planning: Get up to date with DMPTool and DMPonline IDCC14 San Francisco, CA

Recursive neural networks for semantic interpretation Sam Bowman Department of Linguistics and

Efficient Ring-LWE Encryption on 8-bit AVR Processors . Zhe Liu 1 Hwajeong Seo 2 Sujoy Sinha Roy

Cryptography for Embedded Devices Tobias Oder Ruhr-University Bochum Workshop on Cryptography

HW/SW Codesign Data Flow Software Implementation I ECE 522 Mapping DFGs to Software There are a

Multi-Dimensional Indexing: FFT Use a transform to generate an index pattern that still

EDUARDO GUENDELMAN, PHYSICS DEPARTMENT, BEN GURION UNIVERSITY, BEER SHEVA, ISRAEL. WITH EMIL