Outline Problem Definition Overview of FMM Parallel FMM Space - PowerPoint PPT Presentation

Outline  Problem Definition  Overview of FMM  Parallel FMM  Space Filling Curves and Compressed Octrees  Parallel Compressed Octrees  Computing Translations  Octree textures on GPU

Problem Definition To implement Parallel Fast Multipole Method (FMM) on Graphics Hardware Using multi Using GPUs Processors (to be done) (already done) Parallel FMM FMM

Fast Multipole Method The is concerned with evaluating the effect of a “set of sources” , on a set of “evaluation points” . More formally, given we wish to evaluate the sum  Total Complexity:

attempts to reduce this complexity to   The two main insights that make this possible are of the kernel into source and receiver terms   Many application domains do not require the function be calculated at high accuracy  FMM follows a  Each node has associated

Building Interaction Lists Each node has two kind of interaction lists  Far Cell List  Near Cell List  No far cell list at level 1 and level 0 since everything is near neighbor of other  Transfer of energy from near neighbors happens only for leaves Next : Passes of FMM

FMM Algorithm

FMM Algorithm Only for leaves of the quadtree

Parallel Compressed Space Filling Octrees Curves Parallel FMM Building Interaction Lists and computing various Parallel Translations Bitonic Sort Parallel Prefix Sum

Parallel Space Filling Compressed Curves Octrees Parallel FMM Building Interaction Lists and computing various Parallel Translations Bitonic Sort Parallel Prefix Sum

Space Filling Curves  Consider the recursive bisection of a 2D area into non-overlapping cells of equal sizesize  A is a mapping of these cells to a one dimensional linear ordering Z-SFC for k = 2

SFC Construction  The run time to order cells, is expensive since typically  Represent integer coordinates of cell using and then interleaving the bits sarting from first dimension to form a integer  Index of the cell with coordinates time to find the index 

Parallel Compressed Space Filling Octrees Curves Parallel FMM Building Interaction Lists and computing various Parallel Translations Bitonic Sort Parallel Prefix Sum

Octrees 10 9 4 8 1 8 4 2 5 7 6 3 6 5 9 1 3 2 10 7

Compressed Octrees  Each node in compressed octree is either a leaf or has atleast 2 children 10 9 4 8 9 1 10 8 4 2 5 7 6 2 3 3 6 5 1 7

Encapsulating spatial information lost in compression  Store 2 cells in each node of the compressed octree  Large cell : cell that encloses all the points the node represents  Small cell : cell that encloses all the points the node represents 10 9 8 4 4 8 9 1 10 2 3 6 7 2 5 6 3 5 1 7

Octrees and SFCs Octrees can be viewed as multiple SFCs at various resolutions 01 11 1 To establish a total order on the cells of octree: given 2 cells if one is contained in the other, the subcell is taken to precede the supercell 00 10 0 if disjoint, order according to the order of immediate subcells of the smallest supercell enclosing them The resulting linearization is identical to 0 1 traversal

Parallel Compressed Octree Construction  Consider points equally distributed across processors = pre-specified maximum resolution   For each point, generate the index of the leaf cell containing it which is the cell at the max resolution  Parallel sort the leaf indices to compute their SFC-linearization, or the left to right order of leaves in the compressed octree.  Each processor obtains the leftmost leaf cell of the next processor. Why ?  On each processor, construct a local compressed octree for the leaf cells within it and the borrowed leaf cell.  Send the to appropriate processors  Insert the received out of order nodes in the already existing sorted order of nodes

Parallel Space Filling Compressed Curves Octrees Parallel FMM Building Interaction Lists and computing various Parallel Translations Bitonic Sort Parallel Prefix Sum

Parallel FMM The FMM computation consists of the following phases  Building the compressed octree  Building interaction lists  Computing multipole expansions using a bottom-up traversal  Computing multipole to local translations for each cell using its interaction list  Computing the local expansions using a top-down traversal  Projecting the field at leaf cells back to the particles

Computing Multipole Expansions Each processor scans its local array from left to right If leaf node is reached compute its multipole expansion directly If node’s multipole is known, shift and add it to parent’s multipole expansion provided the parent is local to processor Use of postorder ? If the multipole expansion due to a cell is known but its parent lies in a different processor, it is labeled a If the multipole expansion at a node is not yet computed when it is visited, it is labeled a  Residual nodes form a tree (termed the residual tree)  The tree is present in its postorder traversal order, distributed across processors.

Multipole expansions on the residual tree can be computed using an efficient parallel upward tree accumulation algorithm The residual tree can be accumulated in rounds as compared to in case of global compressed octree Thus, the worst-case number of communications are reduced from of the tree to of the tree, which is much smaller

Computing Multipole to Local Translations An all-to-all communication is used to receive fields of nodes from the interaction lists that reside on remote processors Once all the information is available locally, the multipole to local translations are conducted within each processor as much as in the same way as in sequential FMM

Computing Local Expansions Similar to computing multipole expansions Calculate local expansions for the residual tree. Compute local expansions for the local tree using a scan of the The exact number of communication rounds required is the same as in computing multipole expansions

OctreeTextures on GPU A B C(2,0) D(3,0) A(0,0) B(1,0) C (1,0) D (3,0) (2,0)  The content of the leaves is directly stored as an RGB value  Alpha channel is used to distinguish between an index to a child and the content of a leaf alpha = 1 data alpha = 0.5 index alpha = 0 empty cell

A B C(2,0) D(3,0) A(0,0) B(1,0) C (1,0) D (3,0) (2,0) Retrieve the value stored in the tree at a point M є [0,1] × [0,1] The tree lookup starts from the root and successively visits the nodes containing the point M until a leaf is reached.

I 0 = (0,0) node A (root) C(2,0) D(3,0) A(0,0) B(1,0) + M (1,0) P x = I 0x + frac ( M .2 0 ) P S x P y = I 0y + frac ( M .2 0 ) (3,0) (2,0) S y frac(A) denotes the fractional part of A I 0 = (0,0) Let M=(0.7, 0.7) Coordinates of M within grid A = frac(M·2 0 ) = frac(0.7x1) = 0.7 x coordinate of the lookup point P in the texture = P x = {I 0x + frac(M.2 0 )}/S x = (0 + 0.7)/4 = 0.175 y coordinate of the lookup point P in the texture = P y = {I 0y + frac(M.2 0 )}/S y = (0 + 0.7)/1 = 0.7

I 1 = (1,0) node B C(2,0) D(3,0) A(0,0) B(1,0) M + (1,0) P x = I 1x + frac ( M .2 1 ) S x (2,0) P y = I 1y + frac ( M .2 1 ) (3,0) P S y I 1 = (1,0) M=(0.7, 0.7) Coordinates of M within grid B = frac(M·2 1 ) = frac(0.7x2) = 0.4 x coordinate of the lookup point P in the texture = P x = {I 1x + frac(M.2 1 )}/S x = (1 + 0.4)/4 = 0.35 y coordinate of the lookup point P in the texture = P y = {I 1y + frac(M.2 1 )}/S y = (0 + 0.4)/1 = 0.4

I 2 = (2,0) node C C(2,0) D(3,0) A(0,0) B(1,0) + P (1,0) P x = I 2x + frac ( M .2 2 ) M S x P y = I 2y + frac ( M .2 2 ) (3,0) (2,0) S y I 2 = (2,0) M=(0.7, 0.7) Coordinates of M within grid C = frac(M·2 2 ) = frac(0.7x4) = 0.8 x coordinate of the lookup point P in the texture = P x = {I 2x + frac(M.2 2 )}/S x = (2 + 0.8)/4 = 0.7 y coordinate of the lookup point P in the texture = P y = {I 2y + frac(M.2 2 )}/S y = (0 + 0.8)/1 = 0.8

References  L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. Journal of Computational Physics , 73:325 – 348, 1987.  J. Carrier, L. Greengard, and V. Rokhlin. A Fast Adaptive Multipole Algorithm for Particle Simulations. SIAM Journal of Scientific and Statistical Computing , 9:669- 686, July 1988.  R. Beatson and L. Greengard. A Short Course on Fast Multipole Methods.  B. Hariharan and S. Aluru. Efficient parallel algorithms and software for compressed octrees with apllications to hierarchical methods. Parallel Computing , 31:311 – 331, 2005.  B. Hariharan, S. Aluru, and B. Shanker. A Scalable Parallel Fast Multipole Method for Analysis of Scattering from Perfect Electrically Conducting Surfaces. Proc. Super- computing , page 42, 2002.

References contd..  H. Sagan. Space Filling Curves. Springer-Verlag, 1994.  M. Harris. Parallel Prefix Sum (Scan) with CUDA. http://developer.download.nvidia.com/compute/cuda/sdk/website/samples.htm  S. Lefebvre, S. Hornus, and F. Neyret. GPU Gems 2, Octree Textures on the GPU, pages 595 – 614. Addison Wesley, 2005.  T. W. Christopher. Bitonic Sort Tutorial. http://www.tools-of-computing.com/tc/CS/Sorts/bitonic_sort.htm

Outline Problem Definition Overview of FMM Parallel FMM Space - PowerPoint PPT Presentation

Outline Problem Definition Overview of FMM Parallel FMM Space Filling Curves and Compressed Octrees Parallel Compressed Octrees Computing Translations Octree textures on GPU Problem Definition To implement Parallel Fast

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Calculating bounds on expected return and first passage times in finite-state imprecise

Mapping CSP Networks to MPI Clusters Using Channel Graphs and Dynamic Instrumentation Gabriella

Virtual infrastructure partitioning and provisioning under nearly real-time constraints Student:

BMP Design Aids w w w. t r a n s p o r t a t i o n . o h i o . g o v 1 Equations / Programs

SPAIN Smart Path Assignment In Networks Radosaw Pudekiewicz Uniwersytet Warszawski

Efficient HPC Development and Production with Allinea Tools Florent Lebeau

Latches, Flip-flops, and Registers Sequential logic: fundamental elements to store values Output

P. 1 Disclaimer This presentation does not constitute an offer or solicitation to anyone in any