Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its - PowerPoint PPT Presentation

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical Model Checking Masahiro Fujita VLSI Design and Education Center (VDEC) University of Tokyo 1

2 Outline • What Tsunami simulation means in this talk • Acceleration with FPGA/GPU • Acceleration with FPGA/GPU – Based on stream processing (pipelining) with loop unrolling unrolling – Based on parallel processing for decomposed regions • (Formal verification of those implementation) • (Formal verification of those implementation) – (Equivalence checking between FPGA/GPU implementation and the original program in C/Fortran) implementation and the original program in C/Fortran) – Just show our strategy • Statistical model checking Statistical model checking – On software in Fortran – Acceleration with FPGA/GPU l i i h PG /GP

3 Motivation • Based on the values of many earthquake sensors (wired/wireless) compute how Tsunami wave will (wired/wireless), compute how Tsunami wave will propagate • Goal: Realize supercomputer level performance in G l R li l l f i Tsunami simulation with FPGA/GPU Compute/Predict C /P di Tsunami as fast and accurate as possible accurate as possible Earthquake sensors Generate initial wave from sensor data geographically distributed Propagate wave by numerically solving partial differential equations

4 Tsunami simulation • Tsunami simulation algorithm: Find solutions of fluid dynamics equations dynamics equations – Law of Conservation of Mass – Law of Conservation of Momentum with and without bottom friction • Solved with known boundary conditions and bathymetric input of the region y p g • Here the above is processed by numerically solving sets of partial differential equations with finite sets of partial differential equations with finite difference methods

5 Partial differential equations to be solved Partial differential equations to be solved ὴ = vertical displacement of water above still water ὴ p D= Total water depth = h+ ὴ Momentum g = Acceleration due to gravity equations along A = horizontal eddy viscosity current A h i l dd i i X ‐ axis and Y ‐ axis τ = friction along x or y direction respectively M = water flux discharge along X direction M = water flux discharge along X direction without bottom ith t b tt N = water flux discharge along Y direction friction Mass conservation Mass conservation Reference: Tsunami Modeling Manual by Prof Nobuo Shuto

6 Here we use a simplified model: Linear one (valid if sea depth is large enough) • Shallow Water Theory (Long Wave Theory)    M M N       (Mass Conservation) (Mass Conservation) 0 0    t x y          2 2 M M MN gn M           2 2 gD M N 0         7   (Momentum Conservation) (Momentum Conservation) t x D y D x   D D 3          2 2 M MN N gn M           2 2 gD M N 0         7   t x D y D y  D 3   : waveheight g D : depth p g g : g gravity y n : Manning g M , , N : flaxofx f f , , y y • Linear Long Wave Theory (Mass Conservation) (M (Momentum Conservation) t C ti )

7 Finite difference methods • Solution of mass conservation equation based on finite difference method finite difference method – Z(i,j,t+1) = Z(I,j,t) – (dt/dx)*(M(I,j,t)-M(i-1,j,t)+N(i,j,t)-N(i,j-1,t))       M M M M N N    0 Where    t t x y i, j = x, y coordinate Z(i,j,t) = Water Surface level at time t H(i,j,t) = Still water depth at time t t+1 dt = temporal step dx = spatial step M(i,j,1) = water flux discharge along x-axis at time t N(i,j,1) = water flux discharge along y-axis at time t Z(i,j,2) = water surface level at time t+dt

8 Wave Height Computation Mass conservation Momentum conservation (Wave height update) (Wave height update) ・・・・・・・・・・・・ N me T M,N, M,N, ・・・・・・・・・・・・ M M H H Tim H,Z H ・・・・・・・・・・・・ H 1 second 1 second ・・・・・・・・・・・・ T+1 Time T ・・・・・・・・・・・・ Z Z M,N ・・・・・・・・・・・・・・・・・・・・・・・・ Z Z 8

9 Target Tsunami Simulator g • “TUNAMI N1” program in FORTRAN – Developed by Tohoku University D l d b T h k U i i Standing Water Earthquake Depth (H) Depth (H) h ( ) Source Info. Source Info. f Initial Wave Height (Z0) Computation Mass Conservation d) ht Update e = 1 secon Computation (Wave height: Z) 1040 grids 1040 grids ave Heigh eration = Momentum Standing water depth H Conservation (1 grid: 1km x 1km) Computation p Wa (1 it (Momentum: M,N) 9

10 C Implementation (base program) p p g • Mass and Momentum functions are computed alternatively alternatively – Each function raster ‐ scans the grids • Since there is no data dependency between the h b h computations at grids, they can be parallelized Mass Conservation (Z update) Momentum Conservation (M,N update) No dependency No dependency (Parallelizable) (Parallelizable) 10

11 Speed of N1 simulation program • Original Fortran program has been manually converted into C – C is 4 times faster than Fortran in our environment – C version is the base simulator • Size of simulation area • Grid width: 1[km] • Numbers of grids: 1040*668 • Simulated time • 1 time step = 1 sec • 7,200 steps computed (2 hours) • Tsunami simulation time on Intel microprocessor l l (i7@2.93GHz, single core) → 78 7 → 78.7 sec

12 Profiling of Computation time of the software Profiling of Computation time of the software Start Simulation Cycles and Computation time of TUNAMI Input p 160 Initial condition 140 sec] 120 tion time[s Main Loop t=1 ; t<=T ; t++ 100 moment conservation 80 Mass Conservation Mass Conservation open boundary open boundary Computat 60 mass conservation Open Boundary Condition initial 40 M Momentum Conservation t C ti 20 0 3600 7200 10800 14400 18000 21600 Simulation Cycles (t) Simulation Cycles (t) Stop 12

13 Co-execution C C on microprocessor i FPGA/GPU FPGA/GPU • Read earthquake information from files Mass Conservation • Compute initial wave Open Boundary Condition Open Boundary Condition • Load initial wave to DRAM Momentum Conservation memory of FPGA board • Run FPGA . . . • Read data from DRAM Idl Idle . • Main loop . . • Store results to DRAM • Store results into files

14 Pipeline processing for higher throughput • Latency – After receiving input, how many cycles are required to generate its output generate its output • Throughput – How frequently input data can be processed Iter 1 Iter 2 Iter 3 Iter N Iteration Pipeline processing Pipeline processing Time Iter N latency y Iter 3 Pipeline stages Iter 2 Goal: Faster throughput Iteration Iter 1 = Larger numbers of pipeline stages Larger numbers of pipeline stages Initiation interval (~throughput)

15 Typical way for larger pipelines • Usually each loop becomes one pipeline – Multiple loops should be merged as much as possible • Number of pipeline stages depends on the length of each iteration – Better to have larger loops Better to have larger loops • Various loop optimizations have been proposed – Formal analysis becomes possible with such transformations y p for ( i=0; i<N; i++) for (t1=0; t1<N; t1++) for (t2=0; t2<N; t2++) { for ( j=0; j<N; j++) S0(j,i,0)A[t2,t1] += u[t2]*v[t1]; S0(0,i,j): A[ i ][ j ] += u[ i ] * v[ j ]; S1(i j 1) [t1] S1(i,j,1)x[t1] += A[t2,t1]*y[t1]; A[t2 t1]* [t1] i’ i S0(i,j) S1(i’,j’) for ( i=0; i<N; i++) } for ( j=0; j<N; j++) S1(1,i,j): x[ i ] += A[ j ][ i ] * y[ j ]; j’ j [Pluto 08] U. Bondhugula, et al. “"A Practical and Automatic Polyhedral Program Optimization System,” in ACM PLDI'08, 2008

16 Transforming latency-based to throughput-based computation • Stream based programming • Stream based programming – Communication/buffering becomes explicit – Easier for formal analysis as well – Easier for formal analysis as well • Works for both FPGA and GPU – And also for many-cores – And also for many-cores Instruction Data memory memory memory memory P1 µP P2 P3 Hardware “instruction, latency-based” “data, throughput-based”

17 Introducing streams • Steam • Steam = Sequence of data, functions, or combined Sequence of data functions or combined … M. N. H. Z t Formal equivalence One stream for One stream for t+1 t+1 checking with h ki i h each region manipulation of transformation matrix One stream for … each time step Data object b Original (can be a single program code stream)

18 Strategy for GPU implementation • As shown earlier, stream is based on e each region each region Tim – Easier and more efficient for GPU – But depend on memory access B t d d architecture of the target GPU systems • Essentially area where Tsunami should be simulated is decomposed into a set of small regions d d f ll – Each core of GPU is in charge of one region – Straight forward parallelism – Pipelined computation inside each core

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its - PowerPoint PPT Presentation

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical Model Checking Masahiro Fujita VLSI Design and Education Center (VDEC) University of Tokyo 1 2 Outline What Tsunami simulation means in

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

TSUNAMI SAFETY OF NPPS SMiRT 22 in the United States August 2013 OVERVIEW Tsunami

Software into GPU code, Multicore Software and FPGA Hardware Satnam Singh Microsoft Research,

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

FPGA vs GPU Performance Comparison on the Implementation of FIR Filters FPGA. While comparing the

CROSS CULTURAL MENTAL HEALTH CROSS CULTURAL MENTAL HEALTH AND AND THE INDIAN THE INDIAN OCEAN

Tsunami Warning Systems What is a Ts Tsuna unami? Hint: The above image is NOT a real

Silver Tsunami: A Growing Market for Individual Dental and Vision Sales The Silver Tsunami

Study on Design Method of Multistory Building against Tsunami and Tsunami Debris Norimi Mizutani

Puducherry Port Disaster Port Disaster Puducherry A Man- -Made Tsunami Made Tsunami A Man

Verifying Safety of Fault-Tolerant Distributed Components R. Ameur-Boulifa (1) , R. Halalai (2) ,

Byzantine Fault Tolerance Consensus Strikes Back Announcements Lab 2 Hopefully everyone has

Gam ame e Theor Theory y for or Dist istribut ibuted ed Syst ystem ems John P. Conley

The geometry of hydrodynamic integrability David M. J. Calderbank University of Bath October

Betting on Consensus with Fantmette Sarah Azouvi, Patrick McCorry, Sarah Meiklejohn University

Forks and Governance November 6, 2019 guha.jayachandran@sjsu.edu What is a Fork? What is a

Enterprise Ethereum Alliance Technical Roadmap Bob Summerwill and Shahan Khatchadourian Feb 28,

Practical Byzantine Fault Tolerance (Miguel Castro, Barbara Liskov) presented by Bjoern Doebel

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its - PowerPoint PPT Presentation

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical Model Checking Masahiro Fujita VLSI Design and Education Center (VDEC) University of Tokyo 1 2 Outline What Tsunami simulation means in

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

TSUNAMI SAFETY OF NPPS SMiRT 22 in the United States August 2013 OVERVIEW Tsunami

Software into GPU code, Multicore Software and FPGA Hardware Satnam Singh Microsoft Research,

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

FPGA vs GPU Performance Comparison on the Implementation of FIR Filters FPGA. While comparing the

CROSS CULTURAL MENTAL HEALTH CROSS CULTURAL MENTAL HEALTH AND AND THE INDIAN THE INDIAN OCEAN

Tsunami Warning Systems What is a Ts Tsuna unami? Hint: The above image is NOT a real

Silver Tsunami: A Growing Market for Individual Dental and Vision Sales The Silver Tsunami

Study on Design Method of Multistory Building against Tsunami and Tsunami Debris Norimi Mizutani

Puducherry Port Disaster Port Disaster Puducherry A Man- -Made Tsunami Made Tsunami A Man

Verifying Safety of Fault-Tolerant Distributed Components R. Ameur-Boulifa (1) , R. Halalai (2) ,

Byzantine Fault Tolerance Consensus Strikes Back Announcements Lab 2 Hopefully everyone has

Gam ame e Theor Theory y for or Dist istribut ibuted ed Syst ystem ems John P. Conley

The geometry of hydrodynamic integrability David M. J. Calderbank University of Bath October

Betting on Consensus with Fantmette Sarah Azouvi, Patrick McCorry, Sarah Meiklejohn University

Forks and Governance November 6, 2019 guha.jayachandran@sjsu.edu What is a Fork? What is a

Enterprise Ethereum Alliance Technical Roadmap Bob Summerwill and Shahan Khatchadourian Feb 28,

Practical Byzantine Fault Tolerance (Miguel Castro, Barbara Liskov) presented by Bjoern Doebel

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team