Driving Improvements to Algebraic Multigrid Through Performance - PowerPoint PPT Presentation

� � Driving Improvements to Algebraic Multigrid Through Performance Modeling Hormozd Gahvari 1,3 , William Gropp 1 , Kirk E. Jordan 2 , Martin Schulz 3 , Ulrike Meier Yang 3 � 1 University of Illinois at Urbana-Champaign � 2 IBM TJ Watson Research Center � 3 Lawrence Livermore National Laboratory � July 6, 2014 �

Alg lgebraic ic Mult ltig igrid id Apply ¡mul)grid ¡concept: ¡ And ¡cycle: ¡ Level ¡0 ¡ Level ¡1 ¡ Solve ¡original ¡ “fine” ¡problem ¡ … ¡ To ¡unstructured ¡grid ¡problems: ¡ With ¡informa)on ¡from ¡ smaller ¡“coarse” ¡problems ¡ Requires ¡two ¡phases: ¡ ¡ 1. Setup ¡hierarchy ¡of ¡grids ¡ 2. Solve ¡problem ¡ LLNL-‑PRES-‑528011 ¡ 2 ¡

Performa mance Issues • AMG ¡scaled ¡well ¡on ¡IBM ¡Blue ¡Gene/L, ¡Blue ¡Gene/P, ¡but ¡has ¡struggled ¡on ¡ other ¡machines ¡like ¡Hera, ¡an ¡Opteron ¡cluster ¡at ¡LLNL: ¡ Poor ¡performance ¡on ¡coarse ¡grids ¡hurts ¡scalability: ¡ Communica)on ¡pa_ern ¡on ¡one ¡of ¡the ¡coarse ¡grids: ¡ AMG ¡Solve ¡Cycle ¡on ¡Hera ¡ 1 ¡ 0.1 ¡ Time ¡(s) ¡ 0.01 ¡ 128 ¡Cores ¡ 0.001 ¡ 1024 ¡Cores ¡ 0.0001 ¡ 3456 ¡Cores ¡ 0.00001 ¡ 0 ¡ 2 ¡ 4 ¡ 6 ¡ 8 ¡ 10 ¡ Level ¡ • Results ¡are ¡for ¡a ¡3D ¡7-‑point ¡Laplace ¡model ¡problem, ¡50 ¡x ¡50 ¡x ¡25 ¡points/core ¡ • Why ¡was ¡there ¡such ¡degrada)on ¡here ¡but ¡not ¡on ¡Blue ¡Gene ¡machines? ¡ • Mo)va)on ¡for ¡developing ¡performance ¡model ¡ LLNL-‑PRES-‑528011 ¡ 3 ¡

Performa mance Model Approach: ¡work ¡level-‑by-‑level, ¡with ¡α-‑β ¡model ¡(T send =α+nβ ¡for ¡message ¡of ¡ • length ¡n) ¡as ¡baseline ¡ prolong to � level i-1 � Ø Fundamental ¡opera)ons ¡at ¡each ¡level ¡ smooth, � shown ¡in ¡red ¡ smooth � form residual � ¡ Ø Treat ¡each ¡opera)on ¡as ¡MatVec ¡with ¡ restrict to � appropriate ¡operator ¡ level i+1 � Machine ¡parameters ¡for ¡network ¡and ¡computa)on ¡rate ¡measured ¡using ¡ • benchmarks ¡ Communica)on, ¡computa)on ¡counts ¡are ¡available ¡from ¡solver ¡data ¡ • structures ¡ 7/6/14 ¡ LLNL-‑PRES-‑656515 ¡ 4 ¡

Performa mance Model • To ¡the ¡baseline ¡models, ¡we ¡add ¡penal)es ¡to ¡take ¡architecture ¡into ¡ account: ¡ – Distance ¡of ¡communica)on: ¡introduce ¡)me ¡per ¡hop ¡γ ¡ • Measured ¡from ¡worst-‑case, ¡best-‑case ¡latencies ¡and ¡global ¡network ¡diameter ¡ • Distance ¡of ¡diam(P) ¡charged ¡to ¡each ¡message ¡ ¡ Hardware ¡Bandwidth ¡ – Lower ¡effec)ve ¡bandwidth: ¡mul)ply ¡β ¡by ¡ MPI ¡Bandwidth ¡ Hardware ¡Bandwidth ¡ nmsgs ¡ or ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡+ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡, ¡depending ¡on ¡machine ¡ MPI ¡Bandwidth ¡ nlinks ¡ ¡ – Mul)core ¡penal)es: ¡ • c ¡= ¡number ¡of ¡cores ¡per ¡node ¡ • P i ¡= ¡number ¡of ¡“ac)ve” ¡processes ¡on ¡level ¡i ¡ • Mul)core ¡latency ¡penalty: ¡mul)ply ¡α ¡by ¡ cP i ¡ • Mul)core ¡distance ¡penalty: ¡mul)ply ¡γ ¡by ¡ ¡P ¡ – Hybrid ¡MPI/OpenMP: ¡if ¡using ¡j ¡threads, ¡mul)ply ¡)me ¡per ¡flop ¡by ¡ (Mem ¡BW ¡for ¡1 ¡thread)/(Mem ¡BW ¡for ¡j ¡threads) ¡ 7/6/14 ¡ LLNL-‑PRES-‑656515 ¡ 5 ¡

Performa mance Model Intrepid ¡(Blue ¡Gene/P ¡at ¡Argonne) ¡ vs. ¡ Hera ¡(Opteron ¡Cluster ¡at ¡LLNL) ¡ Cycle Time by Level on Intrepid, 8192 Processes Cycle Time by Level on Hera, 1024 Processes −1 −1 10 10 α , β Penalties β , γ Penalties α , β , γ Penalties −2 −2 10 10 Time (s) Time (s) −3 −3 10 10 α , β Penalties α − β Model α − β Model β , γ Penalties α − β − γ Model α − β − γ Model α , β , γ Penalties β Penalty β Penalty −4 −4 10 10 0 2 4 6 8 0 1 2 3 4 5 6 7 Level Level • Fine ¡grid ¡dominates ¡performance ¡ • Coarse ¡grids ¡dominate ¡performance ¡ • Only ¡β ¡penalty ¡applies ¡ • All ¡penal)es ¡apply ¡ • α-‑β ¡model ¡close ¡to ¡actual ¡ • α-‑β ¡model ¡much ¡different ¡from ¡actual ¡ 7/6/14 ¡ LLNL-‑PRES-‑656515 ¡ 6 ¡

Observatio ions • The ¡issue ¡is ¡not ¡the ¡communica)on ¡itself ¡but ¡the ¡ ability ¡of ¡the ¡interconnect ¡to ¡handle ¡it ¡ • Trend ¡towards ¡more ¡on-‑node ¡parallelism ¡means ¡we ¡ cannot ¡rely ¡on ¡interconnects ¡ – Hera ¡= ¡worst ¡case ¡scenario ¡ – However, ¡even ¡something ¡between ¡current-‑genera)on ¡ machines ¡and ¡Hera ¡would ¡be ¡very ¡problema)c ¡ • Model ¡gives ¡us ¡a ¡way ¡forward. ¡We ¡will ¡show ¡how ¡to ¡ use ¡it ¡to ¡ 1. Guide ¡data ¡redistribu)on ¡that ¡trades ¡communica)on ¡for ¡ computa)on ¡ 2. Guide ¡thread/task ¡mix ¡selec)on ¡in ¡hybrid ¡MPI/OpenMP ¡ 7/6/14 ¡ LLNL-‑PRES-‑656515 ¡ 7 ¡

Data Redis istrib ibutio ion in in AMG • Idea ¡that ¡has ¡gained ¡trac)on: ¡ – Concentrate ¡data ¡on ¡coarse ¡grids ¡so ¡that ¡fewer ¡messages ¡are ¡sent ¡ – Has ¡been ¡done ¡with ¡or ¡without ¡redundant ¡replica)on ¡ • Illustra)on ¡of ¡redistribu)on ¡strategy: ¡ • Split ¡problem ¡domain ¡into ¡chunks ¡(blue ¡boxes) ¡ • Processes ¡within ¡a ¡chunk ¡have ¡same ¡part ¡of ¡domain ¡ • Redundant ¡version ¡shown ¡with ¡12 ¡processes ¡and ¡ 4 ¡chunks ¡ • Nonredundant ¡version ¡would ¡keep ¡just ¡one ¡color ¡ group ¡ • Performance ¡model ¡can ¡be ¡adjusted ¡to ¡model ¡this: ¡ – At ¡level ¡where ¡redistribu)on ¡is ¡performed, ¡charge ¡for ¡needed ¡ collec)ve ¡opera)ons ¡ – On ¡this ¡and ¡coarser ¡levels, ¡communica)on ¡is ¡with ¡at ¡most ¡C-‑1 ¡ partners. ¡Adjust ¡computa)on ¡based ¡on ¡amount ¡of ¡data ¡concentrated ¡ – Adjust ¡)me ¡per ¡flop ¡based ¡on ¡“problem ¡size ¡classifica)on” ¡(parts ¡of ¡ data ¡that ¡fit ¡in ¡cache) ¡ 7/6/14 ¡ LLNL-‑PRES-‑656515 ¡ 8 ¡

Guid idin ing Data Redis istrib ibutio ion At ¡each ¡coarse ¡grid ¡in ¡setup ¡phase, ¡use ¡model ¡to ¡es)mate: ¡ • 1. Time ¡spent ¡at ¡that ¡level ¡in ¡solve ¡cycle ¡when ¡redistribu)ng ¡(T switch ) ¡ 2. And ¡when ¡not ¡redistribu)ng ¡(T noswitch ) ¡ 3. If ¡T switch ¡< ¡T noswitch , ¡then ¡redistribute ¡ ¡ Requires ¡extra ¡informa)on: ¡ • – Interpola)on ¡operator ¡unavailable: ¡subs)tute ¡MatVec ¡with ¡solve ¡operator ¡ – Time ¡per ¡flop ¡unknown: ¡measure ¡with ¡MatVecs ¡using ¡local ¡por)on ¡of ¡parallel ¡ data ¡ ¡ Other ¡concerns: ¡ • – Time ¡per ¡flop ¡changes ¡aner ¡redistribu)on: ¡do ¡not ¡change ¡it ¡in ¡model, ¡but ¡ prevent ¡redistribu)on ¡if ¡problem ¡size ¡classifica)on ¡increases ¡ – Hybrid ¡MPI/OpenMP ¡use? ¡Requires ¡essen)ally ¡no ¡change! ¡Implicit ¡in ¡ measurement ¡of ¡)me ¡per ¡flop ¡ – Number ¡of ¡chunks ¡to ¡carve ¡problem ¡into? ¡For ¡quick ¡setup, ¡search ¡powers ¡of ¡2 ¡ <= ¡max ¡# ¡sends ¡ – Possible ¡overeager ¡switching: ¡keep ¡track ¡of ¡running ¡es)mated ¡cycle ¡)me, ¡do ¡ not ¡switch ¡if ¡overall ¡modeled ¡improvement ¡is ¡< ¡5% ¡ 7/6/14 ¡ LLNL-‑PRES-‑656515 ¡ 9 ¡

Redis istrib ibutio ion Experime iments • Pair ¡of ¡test ¡problems: ¡ 1. 3D ¡Laplace ¡with ¡30 ¡x ¡30 ¡x ¡30 ¡points/core ¡ 2. Linear ¡Elas)city ¡with ¡~6,300 ¡points/core: ¡ ¡ ¡ ¡ ¡ #$%& ! '()" #$%& ! '(" !" ¡ ¡ • Used ¡nonredundant ¡redistribu)on ¡owing ¡to ¡issues ¡with ¡large ¡ numbers ¡of ¡MPI ¡communicators ¡at ¡scale ¡ • Ran ¡on ¡three ¡machines: ¡ – Vulcan: ¡IBM ¡Blue ¡Gene/Q ¡at ¡LLNL ¡ – Titan: ¡Cray ¡XK7 ¡at ¡ORNL ¡ – Eos: ¡Cray ¡XC30 ¡at ¡ORNL ¡ 7/6/14 ¡ LLNL-‑PRES-‑656515 ¡ 10 ¡

Driving Improvements to Algebraic Multigrid Through Performance - PowerPoint PPT Presentation

Driving Improvements to Algebraic Multigrid Through Performance Modeling Hormozd Gahvari 1,3 , William Gropp 1 , Kirk E. Jordan 2 , Martin Schulz 3 , Ulrike Meier Yang 3 1 University of Illinois at Urbana-Champaign 2 IBM TJ Watson

Algebraic multigrid methods for mechanical engineering applications Mark F. Adams St.

Algebraic multigrid in PETSc Mark Adams Lawrence Berkeley National Laboratory PETSc user

REVOLUTIONIZING LATTICE QCD PHYSICS WITH HETEROGENEOUS MULTIGRID Kate Clark, April 6th 2016

Compact Fourier Analysis for Multigrid Methods Cortona 2008 Thomas Huckle joint work with

Distracted Driving Jennifer Smith What is Distracted Driving? Driving while engaged in any

Self-Driving Cars As Edge Computing Devices Matt Ranney - @mranney Uber ATG Why Self-Driving?

Safe Driving Techniques Road Safety Management Use of mobile phones Safe Driving Policy

DRIVING AI 1 Driving AI AI world representation Path finding AI driving

A Multigrid Optimization Framework for Centroidal Voronoi Tessellation Zichao Di Department of

AN INTRODUCTION TO MULTIGRID METHODS VIA SUBSPACE CORRECTION FRAMEWORK LUDMIL ZIKATANOV,

Multigrid methods for zero-sum two player stochastic games with mean reward Sylvie Detournay and

Gmunu : Toward multigrid based Einstein field equations CHEONG, Chi-Kit field equations solver

CLOVER HMC AND STAGGERED MULTIGRID ON SUMMIT AND VOLTA Kate Clark, July 25th 2018 OUTLINE with

Multigrid preconditioning for anisotropic positive semidefinite block Toeplitz systems Rainer

A massivelly parallel multigrid solver using PETSc for unstructured meshes on Tier0

Algebraic Multigrid Methods on GPU-Accelerated Hybrid Architectures Manfred Liebmann Institute

Fostering accent diversity in workplace communication Oksana Hera PronSIG Online Conference 4

The Athenians and their Gods Image courtesy of Jack Versloot on flickr. License CC BY. 1 Image

Heavy Flavour Content of the Proton eminar, 8 th October 2008 Paul Thompson, Birmingham S

Jet production in ultra-peripheral collisions with Pythia 8 COST workshop on collectivity in

VERGIL VERGIL The Roman Pantheon: Greek versus Roman Gods The Myths of Early Rome:

Tau Leptons at HERA Linus Lindfeld University of Zurich, Switzerland S T A U T R I I S

Accelerator Based Particle Physics going Global What has been done in the past? What

Stock Area Analysis & Misreporting Investigation CAPT Kevin King U.S. Coast Guard First

Driving Improvements to Algebraic Multigrid Through Performance - PowerPoint PPT Presentation

Driving Improvements to Algebraic Multigrid Through Performance Modeling Hormozd Gahvari 1,3 , William Gropp 1 , Kirk E. Jordan 2 , Martin Schulz 3 , Ulrike Meier Yang 3 1 University of Illinois at Urbana-Champaign 2 IBM TJ Watson

Algebraic multigrid methods for mechanical engineering applications Mark F. Adams St.

Algebraic multigrid in PETSc Mark Adams Lawrence Berkeley National Laboratory PETSc user

REVOLUTIONIZING LATTICE QCD PHYSICS WITH HETEROGENEOUS MULTIGRID Kate Clark, April 6th 2016

Compact Fourier Analysis for Multigrid Methods Cortona 2008 Thomas Huckle joint work with

Distracted Driving Jennifer Smith What is Distracted Driving? Driving while engaged in any

Self-Driving Cars As Edge Computing Devices Matt Ranney - @mranney Uber ATG Why Self-Driving?

Safe Driving Techniques Road Safety Management Use of mobile phones Safe Driving Policy

DRIVING AI 1 Driving AI AI world representation Path finding AI driving

A Multigrid Optimization Framework for Centroidal Voronoi Tessellation Zichao Di Department of

AN INTRODUCTION TO MULTIGRID METHODS VIA SUBSPACE CORRECTION FRAMEWORK LUDMIL ZIKATANOV,

Multigrid methods for zero-sum two player stochastic games with mean reward Sylvie Detournay and

Gmunu : Toward multigrid based Einstein field equations CHEONG, Chi-Kit field equations solver

CLOVER HMC AND STAGGERED MULTIGRID ON SUMMIT AND VOLTA Kate Clark, July 25th 2018 OUTLINE with

Multigrid preconditioning for anisotropic positive semidefinite block Toeplitz systems Rainer

A massivelly parallel multigrid solver using PETSc for unstructured meshes on Tier0

Algebraic Multigrid Methods on GPU-Accelerated Hybrid Architectures Manfred Liebmann Institute

Fostering accent diversity in workplace communication Oksana Hera PronSIG Online Conference 4

The Athenians and their Gods Image courtesy of Jack Versloot on flickr. License CC BY. 1 Image

Heavy Flavour Content of the Proton eminar, 8 th October 2008 Paul Thompson, Birmingham S

Jet production in ultra-peripheral collisions with Pythia 8 COST workshop on collectivity in

VERGIL VERGIL The Roman Pantheon: Greek versus Roman Gods The Myths of Early Rome:

Tau Leptons at HERA Linus Lindfeld University of Zurich, Switzerland S T A U T R I I S

Accelerator Based Particle Physics going Global What has been done in the past? What

Stock Area Analysis &amp; Misreporting Investigation CAPT Kevin King U.S. Coast Guard First

Stock Area Analysis & Misreporting Investigation CAPT Kevin King U.S. Coast Guard First