DUNE on Blue Gene / P Markus Blatt (Markus.Blatt@iwr.uni-heidelberg.de) joint work with: Olaf Ippisch and Felix Heimann Interdisziplin¨ ares Zentrum f¨ ur wissenschaftliches Rechnen Universit¨ at Heidelberg SciComp 15, Barcelona, May 21, 2009 M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 1 / 19
Outline DUNE 1 Parallelization Approach 2 Porting to BG/P 3 Scalability 4 M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 2 / 19
DUNE DUNE Why another framework? • Lots of good frameworks for PDEs out there. • Using one it might be • either impossible have a particular feature, • or very inefficient in certain applications. • Extension of the feature set is usually hard D istributed and U nified N umerics E nvironment • Separation of data structures and algorithms by abstract interfaces. • Efficient implementation of these interfaces using generic programming techniques in C++. • Static polymorphism enables extensive optimization by the compiler. • Algorithms are parametrized with data structures. Interface is removed at compile time. • Open Source available from http://www.dune-project.org M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 3 / 19
DUNE DUNE is modular dune−pdelab−howto dune−pdelab dune−fem dune−grid−howto dune−localfunctions dune−grid dune−istl Metis NeuronGrid SuperLU UG Alberta ALU dune−common VTK Gmsh • Grid interface : (non-)conforming hierarchically nested, multi-element-type parallel grids in arbitrary space dimensions. • Iterative Solver Template Library : Generic sparse and dense matrix and vector classes supporting recursive block structures. Corresponding (parallel) solvers, e.g. AMG. • PDELab : Discretization module that is closely related to the mathematical formulation of finite element methods. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 4 / 19
DUNE Sample Simulations • Flow and transport in porous media • Neuron network simulation • Density-driven flow • Root uptake M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 5 / 19
Parallelization Approach Parallelization Approach DUNE 1 Parallelization Approach 2 Porting to BG/P 3 Scalability 4 M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 6 / 19
Parallelization Approach Index Based Communication Goals • Allow reuse of efficient sequential data structures for computations • Let user initiate communication when needed. • Support • Unstructuredness • Adaptivity • Communication of different data with the same decomposition. Approach • Keep decomposition and communication information outside of data structures. • Use simple and portable index identification of items. • Data structures need to be augmented to contain ghost items. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 7 / 19
Parallelization Approach Index Sets Index Set • Distributed overlapping index set I = � P − 1 I p 0 • Process p stores and manages mapping I p − → [0 , n p ). • Supports adaptivity. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 8 / 19
Parallelization Approach Index Sets Index Set • Distributed overlapping index set I = � P − 1 I p 0 • Process p stores and manages mapping I p − → [0 , n p ). • Supports adaptivity. Global Index • Identifies a position (index) globally. • Arbitrary and not consecutive (to support adaptivity). • Persistent. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 8 / 19
Parallelization Approach Index Sets Index Set • Distributed overlapping index set I = � P − 1 I p 0 • Process p stores and manages mapping I p − → [0 , n p ). • Supports adaptivity. Local Index • Addresses a position in the local container. • Convertible to an integral type. • Consecutive index starting from 0. • Non-persistent. • Provides an attribute to partition the set. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 8 / 19
Parallelization Approach Remote Index Information • Communication between different distributions of the index set is possible, e.g. • Data agglomeration onto fewer processes. • Data redistribution for load balancing. • For each process one needs to store all global indices, which are stored on that process, too, together with the corresponding attribute. • The remote index information can either be setup by hand (better efficiency) • or computed automatically using global communication. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 9 / 19
Parallelization Approach Communication Interface • Contains information on a specific communication scheme. • Target and source partition of the index is chosen using attribute flags, e.g from ghost to owner and ghost . • Still independent of the data to be communicated. • For each process a list of corresponding local indices at the source and target index set is stored. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 10 / 19
Parallelization Approach Communication • Communication occurs according to the setup interfaces. • Communication is possible in both directions (from source to target and vice versa). • Data associated to indices can either • be of the same size for each index, • or of different size for each index. • Data can be manipulated either at the source or at the target (customizable by user) M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 11 / 19
Porting to BG/P Porting to BG/P DUNE 1 Parallelization Approach 2 Porting to BG/P 3 Scalability 4 M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 12 / 19
Porting to BG/P Porting, a piece of cake? Naive Assumptions • Dune uses the autotools-toolchain together with a custom script for managing the module dependencies. • Autotools support cross compilation. • Configure test that need to run MPI programs can be switched off. • DUNE uses standard C++ (but advanced template stuff). • This should be really easy! Worked on other LINUX clusters, too! The real HPC World • XLC lacks support for some standard template code (e.g. partial template specialization). • Libtool gets confused somehow and tries to link shared libraries statically. • Bottleneck ( O ( P )) in communication setup becomes apparent. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 13 / 19
Porting to BG/P Porting, a piece of cake? Naive Assumptions • Dune uses the autotools-toolchain together with a custom script for managing the module dependencies. • Autotools support cross compilation. • Configure test that need to run MPI programs can be switched off. • DUNE uses standard C++ (but advanced template stuff). • This should be really easy! Worked on other LINUX clusters, too! The real HPC World • XLC lacks support for some standard template code (e.g. partial template specialization). • Libtool gets confused somehow and tries to link shared libraries statically. • Bottleneck ( O ( P )) in communication setup becomes apparent. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 13 / 19
Porting to BG/P Problem Resolutions Missing template support in XLC • Thank goodness, GNU C++ compiler is also available! Libtool problem • Use special option for Darwin (-dynamic). • Thanks to Bernd Mohr (JSC) and Frank Ingram (IBM). O ( P ) bottleneck • At the time programming we were not thinking > 512 processors. • Fortunately we use a structured tensor product grid for our simulation. • Therefore we do not need send all indices in a ring!! • Switched to asynchronous communication with just the neighboring processors. Now O (3 d ) for dimension d. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 14 / 19
Porting to BG/P Problem Resolutions Missing template support in XLC • Thank goodness, GNU C++ compiler is also available! Libtool problem • Use special option for Darwin (-dynamic). • Thanks to Bernd Mohr (JSC) and Frank Ingram (IBM). O ( P ) bottleneck • At the time programming we were not thinking > 512 processors. • Fortunately we use a structured tensor product grid for our simulation. • Therefore we do not need send all indices in a ring!! • Switched to asynchronous communication with just the neighboring processors. Now O (3 d ) for dimension d. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 14 / 19
Porting to BG/P Problem Resolutions Missing template support in XLC • Thank goodness, GNU C++ compiler is also available! Libtool problem • Use special option for Darwin (-dynamic). • Thanks to Bernd Mohr (JSC) and Frank Ingram (IBM). O ( P ) bottleneck • At the time programming we were not thinking > 512 processors. • Fortunately we use a structured tensor product grid for our simulation. • Therefore we do not need send all indices in a ring!! • Switched to asynchronous communication with just the neighboring processors. Now O (3 d ) for dimension d. M. Blatt (IWR, Heidelberg) DUNE on BG/P ScicomP15, May 21, 2009 14 / 19
Recommend
More recommend