Me Method odol olog ogy f for t or the R Rapid D Develop opme ment of of Sc Scalable H HPC D C Data Se Service ces Matthieu Dorier, Philip Carns, Kevin Harms, Robert Latham, Robert Ross, Shane Snyder, Justin Wozniak, Samuel K. Gituérrez, Bob Robey, Brad Settlemyer, Galen Shipman, Jerome Soumagne, James Kowalkowski, Marc Paterno, Saba Sehrish PDSW-DISCS 2018 Dallas, TX 1
– Ne New Application ons and Systems: De Demand for or Ne New Services “pillars” Simulation Data Learning 2 Top image credit B. Helland (ASCR). Bottom left and right images credit ALCF. Bottom center image credit OLCF.
– Ne New Application ons and Systems: De Demand for or Ne New Services “pillars” Simulation Data Learning • Different application use cases have different data needs • “One size fits all” doesn’t work: need customized data services for each to meet mission goals • This poses a significant technical challenge: how to enable rapid development of such services ( agility ) while still preserving performance ( efficiency ) and production quality ( maintainability ) Key idea: address this challenge via composable data services. 3 Top image credit B. Helland (ASCR). Bottom left and right images credit ALCF. Bottom center image credit OLCF.
Towards reusable components for data services Hand-crafted Parallel Data Services File Systems Advantages Well established • Standard interface • Drawbacks Single consistency model • Complex to maintain and tune • Files often inappropriate • 4
Towards reusable components for data services Hand-crafted Parallel Data Services File Systems Advantages Advantages Well established Tuned for this application • • Standard interface Appropriate consistency model • • Drawbacks Appropriate data model • Single consistency model Drawbacks • Complex to maintain and tune Difficult to maintain • • Files often inappropriate Not reusable • • Scare users • 5
Towards reusable components for data services Hand-crafted Parallel Data Services File Systems Advantages Composable micro-services Advantages Well established Tuned for this application • • Standard interface Appropriate consistency model • • Reusable across services • Drawbacks Appropriate data model • Easy to maintain • Single consistency model Drawbacks • Not so scary to users • Complex to maintain and tune Difficult to maintain • • Adaptable, configurable • Files often inappropriate Not reusable • • Can use latest tech • Scare users • 6
Common capabilities need by data services Runtime substrate ● RPC, RDMA ○ Threading/Tasking ○ Core components ● Bulk storage management ○ Key/Value storage ○ Group membership ○ Diagnostics and monitoring ○ Programmability/Expressiveness ● Embedded interpreters ○ Wrappers (Python, C++, etc.) ○ 7
Common capabilities need by data services Runtime substrate ● RPC, RDMA ○ Threading/Tasking ○ Core components ● Bulk storage management ○ Key/Value storage ○ Group membership ○ Diagnostics and monitoring ○ Programmability/Expressiveness ● Embedded interpreters ○ Wrappers (Python, C++, etc.) ○ 8
Common capabilities need by data services Runtime substrate ● RPC, RDMA ○ Threading/Tasking ○ Core components ● Bulk storage management ○ Key/Value storage ○ Group membership ○ Diagnostics and monitoring ○ Programmability/Expressiveness ● Embedded interpreters ○ Wrappers (Python, C++, etc.) ○ 9
Common capabilities need by data services Runtime substrate ● RPC, RDMA ○ Threading/Tasking ○ Core components ● Composed services ● Bulk storage management ○ FlameStore ○ Key/Value storage ○ HEPnOS ○ Group membership ○ SDSDKV ○ Diagnostics and monitoring ○ Programmability/Expressiveness ● Embedded interpreters ○ Wrappers (Python, C++, etc.) ○ 10
Challenges in Composing HPC Microservices ● Formalize composition ● Unify single-process, multi- process, single-node, and multi- node designs ● Maximize efficient use of resources (network, storage) 11
Cloud HPC http://www.mcs.anl.gov/research/projects/mochi/ Computing Fast Transports Object Stores Scientific Data Vision Key-Value Stores User-level Threads Lowering the barriers to distributed services in computational science. Approach Distributed Mochi Autonomics Familiar models (key/value, object, file) ● Computing Dist. Control Easy to build, adapt, and deploy ● Group Lightweight, user-space components Adaptability ● Membership Modern hardware support ● Communication Impact Software Better, more capable services for specific use ● Engineering cases on high-end platforms Composability Significant code reuse ● Ecosystem for service development ●
Let’s dive into the methodology 13
Matching building blocks to user requirements Composition User Service Building and Requirements Requirements Blocks Interfacing Data model Data organization Composition glue code Runtime Access pattern Metadata organization API implementation Service providers Guaranties User interface 14
Identifying application needs User Requirements • Which data model? • Arrays, meshes, objects • Namespace, metadata • Which access pattern? • Characteristics (e.g. access sizes) • Collective/individual accesses • Which guarantees? • Consistency • Performance • Persistence 15
Identifying application needs User Requirements • Which data model? • Arrays, meshes, objects • Namespace, metadata • Which access pattern? • Characteristics (e.g. access sizes) • Collective/individual accesses • Which guarantees? • Consistency • Performance • Persistence 16
Identifying application needs User Requirements • Which data model? • Arrays, meshes, objects • Namespace, metadata • Which access pattern? • Characteristics (e.g. access sizes) • Collective/individual accesses • Which guarantees? • Consistency • Performance • Persistence 17
Defining service requirements Service Requirements • Which data model? • Arrays, meshes, objects • How should data be organized? • Namespace, metadata • Sharding, distribution, replication • Which access pattern? • How should metadata be organized? • Characteristics (e.g. access sizes) • Collective/individual accesses • Distribution, content, indexing • Which guarantees? • How do clients interface with the service? • Consistency • Programming language, API • Performance • Persistence 18
What do components look like? 19
Components: engineering challenges Building Blocks • How do components share resource (CPU, network, memory) without interfering with one another? • Bad approach: each component has its own progress loop • How do we leverage massively multi-core nodes to, for instance, assign components to cores, make components efficiently share a core, prevent components from interfering with network progress?… • Bad approach: each component manages its own thread(s) • How can we support a wide range of networks? • Bad approach: reimplement for new transport every time the code is ported to a new platform 20
Components: engineering challenges Building Blocks • How do components share resource (CPU, network, memory) without interfering with one another? • Bad approach: each component has its own progress loop • How do we leverage massively multi-core nodes to, for instance, assign components to cores, make components efficiently share a core, prevent components from interfering with network progress?… • Bad approach: each component manages its own thread(s) • How can we support a wide range of networks? • Bad approach: reimplement for new transport every time the code is ported to a new platform 21
Components: engineering challenges Building Blocks • How do components share resource (CPU, network, memory) without interfering with one another? • Bad approach: each component has its own progress loop • How do we leverage massively multi-core nodes to, for instance, assign components to cores, make components efficiently share a core, prevent components from interfering with network progress?… • Bad approach: each component manages its own thread(s) • How can we support a wide range of networks? • Bad approach: reimplement for new transport every time the code is ported to a new platform 22
Anatomy of a Mochi component Building Blocks 23
Anatomy of a Mochi component Building Blocks 24
Anatomy of a Mochi component Building Blocks 25
Anatomy of a Mochi component Building Blocks 26
Anatomy of a Mochi component Building Blocks 27
Anatomy of a Mochi component Building Blocks 28
Anatomy of a Mochi component Building Blocks 29
Anatomy of a Mochi component Building Blocks 30
Recommend
More recommend