me method odol olog ogy f for t or the r rapid d develop
play

Me Method odol olog ogy f for t or the R Rapid D Develop - PowerPoint PPT Presentation

Me Method odol olog ogy f for t or the R Rapid D Develop opme ment of of Sc Scalable H HPC D C Data Se Service ces Matthieu Dorier, Philip Carns, Kevin Harms, Robert Latham, Robert Ross, Shane Snyder, Justin Wozniak, Samuel K.


  1. Me Method odol olog ogy f for t or the R Rapid D Develop opme ment of of Sc Scalable H HPC D C Data Se Service ces Matthieu Dorier, Philip Carns, Kevin Harms, Robert Latham, Robert Ross, Shane Snyder, Justin Wozniak, Samuel K. Gituérrez, Bob Robey, Brad Settlemyer, Galen Shipman, Jerome Soumagne, James Kowalkowski, Marc Paterno, Saba Sehrish PDSW-DISCS 2018 Dallas, TX 1

  2. – Ne New Application ons and Systems: De Demand for or Ne New Services “pillars” Simulation Data Learning 2 Top image credit B. Helland (ASCR). Bottom left and right images credit ALCF. Bottom center image credit OLCF.

  3. – Ne New Application ons and Systems: De Demand for or Ne New Services “pillars” Simulation Data Learning • Different application use cases have different data needs • “One size fits all” doesn’t work: need customized data services for each to meet mission goals • This poses a significant technical challenge: how to enable rapid development of such services ( agility ) while still preserving performance ( efficiency ) and production quality ( maintainability ) Key idea: address this challenge via composable data services. 3 Top image credit B. Helland (ASCR). Bottom left and right images credit ALCF. Bottom center image credit OLCF.

  4. Towards reusable components for data services Hand-crafted Parallel Data Services File Systems Advantages Well established • Standard interface • Drawbacks Single consistency model • Complex to maintain and tune • Files often inappropriate • 4

  5. Towards reusable components for data services Hand-crafted Parallel Data Services File Systems Advantages Advantages Well established Tuned for this application • • Standard interface Appropriate consistency model • • Drawbacks Appropriate data model • Single consistency model Drawbacks • Complex to maintain and tune Difficult to maintain • • Files often inappropriate Not reusable • • Scare users • 5

  6. Towards reusable components for data services Hand-crafted Parallel Data Services File Systems Advantages Composable micro-services Advantages Well established Tuned for this application • • Standard interface Appropriate consistency model • • Reusable across services • Drawbacks Appropriate data model • Easy to maintain • Single consistency model Drawbacks • Not so scary to users • Complex to maintain and tune Difficult to maintain • • Adaptable, configurable • Files often inappropriate Not reusable • • Can use latest tech • Scare users • 6

  7. Common capabilities need by data services Runtime substrate ● RPC, RDMA ○ Threading/Tasking ○ Core components ● Bulk storage management ○ Key/Value storage ○ Group membership ○ Diagnostics and monitoring ○ Programmability/Expressiveness ● Embedded interpreters ○ Wrappers (Python, C++, etc.) ○ 7

  8. Common capabilities need by data services Runtime substrate ● RPC, RDMA ○ Threading/Tasking ○ Core components ● Bulk storage management ○ Key/Value storage ○ Group membership ○ Diagnostics and monitoring ○ Programmability/Expressiveness ● Embedded interpreters ○ Wrappers (Python, C++, etc.) ○ 8

  9. Common capabilities need by data services Runtime substrate ● RPC, RDMA ○ Threading/Tasking ○ Core components ● Bulk storage management ○ Key/Value storage ○ Group membership ○ Diagnostics and monitoring ○ Programmability/Expressiveness ● Embedded interpreters ○ Wrappers (Python, C++, etc.) ○ 9

  10. Common capabilities need by data services Runtime substrate ● RPC, RDMA ○ Threading/Tasking ○ Core components ● Composed services ● Bulk storage management ○ FlameStore ○ Key/Value storage ○ HEPnOS ○ Group membership ○ SDSDKV ○ Diagnostics and monitoring ○ Programmability/Expressiveness ● Embedded interpreters ○ Wrappers (Python, C++, etc.) ○ 10

  11. Challenges in Composing HPC Microservices ● Formalize composition ● Unify single-process, multi- process, single-node, and multi- node designs ● Maximize efficient use of resources (network, storage) 11

  12. Cloud HPC http://www.mcs.anl.gov/research/projects/mochi/ Computing Fast Transports Object Stores Scientific Data Vision Key-Value Stores User-level Threads Lowering the barriers to distributed services in computational science. Approach Distributed Mochi Autonomics Familiar models (key/value, object, file) ● Computing Dist. Control Easy to build, adapt, and deploy ● Group Lightweight, user-space components Adaptability ● Membership Modern hardware support ● Communication Impact Software Better, more capable services for specific use ● Engineering cases on high-end platforms Composability Significant code reuse ● Ecosystem for service development ●

  13. Let’s dive into the methodology 13

  14. Matching building blocks to user requirements Composition User Service Building and Requirements Requirements Blocks Interfacing Data model Data organization Composition glue code Runtime Access pattern Metadata organization API implementation Service providers Guaranties User interface 14

  15. Identifying application needs User Requirements • Which data model? • Arrays, meshes, objects • Namespace, metadata • Which access pattern? • Characteristics (e.g. access sizes) • Collective/individual accesses • Which guarantees? • Consistency • Performance • Persistence 15

  16. Identifying application needs User Requirements • Which data model? • Arrays, meshes, objects • Namespace, metadata • Which access pattern? • Characteristics (e.g. access sizes) • Collective/individual accesses • Which guarantees? • Consistency • Performance • Persistence 16

  17. Identifying application needs User Requirements • Which data model? • Arrays, meshes, objects • Namespace, metadata • Which access pattern? • Characteristics (e.g. access sizes) • Collective/individual accesses • Which guarantees? • Consistency • Performance • Persistence 17

  18. Defining service requirements Service Requirements • Which data model? • Arrays, meshes, objects • How should data be organized? • Namespace, metadata • Sharding, distribution, replication • Which access pattern? • How should metadata be organized? • Characteristics (e.g. access sizes) • Collective/individual accesses • Distribution, content, indexing • Which guarantees? • How do clients interface with the service? • Consistency • Programming language, API • Performance • Persistence 18

  19. What do components look like? 19

  20. Components: engineering challenges Building Blocks • How do components share resource (CPU, network, memory) without interfering with one another? • Bad approach: each component has its own progress loop • How do we leverage massively multi-core nodes to, for instance, assign components to cores, make components efficiently share a core, prevent components from interfering with network progress?… • Bad approach: each component manages its own thread(s) • How can we support a wide range of networks? • Bad approach: reimplement for new transport every time the code is ported to a new platform 20

  21. Components: engineering challenges Building Blocks • How do components share resource (CPU, network, memory) without interfering with one another? • Bad approach: each component has its own progress loop • How do we leverage massively multi-core nodes to, for instance, assign components to cores, make components efficiently share a core, prevent components from interfering with network progress?… • Bad approach: each component manages its own thread(s) • How can we support a wide range of networks? • Bad approach: reimplement for new transport every time the code is ported to a new platform 21

  22. Components: engineering challenges Building Blocks • How do components share resource (CPU, network, memory) without interfering with one another? • Bad approach: each component has its own progress loop • How do we leverage massively multi-core nodes to, for instance, assign components to cores, make components efficiently share a core, prevent components from interfering with network progress?… • Bad approach: each component manages its own thread(s) • How can we support a wide range of networks? • Bad approach: reimplement for new transport every time the code is ported to a new platform 22

  23. Anatomy of a Mochi component Building Blocks 23

  24. Anatomy of a Mochi component Building Blocks 24

  25. Anatomy of a Mochi component Building Blocks 25

  26. Anatomy of a Mochi component Building Blocks 26

  27. Anatomy of a Mochi component Building Blocks 27

  28. Anatomy of a Mochi component Building Blocks 28

  29. Anatomy of a Mochi component Building Blocks 29

  30. Anatomy of a Mochi component Building Blocks 30

Recommend


More recommend