burst buffer simulation in dragonfly network
play

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael - PowerPoint PPT Presentation

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of Technology, Los Alamos National Laboratory Purpose: Residing in the compute node network, and using solid state drives (SSD), burst buffers bring a


  1. Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of Technology, Los Alamos National Laboratory

  2. Purpose: • Residing in the compute node network, and using solid state drives (SSD), burst buffers bring a significant I/O performance boost compared with traditional external hard disk drive (HDD) storage system. • bottleneck of fully leveraging burst buffers still remains unknown. • Sharing network • Limited burst buffer number • Due to large system scales, it is usually too expensive to change system setup and configurations.

  3. Trinity System Overview : • System Level • Cabinet Level • Chassis Level

  4. Inside a Group:

  5. Trinity Phase II Network Configuration: • Global link: • 37.6 GB/s • Local link: • Intra-chassis, 5.25GB/s • Inter-chassis, 15.75GB/s (3 tiles). • Intra-Blade link: • PCIE 3.0, 16GB/s *All bi-directional

  6. General: • All-to-all pattern interconnections among routers. • Optical Cable Link among groups. In Trinity, each group link is made with 2 cables. Each cable provides 4.7GB/s bandwidth. • On the cabinet level, there are 2 cabinets in each group, which are connected by backplane electrical links. Each cabinet contains 3 chassis. The bandwidth of each inter-chassis link is 15.75GB/s. • Inside a chassis, the bandwidth of link between each router is 5.25GB/s.

  7. Connections-Router • All nodes connected by routers • 10 inter-group ports • 15 inter-chassis ports • 15 inter-blade ports

  8. Connections-Chassis • 16 blades • 40 connectors to other groups • 5 connectors to other chassis per blade • Backplane connections among blades • PCIE -3 x 16 between a node and blade

  9. Connections-Inter-group • Connection between 2 group ports of 2 routers in 2 groups. • One link between each group • Use Absolute(Direct) pattern.

  10. Datawarp • Burst buffers are implemented as Datawarp nodes in Cray XC40.

  11. Simulation Detail • 96 routers • 10 Burst Buffer nodes • 2 LNET nodes • Final phase 224 LNET nodes • 360 Compute Nodes • 384 Nodes in total, 372 in simulation. • Trinity Phase II, 23 Groups • 230 Burst Buffers • 8280 Compute Nodes • Adaptive routing

  12. Simulation Framework • Application Layer • IOR workload • Darshan 3.1 workload • Model-net Layer • Burst Buffer process • Codes-0.5.2

  13. Results • N-N Write • 4 procs per node • 8MB stripe size on BB • <1024: 32GB per proc • >=1024: 2GB per proc

  14. Results • N-1 Write

  15. Results • N-N Read

  16. Problems • Darshan traces of applications. • Mostly checkpointing at LANL • Lustre is fast enough. • Must use Datawarp APIs. • Datawarp software is still updating. • Currently modeling Trinity Phase II. Final phase is undergoing: • 576 Burst Buffer Nodes • ~20,000 Compute nodes • More modeling details need to be confirmed. • READ bug: • Simulation ends when compute nodes scale up to 2048 in N-N Read. • Ends sooner in N-1 Read simulation. • Seems to be messages are already freed when a reverse event is received.

More recommend