performance of hpc middleware over infiniband wan
play

Performance of HPC Middleware over Infiniband WAN Designing - PowerPoint PPT Presentation

Performance of HPC Middleware over Infiniband WAN Designing Efficient FTP Mechanisms for High Performance Data Transfer over Infiniband High Performance Data Transfer in Grid Environment Using GridFTP over Infiniband Presented by: Ashish


  1. Performance of HPC Middleware over Infiniband WAN Designing Efficient FTP Mechanisms for High Performance Data –Transfer over Infiniband High Performance Data Transfer in Grid Environment Using GridFTP over Infiniband Presented by: Ashish Kumar Singh

  2. Performance of HPC Middleware over Infiniband WAN S. Narravula, H. Subramoni, P. Lai, R. Noronha and D.K. Panda

  3. Motivation • Multi-Cluster needs of organizations • Advent of long haul Infiniband (IB WAN) – Infiniband range extenders like Intel Connects and Obsidian Longbows • IB applications and libraries like, MPI, NFS over RDMA, etc. developed for Intra-cluster environments

  4. Contributions • Analyzes the general communication performance of HPC middleware • Proposes basic design optimizations for enhancing communication performance over WAN • Demonstrates the potential benefits obtained by enhancing internal protocols of middleware

  5. IB Range Extension • Obsidian Longbows provide range extension for Infiniband fabrics over 10 Gigabits/s WAN

  6. Verbs-level Performance (UD) • UD does not involve any acknowledgements from the remote side • UD is scalable with higher delays • Higher level protocols need to take care of reliability and flow control mechanisms

  7. Verbs-level Performance (RC) • RC guarantees in-order delivery by ACKs and NACKs, which limits the number of messages that can be in flight to a maximum supported window size • Fewer large messages can fill the pipeline and so large messages are less effected

  8. IPoIB Performance (UD) • TCP needs larger window sizes to achieve good bandwidth • More streams – more UD packets with independent flow control, so more outstanding packets that can be pushed out from source at any given time frame

  9. IPoIB Performance (RC) • Advantage of RC transport mode over IPoIB is that RC can handle larger packet sizes. Larger packet sizes can achieve better bandwidth and per byte TCP processing decreases

  10. MPI-level Performance (Delay) • Trends similar to basic verbs-level evaluation

  11. MPI-level Performance (Tuning) • Protocol choice changes for medium sized messages in high delay scenario • Rendezvous protocol involves an additional message exchange

  12. MPI-level Performance (Streams) a) 100 us delay b) 1 ms delay c) 10 ms delay • For small messages, messaging rate increases proportionally with number of communicating streams • For higher delay networks, additional parallel streams are better for overall network bandwidth utilization

  13. MPI-level Performance (Collective) a) 10 us delay b) 100 us delay c) 1000 us delay • Simple optimized broadcast that performs the bcast operation hierarchially over the two connected clusters, minimizing the traffic on the WAN • For small messages, as the WAN link is able to handle all the traffic, the congestion is very minor

  14. Conclusions • Applications usually absorb smaller network delay fairly well • Many protocols get severely impacted in high delay scenarios • Protocols can be optimized for high delay scenarios to improve the performance • With long-haul IB WAN technology cluster-of- clusters architecture for HPC systems is feasible

  15. Designing Efficient FTP Mechanisms for High Performance Data – Transfer over Infiniband P. Lai, H. Subramoni, S. Narravula, A. Mamidala and D.K. Panda

  16. Motivation • FTP - most popular method to transfer bulk data • Typically used in applications like data staging, content replication and remote site backup • Advent of long haul Infiniband (IB WAN) made cluster-of-cluster architecture possible • IPoIB and SDP lose significant native performance

  17. Possible Approaches • Existing sockets based FTP through intermediate drivers (#1, #2 and #3). IPoIB and SDP are the popular schemes for this choice. • #4, new FTP mechanism using the Native IB features.

  18. Performance of Communication Protocols • Native IB verbs achieve much higher bandwidth as compared to other protocols. • Performance for FTP, e.g., GridFTP, using IPoIB and SDP is even more worse.

  19. Contributions • Design an Advanced Data Transfer Service (ADTS) that leverages zero-copy capabilities • Leverage ADTS to design a high performance zero-copy FTP library • Provide a robust and inter-operable mechanism to support zero-copy capable clients and the traditional TCP/UDP clients • Performance study

  20. FTP-ADTS Architecture • Clients may be capable of performing zero-copy data transfer or only support the TCP/UDP based communication. • Once the transport protocol is negotiated , Data Connection Management component initiates a connection.

  21. Design of Zero-Copy Channel • Memory Semantics using RDMA vs. Channel semantics using Send-Recv • Drawbacks of Memory Semantics: – Pre-allocation, registration and communication of target RDMA buffers – Explicit flow control – Notification of completion – Latency benefits for small messages is marred by high network delay

  22. Design of Zero-Copy Channel • Advantages of Send-Recv Semantics: – Identical zero-copy benefits – Simpler flow control, with use of SRQ – Sender is not throttled down due to lack of buffers on remote node – Both RC and UD transports available

  23. Design Enhancements • Buffer/File Management component keeps a small set of pre-allocated and registered buffers • Memory Registration Cache and Persistent Sessions • Pipelined Data Transfers • Prefork Server to handle bursts of requests

  24. Performance • Site Replication over IB WAN using FTP. • FTP-ADTS speeds up data transfer by up to 65%. • Much lesser CPU utilization.

  25. Conclusions • Existing TCP or UDP or SCTP based FTP implementations are not suitable for WAN capable interconnects like IB WAN • FTP-ADTS efficiently transfers data by leveraging zero-copy operations of modern interconnects

  26. High Performance Data Transfer in Grid Environment Using GridFTP over Infiniband H. Subramoni, P. Lai, R. Kettimuthu and D.K. Panda

  27. Overview • GridFTP is a high-performance, secure, reliable extension of the standard FTP optimized for WAN • Globus XIO framework, used to design GridFTP, offers easy-to-use interface • The framework hides the complications of communication semantics of underlying devices (network or disk)

  28. Contribution • Combining the ease of use of Globus XIO framework and the high performance achieved through IB • Enhancing the disk I/O performance of the existing ADTS library – By decoupling the network processing from disk I/O operations • Evaluation of the design – micro-benchmark level – applications like Community Climate System Model and ultra scale visualization

  29. Design Issues • Most HPC applications require movement of huge amount of data – Needs slower hard disks and RAIDs for storage – With low bandwidth provided by TCP/UDP based FTP, this was not an issue – Will be an issue for Globus ADTS XIO • Solution – decoupling of network from disk I/O

  30. Design Changes in ADTS • Introduction of : • multiple threads (read, write and network thread) • set of buffers to stage the data • Read thread prefetches a set of locations from the disk and keeps it ready for the network thread to send over the physical link • How to avoid frequent context switches? • Low and High Water Marks, High water mark is set to max size of circular buf • Read only available buffers less than low-water mark

  31. Application Level Improvements

Recommend


More recommend