robust communication for jungle computing
play

Robust Communication for Jungle Computing Jason Maassen Computer - PowerPoint PPT Presentation

Robust Communication for Jungle Computing Jason Maassen Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands Requirements (revisited) Resource independence Transparent / easy deployment


  1. Robust Communication for Jungle Computing Jason Maassen Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands

  2. Requirements (revisited) ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● System-support for malleability and fault-tolerance ● Globally unique naming ● Transparent parallelism & application-level fault-tolerance ● Easy integration with external software ● MPI, OpenCL , CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 2

  3. Requirements (revisited) ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● System-support for malleability and fault-tolerance ● Globally unique naming ● Transparent parallelism & application-level fault-tolerance ● Easy integration with external software ● MPI, OpenCL , CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 3

  4. Low-level problems ● Many sites have connectivity issues ● Firewalls ● Network Address Translation (NAT) ● Non-routed networks ● Multi homing ● Mis-configured machines ● ... ● This makes it very hard to use a combination of machines! ComplexHPC Spring School 2011 4

  5. High-level problems ● We need more advanced features: ● Malleability: machines come and go during the application lifetime ● Fault Tolerance: machines may crash at any time ● Robust and globally unique naming ● Flexible communication primitives ● Multicast or many-to-one communication ● Efficient serialization of complex data structures ● Need to be robust! ComplexHPC Spring School 2011 5

  6. Existing libraries ● Sockets is too low-level for daily use ● Only point-to-point ● No resource management ● MPI is too inflexible ● Focus on SPMD model ● Little/no support for malleability or fault tolerance ● Neither can handle firewalls/NAT/etc. ComplexHPC Spring School 2011 6

  7. Ibis ● Ibis offers “ Jungle proof ” communication: ● SmartSockets ● Sockets library (on top of regular TCP/IP) ● Solves low-level connectivity problems ● Ibis Portability Layer (IPL) ● “MPI for Jungle computing” ● Offers high-level communication primitives ComplexHPC Spring School 2011 7

  8. Where are we ? ComplexHPC Spring School 2011 8

  9. SmartSockets What problems does it solve ? ● Unreachable machines: ● Behind firewall / NAT or on private network ● Machine identification: ● Machines have multiple IPs ● Multiple machines have the same (private) IP ComplexHPC Spring School 2011 9

  10. Problem 1: Firewalls ● Blocks 'inappropriate' connections ● Usually only blocks incoming connections ● Some also block outgoing connection ComplexHPC Spring School 2011 10

  11. Problem 2: Network Address Translation ● Allows multiple machines to share an IP address ComplexHPC Spring School 2011 11

  12. Problem 2: Network Address Translation ComplexHPC Spring School 2011 12

  13. Problem 2: Network Address Translation ComplexHPC Spring School 2011 13

  14. Problem 2: Network Address Translation ComplexHPC Spring School 2011 14

  15. Problem 2: Network Address Translation ComplexHPC Spring School 2011 15

  16. Problem 3: Multi Homing ● Some sites have multiple networks ● The target address depends on the source of the connection ComplexHPC Spring School 2011 16

  17. Problem 4: Non-routed Networks ● No route between local network and internet ● Only the frontend is reachable ComplexHPC Spring School 2011 17

  18. Problem 5: Machine Identification ● Private IPs (NAT/non-routed) lead to machine identification problems ComplexHPC Spring School 2011 18

  19. SmartSockets Solutions ● The SmartSockets library ● Detects connectivity problems ● Tries to solve them automatically using: ● Smart Addressing ● Side channel ● ... and various tricks: ● SSH Tunneling (pass through firewalls) ● STUN (detect external IP of NAT) ● UPnP (automatic port forwarding) ● ... ComplexHPC Spring School 2011 19

  20. SmartSockets Library ● Integrates existing and new solutions into one library ● With as little help from the user as possible ● Mostly transparent to user! ● Offers a socket-like interface ● Addressing is different ComplexHPC Spring School 2011 20

  21. Smart Addressing ● Instead of using a single IP:port combination for each machine we use: ● All machine addresses ● Add extra information ● External address + port for NAT (STUN, UPnP) ● SSH contact information ● UUID (if entire address is private) ● … ComplexHPC Spring School 2011 21

  22. Addressing Examples ComplexHPC Spring School 2011 22

  23. Creating a Connection ComplexHPC Spring School 2011 23

  24. Using Smart Addresses ● This solves machine identification problems ● All addresses are known with multi-homing ● Each identity is unique with private IPs ● The identity is always checked ● Assumes anyone can create a connection ● This will not help when target is behind NAT/Firewall ● To solve this we need a side channel ComplexHPC Spring School 2011 24

  25. Side channel ● Overlay network implemented using a set of hubs ● Support processes for the application ● Started in advance ● Hubs are run on machines with 'more connectivity' ● Such as cluster frontends, 'open' machines, etc. ● How / where you start them is a separate problem ● Solved by IbisDeploy ComplexHPC Spring School 2011 25

  26. Hubs ● Similar to a peer-to-peer overlay network ● Hubs connect to each other ● Gossip information about other hubs ● Automatically discover new hubs and routes ● Need to set up spanning tree (or better) ● Use direct connections and SSH tunnels ● Clients connect to a 'local' hub ● Use as side channel for connection setup ComplexHPC Spring School 2011 26

  27. Hub Overlay Network ComplexHPC Spring School 2011 27

  28. Advanced Connection Setup ComplexHPC Spring School 2011 28

  29. Advanced Connection Setup ● Reverse direction of connection setup ● Send message to target using hub and wait for incoming connection ● Results in direct connection ● Route via overlay ● Create virtual connection using hubs ● Forward all data over side channel ● Results in indirect connection ComplexHPC Spring School 2011 29

  30. SmartSockets All problems solved ● Unreachable machines: ● SSH tunnels ● Reverse connection setup ● Routing over hubs ● Machine identification: ● Smart addressing ● Identity check at connection setup ComplexHPC Spring School 2011 30

  31. ComplexHPC Spring School 2011 31

  32. Hub Network ComplexHPC Spring School 2011 32

  33. Evaluation ● Regular TCP/IP only worked in 6 out of 30 ● SmartSockets worked in 30 out of 30! ComplexHPC Spring School 2011 33

  34. Evaluation ComplexHPC Spring School 2011 34

  35. Summary ● In Jungle computing communication is hard! ● Many connectivity problems occur ● Takes a lot of work to find the problems and work around them ● SmartSockets reduces this to a single problem: ● How to set up a spanning tree of hubs ● The rest is done automatically! ComplexHPC Spring School 2011 35

  36. However …. ● Sockets is too low level for daily use ● For Jungle computing we need support for ● Malleability ● Fault Tolerance ● Robust and globally unique naming ● Flexible communication primitives ● Provided by the Ibis Portability Layer (IPL) ComplexHPC Spring School 2011 36

  37. Ibis Portability Layer (IPL) ● Simple API for Jungle Communication ● Flexible communication model ● Connection oriented messaging ● Abstract addressing scheme ● Resource tracking ● Notifications when machines join/leave/crash ● Efficient serialization ● Send bytes, doubles, objects, etc. ● Portable: ● SmartSockets, TCP, UDP, MPI, MX, BlueTooth ,… ComplexHPC Spring School 2011 37

  38. Communication Model ● Simple communication model ● Unidirectional pipes ● Two end points (send and receive ports) send receive port port ● Connection oriented ● Allows streaming (good with high latency) ● Portable model ● Easy to implement on Sockets/MPI/MX/… ComplexHPC Spring School 2011 38

  39. Communication Model ● Flexible model! ComplexHPC Spring School 2011 39

  40. Port Types ● All ports have a type ● Defined at runtime ● Specify set of capabilities ● Types must match when connecting! X √ ComplexHPC Spring School 2011 40

  41. Port Types ● Consists of a set of capabilities: ● Connection patterns ● Unicast, many-to-one, one-to-many, many-to-many. ● Communication properties: ● Fifo ordering, numbering, reliability. ● Serialization properties: ● Bytes, primitive types, objects ● Message delivery: ● Explicit receipt, automatic upcalls, polling ComplexHPC Spring School 2011 41

  42. Port Types ● Forces programmer to specify how each communication channel is used ● Prevents bugs ● Exception when contract is breached ● Allows efficient implementation to be selected ● Unicast only ? ● Transfer bytes only ? ● Can save a lot complexity! ComplexHPC Spring School 2011 42

Recommend


More recommend