Robust Communication for Jungle Computing Jason Maassen Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands
Requirements (revisited) ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● System-support for malleability and fault-tolerance ● Globally unique naming ● Transparent parallelism & application-level fault-tolerance ● Easy integration with external software ● MPI, OpenCL , CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 2
Requirements (revisited) ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● System-support for malleability and fault-tolerance ● Globally unique naming ● Transparent parallelism & application-level fault-tolerance ● Easy integration with external software ● MPI, OpenCL , CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 3
Low-level problems ● Many sites have connectivity issues ● Firewalls ● Network Address Translation (NAT) ● Non-routed networks ● Multi homing ● Mis-configured machines ● ... ● This makes it very hard to use a combination of machines! ComplexHPC Spring School 2011 4
High-level problems ● We need more advanced features: ● Malleability: machines come and go during the application lifetime ● Fault Tolerance: machines may crash at any time ● Robust and globally unique naming ● Flexible communication primitives ● Multicast or many-to-one communication ● Efficient serialization of complex data structures ● Need to be robust! ComplexHPC Spring School 2011 5
Existing libraries ● Sockets is too low-level for daily use ● Only point-to-point ● No resource management ● MPI is too inflexible ● Focus on SPMD model ● Little/no support for malleability or fault tolerance ● Neither can handle firewalls/NAT/etc. ComplexHPC Spring School 2011 6
Ibis ● Ibis offers “ Jungle proof ” communication: ● SmartSockets ● Sockets library (on top of regular TCP/IP) ● Solves low-level connectivity problems ● Ibis Portability Layer (IPL) ● “MPI for Jungle computing” ● Offers high-level communication primitives ComplexHPC Spring School 2011 7
Where are we ? ComplexHPC Spring School 2011 8
SmartSockets What problems does it solve ? ● Unreachable machines: ● Behind firewall / NAT or on private network ● Machine identification: ● Machines have multiple IPs ● Multiple machines have the same (private) IP ComplexHPC Spring School 2011 9
Problem 1: Firewalls ● Blocks 'inappropriate' connections ● Usually only blocks incoming connections ● Some also block outgoing connection ComplexHPC Spring School 2011 10
Problem 2: Network Address Translation ● Allows multiple machines to share an IP address ComplexHPC Spring School 2011 11
Problem 2: Network Address Translation ComplexHPC Spring School 2011 12
Problem 2: Network Address Translation ComplexHPC Spring School 2011 13
Problem 2: Network Address Translation ComplexHPC Spring School 2011 14
Problem 2: Network Address Translation ComplexHPC Spring School 2011 15
Problem 3: Multi Homing ● Some sites have multiple networks ● The target address depends on the source of the connection ComplexHPC Spring School 2011 16
Problem 4: Non-routed Networks ● No route between local network and internet ● Only the frontend is reachable ComplexHPC Spring School 2011 17
Problem 5: Machine Identification ● Private IPs (NAT/non-routed) lead to machine identification problems ComplexHPC Spring School 2011 18
SmartSockets Solutions ● The SmartSockets library ● Detects connectivity problems ● Tries to solve them automatically using: ● Smart Addressing ● Side channel ● ... and various tricks: ● SSH Tunneling (pass through firewalls) ● STUN (detect external IP of NAT) ● UPnP (automatic port forwarding) ● ... ComplexHPC Spring School 2011 19
SmartSockets Library ● Integrates existing and new solutions into one library ● With as little help from the user as possible ● Mostly transparent to user! ● Offers a socket-like interface ● Addressing is different ComplexHPC Spring School 2011 20
Smart Addressing ● Instead of using a single IP:port combination for each machine we use: ● All machine addresses ● Add extra information ● External address + port for NAT (STUN, UPnP) ● SSH contact information ● UUID (if entire address is private) ● … ComplexHPC Spring School 2011 21
Addressing Examples ComplexHPC Spring School 2011 22
Creating a Connection ComplexHPC Spring School 2011 23
Using Smart Addresses ● This solves machine identification problems ● All addresses are known with multi-homing ● Each identity is unique with private IPs ● The identity is always checked ● Assumes anyone can create a connection ● This will not help when target is behind NAT/Firewall ● To solve this we need a side channel ComplexHPC Spring School 2011 24
Side channel ● Overlay network implemented using a set of hubs ● Support processes for the application ● Started in advance ● Hubs are run on machines with 'more connectivity' ● Such as cluster frontends, 'open' machines, etc. ● How / where you start them is a separate problem ● Solved by IbisDeploy ComplexHPC Spring School 2011 25
Hubs ● Similar to a peer-to-peer overlay network ● Hubs connect to each other ● Gossip information about other hubs ● Automatically discover new hubs and routes ● Need to set up spanning tree (or better) ● Use direct connections and SSH tunnels ● Clients connect to a 'local' hub ● Use as side channel for connection setup ComplexHPC Spring School 2011 26
Hub Overlay Network ComplexHPC Spring School 2011 27
Advanced Connection Setup ComplexHPC Spring School 2011 28
Advanced Connection Setup ● Reverse direction of connection setup ● Send message to target using hub and wait for incoming connection ● Results in direct connection ● Route via overlay ● Create virtual connection using hubs ● Forward all data over side channel ● Results in indirect connection ComplexHPC Spring School 2011 29
SmartSockets All problems solved ● Unreachable machines: ● SSH tunnels ● Reverse connection setup ● Routing over hubs ● Machine identification: ● Smart addressing ● Identity check at connection setup ComplexHPC Spring School 2011 30
ComplexHPC Spring School 2011 31
Hub Network ComplexHPC Spring School 2011 32
Evaluation ● Regular TCP/IP only worked in 6 out of 30 ● SmartSockets worked in 30 out of 30! ComplexHPC Spring School 2011 33
Evaluation ComplexHPC Spring School 2011 34
Summary ● In Jungle computing communication is hard! ● Many connectivity problems occur ● Takes a lot of work to find the problems and work around them ● SmartSockets reduces this to a single problem: ● How to set up a spanning tree of hubs ● The rest is done automatically! ComplexHPC Spring School 2011 35
However …. ● Sockets is too low level for daily use ● For Jungle computing we need support for ● Malleability ● Fault Tolerance ● Robust and globally unique naming ● Flexible communication primitives ● Provided by the Ibis Portability Layer (IPL) ComplexHPC Spring School 2011 36
Ibis Portability Layer (IPL) ● Simple API for Jungle Communication ● Flexible communication model ● Connection oriented messaging ● Abstract addressing scheme ● Resource tracking ● Notifications when machines join/leave/crash ● Efficient serialization ● Send bytes, doubles, objects, etc. ● Portable: ● SmartSockets, TCP, UDP, MPI, MX, BlueTooth ,… ComplexHPC Spring School 2011 37
Communication Model ● Simple communication model ● Unidirectional pipes ● Two end points (send and receive ports) send receive port port ● Connection oriented ● Allows streaming (good with high latency) ● Portable model ● Easy to implement on Sockets/MPI/MX/… ComplexHPC Spring School 2011 38
Communication Model ● Flexible model! ComplexHPC Spring School 2011 39
Port Types ● All ports have a type ● Defined at runtime ● Specify set of capabilities ● Types must match when connecting! X √ ComplexHPC Spring School 2011 40
Port Types ● Consists of a set of capabilities: ● Connection patterns ● Unicast, many-to-one, one-to-many, many-to-many. ● Communication properties: ● Fifo ordering, numbering, reliability. ● Serialization properties: ● Bytes, primitive types, objects ● Message delivery: ● Explicit receipt, automatic upcalls, polling ComplexHPC Spring School 2011 41
Port Types ● Forces programmer to specify how each communication channel is used ● Prevents bugs ● Exception when contract is breached ● Allows efficient implementation to be selected ● Unicast only ? ● Transfer bytes only ? ● Can save a lot complexity! ComplexHPC Spring School 2011 42
Recommend
More recommend