Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O’Shea (MSR Cambridge)
MapReduce Overview Input file Intermediate results Final results Map Task Reduce Task Chunk 0 Chunk 1 Map Task Reduce Task Chunk 2 Map Task Reduce Task • Map − Processes input data and generates (key, value) pairs • Shuffle − Distributes the intermediate pairs to the reduce tasks • Reduce − Aggregates all values associated to each key Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 2/52
Problem Input file Intermediate results Final results Map Task Reduce Task Split 0 Map Task Reduce Task Split 1 Split 2 Map Task Reduce Task • Shuffle phase is challenging for data center networks − All-to-all traffic pattern with O(N 2 ) flows − Led to proposals for full-bisection bandwidth Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 3/52
Data Reduction Final results Intermediate results Input file Map Task Reduce Task Split 0 Map Task Reduce Task Split 1 Split 2 Map Task Reduce Task • The final results are typically much smaller than the intermediate results • In most Facebook jobs the final size is 5.4 % of the intermediate size • In most Yahoo jobs the ratio is 8.2 % Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 4/52
Data Reduction Final results Intermediate results Input file Map Task Reduce Task Split 0 Map Task Reduce Task Split 1 Split 2 Map Task Reduce Task • The final results are typically much smaller than the intermediate results • In most Facebook jobs final size is 5.4 % of the How can we exploit this to reduce the traffic and intermediate size improve the performance of the shuffle phase? • In most Yahoo jobs the ratio is 8.2 % Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 5/52
Background: Combiners Input file Intermediate results Final results Map Task Reduce Task Split 0 Map Task Reduce Task Split 1 Split 2 Map Task Reduce Task • To reduce the data transferred in the shuffle, users can specify a combiner function − Aggregates the local intermediate pairs • Server-side only => limited aggregation Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 6/52
Background: Combiners Intermediate results Final results Input file Map Task Combiner Reduce Task Split 0 Split 1 Map Task Combiner Reduce Task Split 2 Reduce Task Map Task Combiner • To reduce the data transferred in the shuffle, users can specify a combiner function − Aggregates the local intermediate pairs • Server-side only => limited aggregation Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 7/52
Distributed Combiners • It has been proposed to use aggregation trees in MapReduce to perform multiple steps of combiners − e.g., rack-level aggregation [Yu et al., SOSP’ 09 ] Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 8/52
Logical and Physical Topology What happens when we map the tree to a typical data center topology? ToR Switch Logical topology Physical topology The server link is the bottleneck Full-bisection bandwidth does not help here Mismatch between physical and logical topology Two logical links are mapped onto the same physical link Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 9/52
Logical and Physical Topology Only 500 Mbps What happens when we map the tree per child to a typical data center topology? ToR Switch Logical topology Physical topology The server link is the bottleneck Full-bisection bandwidth does not help here Mismatch between physical and logical topology Two logical links are mapped onto the same physical link Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications
Logical and Physical Topology Only 500 Mbps What happens when we map the tree per child to a typical data center topology? ToR Switch Logical topology Physical topology Camdoop Goal The server link is the bottleneck Perform the combiner functions within the network as Full-bisection bandwidth does not help here opposed to application-level solutions Mismatch between physical and logical topology Two logical links are mapped onto the same physical link Reduce shuffle time by aggregating packets on path Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications
How Can We Perform In-network Processing? y • We exploit CamCube − Direct-connect topology − 3D torus − Uses no switches / routers for internal traffic z • Servers intercept, forward and process packets x • Nodes have (x,y,z) coordinates − This defines a key-space (=> key-based routing) − Coordinates are locally re-mapped in case of failures Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 12/52
How Can We Perform In-network Processing? y • We exploit CamCube − Direct-connect topology − 3D torus − Uses no switches / routers for internal traffic z • Servers intercept, forward and process packets x • Nodes have (x,y,z) coordinates − This defines a key-space (=> key-based routing) − Coordinates are locally re-mapped in case of failures Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 13/52
(1,2,1) (1,2,2) How Can We Perform In-network Processing? y • We exploit CamCube − Direct-connect topology − 3D torus − Uses no switches / routers for internal traffic z • Servers intercept, forward and process packets x • Nodes have (x,y,z) coordinates − This defines a key-space (=> key-based routing) − Coordinates are locally re-mapped in case of failures Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 14/52
(1,2,1) (1,2,2) How Can We Perform In-network Processing? y • We exploit CamCube − Direct-connect topology − 3D torus − Uses no switches / routers for internal traffic z • Servers intercept, forward and process packets x • Nodes have (x,y,z) coordinates Key property − This defines a key-space (=> key-based routing) No distinction between network and computation devices − Coordinates are locally re-mapped in case of failures Servers can perform arbitrary packet processing on-path Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications
Mapping a tree… … on a switched topology … on CamCube • Packets are aggregated • The 1 Gbps link on path (=> less traffic) becomes the 1/in-degree • 1:1 mapping btw. logical Gbps bottleneck and physical topology 1 Gbps Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 16/52
Camdoop Design Goals 1. No change in the programming model 2. Exploit network locality 3. Good server and link load distribution 4. Fault-tolerance Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 17/52
Design Goal #1 Programming Model • Camdoop adopts the same MapReduce model • GFS-like distributed file-system − Each server runs map tasks on local chunks • We use a spanning tree − Combiners aggregate map tasks and children results (if any) and stream the results to the parents − The root runs the reduce task and generates the final output Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 18/52
Design Goal #2 Network locality How to map the tree nodes to servers? Paolo Costa Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications Camdoop: Exploiting In-network Aggregation for Big Data Applications 19/52
Design Goal #2 Network locality Map task outputs are always read from the local disk Paolo Costa Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications Camdoop: Exploiting In-network Aggregation for Big Data Applications 20/52
Design Goal #2 Network locality (1,2,1) (1,1,1) (1,2,2) The parent-children are mapped on physical neighbors Paolo Costa Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications Camdoop: Exploiting In-network Aggregation for Big Data Applications 21/52
Design Goal #2 Network locality (1,2,1) (1,1,1) (1,2,2) This ensures maximum locality and How to map the tree nodes to servers? Map task outputs are always read from the local disk The parent-children are mapped on physical neighbors optimizes network transfer Paolo Costa Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications Camdoop: Exploiting In-network Aggregation for Big Data Applications
Network Locality Logical View Physical View (3D Torus) One physical link is used by one and only one logical link Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 23/52
Design Goal #3 Load Distribution Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 24/52
Design Goal #3 Load Distribution Only 1 Gbps Different (instead of 6) in-degree Poor server load distribution Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 25/52
Recommend
More recommend