Transformations • Transformations + Annotations allow Stubby to support different interfaces by being External to any interface D0 1 D0 2 M2 M1 J1 J2 R2 R1 D1 D2 73
Transformations • Transformations + Annotations allow Stubby to support different interfaces by being External to any interface D0 1 D0 2 Transformation M2 M1 J1 J2 R2 R1 D1 D2 74
Transformations • Transformations + Annotations allow Stubby to support different interfaces by being External to any interface D0 1 D0 2 D0 1 D0 2 Transformation M2 M1 M1 M2 J1 J2 J1-2 R2 R1 R1 R2 D1 D2 D2 D1 75
Transformations • Transformations + Annotations allow Stubby to support different interfaces by being External to any interface D0 1 D0 2 D0 1 D0 2 Transformation M2 M1 M1 M2 J1 J2 J1-2 R2 R1 R1 R2 D1 D2 D2 D1 • Annotations ensure only valid transformations are considered 76
Transformations • Transformations + Annotations allow Stubby to support different interfaces by being External to any interface D0 1 D0 2 D0 1 D0 2 Transformation M2 M1 M1 M2 J1 J2 J1-2 R2 R1 R1 R2 D1 D2 D2 D1 • Annotations ensure only valid transformations are considered • Transformations can be combined (whole >> sum of parts!) • Stubby considers 5 types of transformations (more to come) 77
Transformations • Transformations + Annotations allow Stubby to support different interfaces by being External to any interface D0 1 D0 2 D0 1 D0 2 Transformation M2 M1 M1 M2 J1 J2 J1-2 R2 R1 R1 R2 D1 D2 D2 D1 • Annotations ensure only valid transformations are considered • Transformations can be combined (whole >> sum of parts!) • Stubby considers 5 types of transformations (more to come) 78
Intra-Job Vertical Packing • Transforms a MapReduce job into a Map-only job 79
Intra-Job Vertical Packing • Transforms a MapReduce job into a Map-only job < 51,2> < 51,1> < 50,1> … M M hash (O,Z) < 51,2> sort (O,Z) < 51,1> J.K 2 ={O,Z} < 50,1> … R R < 51,2> <51,1> < 50,1> … M M M hash (O) sort (O) <51,2> J.K 2 ={O} < 50,1> < 51,1> … R R 80
Intra-Job Vertical Packing • Transforms a MapReduce job into a Map-only job < 51,2> < 51,2> < 51,1> < 50,1> < 51,1> < 50,1> … … M M M M hash (O,Z) hash (O) < 51,2> <51,2> sort (O,Z) sort (O,Z) < 51,1> J.K 2 ={O,Z} < 50,1> < 50,1> < 51,1> … … R R R R Transformation < 51,2> < 50,1> <51,1> <51,2> < 50,1> < 51,1> … M M M M M hash (O) … sort (O) <51,2> R R J.K 2 ={O} < 50,1> < 51,1> … R R 81
Intra-Job Vertical Packing • Transforms a MapReduce job into a Map-only job < 51,2> < 51,2> < 51,1> < 50,1> < 51,1> < 50,1> … … M M M M hash (O,Z) hash (O) < 51,2> <51,2> sort (O,Z) sort (O,Z) < 51,1> J.K 2 ={O,Z} < 50,1> < 50,1> < 51,1> … … R R R R Transformation < 51,2> < 50,1> <51,1> <51,2> < 50,1> < 51,1> … M M M M M hash (O) … sort (O) <51,2> R R J.K 2 ={O} < 50,1> < 51,1> … R R 82
Intra-Job Vertical Packing • Transforms a MapReduce job into a Map-only job • Group/Partition requirements of both jobs is now enforced at the same time < 51,2> < 51,2> < 51,1> < 50,1> < 51,1> < 50,1> … … M M M M hash (O,Z) hash (O) < 51,2> <51,2> sort (O,Z) sort (O,Z) < 51,1> J.K 2 ={O,Z} < 50,1> < 50,1> < 51,1> … … R R R R Transformation < 51,2> < 50,1> <51,1> <51,2> < 50,1> < 51,1> … M M M M M hash (O) … sort (O) <51,2> R R J.K 2 ={O} < 50,1> < 51,1> … R R 83
Intra-job Vertical Packing (2) • Can have positive / negative effect on performance -> Need cost-based approach 3 2.5 2 Speedup 1.5 1 0.5 0 84 Performance Degradation Performance Improvement
Intra-job Vertical Packing (2) • Can have positive / negative effect on performance -> Need cost-based approach 3 - Forces dependencies 2.5 of configurations (e.g., parallelism) - Resource contention 2 Speedup (more functions in a task) 1.5 1 0.5 0 85 Performance Degradation Performance Improvement
Intra-job Vertical Packing (2) • Can have positive / negative effect on performance -> Need cost-based approach + Eliminates inter-task data transfer + Eliminates sorting overhead + Eliminates writing output to disk 3 - Forces dependencies 2.5 of configurations (e.g., parallelism) - Resource contention 2 Speedup (more functions in a task) 1.5 1 0.5 0 86 Performance Degradation Performance Improvement
Inter-job Vertical Packing • Merges a map-only job with another job 87
Inter-job Vertical Packing • Merges a map-only job with another job M R M 88
Inter-job Vertical Packing • Merges a map-only job with another job M R Transformation M 89
Inter-job Vertical Packing • Merges a map-only job with another job M M R R Transformation M M 90
Inter-job Vertical Packing • Merges a map-only job with another job M M R R Transformation M M • If combine intra-job + inter-job -> 2 MapReduce jobs to 1 MapReduce job 91
Inter-job Vertical Packing • Merges a map-only job with another job M M R R Transformation M M • Again, not always a good thing • + Eliminates writing to disk • - Forces dependencies 92
Horizontal Packing • Combine concurrent running jobs into a single job 93
Horizontal Packing • Combine concurrent running jobs into a single job M M M R R R 94
Horizontal Packing • Combine concurrent running jobs into a single job Transformation M M M R R R 95
Horizontal Packing • Combine concurrent running jobs into a single job Transformation M M M M M M R R R R R R 96
Horizontal Packing • Combine concurrent running jobs into a single job Transformation M M M M M M R R R R R R • + Read dataset once • + Share overhead of launching jobs • - Extra overhead of sorting/partitioning combined map output • - Share limited memory resources per task (can spill more) 97
Partition Function • Change how map outputs are partitioned and sorted 98
Partition Function • Change how map outputs are partitioned and sorted M hash (O) R M filter={0<=O<100} R 99
Partition Function • Change how map outputs are partitioned and sorted M hash (O) R Transformation M filter={0<=O<100} R 100
Partition Function • Change how map outputs are partitioned and sorted M M range (O) hash (O) R split-points (100,200,…) R Transformation M M filter={0<=O<100} filter={0<=O<100} R R 101
Partition Function • Change how map outputs are partitioned and sorted M M range (O) hash (O) R split-points (100,200,…) R Transformation M M filter={0<=O<100} filter={0<=O<100} R R • Enables partition pruning 102 • Enables vertical packing transformation
Configuration Transformation • Changes the configuration of a MapReduce job 103
Configuration Transformation • Changes the configuration of a MapReduce job Memory Buffer 512MB M M R R 2 Reduce Tasks 104
Configuration Transformation • Changes the configuration of a MapReduce job Memory Buffer 512MB vs. M M M M Memory Buffer 128MB Transformation R R R R R R 2 Reduce Tasks vs. 4 Reduce Tasks 105
Configuration Transformation • Changes the configuration of a MapReduce job Memory Buffer 512MB vs. M M M M Memory Buffer 128MB Transformation R R R R R R 2 Reduce Tasks vs. 4 Reduce Tasks • Many configurations that affect performance (e.g., sort buffer, compression, combiner, reduce tasks, etc) • Impact of configuration depends on other 106 transformations (interaction)
Configuration Transformation • Changes the configuration of a MapReduce job Memory Buffer 512MB vs. M M M M Memory Buffer 128MB Transformation R R R R R R 2 Reduce Tasks vs. 4 Reduce Tasks • Many configurations that affect performance (e.g., sort buffer, compression, combiner, reduce tasks, etc) • Impact of configuration depends on other 107 transformations (interaction)
Next Transformations Many Interfaces Information Large MapReduce Spectrum Plan Space Workflow Optimization Challenges Interac racti tions ns Annotations 108
Next Transformations Many Interfaces Information Large MapReduce Spectrum Plan Space Workflow Optimization Challenges Interac racti tions ns Annotations 109
Optimization Process 110
Optimization Process U (1) O ptimization unit D0 1 D0 2 localizes M2 M1 interactions among J1 J2 R2 R1 plan space choices D1 D2 M3 J3 R3 D3 M4 J4 D4 M5 M6 J5 J6 R5 R6 D5 D6 111 M7 J7 R7 D7
Optimization Process U (1) D0 1 D0 2 M1 M2 J1-2 R1 R2 D1 D2 M3 J3 R3 D3 M4 J4 D4 M5 M6 J5 J6 R5 R6 D5 D6 112 M7 J7 R7 D7
Optimization Process D0 1 D0 2 M1 M2 J1-2 R1 R2 D1 D2 U (2) M3 J3 R3 D3 M4 J4 D4 M5 M6 J5 J6 R5 R6 D5 D6 113 M7 J7 R7 D7
Optimization Process Top-Down D0 1 D0 2 because producer jobs M1 M2 affect the input datasets J1-2 R1 R2 of consumer jobs D1 D2 U (2) M3 J3 Dynamically generated R3 because previous D3 optimization unit transforms workflow M4 J4 D4 M5 M6 J5 J6 R5 R6 D5 D6 114 M7 J7 R7 D7
Optimization Process D0 1 D0 2 M1 M2 J1-2 R1 R2 D2 D1 U (4) M3 R3 M4 M5 M6 J3-7 R5 R6 M7 R7 D7 D6 115
Divide and Conquer 116
Divide and Conquer • Divide workflow into Optimization Units to have smaller plan spaces • Issue: Interactions among plan space choices • Insight: Based on Dataset and Resource dependencies 117
Divide and Conquer • Divide workflow into Optimization Units to have smaller plan spaces • Issue: Interactions among plan space choices • Insight: Based on Dataset and Resource dependencies 118
Divide and Conquer • Divide workflow into Optimization Units to have smaller plan spaces • Issue: Interactions among plan space choices • Insight: Based on Dataset and Resource dependencies M5 M6 J5 J6 R5 R6 D5 D6 M7 J7 R7 D7 119
Divide and Conquer • Divide workflow into Optimization Units to have smaller plan spaces • Issue: Interactions among plan space choices • Insight: Based on Dataset and Resource dependencies M5 M6 J5 J6 R5 R6 D5 D6 M7 J7 R7 D7 • Divide into producer-consumer relationships • Transformations on producer jobs, affect transformations on consumer jobs • E.g, partition function on J5 -> vertical packing on J7, 120 compressing D5 forces J7 to decompress
Divide and Conquer • Divide workflow into Optimization Units to have smaller plan spaces • Issue: Interactions among plan space choices • Insight: Based on Dataset and Resource dependencies M5 M6 J5 J6 R5 R6 D5 D6 M7 J7 R7 D7 • Divide into producer-consumer relationships • Transformations on producer jobs, affect transformations on consumer jobs • E.g, partition function on J5 -> vertical packing on J7, 121 compressing D5 forces J7 to decompress
Divide and Conquer • Divide workflow into Optimization Units to have smaller plan spaces • Issue: Interactions among plan space choices • Insight: Based on Dataset and Resource dependencies M5 M6 J5 J6 R5 R6 D5 D6 M7 J7 R7 D7 • Concurrent jobs use the same cluster resources • E.g, affect configuration and horizontal packing transformations 122
Divide and Conquer • Divide workflow into Optimization Units to have smaller plan spaces • Issue: Interactions among plan space choices • Insight: Based on Dataset and Resource dependencies M5 M6 J5 J6 R5 R6 D5 D6 M7 J7 R7 D7 123
Within an Optimization Unit • Enumerate all valid combinations of packing transformations 124
Within an Optimization Unit • Enumerate all valid combinations of packing transformations D1 D2 M3 R3 D3 M4 D4 p 4 p 3 p 2 p 1 D1 D2 D1 D2 D1 D2 D1 D2 M3 M3 M3 M3 R3 R3 R3 R3 M4 M4 D3 D3 125 D4 D4 M4 M4 D4 D4
Within an Optimization Unit • Enumerate all valid combinations of packing transformations p 4 p 3 p 2 p 1 D1 D2 D1 D2 D1 D2 D1 D2 M3 M3 M3 M3 R3 R3 R3 R3 M4 M4 D3 D3 126 D4 D4 M4 M4 D4 D4
Within an Optimization Unit • Enumerate all valid combinations of packing transformations • Use Starfish’s What -If Engine [Herodotou VLDB 2011] for costing • Use Recursive Random Search [Ye SIGMETRICS 03] to find configurations with best cost for each combination p i p 4 p 3 p 2 p 1 D1 D2 D1 D2 D1 D2 D1 D2 M3 M3 M3 M3 R3 R3 R3 R3 M4 M4 D3 D3 127 D4 D4 M4 M4 D4 D4
Within an Optimization Unit • Enumerate all valid combinations of packing transformations • Use Starfish’s What -If Engine [Herodotou VLDB 2011] for costing • Use Recursive Random Search [Ye SIGMETRICS 03] to find configurations with best cost for each combination p i p 4 p 3 p 2 p 1 D1 D2 D1 D2 D1 D2 D1 D2 M3 M3 M3 M3 R3 R3 R3 R3 M4 M4 D3 D3 128 D4 D4 M4 M4 D4 D4
Recommend
More recommend