Repe22ve Paderns • Advanced data analy2cs are itera2ve in nature. – Machine learning, graph processing, image recogni2on, etc. • This results in repe22ve paderns in the control plane. – Similar tasks execute with minor differences. 42
Repe22ve Paderns • Advanced data analy2cs are itera2ve in nature. – Machine learning, graph processing, image recogni2on, etc. • This results in repe22ve paderns in the control plane. – Similar tasks execute with minor differences. Training Es,ma,on while (error > threshold_e) { Data Data while (gradient > threshold_g) { Coefficients Parameters // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) Itera,ve Op,mizer Error Es,ma,on param = update_model(param, error) } 43
This talk • Control Plane: the Emerging Bodleneck • Design Scope of the Control Plane • Execu2on Templates • Nimbus: a Framework with Templates • Evalua2on 44
Execu2on Templates • Tasks are cached as parameterizable blocks on nodes. • Instead of assigning the tasks from scratch, templates are instan<ated by filling in only changing parameters. Task id Data list Task id Dep. list Data list Function Task id Dep. list Parameter Data list Function Dep. list Parameter Function Parameter 45
Execu2on Templates • Tasks are cached as parameterizable blocks on nodes. • Instead of assigning the tasks from scratch, templates are instan<ated by filling in only changing parameters. Task id Load New Data list T 2 Task id T 1 Task ids Dep. list P 2 Data list P 1 Parameters Function Task id Dep. list Parameter Data list Function Dep. list Parameter T 3 Function P 3 Parameter 46
Execu2on Templates Mechanisms Summary • Instan<a<on : spawn a block of tasks without processing each task individually from scratch. It helps increase the task throughput . • Edits : modifies the content of each template at the granularity of tasks. It enables fine-grained, dynamic scheduling . • Patches : In case the state of the worker does not match the precondi2ons of the template. It enables dynamic control flow . 47
Execu2on Model Driver Program Controller Data Data flow Map Reduce Worker Worker 48
Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce Worker Worker 49
Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce Data Objects Data Objects Worker Worker 50
Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce C Data Objects Data Objects Worker Worker 51
Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce Task id Data list C Dep. list Function Parameter Data Objects Data Objects Worker Worker 52
Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce Task id Data Objects Data Objects Data list Dep. list Function Data Parameter Exchange C Worker Worker 53
Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Data Objects Data Objects Worker Worker 54
Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Task id Data list C Dep. list Function Parameter Data Objects Data Objects Worker Worker 55
Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Task id Data Objects Data Objects Data list Dep. list Function Data Parameter Exchange C Worker Worker 56
Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Task id Data list C Dep. list Function Parameter Data Objects Data Objects Worker Worker 57
Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Task id Data Objects Data Objects Data list Dep. list Function Data Parameter Exchange C Worker Worker 58
Execu2on Templates Abstrac2on Controller Task Graph C Data Objects Data Objects Worker Worker 59
Execu2on Templates Abstrac2on Controller Task Graph Data Objects Data Objects Template Template C C Worker Worker 60
Execu2on Templates Abstrac2on Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 61
Execu2on Templates Abstrac2on Controller Task Graph Instantiate<params> Instantiate<params> Data Objects Data Objects Template Template C Worker Worker 62
Execu2on Templates Abstrac2on Controller Task Graph Data Objects Data Objects Template Template C C Worker Worker 63
Execu2on Templates Abstrac2on Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 64
Execu2on Templates The Devil is in the details. • Caching tasks implies sta2c behavior: – Templates and dynamic scheduling ? • Reac2ve scheduling changes for load balancing. • Scheduling changes at the task granularity. – Templates and dynamic control flow ? • Need to support nested loops. • Need to support data dependent branches. 65
Execu2on Templates The Devil is in the details. • Caching tasks implies sta2c behavior: – Templates and dynamic scheduling ? • Reac2ve scheduling changes for load balancing. • Scheduling changes at the task granularity. – Templates and dynamic control flow ? • Need to support nested loops. • Need to support data dependent branches. 66
Execu2on Templates Edits • If scheduling changes, even slightly, the templates are obsolete. – For example migra2ng tasks among workers. • Instead of paying the substan2al cost of installing templates for every changes, templates allow edit , to change their structure. • Edits enable adding or removing tasks from the template and modifying the template content, in-place. • Controller has the general view of the task graph so it can update the dependencies properly, needed by the edits. 67
Execu2on Templates Edits Controller Task Graph Data Objects Data Objects Migrate one task Template Template C Worker Worker 68
Execu2on Templates Edits Controller Task Graph Edit<add > Edit<remove > Data Objects Data Objects Template Template C Worker Worker 69
Execu2on Templates Edits Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 70
Execu2on Templates Edits Controller Task Graph Instantiate<params> Instantiate<params> Data Objects Data Objects Template Template C Worker Worker 71
Execu2on Templates The Devil is in the details. • Caching tasks implies sta2c behavior: – Templates and dynamic scheduling ? • Reac2ve scheduling changes for load balancing. • Scheduling changes at the task granularity. – Templates and dynamic control flow ? • Need to support nested loops. • Need to support data dependent branches. 72
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on 73
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on • The more tasks cached in the template the beder. – The cost of template instan2a2on is amor2zed over greater number of tasks. – But loop unrolling only works for sta2c control flow. 74
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template 75
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template 76
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template • Cannot reuse the template (only two itera2ons of the inner loop). 77
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on • Templates cannot go beyond a branch in the driver program. • Execu2on templates operates at the granularity of basic blocks : – A code block with single entry and no branches except at the end. – It is the biggest block without sacrificing dynamic control flow . 78
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template 1 79
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Instan2ate Template 1 Instan2ate Template 1 Template 1 Template 1 Template 1 80
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template 2 81
Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Template 2 Template 2 82
Execu2on Templates Granularity Controller Task Graph EndTemplate EndTemplate C StartTemplate StartTemplate Data Objects Data Objects Worker Worker 83
Execu2on Templates Granularity Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 84
Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on • With dynamic control flow a basic block can have different entries. • The execu2on state is not similar in all circumstances. 85
Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Template 1 Template 1 86
Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Template 1 Template 1 87
Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Template 1 Template 1 88
Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Template 1 Template 1 Updated model parameters only on the reducer 89
Execu2on Templates Patching • Each template has a set of precondi<ons that need to be sa2sfied before it can be instan2ated. – For example the set of data objects in memory, accessed by the tasks cached in the template. 90
Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 91
Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 92
Execu2on Templates Patching • Each template has a set of precondi<ons that need to be sa2sfied before it can be instan2ated. – For example the set of data objects in memory, accessed by the tasks cached in the template. • Worker state might not match the precondi2ons of the template in all circumstances. • Controller patches the worker state before template instan2a2on, to sa2sfy the precondi2ons. 93
Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 94
Execu2on Templates Patching Controller Task Graph Patch< load > Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 95
Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 96
Execu2on Templates Patching Controller Task Graph Instantiate<params> Instantiate<params> Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 97
Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Precondi2ons Precondi2ons Template Template C C Worker Worker 98
Execu2on Templates Mechanisms Summary • Instan<a<on : spawn a block of tasks without processing each task individually from scratch. It helps increase the task throughput . • Edits : modifies the content of each template at the granularity of tasks. It enables fine-grained, dynamic scheduling . • Patches : In case the state of the worker does not match the precondi2ons of the template. It enables dynamic control flow . 99
This talk • Control Plane: the Emerging Bodleneck • Design Scope of the Control Plane • Execu2on Templates • Nimbus: a Framework with Templates • Evalua2on 100
Recommend
More recommend