fast distributed computa2ons in the cloud
play

Fast, Distributed Computa2ons in the Cloud Omid Mashayekhi Advisor: - PowerPoint PPT Presentation

Fast, Distributed Computa2ons in the Cloud Omid Mashayekhi Advisor: Philip Levis April 7, 2017 2 2 Cloud Frameworks Machine Streaming Graph SQL Learning Cloud Framework ... ... ... ... Cloud frameworks abstract away the complexi2es


  1. Repe22ve Paderns • Advanced data analy2cs are itera2ve in nature. – Machine learning, graph processing, image recogni2on, etc. • This results in repe22ve paderns in the control plane. – Similar tasks execute with minor differences. 42

  2. Repe22ve Paderns • Advanced data analy2cs are itera2ve in nature. – Machine learning, graph processing, image recogni2on, etc. • This results in repe22ve paderns in the control plane. – Similar tasks execute with minor differences. Training Es,ma,on while (error > threshold_e) { Data Data while (gradient > threshold_g) { Coefficients Parameters // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) Itera,ve Op,mizer Error Es,ma,on param = update_model(param, error) } 43

  3. This talk • Control Plane: the Emerging Bodleneck • Design Scope of the Control Plane • Execu2on Templates • Nimbus: a Framework with Templates • Evalua2on 44

  4. Execu2on Templates • Tasks are cached as parameterizable blocks on nodes. • Instead of assigning the tasks from scratch, templates are instan<ated by filling in only changing parameters. Task id Data list Task id Dep. list Data list Function Task id Dep. list Parameter Data list Function Dep. list Parameter Function Parameter 45

  5. Execu2on Templates • Tasks are cached as parameterizable blocks on nodes. • Instead of assigning the tasks from scratch, templates are instan<ated by filling in only changing parameters. Task id Load New Data list T 2 Task id T 1 Task ids Dep. list P 2 Data list P 1 Parameters Function Task id Dep. list Parameter Data list Function Dep. list Parameter T 3 Function P 3 Parameter 46

  6. Execu2on Templates Mechanisms Summary • Instan<a<on : spawn a block of tasks without processing each task individually from scratch. It helps increase the task throughput . • Edits : modifies the content of each template at the granularity of tasks. It enables fine-grained, dynamic scheduling . • Patches : In case the state of the worker does not match the precondi2ons of the template. It enables dynamic control flow . 47

  7. Execu2on Model Driver Program Controller Data Data flow Map Reduce Worker Worker 48

  8. Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce Worker Worker 49

  9. Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce Data Objects Data Objects Worker Worker 50

  10. Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce C Data Objects Data Objects Worker Worker 51

  11. Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce Task id Data list C Dep. list Function Parameter Data Objects Data Objects Worker Worker 52

  12. Execu2on Model Driver Program Controller Data Task Graph Data flow Map Reduce Task id Data Objects Data Objects Data list Dep. list Function Data Parameter Exchange C Worker Worker 53

  13. Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Data Objects Data Objects Worker Worker 54

  14. Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Task id Data list C Dep. list Function Parameter Data Objects Data Objects Worker Worker 55

  15. Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Task id Data Objects Data Objects Data list Dep. list Function Data Parameter Exchange C Worker Worker 56

  16. Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Task id Data list C Dep. list Function Parameter Data Objects Data Objects Worker Worker 57

  17. Repe22ve Paderns Controller Driver Program Task Graph while (error > threshold_e) { while (gradient > threshold_g) { // Optimization code block gradient = Gradient(tdata, coeff, param) coeff += gradient } // Estimation code block error = Estimate(edata, coeff, param) param = update_model(param, error) } Task id Data Objects Data Objects Data list Dep. list Function Data Parameter Exchange C Worker Worker 58

  18. Execu2on Templates Abstrac2on Controller Task Graph C Data Objects Data Objects Worker Worker 59

  19. Execu2on Templates Abstrac2on Controller Task Graph Data Objects Data Objects Template Template C C Worker Worker 60

  20. Execu2on Templates Abstrac2on Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 61

  21. Execu2on Templates Abstrac2on Controller Task Graph Instantiate<params> Instantiate<params> Data Objects Data Objects Template Template C Worker Worker 62

  22. Execu2on Templates Abstrac2on Controller Task Graph Data Objects Data Objects Template Template C C Worker Worker 63

  23. Execu2on Templates Abstrac2on Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 64

  24. Execu2on Templates The Devil is in the details. • Caching tasks implies sta2c behavior: – Templates and dynamic scheduling ? • Reac2ve scheduling changes for load balancing. • Scheduling changes at the task granularity. – Templates and dynamic control flow ? • Need to support nested loops. • Need to support data dependent branches. 65

  25. Execu2on Templates The Devil is in the details. • Caching tasks implies sta2c behavior: – Templates and dynamic scheduling ? • Reac2ve scheduling changes for load balancing. • Scheduling changes at the task granularity. – Templates and dynamic control flow ? • Need to support nested loops. • Need to support data dependent branches. 66

  26. Execu2on Templates Edits • If scheduling changes, even slightly, the templates are obsolete. – For example migra2ng tasks among workers. • Instead of paying the substan2al cost of installing templates for every changes, templates allow edit , to change their structure. • Edits enable adding or removing tasks from the template and modifying the template content, in-place. • Controller has the general view of the task graph so it can update the dependencies properly, needed by the edits. 67

  27. Execu2on Templates Edits Controller Task Graph Data Objects Data Objects Migrate one task Template Template C Worker Worker 68

  28. Execu2on Templates Edits Controller Task Graph Edit<add > Edit<remove > Data Objects Data Objects Template Template C Worker Worker 69

  29. Execu2on Templates Edits Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 70

  30. Execu2on Templates Edits Controller Task Graph Instantiate<params> Instantiate<params> Data Objects Data Objects Template Template C Worker Worker 71

  31. Execu2on Templates The Devil is in the details. • Caching tasks implies sta2c behavior: – Templates and dynamic scheduling ? • Reac2ve scheduling changes for load balancing. • Scheduling changes at the task granularity. – Templates and dynamic control flow ? • Need to support nested loops. • Need to support data dependent branches. 72

  32. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on 73

  33. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on • The more tasks cached in the template the beder. – The cost of template instan2a2on is amor2zed over greater number of tasks. – But loop unrolling only works for sta2c control flow. 74

  34. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template 75

  35. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template 76

  36. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template • Cannot reuse the template (only two itera2ons of the inner loop). 77

  37. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on • Templates cannot go beyond a branch in the driver program. • Execu2on templates operates at the granularity of basic blocks : – A code block with single entry and no branches except at the end. – It is the biggest block without sacrificing dynamic control flow . 78

  38. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template 1 79

  39. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Instan2ate Template 1 Instan2ate Template 1 Template 1 Template 1 Template 1 80

  40. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Template 2 81

  41. Execu2on Templates Granularity Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Template 2 Template 2 82

  42. Execu2on Templates Granularity Controller Task Graph EndTemplate EndTemplate C StartTemplate StartTemplate Data Objects Data Objects Worker Worker 83

  43. Execu2on Templates Granularity Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 84

  44. Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on • With dynamic control flow a basic block can have different entries. • The execu2on state is not similar in all circumstances. 85

  45. Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Template 1 Template 1 86

  46. Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Template 1 Template 1 87

  47. Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Template 1 Template 1 88

  48. Execu2on Templates Patching Training Es,ma,on Data Data Coefficients Parameters Itera,ve Op,mizer Error Es,ma,on Instan2ate Instan2ate Template 1 Template 1 Updated model parameters only on the reducer 89

  49. Execu2on Templates Patching • Each template has a set of precondi<ons that need to be sa2sfied before it can be instan2ated. – For example the set of data objects in memory, accessed by the tasks cached in the template. 90

  50. Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Template Template C Worker Worker 91

  51. Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 92

  52. Execu2on Templates Patching • Each template has a set of precondi<ons that need to be sa2sfied before it can be instan2ated. – For example the set of data objects in memory, accessed by the tasks cached in the template. • Worker state might not match the precondi2ons of the template in all circumstances. • Controller patches the worker state before template instan2a2on, to sa2sfy the precondi2ons. 93

  53. Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 94

  54. Execu2on Templates Patching Controller Task Graph Patch< load > Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 95

  55. Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 96

  56. Execu2on Templates Patching Controller Task Graph Instantiate<params> Instantiate<params> Data Objects Data Objects Precondi2ons Precondi2ons Template Template C Worker Worker 97

  57. Execu2on Templates Patching Controller Task Graph Data Objects Data Objects Precondi2ons Precondi2ons Template Template C C Worker Worker 98

  58. Execu2on Templates Mechanisms Summary • Instan<a<on : spawn a block of tasks without processing each task individually from scratch. It helps increase the task throughput . • Edits : modifies the content of each template at the granularity of tasks. It enables fine-grained, dynamic scheduling . • Patches : In case the state of the worker does not match the precondi2ons of the template. It enables dynamic control flow . 99

  59. This talk • Control Plane: the Emerging Bodleneck • Design Scope of the Control Plane • Execu2on Templates • Nimbus: a Framework with Templates • Evalua2on 100

Recommend


More recommend