online aggregation for large mapreduce jobs
play

Online Aggregation for Large MapReduce Jobs Niketan Pansare 1 , - PowerPoint PPT Presentation

Online Aggregation for Large MapReduce Jobs Niketan Pansare 1 , Vinayak Borkar 2 , Chris Jermaine 1 , Tyson Condie 3 1 Rice University, 2 UC Irvine, 3 Yahoo! Research Outline Motivation Implementation Experiments Conclusion 2


  1. OLA over single machine Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order  Example: Find SUM of below values 5, 9  7, 4, 2 7, 4, 2 8, 3 5, 9 1, 10, 6 8, 3 1, 10, 6 Sample = {13, 11, 14} Estimate = (13 + 11 + 14) * 4 / 3 = 50.67 37

  2. OLA over single machine Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order  Example: Find SUM of below values 5, 9  7, 4, 2 7, 4, 2 8, 3 5, 9 1, 10, 6 8, 3 1, 10, 6 Sample = {13, 11, 14} Estimate = (13 + 11 + 14) * 4 / 3 = 50.67 38

  3. OLA over single machine Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order  Example: Find SUM of below values 5, 9  7, 4, 2 7, 4, 2 8, 3 5, 9 1, 10, 6 8, 3 1, 10, 6 Sample = {13, 11, 14, 17} Estimate = (13 + 11 + 14 + 17) * 4 / 4 = 55 39

  4. Extend existing approaches OLA over single machine   Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order OLA over multiple machines   Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable Why it won't work ?  How do we deal with those issues ?  40

  5. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  So, instead of Example: Find SUM of below values  5, 9 7, 4, 2 7, 4, 2 8, 3 5, 9 1, 10, 6 8, 3 1, 10, 6 41

  6. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  1, 10, 6 5, 9 8, 3 Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 X axis = Processing Time 7, 4, 2 42

  7. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 Blocks that take   long time to process = RED  Short time to process = Green 43

  8. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 Arrows = Random Time Instances (Polling blocks) 44

  9. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 45

  10. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 46

  11. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 47

  12. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 48

  13. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 49

  14. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 50

  15. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 51

  16. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 52

  17. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 53

  18. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 Notice, there are more arrows on red region than green region 54

  19. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 Notice, there are more arrows on red region than green region Inspection Paradox: At any random time t, (stochastically) you will be processing those blocks that take long time 55

  20. Extend existing approaches OLA over single machine   Confidence interval found using classical sampling theory  Tuples are bundled into blocks - Arrive in random order OLA over multiple machines   Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable Why it won't work ?  How do we deal with those issues ?  56

  21. Why won't previous approach work ? Inspection paradox → At the time of estimation, processing  longer blocks Possible: correlation between processing time and value   Eg: count query 57

  22. Why won't previous approach work ? Inspection paradox → At the time of estimation, processing  longer blocks Possible: correlation between processing time and value   Eg: count query Biased estimates → current techniques won't work  58

  23. Why won't previous approach work ? Inspection paradox → At the time of estimation, processing  longer blocks This effect is found experimentally in the paper: 'MapReduce Online' Possible: correlation between processing time and value   Eg: count query Biased estimates → current techniques won't work  59

  24. Why won't previous approach work ? Inspection paradox → At the time of estimation, processing  longer blocks Possible: correlation between processing time and value   Eg: count query Biased estimates → current techniques won't work  Therefore, need to deal with inspection paradox in principled  fashion 60

  25. Extend existing approaches OLA over single machine   Confidence interval found using classical sampling theory  Tuples are bundled into blocks - Arrive in random order OLA over multiple machines   Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable Why it won't work ?  How do we deal with those issues ?  61

  26. How do we deal with Inspection Paradox Capture timing information (i.e. processing time of block)   Along with values Instead of using classical sampling theory, we output estimates  using bayesian model that:  Allows for correlation between processing time and values  And also takes into account the processing time of current block 62

  27. Outline  Motivation  Implementation  Experiments  Conclusion 63

  28. Implementation Overview Framework for distributed systems: MapReduce   Hadoop - Staged processing → Online  Hyracks (developed at UC Irvine) - Pipelining → ”Online” - Architecture (and API) similar to Hadoop - http://code.google.com/p/hyracks/ For estimates of ”Aggregation”,   2 modifications to MapReduce (Hyracks)  Bayesian Estimator 64

  29. Implementation Overview Framework for distributed systems: MapReduce   Hadoop - Staged processing → Online  Hyracks (developed at UC Irvine) - Pipelining → ”Online” - Architecture (and API) similar to Hadoop - http://code.google.com/p/hyracks/ For estimates of ”Aggregation”,   2 modifications to MapReduce (Hyracks)  Bayesian Estimator 65

  30. Implementation Overview Framework for distributed systems: MapReduce   Hadoop - Staged processing → Online  Hyracks (developed at UC Irvine) - Pipelining → ”Online” - Architecture (and API) similar to Hadoop - http://code.google.com/p/hyracks/ For estimates of ”Aggregation”,   2 modifications to MapReduce (Hyracks)  Bayesian Estimator 66

  31. Modifications to MapReduce (Hyracks) Master   Maintains random ordering of blocks - Logical not physical queue  Assigns block from head of queue  Block comes to head of queue → Timer starts (processing time) Two intermediates set of files   Data file → Values  Metadata file → Timing information  Shuffle phase of reducer 67

  32. Modifications to MapReduce (Hyracks) Client Master select sum(stock_price) from nasdaq_db group by company; Blk1 MSFT 2 AAPL 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4 68

  33. Modifications to MapReduce (Hyracks) Client Master Blk1 MSFT 2 AAPL 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4 69 Time t = 0

  34. Modifications to MapReduce (Hyracks) Blk 1 Blk 2 Blk 3 Blk 4 Blk 5 Blk 6 Blk 7 Client Master Blk1 MSFT 2 Master maintains a logical AAPL 4 queue of the blocks Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4 70 Time t = 1

  35. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 Master randomizes the AAPL 4 queue Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4 71 Time t = 1

  36. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 Master forks workers AAPL 4 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 72 Time t = 2

  37. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 Workers request for blocks AAPL 4 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 73 Time t = 3

  38. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk6 Blk1 MSFT 2 Masters reads head of AAPL 4 queue and assigns it to Blk2 ORCL 3 Worker 1 first worker Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 74 Time t = 4

  39. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 Blk6 Worker1 starts reading AAPL 4 Blk6 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 75 Time t = 5

  40. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk5 Blk1 MSFT 2 <MSFT, 2> Assigns Blk5 to Worker2 AAPL 4 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 76 Time t = 6

  41. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 <MSFT, 2> Worker1 does its map task AAPL 4 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk5 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 77 Time t = 7

  42. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Reducer Blk1 MSFT 2 <MSFT, 2> AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> Blk5 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 78 Time t = 8

  43. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 79 Time t = 9

  44. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Random Time instance: Do estimation 80 Time t = 9

  45. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master t process > 3 Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Random Time instance: Do estimation 81 Time t = 9

  46. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master t process > 3 Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Random Time instance: Do estimation 82 Time t = 9

  47. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master t process > 3 Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Blk6: <MSFT, 2> Estimation Blk6: t process = 4 code Random Time instance: Do estimation 83 Time t = 9 Blk5: t process > 3

  48. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master t process > 3 Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Estimation code Random Time instance: Do estimation 84 [5.8, 8] Time t = 9

  49. Implementation Overview Framework for distributed systems: MapReduce   Hadoop - Staged processing → Online  Hyracks (developed at UC Irvine) - Pipelining → ”Online” - Architecture (and API) similar to Hadoop - http://code.google.com/p/hyracks/ For estimates of ”Aggregation”,   2 modifications to MapReduce (Hyracks)  Bayesian Estimator 85

  50. Bayesian Estimator Why ? → To deal with Inspection Paradox  86

  51. Bayesian Estimator Why ? → To deal with Inspection Paradox  How ?   Allows for correlation between processing time and values  And also take into account the processing time of current block 87

  52. Bayesian Estimator Why ? → To deal with Inspection Paradox  How ?   Allows for correlation between processing time and values  And also take into account the processing time of current block Implementation:   C++ code using GNU Scientific Library and Minuit2  Input: Data file and Metadata file from Reducer  Output: Confidence Interval → Eg:[995, 1005] with 95% prob 88

  53. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X 89

  54. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X) 90

  55. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X)  Our approach: f(X, T process , T scheduling ) 91

  56. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X)  Our approach: f(X, T process , T scheduling ) - Correlation between X, T process and T scheduling 92

  57. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X)  Our approach: f(X, T process , T scheduling ) - Correlation between X, T process and T scheduling - f(X | T process > 100000000, T scheduling = 22) ≠ f(X) 93

  58. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X)  Our approach: f(X, T process , T scheduling ) - Correlation between X, T process and T scheduling - f(X | T process > 100000000, T scheduling = 22) ≠ f(X) Estimation using Bayesian Machinery   Gibbs Sampler 94 - Developed probability (or update) equations

  59. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X) Detailed discussion in the paper  Our approach: f(X, T process , T scheduling ) - Correlation between X, T process and T scheduling - f(X | T process > 100000000, T scheduling = 22) ≠ f(X) Estimation using Bayesian Machinery   Gibbs Sampler 95 - Developed probability (or update) equations

  60. Outline  Motivation  Implementation  Experiments  Conclusion 96

  61. Experiments Hypothesis:   Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates Experiment 1: (Real dataset)   select sum(page_count) from wikipedia_log group by language  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 97

  62. Experiments Hypothesis:   Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates Experiment 1: (Real dataset)   select sum(page_count) from wikipedia_log group by language  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 98

  63. Experiments Hypothesis:  Reading the figures  Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates Experiment 1: (Real dataset)   6 months Wikipedia log (220 GB compressed, 3960 blocks) Percentage of data processed  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer Experiment 2: (Simulated data set)   ↑ correlation (Non-uniform configuration) 99

  64. Experiments Hypothesis:  Reading the figures  Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates Experiment 1: (Real dataset)   6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer Experiment 2: (Simulated data set)   ↑ correlation (Non-uniform configuration) 100

Recommend


More recommend