f low p rophet generic and accurate traffic prediction
play

F LOW P ROPHET : Generic and Accurate Traffic Prediction for - PowerPoint PPT Presentation

ICDCS15, Columbus, USA F LOW P ROPHET : Generic and Accurate Traffic Prediction for Data-parallel Cluster Computing Hao Wang 1,2 , Li Chen 2 , Kai Chen 2 , Ziyang Li 2,3 , Yiming Zhang 3 , Haibing Guan 1 , Zhengwei Qi 1 , Dongsheng Li 3 ,


  1. ICDCS’15, Columbus, USA F LOW P ROPHET : Generic and Accurate Traffic Prediction for Data-parallel Cluster Computing Hao Wang 1,2 , Li Chen 2 , Kai Chen 2 , Ziyang Li 2,3 , Yiming Zhang 3 , Haibing Guan 1 , Zhengwei Qi 1 , Dongsheng Li 3 , Yanhui Geng 4 1 Shanghai Jiao Tong University 2 Hong Kong University of Science and Technology 3 National University of Defense Technology 4 Huawei Technologies Co. Ltd.

  2. Dryad … 2

  3. Flow-based optimization mechanisms: - PDQ [Sigcomm’12], pFabric [Sigcomm’13], PASE [Sigcomm’14], Varys [Sigcomm’14], Baraat [Sigcomm’14] Architectural bandwidth provisioning: - c-Through [Sigcomm’10], Helios [Sigcomm’11], Mordia [Sigcomm’13], OSA [NSDI’12] Traffic engineering: - Hedera [NSDI’10], MicroTE [CoNEXT’11], D 3 [Sigcomm’11] 3

  4. Knowing the Flow Information Flow-based optimization mechanisms: Ahead of Time - PDQ [Sigcomm’12], pFabric [Sigcomm’13], ? PASE [Sigcomm’14], Varys [Sigcomm’14], Baraat [Sigcomm’14] Architectural bandwidth provisioning: - c-Through [Sigcomm’10], Helios [Sigcomm’11], Mordia [Sigcomm’13], OSA [NSDI’12] Traffic engineering: - Hedera [NSDI’10], MicroTE [CoNEXT’11], D 3 [Sigcomm’11] 4 4

  5. F LOW P ROPHET • Generic for DCFs • Accurate and fined-grained • Ahead-of-time • Scalable and low-overhead 5

  6. (B,1) (D,1) (A,1) (E,1) (C,1) (B,1) (E,1) (C,1) (A,1) Toy Example: (D,1) (B,1) Word Count (D,3) (D,1) (A,2) (E,2) (B,3) (C,2) 6

  7. Logical View … … … … … A B D E C A E C A B … … … … B… B D E C A A D E map() (A,1) (A,1) (B,1) (B,1) (B,1) (C,1) (C,1) (D,1) (D,1) (D,1) (E,1) (E,1) reduce() (A,2) (D,3) (B,3) (E,2) 7 (C,2)

  8. Physical View 8

  9. Physical map() (B,1) View (D,1) (A,1) (E,1) (C,1) (B,1) (E,1) (C,1) (A,1) (D,1) (B,1) (D,1) 8

  10. Physical map() (B,1) View (D,1) (A,1) (E,1) (C,1) (B,1) (E,1) (C,1) (A,1) (D,1) (B,1) (D,1) 8

  11. Physical map() (B,1) View (D,1) (A,1) (E,1) (C,1) (B,1) (E,1) (C,1) (D,1) 8

  12. Physical map() (B,1) View (D,1) (A,1) (E,1) (C,1) (E,1) 8

  13. Physical map() (B,1) View (D,1) (E,1) 8

  14. Physical map() View 8

  15. Physical map() View Shuffle 8

  16. Physical map() View Shuffle reduce() (D,3) (A,2) (E,2) (B,3) (C,2) 8

  17. User Distributed Computing Frameworks 9

  18. Logical Physical User View View Predict flow info. 9

  19. Logical Physical View View Predict

  20. Logical Flow info. View Predict

  21. DAG Flow info. Predict

  22. Directed Acyclic Graph (DAG)

  23. … input data tasks … stage #3 … stage #2 stage #1 stage #0 output data

  24. input data ……………. n Dryad ……………. computing n vertices ……………. n output files

  25. … input data map tasks … reduce tasks … output data

  26. supserstep( i ) computing … (BSP Model) nodes computing … nodes barrier synchronization

  27. job#1 job#2 job#3 job#n …… Application Submit Master

  28. job#1 job#2 job#3 job#n …… Application Submit Master

  29. … … Task Assignment Worker#1 Worker#n Worker#2 …

  30. L IFE C YCLE job#1 job#2 job#n … … … Worker#1 Worker#n Worker#2 … Master

  31. O BSERVATION — DAG contains necessary time , data , and flow dependencies for accurate flow prediction.

  32. A RCHITECTURE Worker Node … Spark Ciel Hadoop Local Network Master Master Master Disk Interface Local DAG Builder Memory Task List Stage ID Write Fetch Data Status Flow Data Status Data Data Tracker Calculator List Aggregator Spark Hadoop Ciel … Worker Worker Worker Master Node

  33. API EXAMPLES • Required APIs for DCF master Event Definition Trigger Condition newStageEvent(stageID, childStageID) a new stage is created stageStartEvent(List[task], stageID) a stage is beginning stageFinishedEvent(stageID) a stage is finished • The DAG Builder event handlers Event Definition newStageHandler(newStageEvent) ⇒ (currentStage, childStage) stageStartHandler(stageStartEvent) ⇒ Event(List[task], List[stageID]) stageFinishedHandler(stageFinishedEvent) ⇒ Event(stageID)

  34. Flow Calculator DAG Builder Data Tracker 3 1 2 192.168.1.11 3 192.168.1.21 blockID#1, 120MB 2 req : blockID#2 192.168.1.12 0 3 192.168.1.22 blockID#2, 200MB 2 req : blockID#3 192.168.1.13 3 192.168.1.23 blockID#3, 200MB 2 req : blockID#1 block info. Output flow info. t

  35. F LOW P ROPHET Generic • Accurate and fined-grained • Ahead-of-time • Scalable and low-overhead • 23

  36. T ESTBED Dell PowerEdge R320 x 37 • Intel Xeons E5-1410 2.8GHz CPU • 24GB 1600MHz DDR3 • Broadcom Gigabit Ethernet NIC • Pronto-3295 Gigabit Ethernet Switch • 24

  37. B ENCHMARKS WikiPageRank Hadoop TeraSort • • SparkPageRank π (Pi) • • Spark K-means WordCount • • M ETRICS Time advance Overhead • • Prediction Scalability • • accuracy Benefits • 25

  38. T IME A DVANCE • WikipediaPageRank-13G (Spark) 3000 Prediction Time (16:18:22.365) Prediction Time (16:18:29.547) Flow(#) in a Shuf fl e 2000 1000 0 16:18:25 16:18:30 16:18:35 Shuf fl eID#7 Shuf fl eID#6 Time 26

  39. 1 1 CDF 0.8 0.8 Spark Spark OF 0.6 0.6 WikiPR-13G WikiPR-26G CDF CDF 0.4 0.4 Avg: 414.1ms Avg: 478ms L EAD 0.2 0.2 0 0 0 1 2 3 4 5 0 1 2 3 4 5 T IME Lead Time (s) Lead Time (s) 1 1 0.8 0.8 Hadoop Hadoop 0.6 0.6 CDF CDF TeraSort-10G WordCount-20G 0.4 0.4 Avg: 12.3123s Avg: 7.7348s 0.2 0.2 0 0 0 10 20 30 40 0 10 20 30 40 50 Lead Time (s) Lead Time (s) 27

  40. P REDICTION A CCURACY 1000 Spark WikiPR-26G Actual Traf fi c Predicted Traf fi c 800 Volume (MB) 600 400 200 0 Shuf fl eID#3 Shuf fl eID#4 Shuf fl eID#5 Shuf fl eID#6 15 400 Actual Traf fi c Actual Traf fi c Predicted Traf fi c Predicted Traf fi c 10 300 Volume (GB) Volume (MB) 200 5 100 28 0 0 Hadoop TeraSort-10G Hadoop WordCount-10G

  41. O VERHEAD Pure Spark Spark with FlowProphet 200 Completion Time (s) 150 100 50 0 Wikipedia Wikipedia SparkPi SparkPi WordCount WordCount KMeans PageRank-13G PageRank-26G -500M -1000M -20G -40G -20G 29

  42. O VERHEAD Pure Hadoop 200 Hadoop with FlowProphet Hadoop with HadoopWatch Completion Time (s) 150 100 50 0 HadoopPi HadoopPi WordCount WordCount TeraSort-10G TeraSort-20G -100M -500M -20G -40G 30

  43. S CALABILITY OR = t enabled − t disabled • Overhead Ratio (OR) : t disabled Overhead Ratio (%) 2 Spark WikiPR-26G 1 0 400 Job Completion Time (s) 300 200 Pure Spark Spark with FlowProphet OR on testbed 100 OR by projection 0 10 15 20 25 30 35 40 45 50 55 60 65 70 75 ... n Number of Worker Nodes

  44. S CALABILITY Overhead Ratio (%) 2 Hadoop TeraSort-10G 1 0 Job Completion Time (s) 150 100 Pure Hadoop Hadoop with FlowProphet OR on testbed 50 OR by projection 0 10 15 20 25 30 35 40 45 50 55 60 65 70 75 ... n Number of Worker Nodes

  45. B ENEFITS • Hadoop TeraSort-25G • 12.52% JCT reduction by a simple network scheduler Original Optimized 70 75 80 85 90 95 100 105 110 115 120 Average co fl ow completion time (s) Original Optimized 70 80 90 100 110 120 130 140 Average job completion time (s) 33

  46. R ELATED W ORK • Analyze past statistics - Traffic Engineering with Estimated Traffic Matrices • Monitor buffers or counters in switches - c-Through, Hedera, Helios • Tracing and profiling toolkits - X-Trace • File system monitoring - HadoopWatch 34

  47. S UMMARY • DCF execution pattern • DAG for predicting flows • Design and implementation • Evaluation on testbed 35

  48. Thank you Q&A ICDCS’15, Columbus, USA

Recommend


More recommend