ICDCS’15, Columbus, USA F LOW P ROPHET : Generic and Accurate Traffic Prediction for Data-parallel Cluster Computing Hao Wang 1,2 , Li Chen 2 , Kai Chen 2 , Ziyang Li 2,3 , Yiming Zhang 3 , Haibing Guan 1 , Zhengwei Qi 1 , Dongsheng Li 3 , Yanhui Geng 4 1 Shanghai Jiao Tong University 2 Hong Kong University of Science and Technology 3 National University of Defense Technology 4 Huawei Technologies Co. Ltd.
Dryad … 2
Flow-based optimization mechanisms: - PDQ [Sigcomm’12], pFabric [Sigcomm’13], PASE [Sigcomm’14], Varys [Sigcomm’14], Baraat [Sigcomm’14] Architectural bandwidth provisioning: - c-Through [Sigcomm’10], Helios [Sigcomm’11], Mordia [Sigcomm’13], OSA [NSDI’12] Traffic engineering: - Hedera [NSDI’10], MicroTE [CoNEXT’11], D 3 [Sigcomm’11] 3
Knowing the Flow Information Flow-based optimization mechanisms: Ahead of Time - PDQ [Sigcomm’12], pFabric [Sigcomm’13], ? PASE [Sigcomm’14], Varys [Sigcomm’14], Baraat [Sigcomm’14] Architectural bandwidth provisioning: - c-Through [Sigcomm’10], Helios [Sigcomm’11], Mordia [Sigcomm’13], OSA [NSDI’12] Traffic engineering: - Hedera [NSDI’10], MicroTE [CoNEXT’11], D 3 [Sigcomm’11] 4 4
F LOW P ROPHET • Generic for DCFs • Accurate and fined-grained • Ahead-of-time • Scalable and low-overhead 5
(B,1) (D,1) (A,1) (E,1) (C,1) (B,1) (E,1) (C,1) (A,1) Toy Example: (D,1) (B,1) Word Count (D,3) (D,1) (A,2) (E,2) (B,3) (C,2) 6
Logical View … … … … … A B D E C A E C A B … … … … B… B D E C A A D E map() (A,1) (A,1) (B,1) (B,1) (B,1) (C,1) (C,1) (D,1) (D,1) (D,1) (E,1) (E,1) reduce() (A,2) (D,3) (B,3) (E,2) 7 (C,2)
Physical View 8
Physical map() (B,1) View (D,1) (A,1) (E,1) (C,1) (B,1) (E,1) (C,1) (A,1) (D,1) (B,1) (D,1) 8
Physical map() (B,1) View (D,1) (A,1) (E,1) (C,1) (B,1) (E,1) (C,1) (A,1) (D,1) (B,1) (D,1) 8
Physical map() (B,1) View (D,1) (A,1) (E,1) (C,1) (B,1) (E,1) (C,1) (D,1) 8
Physical map() (B,1) View (D,1) (A,1) (E,1) (C,1) (E,1) 8
Physical map() (B,1) View (D,1) (E,1) 8
Physical map() View 8
Physical map() View Shuffle 8
Physical map() View Shuffle reduce() (D,3) (A,2) (E,2) (B,3) (C,2) 8
User Distributed Computing Frameworks 9
Logical Physical User View View Predict flow info. 9
Logical Physical View View Predict
Logical Flow info. View Predict
DAG Flow info. Predict
Directed Acyclic Graph (DAG)
… input data tasks … stage #3 … stage #2 stage #1 stage #0 output data
input data ……………. n Dryad ……………. computing n vertices ……………. n output files
… input data map tasks … reduce tasks … output data
supserstep( i ) computing … (BSP Model) nodes computing … nodes barrier synchronization
job#1 job#2 job#3 job#n …… Application Submit Master
job#1 job#2 job#3 job#n …… Application Submit Master
… … Task Assignment Worker#1 Worker#n Worker#2 …
L IFE C YCLE job#1 job#2 job#n … … … Worker#1 Worker#n Worker#2 … Master
O BSERVATION — DAG contains necessary time , data , and flow dependencies for accurate flow prediction.
A RCHITECTURE Worker Node … Spark Ciel Hadoop Local Network Master Master Master Disk Interface Local DAG Builder Memory Task List Stage ID Write Fetch Data Status Flow Data Status Data Data Tracker Calculator List Aggregator Spark Hadoop Ciel … Worker Worker Worker Master Node
API EXAMPLES • Required APIs for DCF master Event Definition Trigger Condition newStageEvent(stageID, childStageID) a new stage is created stageStartEvent(List[task], stageID) a stage is beginning stageFinishedEvent(stageID) a stage is finished • The DAG Builder event handlers Event Definition newStageHandler(newStageEvent) ⇒ (currentStage, childStage) stageStartHandler(stageStartEvent) ⇒ Event(List[task], List[stageID]) stageFinishedHandler(stageFinishedEvent) ⇒ Event(stageID)
Flow Calculator DAG Builder Data Tracker 3 1 2 192.168.1.11 3 192.168.1.21 blockID#1, 120MB 2 req : blockID#2 192.168.1.12 0 3 192.168.1.22 blockID#2, 200MB 2 req : blockID#3 192.168.1.13 3 192.168.1.23 blockID#3, 200MB 2 req : blockID#1 block info. Output flow info. t
F LOW P ROPHET Generic • Accurate and fined-grained • Ahead-of-time • Scalable and low-overhead • 23
T ESTBED Dell PowerEdge R320 x 37 • Intel Xeons E5-1410 2.8GHz CPU • 24GB 1600MHz DDR3 • Broadcom Gigabit Ethernet NIC • Pronto-3295 Gigabit Ethernet Switch • 24
B ENCHMARKS WikiPageRank Hadoop TeraSort • • SparkPageRank π (Pi) • • Spark K-means WordCount • • M ETRICS Time advance Overhead • • Prediction Scalability • • accuracy Benefits • 25
T IME A DVANCE • WikipediaPageRank-13G (Spark) 3000 Prediction Time (16:18:22.365) Prediction Time (16:18:29.547) Flow(#) in a Shuf fl e 2000 1000 0 16:18:25 16:18:30 16:18:35 Shuf fl eID#7 Shuf fl eID#6 Time 26
1 1 CDF 0.8 0.8 Spark Spark OF 0.6 0.6 WikiPR-13G WikiPR-26G CDF CDF 0.4 0.4 Avg: 414.1ms Avg: 478ms L EAD 0.2 0.2 0 0 0 1 2 3 4 5 0 1 2 3 4 5 T IME Lead Time (s) Lead Time (s) 1 1 0.8 0.8 Hadoop Hadoop 0.6 0.6 CDF CDF TeraSort-10G WordCount-20G 0.4 0.4 Avg: 12.3123s Avg: 7.7348s 0.2 0.2 0 0 0 10 20 30 40 0 10 20 30 40 50 Lead Time (s) Lead Time (s) 27
P REDICTION A CCURACY 1000 Spark WikiPR-26G Actual Traf fi c Predicted Traf fi c 800 Volume (MB) 600 400 200 0 Shuf fl eID#3 Shuf fl eID#4 Shuf fl eID#5 Shuf fl eID#6 15 400 Actual Traf fi c Actual Traf fi c Predicted Traf fi c Predicted Traf fi c 10 300 Volume (GB) Volume (MB) 200 5 100 28 0 0 Hadoop TeraSort-10G Hadoop WordCount-10G
O VERHEAD Pure Spark Spark with FlowProphet 200 Completion Time (s) 150 100 50 0 Wikipedia Wikipedia SparkPi SparkPi WordCount WordCount KMeans PageRank-13G PageRank-26G -500M -1000M -20G -40G -20G 29
O VERHEAD Pure Hadoop 200 Hadoop with FlowProphet Hadoop with HadoopWatch Completion Time (s) 150 100 50 0 HadoopPi HadoopPi WordCount WordCount TeraSort-10G TeraSort-20G -100M -500M -20G -40G 30
S CALABILITY OR = t enabled − t disabled • Overhead Ratio (OR) : t disabled Overhead Ratio (%) 2 Spark WikiPR-26G 1 0 400 Job Completion Time (s) 300 200 Pure Spark Spark with FlowProphet OR on testbed 100 OR by projection 0 10 15 20 25 30 35 40 45 50 55 60 65 70 75 ... n Number of Worker Nodes
S CALABILITY Overhead Ratio (%) 2 Hadoop TeraSort-10G 1 0 Job Completion Time (s) 150 100 Pure Hadoop Hadoop with FlowProphet OR on testbed 50 OR by projection 0 10 15 20 25 30 35 40 45 50 55 60 65 70 75 ... n Number of Worker Nodes
B ENEFITS • Hadoop TeraSort-25G • 12.52% JCT reduction by a simple network scheduler Original Optimized 70 75 80 85 90 95 100 105 110 115 120 Average co fl ow completion time (s) Original Optimized 70 80 90 100 110 120 130 140 Average job completion time (s) 33
R ELATED W ORK • Analyze past statistics - Traffic Engineering with Estimated Traffic Matrices • Monitor buffers or counters in switches - c-Through, Hedera, Helios • Tracing and profiling toolkits - X-Trace • File system monitoring - HadoopWatch 34
S UMMARY • DCF execution pattern • DAG for predicting flows • Design and implementation • Evaluation on testbed 35
Thank you Q&A ICDCS’15, Columbus, USA
Recommend
More recommend