pf pfimbi accelerat ating big dat ata jo jobs through
play

Pf Pfimbi : Accelerat ating Big Dat ata Jo Jobs Through Flow- - PowerPoint PPT Presentation

Pf Pfimbi : Accelerat ating Big Dat ata Jo Jobs Through Flow- -Co Controlled Da Data Replicati tion SimbarasheDzinamarira* Florin Dinu T. S. Eugene Ng* *Rice University, EPFL 1 DFSs have a critical role on the


  1. Pf Pfimbi : Accelerat ating Big Dat ata Jo Jobs Through Flow-­‑ -­‑Co Controlled Da Data Replicati tion SimbarasheDzinamarira* Florin Dinu ▵ T. S. Eugene Ng* *Rice University, ▵ EPFL 1

  2. DFSs have a critical role on the Big-­‑ DF -­‑Da Data la landscape Management & Monitoring (Ambari) Scripting Machine Learning Query NoSQL Database (Pig) (Mahout) (Hive) (Sqoop/REST/ODBC) (HBase) Workflow & Scheduling Data Integration (ZooKeeper) Coordination (Oozie) Distributed Processing (MapReduce) Distributed Processing (HDFS) • Rich ecosystem of distributedsystems around Hadoop and Spark • Predominantly use HDFS for persistent storage • A performant HDFS benefits all these system 2 Image reproducedfrom https //www.mssqltips.com/sqlservertip/3262/big-­‑data-­‑basics-­‑-­‑part-­‑6-­‑-­‑related-­‑apache-­‑projects-­‑in-­‑hadoop-­‑ecosystem/

  3. Synchronous data replication in HD HDFS an and its shortcomings 4. ACKNOWLEDGEMENTS 3.DATA DATANODE DATANODE • Bottlenecks affect the whole pipeline • Contention between primary writes and replication 3

  4. Sy Synchronous replication seldom helps bo boost appl pplica cation pe performance nce • In a study by Fetterly et al. only about 2% of data was read within 5 minutes of being written [TidyFS: USENIX ATC 2011] • Fast networks reduce the cost of non-­‑local reads • There can be data locality withoutreplication DATANODE DATANODE DATANODE 4

  5. Sy Synchronous ¡ ¡replication ¡ ¡impedes ¡ ¡ industry ¡ ¡efforts ¡ ¡to ¡ ¡improve ¡ ¡HD HDFS • Heterogeneous ¡storage HDD STORAGE DEVICE STORAGE HDD DEVICE STORAGE SSD DEVICE • Memory ¡as ¡a ¡storage ¡medium RAM HDD SSD 5 SSD ¡image ¡from ¡: ¡http://www.storagereview.com/intel_ssd_525_msata_review

  6. As Asynchronous replication relieves the ef effec ects of pipel eline bottlen enec ecks 3.DATA DATANODE DATANODE 6

  7. Be Beside asynchronous replication, we ne need flow cont ntrol to manag anage cont ntent ntion DATANODE DATANODE DATANODE WITHOUT WITH FLOW CONTROL FLOW CONTROL BANDWIDTH BANDWIDTH SHARE SHARE TIME TIME 7

  8. Pf Pfimbi effectively supports flow co cont ntrolled asynchr nchrono nous us repl plica cation • Allows diverse flow control policies • Cleanly separates mechanisms from policies • Isolates primary writes from replication • Avoids IO underutilization 8

  9. Pf Pfimbi Overview • Inter-­‑node flow control DATANODE DATANODE DATANODE • Intra-­‑node flow control DATANODE 9 SSD image from http //www.storagereview.com/intel ssd 525 msata review,; Magnifier image from https //commons.wikimedia.org/wiki/File Magnifying glass icon.svg

  10. In Inter-­‑ -­‑no node de ¡ ¡flow ¡ ¡co cont ntrol • Client ¡API ¡: ¡(# ¡of ¡replicas ¡, ¡ # ¡of ¡synchronous ¡replicas ) • Timely ¡transfer ¡of ¡replicas ¡to ¡ensure ¡high ¡utilization • Flexible ¡policies ¡for ¡sharing ¡bandwidth Block ¡notification PFIMBI PFIMBI BLOCK ¡ BLOCK ¡ Send ¡a ¡block BUFFER BUFFER CLIENT Kernel ¡Space Kernel ¡Space Synchronous Asynchronous 10

  11. Hi Hierarchical flow control allows Pfimbi to to implement many IO policies • Example 1 : prioritize replicas earlier in the pipeline • Example 2 : fair sharing of bandwidth between jobs Replication traffic 1 100 Position 1 Position 2 1 1 1 1 1 1 Job 1 Job 2 Job 3 Job 1 Job 2 Job 3 11

  12. In Intra-­‑ -­‑no node de flow co cont ntrol • Isolate synchronousdata from asynchronousdata • Avoid IO underutilization Monitoring Activity Block Buffer Asynchronous Bu Buffer Ca Cache data Incoming Data Synchronous data 12 Tapimage from https //image.freepik.com/free-­‑icon/bathroom-­‑tap-­‑silhouette 318-­‑63404.png

  13. In Intra-­‑ -­‑no node de flow co cont ntrol Pf Pfimbi’s strategy • Keep the disk fully utilized • Limit the amount of replication data in the buffer cache Buffer cache Threshold for asynchronous replication : T + δ OS threshold for flushing buffered data : T Typical Values T 10% of RAM (~13GB) δ 500MB Buffer Cache 20% of RAM (~26GB) 13

  14. Ad Additional topics that are discussed in de detail in the he pa pape per • Other activity metrics and their shortcomings • Consistency • We maintain read and write consistency • Failure handling • Same mechanism as in HDFS to recover from failures • Scalability • Pfimbi’s flow control is distributed 14

  15. Ev Evaluation • 30 worker nodes • NodeManagers collocatedwith DataNodes • 1 Master node • ResourceManager collocatedwith NameNode • Storage • 2TB HDD • 200GB SSD • 128GB DRAM 15

  16. Pf Pfimbi improves job runtime and ex exploits SSDs well DFSIO on HDFS DFSIO on PFIMBI 1000 1000 900 900 800 800 Completion time of replicas(s) Completion time of eplicas(s) 700 700 600 600 500 500 400 400 300 300 200 200 100 100 0 0 HDD->HDD->HDD SSD->HDD->HDD HDD->HDD->HDD SSD->HDD->HDD Configurations Configurations 2 nd replica 1 st replica Syncing dirty data Primary write 16

  17. Ne Necessity of flow control when doing as asynchronous replicat ation Two DFSIO jobs 1000 900 800 700 Completion time (s) 600 500 400 300 200 100 0 Without Flow Control With Flow Control Configurations Remaining Job 1 Job 2 replication 17

  18. Pf Pfimbi performs well for a mix of different jobs: SWI WIM workload 18% IMPROVEMENT IN AVERAGE JOB RUNTIME 18

  19. Po Policy Example: Pfimbi can flexibly divide bandwidth be between repl plica po positions ns Weights in ratio 100:10:1 Equal weights 70 70 60 60 Number of block completions Number of block completions 50 50 40 40 30 30 20 20 10 10 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 Timeline of DFSIO writes (s) Timeline of DFSIO writes (s) 1 st replica 3 rd replica 2 nd replica 19

  20. Re Related Work • Sinbad [SIGCOMM 2013] • Flexible endpointto reduce network congestion • Does not eliminatecontentionwithinnodes • TidyFS [USENIX ATC 2011] • Asynchronousreplication • No flow control leads to arbitrary contention • Retro [NSDI 2015] • Fairness and prioritizationusing rate control • Synchronousreplication 20

  21. Co Conclusion • Pfimbi effectively supports flow controlled asynchronous replication • Successfully balances managing contention and maintaining high utilization • Expressive and backward compatible with HDFS 21

Recommend


More recommend