Scalability! But at what COST? Frank McSherry, Michael Isard, Derek - PowerPoint PPT Presentation

Scalability! But at what COST? Frank McSherry, Michael Isard, Derek G. Murray Alex Gubbay

What's Wrong With Distributed Systems Reporting? • Scalability often touted as the most important feature • Fail to evaluate absolute performance • Direct distributed system design towards salability from better systems NAIAD computation before (system A) and after (system B) optimisation [1]

COST – Configuration that Outperforms a Single Thread • A distributed hardware configuration that outperforms a single threaded implementation. • Investigate published performance of distributed systems and compare a reasonable implementation on a single core • Consider total run time • Some systems have unbounded COST!

Comparisons Against Existing Systems • PageRank • Connected Components – Label Propagation • Implemented in C# on high end 2014 laptop Two implementations 1. Basic 2. Optimised

Optimisations of the Baseline • Better Graph Layout • Naïve implementation processes in vertex order • GraphLab and GraphX partition to reduce communication between workers [3,4] • Ordering on the single thread impacts cache performance • Edge ordering described by a Hilbert curve • Better Algorithm • Label Propagation is not an optimal algorithm [5] • Union Find runs in 𝑃(𝑛 log 𝑜)

[1,2,3,4] Results and COST Evaluation - PageRank Scalable System Cores Twitter (Secs) UK Internet 2007 (Secs) GraphChi 2 3160 6972 Stratosphere 16 2250 - X-Stream 16 1488 - Spark 128 857 1759 Giraph 128 596 1235 GraphLab 128 249 833 GraphX 128 419 462 Single Thread (SSD) 1 300 651 Single Thread (RAM) 1 275 - Hilbert Order (SSD) 1 242 256 Hilbert Order (RAM) 1 110 -

[1,2,3,4] Results and COST Evaluation - PageRank

Results and COST Evaluation – Connected Components Scalable System Cores Twitter (Secs) UK Internet 2007 (Secs) GraphLab 128 242 714 GraphX 128 251 800 Single Thread (SSD) 1 153 417 Hilbert Order (SSD) 1 15 30 Two NAIAD Implementations for Connected Components

Conclusions • Clearly need to consider absolute performance • Distributed systems have a surprisingly high overhead • “Important to distinguish scalability from efficient use of resources” [1] But • More to consider than computation time • Hardware environment – cluster hardware vs laptop • Systems described are prototypes • Qualitative advantages of distributed system • High availability, security, ecosystem integration

Questions?

References 1. F. McSherry, M. Isard and D. Murray: Scalability! But at what COST? , HOTOS, 2015 2. Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Is- ard, Paul Barham, and Mart ́ın Abadi. Naiad: A Timely Dataflow System . SOSP 2013. 3. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, Carlos Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs . OSDI 2012. 4. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, and Michael J. Franklin, and Ion Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework . OSDI 2014. 5. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. PEGASUS: Mining Peta-Scale Graphs . ICDM 2009.

Scalability! But at what COST? Frank McSherry, Michael Isard, Derek - PowerPoint PPT Presentation

Scalability! But at what COST? Frank McSherry, Michael Isard, Derek G. Murray Alex Gubbay What's Wrong With Distributed Systems Reporting? Scalability often touted as the most important feature Fail to evaluate absolute performance

Scalability and Replication Marco Serafini COMPSCI 532 Lecture 13 Scalability 2 Scalability

Performance and Scalability (Chapter 11) Performance and Scalability Performance: How long

Root zone scalability model Bart Gijsen October 28, 2009 Root zone scalability model

Versioning of Topic Map Templates Structuring Versioning and Scalability Scalability Proc.

Hidden Scalability Gotchas Gotchas Hidden Scalability in Memcached Memcached and Friends and

Improving Scalability and Fault Improving Scalability and Fault Tolerance in an Application

Linux multi-core scalability Oct 2009 Andi Kleen Intel Corporation andi@firstfloor.org

Scalability: Pushing the Limits PNSQC Presentation, October 2014 Neha Rai, Tim Schooley, Tejas

Scalability Testing of Kadeploy using Virtual Machines on Grid5000 Luc Sarzyniec, S

Scalability of web applications CSCI 470: Web Science Keith Vertanen Overview Scalability

Scalability and Stability of IP and Compact Routing Huaiyuan Ma PhD defense presentation Feb

WP2 - Scalability and distributed Bigdata Marc X. Makkes Email: m.x.makkes@vu.nl WP2 Status

Purdue Scale-Up Conference USAIDs Toolkit for Assessing Scalability: Lessons from Development

Database Scalability {Patterns} / Robert Treat robert treat omniti postgres oracle - mysql

Software Abstractions for Extreme-Scale Scalability of Computational Frameworks Martin Berzins

An Analysis of Linux Scalability to Many Cores Silas Boyd-Wickizer, Austin T. Clements, Yandong

DO YOU WALK THE LINE? Dr. Irina Weisblat Modeling the Standards for Assistant Professor Ashford

District NWEA Winter Update Adam Sax Administrator for Integration of T eaching, Learning

Syllable-based compression for XML Katsiaryna Chernik, Jan Lnsk, Leo Galambo Dept. of

Band INSTRUMENT CHOICE What is band? Ensemble of wind and percussion instruments Play a

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference & Expo 2011

Promising Practices in Disaster Behavioral Health (DBH) Planning: Plan Scalability August 30,

Testing CLTS Approaches for Scalability: Project Briefing Jonny Crocker & Vidya Venkataramanan

Scalable financial solutions for energy renovations Best practices from Utrecht Region. Whats

Scalability! But at what COST? Frank McSherry, Michael Isard, Derek - PowerPoint PPT Presentation

Scalability! But at what COST? Frank McSherry, Michael Isard, Derek G. Murray Alex Gubbay What's Wrong With Distributed Systems Reporting? Scalability often touted as the most important feature Fail to evaluate absolute performance

Scalability and Replication Marco Serafini COMPSCI 532 Lecture 13 Scalability 2 Scalability

Performance and Scalability (Chapter 11) Performance and Scalability Performance: How long

Root zone scalability model Bart Gijsen October 28, 2009 Root zone scalability model

Versioning of Topic Map Templates Structuring Versioning and Scalability Scalability Proc.

Hidden Scalability Gotchas Gotchas Hidden Scalability in Memcached Memcached and Friends and

Improving Scalability and Fault Improving Scalability and Fault Tolerance in an Application

Linux multi-core scalability Oct 2009 Andi Kleen Intel Corporation andi@firstfloor.org

Scalability: Pushing the Limits PNSQC Presentation, October 2014 Neha Rai, Tim Schooley, Tejas

Scalability Testing of Kadeploy using Virtual Machines on Grid5000 Luc Sarzyniec, S

Scalability of web applications CSCI 470: Web Science Keith Vertanen Overview Scalability

Scalability and Stability of IP and Compact Routing Huaiyuan Ma PhD defense presentation Feb

WP2 - Scalability and distributed Bigdata Marc X. Makkes Email: m.x.makkes@vu.nl WP2 Status

Purdue Scale-Up Conference USAIDs Toolkit for Assessing Scalability: Lessons from Development

Database Scalability {Patterns} / Robert Treat robert treat omniti postgres oracle - mysql

Software Abstractions for Extreme-Scale Scalability of Computational Frameworks Martin Berzins

An Analysis of Linux Scalability to Many Cores Silas Boyd-Wickizer, Austin T. Clements, Yandong

DO YOU WALK THE LINE? Dr. Irina Weisblat Modeling the Standards for Assistant Professor Ashford

District NWEA Winter Update Adam Sax Administrator for Integration of T eaching, Learning

Syllable-based compression for XML Katsiaryna Chernik, Jan Lnsk, Leo Galambo Dept. of

Band INSTRUMENT CHOICE What is band? Ensemble of wind and percussion instruments Play a

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference &amp; Expo 2011

Promising Practices in Disaster Behavioral Health (DBH) Planning: Plan Scalability August 30,

Testing CLTS Approaches for Scalability: Project Briefing Jonny Crocker &amp; Vidya Venkataramanan

Scalable financial solutions for energy renovations Best practices from Utrecht Region. Whats

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference & Expo 2011

Testing CLTS Approaches for Scalability: Project Briefing Jonny Crocker & Vidya Venkataramanan