Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison for Enterprise and Scientific Computing Ben Blamey, Andreas Hellander and Salman Toor Uppsala University, Sweden Ben.Blamey@it.uu.se Bench’19, Denver, USA, November 2019. http://www.benchcouncil.org/bench19/index.html
Summary • Performance Benchmark for Streaming Frameworks – Apache Spark (under various integrations…) – HarmonicIO • Large Message Size (and higher processing cost) – Scientific use cases: microscopy • Key finding: ‘islands’ of good performance over that 2D domain, varying utility w.r.t. theoretical bounds.
Background • Apache Spark – Enterprise grade (resilient, great features, etc.) – Proven performance for typical enterprise use cases. • HASTE Project: – Microscopy use cases – Message Size 1-10MB, >1 second per message. • How well do enterprise tools adapt to sci. computing?
The Parameter Space • 2D Parameter Space ( A ) ) • Theoretical Bounds C P U B o u n d n o i t c – Network n u F p – CPU Ma ( ( B ) N e t w o r k t • How does performance s B o u n d o ( C ) ‘ F r a m e w o r k ’ C generalize across this B o u n d U P domain? C Me s s a g e S i z e
HarmonicIO • Favors P2P message transfer. Image source: Torruangwatthana et al., • Fallback to Master Queue HarmonicIO: Scalable Data Stream Processing for • Processing runs inside Docker containers. Scientific Datasets , IEEE • Intended for scientific computing applications. Services 2018
Methodology icIO A p a c h e S p a r k S t r e a m i n g w . F i l e S t r e a m i n g A p a c h e S p a r k S t r e a m i n g w . T C P A p a c h e S p a r k S t r e a m i n g w . K a f k a H a r m o n Ma s t e r Ma s t e r Ma s t e r Ma s t e r K a f k a S e r v e r Wo r k e r 1 Wo r k e r 1 Wo r k e r 1 Wo r k e r 1 S t r e a m S t r e a m S t r e a m S t r e a m S o u r c e S o u r c e S o u r c e S o u r c e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wo r k e r N - 1 . . . Wo r k e r N Wo r k e r N Wo r k e r N Me s s a g e T r a n s f e r – P 2 P Mo d e F i l e T r a n s f e r ( N F S S h a r e ) Me s s a g e T r a n s f e r Me s s a g e T r a n s f e r Me s s a g e T r a n s f e r – Q u e u e Mo d e F i l e L i s t i n g ( N F S S h a r e )
Experimental Setup CPU Pause, padded to length Spark Streaming Source Benchmarking Application M o n i t o r i n g I P v A S i a e t t L s s o e M g R s , e s s g s o a L g v i e a a i S v R Throttling i g z e e n s , t i r C A o Application P t P i U I n o C M o s t
The Workload StreamingBenchmark.scala
Dark = High Freq Black = Best Light = Low Freq
- Excellent Performance near origin. - 300KHz - Relatively weaker for high CPU Load - Cores used for message forwarding - Crashes for Large Messages.
- Excellent Performance near origin. - At origin, beaten by Spark+TCP - Weaker for high CPU load - Overhead of Kafka server - Weaker for larger messages. - Not intended use case.
- Great Performance at low frequencies. - Sparks’ filesystem polling struggles at high frequency.
- Good overall performance. - Able to match performance of Spark+FS, and Spark+Kafka in their regions of good performance - …and in between. - Struggles at higher frequencies near origin.
Results – Theoretical Bounds
Performance for nil CPU Load
Discussion • ‘islands’ of good performance.
Conclusions • Choice of Spark Integration matters – depends on the parameters, frequency. • 2D Parameter Sweep is a nice way to viz. performance. • Various phenomenon visible only in some regions: – Bottlenecks, overhead costs. – Varying utility (w.r.t. theoretical bounds). • ‘Middle Region’ – 1-10Mb, >1 second cost – Neglected in streaming benchmark studies? – A region where HarmonicIO does well.
Funding The HASTE Project (http://haste.research.it.uu.se/) is funded by the Swedish Foundation for Strategic Research (SSF) under award no. BD15-0008, and the eSSENCE strategic collaboration for eScience.
Questions? Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison for Enterprise and Scientific Computing Ben Blamey, Andreas Hellander and Salman Toor Uppsala University, Sweden http://haste.research.it.uu.se/ https://github.com/HASTE-project/HarmonicIO https://github.com/HASTE-project
Results
Recommend
More recommend