Experiences with the Model-based Generation of Big Data Pipelines - PowerPoint PPT Presentation

Experiences with the Model-based Generation of Big Data Pipelines Holger Eichelberger, Cui Qin, Klaus Schmid {eichelberger, qin, schmid}@sse.uni-hildesheim.de Software Systems Engineering University of Hildesheim www.sse.uni-hildesheim.de

Motivation • Background FP7 QualiMaster: – Configurable and adaptive data processing infrastructure – Real-time financial risk analysis • Programming applications for Big Data frameworks is complex • Ideal: Focus on data processing, ignore technical complexity • Goal: – Model-based approach to stream processing – Hide complexity Experiences – Ease development Lessons learned – Generate complex parts of code – Support self-adaptation 3/6/2017 1 Experiences with the Model-based Generation of Big Data Pipelines

Model-based design • Basis: Concept analysis – Fixed stream operators (e.g., Borealis, PIPES) – User-defined operators / algorithms (e.g., Storm, Heron) – Combinations (e.g., Spark, Flink) • Common concept: Data flow graph Common concept: Data flow graph • Typically represented as program • Recent trend: DSL Data source Data processors Data sink P1 P2 P3 3/6/2017 2 Experiences with the Model-based Generation of Big Data Pipelines

Specific modeling concepts Data processing pipeline P1 P2 P3 Algorithm family Sink Source P2.1 P2.1 P2.1 P2.1 Hardware co- Simple algorithm Sub-pipeline processor • Domain restrictions – Must be a valid data flow graph – If P s → P e , P s must provide types that P e can process – Interface compatibility between families and algorithms 3/6/2017 3 Experiences with the Model-based Generation of Big Data Pipelines

Modeling support Underlying: Own model- managment framework Domain-specific modeling frontend 5 3/6/2017 Experiences with the Model-based Generation of Big Data Pipelines

Code generation Generated Pipelines / • Architecture Applications Management Stack – Heterogeneous resource pool Intermediary Layer – Intermediary layer extending Storm Stream Processing Framework Reconfigurable – Management layer (Apache Storm) Hardware for runtime • Generation steps – Family interfaces 16 pipelines – Data serialization support • x 7 code produced – Integration of hardware co-processors • ~880 MB deployable components – Pipelines / sub-pipelines, switching – Compile, integrate dependencies, package 6 3/6/2017 Experiences with the Model-based Generation of Big Data Pipelines

Experiences and Lessons learned (1) • 7 data engineers from 3 groups, 6 large pipelines • Beginning of the project – Sceptical about model-based approach – Initial version after some months – Hands-on workshops – Feedback: – Feedback: • Puzzled about type saftey • First own generated pipelines helped • Change of focus: More on algorithms • Requests for new features, reports on buggy features • Confidence increased with improved versions (~1 year) 7 3/6/2017 Experiences with the Model-based Generation of Big Data Pipelines

Experiences and Lessons learned (2) • Later phases – Interfaces help to structure work – Typing helps avoiding runtime errors – “Magic“ of generated code • serialization • parameters • algorithm switching – Complex structures due to additional nodes, communication – For sub-pipelines: Manual / generated code perform the same – Shields from complex coding 8 3/6/2017 Experiences with the Model-based Generation of Big Data Pipelines

Experiences and Lessons learned (2) • Center of integration → Higher workload • Supports evolution – Consistent deployment of changes – Algorithms must be evolved manually – Also errors are deployed easily • Continuous integration – Generation and algorithms – Up-to date pipelines are available – Intensive tests increase overall build time → local debugging first • Effects – Focus of work on algorithms – Allows realization and evolution of complex structures – Avoid runtime issues – Stability increases confidence, requires higher quality assurance 9 3/6/2017 Experiences with the Model-based Generation of Big Data Pipelines

Conclusions • Model-based approach for streaming Big Data applications – Type-safe – Heterogeneous data processing (hardware co-processors) – Flexible exchange of algorithms • Code generation for Apache Storm • Approach pays off • Approach pays off – Positive feedback – Requires training, modeling effort, effort for realization of transformation, maintenance and evolution • Future: Optimized code generation for self-adaptation – Switching efficiency Optimized resource usage is already reality! – Multiple target platforms 10 3/6/2017 Experiences with the Model-based Generation of Big Data Pipelines

Experiences with the Model-based Generation of Big Data Pipelines - PowerPoint PPT Presentation

Experiences with the Model-based Generation of Big Data Pipelines Holger Eichelberger, Cui Qin, Klaus Schmid {eichelberger, qin, schmid}@sse.uni-hildesheim.de Software Systems Engineering University of Hildesheim www.sse.uni-hildesheim.de

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

SCA Based SCA Based SCA Based SCA Based Wideband Networking Wideband Networking Waveforms

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

SEGMENT IV: PRESENT SEGMENT IV: PRESENT EXPERIENCES AND PLANS EXPERIENCES AND PLANS NIMH- -BAS

Green Jobs Employment experiences Green Jobs Employment experiences Green Jobs Employment

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

HPE SecureData for Big Data Platform HPE Vertica Big Data Platform HPE Security Data

BIG DATA IN HIGH ENERGY PHYSICS Igor Mandrichenko Big Data meeting 4/3/2015 What is Big Data ?

Solving Triangles and the Law of Cosines In this section we work out the law of cosines from our

AM 205: lecture 17 Last time: introduction to optimization Today: scalar and vector

Problem Solving (Chapter 8 in Transitions) Trevor Hawkes Coventry University

The Brahmahuptas theorem after Coxeter Alexander Mednykh Sobolev Institute of Mathematics

Log-Concavity of Characteristic Polynomials and Toric Intersection Theory Eric Katz (University

Urbana Elementary Feasibility Study Board Presentation

Distinguishing Parallel and Distributed Computing Performance CCDSC 2016

EEM 3117 Introduction Dr. Sezai Taskin Department of Electrical&Electronics Engineering

Sambuz

Useful Links

Newsletter

Mail Us

Experiences with the Model-based Generation of Big Data Pipelines - PowerPoint PPT Presentation

Experiences with the Model-based Generation of Big Data Pipelines Holger Eichelberger, Cui Qin, Klaus Schmid {eichelberger, qin, schmid}@sse.uni-hildesheim.de Software Systems Engineering University of Hildesheim www.sse.uni-hildesheim.de

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

SCA Based SCA Based SCA Based SCA Based Wideband Networking Wideband Networking Waveforms

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

SEGMENT IV: PRESENT SEGMENT IV: PRESENT EXPERIENCES AND PLANS EXPERIENCES AND PLANS NIMH- -BAS

Green Jobs Employment experiences Green Jobs Employment experiences Green Jobs Employment

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

HPE SecureData for Big Data Platform HPE Vertica Big Data Platform HPE Security Data

BIG DATA IN HIGH ENERGY PHYSICS Igor Mandrichenko Big Data meeting 4/3/2015 What is Big Data ?

Solving Triangles and the Law of Cosines In this section we work out the law of cosines from our

AM 205: lecture 17 Last time: introduction to optimization Today: scalar and vector

Problem Solving (Chapter 8 in Transitions) Trevor Hawkes Coventry University

The Brahmahuptas theorem after Coxeter Alexander Mednykh Sobolev Institute of Mathematics

Log-Concavity of Characteristic Polynomials and Toric Intersection Theory Eric Katz (University

Urbana Elementary Feasibility Study Board Presentation

Distinguishing Parallel and Distributed Computing Performance CCDSC 2016

EEM 3117 Introduction Dr. Sezai Taskin Department of Electrical&amp;Electronics Engineering

Sambuz

Useful Links

Newsletter

Mail Us

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

EEM 3117 Introduction Dr. Sezai Taskin Department of Electrical&Electronics Engineering