Asynchronous and Fault-Tolerant Recursive Datalog Evalua9on in - PowerPoint PPT Presentation

Asynchronous and Fault-Tolerant Recursive Datalog Evalua9on in Shared-Nothing Engines Jingjing Wang, Magdalena Balazinska, Daniel Halperin University of Washington

Modern Analy>cs Requires Itera>on • Graph applica>ons – Graph reachability – Connected components – Shortest Path • Machine learning – Clustering algorithms – Logis>c regression • Scien>fic analy>cs – N-body simula>on • … Jingjing Wang - University of Washington 2

Galaxy Evolu>on: An Itera>ve Example A Simula9on of the Universe Galaxy Galaxy … … Picture from D. H. Stalder et. al. arXiv:1208.3444 [astro-ph.CO] Present day Millions of years ago Big Bang Jingjing Wang - University of Washington 3

Galaxy Evolu>on: Itera>ve Lineage Tracing … … Par9cle Galaxy … Present day Millions of years ago Millions of years ago Jingjing Wang - University of Washington

Galaxy Evolu>on: Why It Is not Easy • Large-scale data sizes – Scalability • Itera>ve is the core – Support efficient itera>ve constructs • Users are data scien>sts – Provide an easy-to-use query interface • Shared datasets and resources – Within a data management system Jingjing Wang - University of Washington 5

Itera>ve Analy>cs: Where to Do • SQL Server – Single-node, cannot handle huge scale • MapReduce – Rigid programming model – Write to disk, expensive itera>on • In-memory systems such as Spark – Synchronous opera>ons • Graph engines such as GraphLab – Think like a vertex Jingjing Wang - University of Washington 6

No Exis>ng System Meets All Requirements • Synchronous itera>ons only – AsterixDB, HaLoop, Pregel, REX, Spark, PrIter, Glog, … • Single-node – LogicBlox, DatalogFS, … • No declara>ve language – Stratosphere, Naiad, Grace, GraphLab, … • Specialized for graphs – GraphLab, Grace, … • Not a data management system – SociaLite, … • Theory on recursive queries – DatalogFS, … Jingjing Wang - University of Washington 7

Outline and Contribu>ons • Full-stack solu>on for itera>ve processing – Declara>ve rela>onal query language • A subset of Datalog-with-Aggrega>on – Scalable and easily implementable • Small extensions to exis>ng shared-nothing systems – Efficient itera>ve computa>on • Execu>on models and op>miza>ons • Implementa>on and empirical evalua>on using Jingjing Wang - University of Washington 8

Outline and Contribu>ons • Full-stack solu>on for itera>ve processing – Declara9ve rela9onal query language • A subset of Datalog-with-Aggrega9on – Scalable and easily implementable • Small extensions to exis9ng shared-nothing systems – Efficient itera>ve computa>on • Execu>on models and op>miza>ons • Implementa>on and empirical evalua>on using Jingjing Wang - University of Washington 9

From Datalog Programs to Asynchronous Query Plans • Datalog: a rela>onal query language – Nicely expresses recursions DECLARE @id AS INT, @lvl AS INT CC(x,x) :- Edges(x, ) • Two special operators SET @id = 3 CC(y,$Min(v)) :- CC(x,v), Edges(x,y) SET @lvl = 2 :- CC(y,v) – IDBController ;WITH cte (id, parent, child, lvl) AS ( SELECT id, parent, child, 0 • Maintains state of “nonconstant” rela>ons FROM t WHERE id = 1 – Termina>onController UNION ALL SELECT E.id, E.parent, E.child, M.lvl+1 – Easy extensions to an exis>ng engine FROM t AS E JOIN CTE AS M ON E.parent = M.child WHERE lvl < @lvl • Automa>c compila>on ) SELECT * FROM CTE --where lvl=@lvl --OPTION (MAXRECURSION 10) Jingjing Wang - University of Washington 10

Outline and Contribu>ons • Full-stack solu>on for itera>ve processing – Declara>ve rela>onal query language • A subset of Datalog-with-Aggrega>on – Scalable and easily implementable • Small extensions to exis>ng shared-nothing systems – Efficient itera9ve computa9on • Execu9on models and op9miza9ons • Implementa9on and empirical evalua9on using Jingjing Wang - University of Washington 11

Itera>ve Computa>on: How Can We Do Beqer • Performance impact: # of intermediate tuples – More tuples, more work, more resources • Op>miza>on: recursive execu>on models – Synchronous vs. asynchronous • Op>miza>on: priori>zing tuples – For asynchronous model, favor new tuples vs. base tuples Jingjing Wang - University of Washington 12

Op>miza>on: Recursive Execu>on Models • Synchronous – Stop at the end of each itera>on • Asynchronous – No barrier, propagate updates when ready • Galaxy Evolu>on – Synchronous • Find all galaxies at >mestep 1, then 2, … – Asynchronous • Galaxy A is a part of the evolu>on history • A shares par>cles with galaxy B Jingjing Wang - University of Washington 13

Galaxy Evolu>on: Execu>on Model Does Not Maqer Much 600 500 Time (seconds) 400 300 200 100 0 8 16 32 64 # workers 80GB, 27 snapshots 16 machines Jingjing Wang - University of Washington 14

Another Applica>on: Least Common Ancestor dist:1 1 dist:3 Paper 4 5 dist:2 2 3 Cita>on Jingjing Wang - University of Washington 15

LCA: Asynchronous Can Be Much Slower Than Synchronous 160 140 120 Time (seconds) 100 80 60 40 20 0 8 16 32 64 # workers 2 million papers 8 million cita>ons Jingjing Wang - University of Washington 16

Op>miza>on: Priori>zing Tuples • For asynchronous processing – Choice: favor new tuples vs. base tuples • Example: connected components 1 1 3 4 3 4 2 2 Jingjing Wang - University of Washington 17

Connected Components: Pull Order Impacts Run Time 2000 Time (seconds) 1500 Sync Async, new tuples first 1000 Async, base tuples first 500 0 8 16 32 64 # workers 21 million ver>ces 776 million edges Jingjing Wang - University of Washington 18

Conclusion • Full-stack solu>on for itera>ve big-data analy>cs – A declara>ve language – Small extensions to exis>ng shared-nothing engines – Efficient itera>ve execu>on – Failure handling methods – More details in the paper • Empirical evalua>on of various models – No single method outperforms others – Future work: an adap>ve cost-based op>mizer Jingjing Wang - University of Washington 19

Asynchronous and Fault-Tolerant Recursive Datalog Evalua9on in - PowerPoint PPT Presentation

Asynchronous and Fault-Tolerant Recursive Datalog Evalua9on in Shared-Nothing Engines Jingjing Wang, Magdalena Balazinska, Daniel Halperin University of Washington Modern Analy>cs Requires Itera>on Graph applica>ons Graph

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Computability Abstractions for Fault-tolerant Asynchronous Distributed Computing Julien Stainer

Adaptive Fault Tolerant Systems: Adaptive Fault Tolerant Systems: Reflective Design and

Idealised Fault Tolerant Idealised Fault Tolerant Architectural Element Architectural Element

Order in Datalog with Applications to Declarative Output Stefan Brass University of Halle,

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Fault-tolerant techniques Fault-tolerant techniques What causes component faults? What are the

FAULT-TOLERANT CONTROL Is it possible? JAN MACIEJOWSKI Fault- tolerant control. DPS09,

Building a Fault- Building a Fault- Tolerant Distributed Tolerant Distributed System with

Fault-Tolerant Data Collection in Fault-Tolerant Data Collection in Heterogeneous Intelligent

61A Lecture 6 Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Recursive Methods Noter ch.2 Recursive Methods Recursive problem solution Problems

Recursion Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Inconsistency-Tolerant Query Rewriting for Linear Datalog+/ Thomas Lukasiewicz, Maria Vanina

Overview Introduction and basic concept ECE 753: FAULT-TOLERANT Fault model and fault

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

An Overview of Search Based Software Engineering Shin Yoo / CREST Date 30/01/2013 The 24th CREST

Binsec/RelSE Efficient Constant-Time Analysis of Binary-Level Code with Relational Symbolic

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

O . MODELING AND SCIENTIFIC COMPUTING . MODELLISTICA E CALCOLO SCIENTIFICO . . . M . X

The Metrics Design Pattern Metrics Driven Development Stephanie Kaiser & Horia Dragomir

Getting started with CUDA Part 2 - Host view of GPU computation Edwin Carlinet, Joseph Chazalon {

Unit 1: Evolution 1 Summary - Mon and Wed 1. Wrap up red tape 2. Short answers - the tautology

ENVIRONMENTAL GEOMECHANICS CE-641 Lecture No. 13 Prof. D N Singh Department of Civil

Sambuz

Useful Links

Newsletter

Mail Us

Asynchronous and Fault-Tolerant Recursive Datalog Evalua9on in - PowerPoint PPT Presentation

Asynchronous and Fault-Tolerant Recursive Datalog Evalua9on in Shared-Nothing Engines Jingjing Wang, Magdalena Balazinska, Daniel Halperin University of Washington Modern Analy>cs Requires Itera>on Graph applica>ons Graph

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Computability Abstractions for Fault-tolerant Asynchronous Distributed Computing Julien Stainer

Adaptive Fault Tolerant Systems: Adaptive Fault Tolerant Systems: Reflective Design and

Idealised Fault Tolerant Idealised Fault Tolerant Architectural Element Architectural Element

Order in Datalog with Applications to Declarative Output Stefan Brass University of Halle,

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Fault-tolerant techniques Fault-tolerant techniques What causes component faults? What are the

FAULT-TOLERANT CONTROL Is it possible? JAN MACIEJOWSKI Fault- tolerant control. DPS09,

Building a Fault- Building a Fault- Tolerant Distributed Tolerant Distributed System with

Fault-Tolerant Data Collection in Fault-Tolerant Data Collection in Heterogeneous Intelligent

61A Lecture 6 Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Recursive Methods Noter ch.2 Recursive Methods Recursive problem solution Problems

Recursion Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Inconsistency-Tolerant Query Rewriting for Linear Datalog+/ Thomas Lukasiewicz, Maria Vanina

Overview Introduction and basic concept ECE 753: FAULT-TOLERANT Fault model and fault

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

An Overview of Search Based Software Engineering Shin Yoo / CREST Date 30/01/2013 The 24th CREST

Binsec/RelSE Efficient Constant-Time Analysis of Binary-Level Code with Relational Symbolic

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

O . MODELING AND SCIENTIFIC COMPUTING . MODELLISTICA E CALCOLO SCIENTIFICO . . . M . X

The Metrics Design Pattern Metrics Driven Development Stephanie Kaiser &amp; Horia Dragomir

Getting started with CUDA Part 2 - Host view of GPU computation Edwin Carlinet, Joseph Chazalon {

Unit 1: Evolution 1 Summary - Mon and Wed 1. Wrap up red tape 2. Short answers - the tautology

ENVIRONMENTAL GEOMECHANICS CE-641 Lecture No. 13 Prof. D N Singh Department of Civil

Sambuz

Useful Links

Newsletter

Mail Us

The Metrics Design Pattern Metrics Driven Development Stephanie Kaiser & Horia Dragomir