GraphX : Graph Processing in a Distributed Dataflow Framework OSDI - PowerPoint PPT Presentation

May 06, 2023 •298 likes •555 views

GraphX : Graph Processing in a Distributed Dataflow Framework OSDI 2014 Bidyut Hota Agenda Analytics space background Motivation Goal Approach Optimizations Results Flaws/Limitations Questions Real life Analytics

GraphX : Graph Processing in a Distributed Dataflow Framework OSDI 2014 Bidyut Hota
Agenda • Analytics space background • Motivation • Goal • Approach • Optimizations • Results • Flaws/Limitations • Questions
Real life Analytics Pipeline Raw data Link Table Page Rank Desired results Eg. Google Knowledge graph :570MVertices, 18B Edges ( as in Mid 2017)
Real life Analytics Pipeline Raw data Link Table Page Rank Desired results Tables
Real life Analytics Pipeline Raw data Link Table Page Rank Desired results Graphs
Systems landscape
• Currently separate systems exist to compute on these data representation. • Ability to combine data Motivation sources. • Enhance dataflow frameworks to leverage inherent positives.
Current drawbacks of dataflow frameworks • Implementing iterative algorithms -> requires multiple stages of complex joins. • Do not cover common patterns in graph algorithms -> Room for optimization. • Unlike Spark, no fine grained control of data partitioning.
Current drawbacks of specialized systems • Lacking ability for combining graphs with unstructured or tabular data • Systems favoring snapshot recovery rather than fault tolerance like in Spark
• Immutability of RDD’s What can we • Reusing indices across graph and collection views over iterations. leverage? • Increase in performance
Goal • General purpose distributed frameworks for graph computations • Comparable performances to specialized graph processing systems
Approach • Unifies Tabular view and Graph view • Imbibe the best of specialized systems • Graph representation on dataflow frameworks • Optimizations • Develop GraphX API on top of Spark
Graph approach: Page Rank example • Eg. Page Rank algorithm • Graph parallel abstraction • Define a vertex program • Terminate when vertex programs vote to halt Figure : PageRank in Pregel
Approach • GAS (Gather Apply Scatter) How to apply this in dataflow frameworks? • Map, group-by, join dataflow operators
Representing Property graphs as Tables Never transfer edges
GraphX API
Using the dataflow operators Logical representation Join of vertices table on edges table
Using the dataflow operators on vertex program Userdefined
Optimizations Remote caching Specialized Data Structure Vertex-cut Partitioning Active Set Tracking
Implementing Optimizations • Reusable Hash index • Sequential scan or clustered scan based on active set (Dynamic) • Incremental updates • Automatic Join elimination Additional optimizations: • Memory based shuffle • Batching and columnar structure • Variable Integer encoding
Results
Results Scaling for PageRank Effect of partitioning on on Twitter dataset communication
Current Flaws • Is not optimized for dynamic graphs. • Requires incremental updates to routing table. • Is not designed for streaming applications. • Asynchronous graph computation not available. This is where Naiad will outperform.
Questions

Recommend

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah (SICS) Spark Streaming and GraphX June 30, 2016 1 / 1 Spark Streaming Amir H. Payberah (SICS) Spark Streaming and GraphX June 30, 2016 2 / 1

524 views • 48 slides

Analyzing the Graph-Processing Pipeline: A comparative study of GraphLab and GraphX An open

Analyzing the Graph-Processing Pipeline: A comparative study of GraphLab and GraphX An open source project study Presented by Niko Stahl for R212 Context GraphLab (execution engine: Powergraph) is exclusively built for graph processing.

482 views • 14 slides

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Mining and Graph Kernels GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 | ACM SIG KDD, Las Vegas Graph Mining and Graph

1.28k views • 60 slides

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying General Graph Processing Offline analytics Online querying 2 / 75 Graph Data are Very Common Internet 3 / 75 Graph Data are Very Common Social

986 views • 75 slides

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx Streaming Spark Dataframe Spark Core (RDD) 2 Machine Learning Algorithms Supervised learning Given a set of features and labels Builds a model that

590 views • 24 slides

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri Outline Distributed Graph Processing Gelly: Batch Graph Processing with Flink Gelly-Stream: Continuous Graph Processing with Flink WHEN

1.12k views • 90 slides

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Mining and Graph Kernels GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 | ACM SIG KDD, Las Vegas Graph Mining and

1.16k views • 48 slides

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework Framework on FPGA on FPGA Nina Engelhardt August 31, 2016 Graphs and Graph Traversal Algorithms 1 Vertex-centric Programming Model: From POV of

262 views • 9 slides

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING We manufacture complete, custom built processing lines for both the industry and fresh market, from the reception up to the packing or

770 views • 17 slides

Multiscale Processing on Networks and Community Mining Part 1 - Communities in networks Graph

Introduction Communities in networks Graph Signal Processing Examples of graph signal processing Multiscale Processing on Networks and Community Mining Part 1 - Communities in networks Graph Signal Processing Pierre Borgnat CR1 CNRS

670 views • 51 slides

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

The Chinese University of Hong Kong The Chinese University of Hong Kong Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao, Jeffrey Xu Yu, Philip S. Yu Zhao, Jeffrey Xu Yu,

364 views • 25 slides

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph Computation Think like a vertex Linear algebra Graph Search Find instances of path expressions Graph Mining Mine patterns of

436 views • 16 slides

15-388/688 - Practical Data Science: Graph and network processing J. Zico Kolter Carnegie Mellon

15-388/688 - Practical Data Science: Graph and network processing J. Zico Kolter Carnegie Mellon University Fall 2019 1 Outline Networks and graph Representing graphs Graph algorithms Graph libraries 2 Outline Networks and graph

541 views • 36 slides

Building a Graph Processing System Amitabha Roy (LABOS) 1 X-Stream Graph processing system

X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1 X-Stream Graph processing system Single Machine Works on graphs stored Entirely in RAM Entirely in SSD Entirely on Magnetic Disk

992 views • 75 slides

Medusa Simplified Graph Processing on GPUs Motivation Graph processing algorithms are often

Medusa Simplified Graph Processing on GPUs Motivation Graph processing algorithms are often inherently parallel GPUs consist of many processors running in parallel But writing this code is hard The Solution... Medusa is a

735 views • 18 slides

9/14/16 1 Graph Processing Graphs & Analytics Parallel Graph Processing on Web Graphs

9/14/16 1 Graph Processing Graphs & Analytics Parallel Graph Processing on Web Graphs PageRank Rank websites in search results GPUs, Clusters, and Multicores Belief Propagation Malicious domains & infected hosts Social

599 views • 16 slides

2016 Vegetable Weed Control Herbicide/ Fumigant DIVERSIFICATION Focus Points 1)Weed Seed

2016 Vegetable Weed Control Herbicide/ Fumigant DIVERSIFICATION Focus Points 1)Weed Seed Production 2)Tillage-Stale Seedbed 3)Avoiding Herbicide Injury 4)Specific Crop Programs About 3 wk after emergence = tubers Tillage can be

586 views • 57 slides

Sound Laws Assimilation ingest imbibe < mann-r mar skipta, skipti dma, dmi

Sound Laws Assimilation ingest imbibe < mann-r mar skipta, skipti dma, dmi eya, eyddi brosa, brosti Principle of least effort ingest < mann-r mar skipta, skipti dma, dmi eya, eyddi brosa, brosti Principle

596 views • 24 slides

Sound Laws Assimilation ingest imbibe < mann-r mar dma, dmi skipta, skipti

Sound Laws Assimilation ingest imbibe < mann-r mar dma, dmi skipta, skipti brosa, brosti eya, eyddi Principle of minimal effort ingest < mann-r mar dma, dmi skipta, skipti brosa, brosti eya, eyddi

1.01k views • 18 slides

Introduction to Big Data and Machine Learning Image Processing in Python Dr. Mihail October 24,

Introduction to Big Data and Machine Learning Image Processing in Python Dr. Mihail October 24, 2019 (Dr. Mihail) Intro Big Data October 24, 2019 1 / 17 Image processing Images are 2D numerical arrays Medical imagery such as CTs or MRIs

498 views • 12 slides

no. 3 Vermouth Gin Martini History, Fizzes, Collinses Wednesday, 11 January 2012 Vermouth

no. 3 Vermouth Gin Martini History, Fizzes, Collinses Wednesday, 11 January 2012 Vermouth Photo Credit: 12bottlebar.com Wednesday, 11 January 2012 Vermouth Regions Photo Credit: vermouth101.com Wednesday, 11 January 2012 Gin Photo

450 views • 16 slides

The Life & Times of Jacob FROM FLIGHT T0 RECONCILIATION GENESIS 28:10-36:43 General

The Life & Times of Jacob FROM FLIGHT T0 RECONCILIATION GENESIS 28:10-36:43 General Structural Overview REFERENCE SECTION TITLE The Journey to Haran 28:10-22 The Birthing of a Nation Literally! 29:1-30:24 Jacobs

815 views • 19 slides

ENVIRONMENTAL GEOMECHANICS CE-641 Lecture No. 19 Prof. D N Singh Department of Civil

ENVIRONMENTAL GEOMECHANICS CE-641 Lecture No. 19 Prof. D N Singh Department of Civil Engineering 28.10.2018 Lecture No. 19 Lecture Name: Geomaterial Characterization

340 views • 22 slides

Readings for the Next Lectures Mokyr, Joel (2008), The Contribution of Economic History to the

Readings for the Next Lectures Mokyr, Joel (2008), The Contribution of Economic History to the Study of Innovation and Technical Change, in Handbook of the Economics of Innovation De Vries, Jan (1994), The Industrial Revolution and the

923 views • 59 slides