Apache Giraph Large-scale Graph Processing on Hadoop Claudio - PowerPoint PPT Presentation

Aug 26, 2023 •309 likes •768 views

Apache Giraph Large-scale Graph Processing on Hadoop Claudio Martella <claudio@apache.org> @claudiomartella 2 Graphs are simple 3 A computer network 4 A social network 5 A semantic network 6 A map 7 Predicting break ups Graph

Apache Giraph Large-scale Graph Processing on Hadoop Claudio Martella <claudio@apache.org> @claudiomartella
2
Graphs are simple 3
A computer network 4
A social network 5
A semantic network 6
A map 7
Predicting break ups Graph approach Aggregation approach 8
Graphs are nasty. 9
Each vertex depends on its neighbours, recursively. 10
Recursive problems are nicely solved iteratively. 11
12
PageRank in MapReduce • Record: < v_i, pr, [ v_j, ..., v_k ] > • Mapper: emits < v_j, pr / #neighbours > • Reducer: sums the partial values 13
MapReduce dataflow 14
Drawbacks • Each job is executed N times • Job bootstrap • Mappers send PR values and structure • Extensive IO at input, shuffle & sort, output 15
16
Timeline • Inspired by Google Pregel (2010) • Donated to ASF by Yahoo! in 2011 • Top-level project in 2012 • 1.0 release in January 2013 • 1.1 release in November 2014 17
Plays well with Hadoop 18
Vertex-centric API 19
Shortest Paths 20
Shortest Paths 21
Shortest Paths 22
Shortest Paths 23
Shortest Paths 24
Code def compute(vertex, messages): minValue = Inf # float(‘Inf’) for m in messages: minValue = min(minValue, m) if minValue < vertex.getValue(): vertex.setValue(minValue) for edge in vertex.getEdges(): message = minValue + edge.getValue() sendMessage(edge.getTargetId(), message) vertex.voteToHalt() 25
26
27
28
29
BSP & Giraph 30
Advantages • No locks: message-based communication • No semaphores: global synchronization • Iteration isolation: massively parallelizable 31
Designed for iterations • Stateful (in-memory) • Only intermediate values (messages) sent • Hits the disk at input, output, checkpoint • Can go out-of-core 32
Giraph job lifetime 33
Architecture 34
Composable API 35
Checkpointing 36
No SPoFs 37
Giraph scales ref: https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion- edges/10151617006153920 38
Giraph is fast • 100x over MR (Pr) • jobs run within minutes • given you have resources ;-) 39
Serialised objects 40
Primitive types • Autoboxing is expensive • Objects overhead (JVM) • Use primitive types on your own • Use primitive types-based libs (e.g. fastutils) 41
Sharded aggregators 42
Okapi • Apache Mahout for graphs • Graph-based recommenders: ALS, SGD, SVD++, etc. • Graph analytics: Graph partitioning, Community Detection, K-Core, etc. 43
Thank you http://giraph.apache.org <claudio@apache.org> @claudiomartella

Recommend

Outline Vienna, Austria - introduction to the giRaph package The giRaph package for graph

giRaph: The giRaph package for graph representation in R giRaph: The giRaph package for graph representation in R 2 nd International R User Conference June 1517, 2006 Wirtschaftuniversit at Wien Outline Vienna, Austria - introduction to

220 views • 4 slides

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora Team Renato

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora Team Renato Marroqun PhD student: Interested in: Information retrieval. Distributed and scalable data management . Apache Gora: PPMC Member

770 views • 52 slides

Giraph: Production-grade graph processing infrastructure for trillion edge graphs 6/22/2014

Giraph: Production-grade graph processing infrastructure for trillion edge graphs 6/22/2014 GRADES Avery Ching Motivation Apache Giraph Inspired by Googles Pregel but runs on Hadoop Think like a vertex Maximum value

816 views • 43 slides

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF What Is Apache CXF Production

465 views • 25 slides

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About cziegeler@apache.org @cziegeler RnD Team at Adobe Research Switzerland Member of the Apache So fu ware Foundation Apache Felix and Apache

725 views • 26 slides

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The Apache Way The Apache Way The Apache Way The Apache Way A collaborative slide deck with A collaborative slide deck with A collaborative slide deck

493 views • 45 slides

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian Tzolov Whoami Christian Tzolov Engineer at Pivotal, Big-Data, Hadoop, Spring Cloud Dataflow, Apache Geode, Apache HAWQ, Apache Committer, Apache

796 views • 41 slides

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache Crail (crail.apache.org) Apache Crail (crail.apache.org) Ephemeral Data HDFS, Input data S3 Map-reduce job Broadcast Map Shuffle Reduce

393 views • 36 slides

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb serb@apache.org 2016.11.15 @ErbStephan Apache Aurora https://aurora.apache.org Mesos

325 views • 31 slides

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise @atrato_io October 30, 2017, Dagstuhl Seminar Stream Processing with Apache Apex Real-time visualization, Transform / Analytics Data Sources Data

398 views • 22 slides

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's New with Apache Tika? Nick Burch @Gagravarr @Gagravarr Nick Burch @Gagravarr Nick Burch @Gagravarr Nick Burch CTO,

941 views • 65 slides

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng Wang, Intel (huafengw@apache.org) Apache: Big Data Europe 2016 Sevilla, Spain 14 November 2016 Agenda What is Gearpump? Why Apache

853 views • 60 slides

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

9/7/12 Avoiding Vendor Lock-in Using Apache Libcloud [www.tomaz.me] Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud Tomaz Muraus Tomaz Muraus tomaz@apache.org tomaz@apache.org Cloud Open 2012,

609 views • 26 slides

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr Gordon Russell Lecturers: G. Russell This lecture Configuring Apache Mod_rewrite Discussions Configuring Apache Apache

624 views • 48 slides

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker Introduction Introducing Apache Syncope Apache Syncope basics Apache Syncope is an Open Source system for managing digital identities in enterprise

929 views • 33 slides

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation Member uschindler@apache.org http://www.thetaphi.de, http://blog.thetaphi.de @ThetaPh1 SD DataSolutions GmbH , Wtjenstr. 49, 28213 Bremen,

859 views • 51 slides

[CoolName++]: A Graph Processing Framework for Charm++ Hassan Eslami, Erin Molloy, August Shi,

[CoolName++]: A Graph Processing Framework for Charm++ Hassan Eslami, Erin Molloy, August Shi, Prakalp Srivastava Laxmikant V. Kale Charm++ Workshop University of Illinois at Urbana-Champaign { eslami2,emolloy2,awshi2,psrivas2,kale }

382 views • 26 slides

I. Research Question How can we extract data from a social network on an large scale? Joseph

P RYING D ATA F ROM A S OCIAL N ETWORK Joseph Bonneau jcb82@cl.cam.ac.uk Jonathan Anderson jra40@cl.cam.ac.uk Computer Laboratory George Danezis gdane@microsoft.com ASONAM Conference Athens, Greece July 20, 2009 Joseph Bonneau (University

456 views • 33 slides

NLP at Georgia Tech Subash Chebolu and Jacob Hoylman Revalent people Faculty: Jacob Eisenstein

NLP at Georgia Tech Subash Chebolu and Jacob Hoylman Revalent people Faculty: Jacob Eisenstein PhD Students: Yi Yang, Umashanthi Pavalanathan, Sandeep Soni, Ian Stewart, and Yuval Pinter Jacob Eisenstein Focuses on non-standard language,

830 views • 9 slides

The Design, Set-up and Management of a Crisis Management Training Centre Poon Ngee Deputy

The Design, Set-up and Management of a Crisis Management Training Centre Poon Ngee Deputy Director, Home Team Simulation System Centre for Skills Transformation Home Team Academy Scope Scope Scope Scope Introduction Design, Set-up

457 views • 28 slides

Applying Social Network Analysis to the Information in CVS Repositories Luis L opez-Fern

Applying Social Network Analysis to the Information in CVS Repositories Luis L opez-Fern andez, Gregorio Robles, Jes us M. Gonz alez Barahona, GSyC, Universidad Rey Juan Carlos, Madrid, Spain { llopez,grex,jgb } @gsyc.escet.urjc.es

340 views • 18 slides

our thunderclap here to participate in the national day of action in support of this program.

Wednesday, August 30, 2017 Thanks to those who have already sponsored CEFs 2017 Gala! Straight As sponsor : American Federation of Teachers Star Student sponsors : American Continental Group, Lumina Foundation, National Association of

457 views • 23 slides

We Value Nature Virtual Office Hour call 24 September 2020 What is a Virtual Office Hour call

We Value Nature Virtual Office Hour call 24 September 2020 What is a Virtual Office Hour call & how does it work? A Virtual Office Hour call offers you a dedicated time and space to ask questions and have group discussions. The aim is to:

333 views • 29 slides

Case study Web Mining and Recommender Systems Using Regression to Predict Content Popularity on

Case study Web Mining and Recommender Systems Using Regression to Predict Content Popularity on Reddit Images on the web To predict whether an image will become popular , it helps to know Its audience , or the community it was submitted to

488 views • 15 slides