CS 744: Powergraph Shivaram Venkataraman Fall 2020
ADMINISTRIVIA ! ! - Midterm update Tonight → - Course Project reminders groups Discussion - id - email Group Number : - Piazza group corresponding the You can join - week ! next from start this slot OH : - extra
Applications f- Spark streaming Naiad , Machine Learning SQL Streaming Graph → - - - Computational Engines Scalable Storage Systems Resource Management Datacenter Architecture
↳ GRAPH DATA Datasets Application " friend " - > recommendation graph network pair .ir .am#y.mrtgg..ponltrgeg:i7nm Social I . link PageRank web pages , Internet ! → 2 . connected Hosts are out , e. Fagots s . → etc others Paper 't cites Papert cites . 4 . actor frame . .ru/Btonimt ! dependencies Software 5 . Akka . . . → Spark
↳ GRAPH ANALYTICS on queries see L Perform computations on graph-structured data data Tabular Examples PageRank Shortest path g Connected components → …
11 " " " of - → Vow Ef PREGEL: PROGRAMMING MODEL vet -7in :c : : : a → Message combiner(Message m1, Message m2): - return Message(m1.value() + m2.value()); - \ void PregelPageRank(Message msg): float total = msg.value(); - - - State q vertex.val = 0.15 + 0.85*total; of - this 4 3 2 vertex foreach(nbr in out_neighbors): - - SendMsg(nbr, vertex.val/num_out_nbrs); = ]rh%eat Neighbors from het messages e) & ° messages coalesces combiner e ) combined message convergence the computation using , Neighbors to Send out msgs (4)
↳ ↳ ↳ NATURAL GRAPHS skewed ! degree is Distribution of a) small degree vertices have most - very high degree vertices have some q - lead to in skew vertices High degree (2) Communication premiere ( state ) memory computation D a - graphs a such partition Hard to
POWERGRAPH Programming Model: Execution → Gather-Apply-Scatter Better Graph Partitioning with vertex cuts Distributed execution (Sync, Async)
⑦ → Are GATHER-APPLY-SCATTER Az ⑦ state 0 AHAHA , quieter fedt~veri.IE ⑦ As // gather_nbrs: IN_NBRS Gather: Accumulate info from nbrs gather(Du, D(u,v), Dv): - - - - return Dv.rank / #outNbrs(v) Apply: Accumulated value to vertex sum(a, b): return a+b - → - apply(Du, acc): Scatter: Update adjacent edges, vertices - - rnew = 0.15 + 0.85 * acc - Du.delta = (rnew - Du.rank)/ accumulator in value returns - father an → change #outNbrs(u) ! accumulators - You Du.rank = rnew combine can spark - in reduction - similar to // scatter_nbrs: OUT_NBRS - scatter(Du,D(u,v),Dv): vertex from neighboring Activate a on if(|Du.delta|> ε) Activate(v) only to Allows scatter us return delta → next in vertices necessary process iteration
↳ Could into run EXECUTION MODEL, CACHING conditions race h Hath na : :* .li#*.eaon-atel:oii Ftl machine Single vertex . !÷7n÷e gather . . • - state ' ' ¥¥ - F u ,qedge Active Queue . P - - - . . . Huyser .fm/aaufaa.fau4aIy ¥ ⇒ ¥ → ✓ apses ? ! .ae ) . . Eat Eat . scatter UD u÷÷÷+.l÷÷at → mainframes need " Delta caching Cache accumulator value for vertex → operations future Optionally scatter returns a delta → - A- sync Accumulate deltas Syne rs . .
V1 SYNC VS ASYNC of Queue / operations Vz - # ' vs Sync Execution Async Execution . Gather for all active vertices, Execute active vertices, → - - - - followed by Apply, Scatter as cores become available - Barrier Barrier after each minor-step No Barriers! Optionally serializable read GUD Vertenl her , state vertex ensures updates neighbor AUD GUD → edge state . huh ) update Barrier state ? update Acv Acu ) visible in is local Alva , state GCVD mirror Barrier next so ? step :
DISTRIBUTED EXECUTION Symmetric system, no coordinator state 1 € :E Load graph into each machine partition Communicate across machines to spread updates, read state
GRAPH PARTITIONING mirror mirror I 1 - O ' O ① placed placed a on is is edge → Every vertex Every → machine machine a them across across be span might might Edges Vertices → machines graphs v. Natural balance for → edges across Better lots of → → graphs natural machines !
↳ ↳ qmachiez RANDOM, GREEDY OBLIVIOUS t - - machine I - - ← - Three distributed approaches: ② B Random Placement through edges stream machine random edge to send a Coordinated Greedy Placement that already has machine 6- send edge a vertices its of one don't have Oblivious Greedy Placement parallel so you ↳ greedy in machine vertex → knowledge of perfect
OTHER FEATURES Async Serializable engine - Preventing adjacent vertex from running simultaneously Acquire locks for all adjacent vertices → Fault Tolerance [IIFhfFj super step Checkpoint at the end of super-step for sync .
SUMMARY Gather-Apply-Scatter programming model Vertex cuts to handle power-law graphs Balance computation, minimize communication
DISCUSSION https://forms.gle/rKB5hcJgT4NQsFgq8
↳ Consider the PageRank implementation in Spark vs synchronous PageRank in PowerGraph. What are some reasons why PowerGraph might be faster? computation wasteful Activate ensures no → - graph - grained Power communication in fine → ! partitioning Better computation avoids caching → Delta →
NEXT STEPS Next class: GraphX Co - partitioning spark Partitioning in → : .me?:::::E:::nYJsrr . µ ! iterations ✓ methods to has Power graph vertices go what fick partition in a
Recommend
More recommend