Janus: Transactional Processing of Navigational and Analytical Graph - - PowerPoint PPT Presentation

janus transactional processing of
SMART_READER_LITE
LIVE PREVIEW

Janus: Transactional Processing of Navigational and Analytical Graph - - PowerPoint PPT Presentation

Janus: Transactional Processing of Navigational and Analytical Graph Queries on Many-core Servers Kevin Wilkinson Hideaki Kimura Alkis Simitsis (speaker) Hewlett Packard Labs 1/15 Take-away Graph Engine on modern servers for both


slide-1
SLIDE 1

1/15

Janus: Transactional Processing of Navigational and Analytical Graph Queries on Many-core Servers

Alkis Simitsis Hideaki Kimura

Kevin Wilkinson

(speaker)

Hewlett Packard Labs

slide-2
SLIDE 2

2/15

◉Graph Engine on modern servers for both navigational and analytic queries. ◉Leverages Transaction Processing.

Take-away

slide-3
SLIDE 3

3/15

Navigational vs Analytic Graph Queries Navigational

High-throughput Accesses few vertices/edges

Analytic

Resource-intensive Accesses a large fraction of graph e.g., Pair-wise shortest path, “Can he see my LinkedIn prof.” e.g., PageRank, Graph clustering

slide-4
SLIDE 4

4/15

Existing Graph Engines

◉Optimized either for navigational (e.g., Neo4j),

  • r for analytic queries (e.g., GraphLab)

◉Limited scalability on many-core ◉Poorly leverages large, NUMA Memory ◉No fast, concurrent updates

slide-5
SLIDE 5

5/15

Janus

◉Runs both type of queries as well as concurrent updates ◉Exploits emerging server hardware; many- cores, large DRAM/NVM. ◉Built on Transaction Processing engine (FOEDUS)

Reason 1 : Concurrent/serializable update. Obvious. Reason 2 : Scalability. To parallelize a query.

slide-6
SLIDE 6

6/15

Parallelizing a Graph Query as Transactions

◉Serializability is must; otherwise loop forever. ◉Scalability is must; many-cores, large NUMA.

Single-Source Shortest-Path (SSSP)

“Distributed GraphLab” [Low et al, VLDB’12]

Parallel workers issue millions of concurrent transactions.

slide-7
SLIDE 7

7/15

Janus Architecture

Insert/Delete Ingestion Xcts

Reads Writes

slide-8
SLIDE 8

8/15

Partitioning Graph and Workers

◉NUMA-aware partition for permanent graph, intermediate data, and workers. ◉Locality matters. Co-locate data w/ workers. ◉Needs a database that supports flexible partitioning and data-worker co-location.

slide-9
SLIDE 9

9/15

Pair-wise shortest-path Impl. in Janus

◉Good-old Dijkstra. ◉A NUMA-aware worker. ◉Serializable Reads on Graph.

Node Dist. A 5 B 13 C 3

Distance hashtable

A:5 … …

Relaxation min-heap S A T B C

5 3 10 7 6

From Edges S A:5, C:3.. A T:6 … …

Graph Data

Intermediate Data on worker-local memory Global Memory

Serializable Reads

Navigational Worker (Snapshot Reads from NVM as of same epoch) “FOEDUS”, [SIGMOD’15]

slide-10
SLIDE 10

10/15

SSSP Impl. In Janus

◉Distributed Bellman-Ford ◉Analytic-workers cooperatively maintain global memory. ◉Processes billions of highly contended Xcts

  • n Intermediate Data

Node Dist. A 5 B 13 C 3

Distance hashtable Activation bitmap Intermediate Data on global memory Analytic Workers “Mostly Optimistic Concurrency Control” [VLDB’17]

slide-11
SLIDE 11

11/15

Experiments

◉Shortest-Path Navigational : Pair-wise Analytic : SSSP ◉Compared with Neo4J (navigational) and Distributed GraphLab ◉H/W: HP DragonHawk, 240-Cores and 12 TB DRAM (not yet NVRAM) on 16-Sockets

# Nodes # Edges SMALL 2 M 37 M MEDIUM 97 M 1600 M LARGE 403 M 6500 M

slide-12
SLIDE 12

12/15

Loading and Navigational Throughput

1 10 100 1000 10000 100000 1000000

Navigational Query Throughput [TPS]

Neo4J Janus 1 100 10000 1000000 small medium large msec

Data Loading Time

Janus GraphLab Neo4j

Did Not Finish Did Not Finish

slide-13
SLIDE 13

13/15

Analytic Query Runtime

1 100 10000 1000000 small medium large msec Janus GraphLab

Did Not Finish

small medium large analytics-only mixed

Mixed Workload Analytics-Only Workload

slide-14
SLIDE 14

14/15

Conclusions

◉Janus : graph engine on future servers for navigational/analytic queries. ◉Transaction is the key, breeding edge to massively parallelize big-data analytics.

slide-15
SLIDE 15

15/15

Open Questions

◉Not a panacea! e.g., Topic Modeling Where's good fit? ◉Autonomous Partition/Query Optimization e.g., when to activate/propagate nodes in what order ◉Fast resume/failover with NVM