Janus: Transactional Processing of Navigational and Analytical Graph Queries on Many-core Servers Kevin Wilkinson Hideaki Kimura Alkis Simitsis (speaker) Hewlett Packard Labs 1/15
Take-away ◉ Graph Engine on modern servers for both navigational and analytic queries. ◉ Leverages Transaction Processing. “ 2/15
Navigational vs Analytic Graph Queries Analytic Navigational Resource-intensive High-throughput Accesses a large fraction of graph Accesses few vertices/edges e.g., PageRank, e.g., Pair-wise shortest path, “Can he see my LinkedIn prof.” Graph clustering 3/15
Existing Graph Engines ◉ Optimized either for navigational (e.g., Neo4j), or for analytic queries (e.g., GraphLab) ◉ Limited scalability on many-core ◉ Poorly leverages large, NUMA Memory ◉ No fast, concurrent updates 4/15
Janus ◉ Runs both type of queries as well as concurrent updates ◉ Exploits emerging server hardware; many- cores, large DRAM/NVM. ◉ Built on Transaction Processing engine (FOEDUS) Reason 1 : Concurrent/serializable update. Obvious. Reason 2 : Scalability . To parallelize a query. 5/15
Parallelizing a Graph Query as Transactions Single-Source Shortest-Path (SSSP) “Distributed GraphLab ” [Low et al, VLDB’12] Parallel workers issue millions of concurrent transactions . ◉ Serializability is must; otherwise loop forever. ◉ Scalability is must; many-cores, large NUMA. 6/15
Janus Architecture Reads Writes Insert/Delete Ingestion Xcts 7/15
Partitioning Graph and Workers ◉ NUMA-aware partition for permanent graph, intermediate data, and workers. ◉ Locality matters. Co-locate data w/ workers. ◉ Needs a database that supports flexible partitioning and data-worker co-location. 8/15
Pair-wise shortest-path Impl. in Janus Graph Data ◉ Good-old Dijkstra. 5 6 A ◉ A NUMA-aware worker. S 3 T 10 C ◉ Serializable Reads on Graph. B Global Memory 7 From Edges Intermediate Data on S A:5, C:3.. worker-local memory A T:6 Distance hashtable … … Relaxation min-heap Node Dist. A 5 Serializable Reads A:5 B 13 (Snapshot Reads … … C 3 from NVM as of Navigational same epoch) 9/15 Worker “FOEDUS”, [SIGMOD’15]
SSSP Impl. In Janus Intermediate Data on global memory ◉ Distributed Bellman-Ford Distance hashtable Activation bitmap Node Dist. ◉ Analytic-workers A 5 cooperatively maintain B 13 C 3 global memory. ◉ Processes billions of highly contended Xcts on Intermediate Data “Mostly Optimistic Concurrency Control” [VLDB’17] Analytic Workers 10/15
Experiments ◉ Shortest-Path # Nodes # Edges SMALL 2 M 37 M Navigational : Pair-wise MEDIUM 97 M 1600 M Analytic : SSSP LARGE 403 M 6500 M ◉ Compared with Neo4J (navigational) and Distributed GraphLab ◉ H/W: HP DragonHawk, 240-Cores and 12 TB DRAM (not yet NVRAM) on 16-Sockets 11/15
Loading and Navigational Throughput Data Loading Time Navigational Query Throughput [TPS] Janus GraphLab Neo4j 1000000 Did Not Finish Did Not Finish 1000000 100000 10000 10000 msec 1000 100 100 10 1 1 Neo4J Janus small medium large 12/15
Analytic Query Runtime Mixed Workload Analytics-Only Workload analytics-only mixed Did Not Finish Janus GraphLab 1000000 10000 msec 100 1 small medium large small medium large 13/15
Conclusions ◉ Janus : graph engine on future servers for navigational/analytic queries. ◉ Transaction is the key, breeding edge to massively parallelize big-data analytics. 14/15
Open Questions ◉ Not a panacea! e.g., Topic Modeling Where's good fit? ◉ Autonomous Partition/Query Optimization e.g., when to activate/propagate nodes in what order ◉ Fast resume/failover with NVM 15/15
Recommend
More recommend