brie a specialized trie for concurrent datalog
play

Brie: A Specialized Trie for Concurrent Datalog Jordan 1 , - PowerPoint PPT Presentation

Brie: A Specialized Trie for Concurrent Datalog Jordan 1 , Pavle Suboti 3 , Herbert He David Zhao 2 , and Bernhard Scholz 2 PMAM 2019, 17 February 2019, Washington, DC 1) University of Innsbruck 2) University of Sydney 3)


  1. Brie: A Specialized Trie for Concurrent Datalog Jordan 1 , Pavle Suboti ć 3 , Herbert He David Zhao 2 , and Bernhard Scholz 2 PMAM 2019, 17 February 2019, Washington, DC 1) University of Innsbruck 2) University of Sydney 3) Amazon

  2. Datalog (by Example) from to a a b c b a c Are there cycles? b f d e c e d a g f d c … … graph edge relation 2

  3. Datalog (by Example) from to a a b c b a c Is the graph b f d e connected? c e d a g f d c … … graph edge relation 3

  4. Datalog (by Example) from to a a b c b a c Which nodes b f d e are connected? c e d a g f d c … … graph edge relation 4

  5. Datalog (by Example) from to a a b c b a c path (X,Y) :- edge (X,Y). b f d e c e path (X,Z) :- path (X,Y), d a g edge (Y,Z). f d c … … Da Datalog graph edge relation query 5

  6. Datalog › Benefits: – a concise formalism for powerful data analysis – lately major performance improvements and tool support › Applications: – data base queries – program analysis 100s of relations and rules, – security vulnerability analysis billions of tuples, all in-memory – network analysis 6

  7. Query Processing relations set of integer tuples sequence of rules relational algebra operations on sets 7

  8. Example path (X,Z) :- path (X,Y), edge (Y,Z). ,(-#" ← !"#ℎ while ( ,(-#" ≠ ∅ ) { computational expensive and '() ← *(,(-#" ⋈ (,/() ∖ !"#ℎ dominating part !"#ℎ ← !"#ℎ ∪ '() ,(-#" ← '() } 8

  9. Needed › efficient data structure for relations – maintain set of n-dimensional tuples – efficient support for › insertion, › scans, well supported › range queries, by B-tr trees › membership tests, › emptiness checks – efficient synchronization of challenging concurrent inserts 9

  10. B-tree Issues (5,3) (8,2) (1,1) (1,2) (3,2) (4,7) (6,9) (7,4) (8,7) (9,2) (9,4) › Concurrent inserts: – require sophisticated locking scheme – while holding locks, costly operations are performed › binary search operations, and inserts in sorted arrays 10

  11. Brie 11

  12. Brie – Inner Node 12

  13. Brie – Leaf Node 13

  14. Synchronizing Inserts › Insertion 1. navigate down the tree › insert sub-trees on demand using CAS 2. If inner node tree needs to grow › introduce new root node using CAS 3. add 1-bit to leaf level mask › using atomic bitwise or 14

  15. Data Density Performance is density dependent: 7 0 3 3 (7,2) (3,1) (3,3) (3,4) (0,3) (3,1) high density low density Density: ratio of included points vs . spanned interval 15

  16. Memory Usage btree brie 100% brie 10% brie 5% brie 2% brie 1% brie 0.5% brie 0.1% 2 1.8 1.6 1.4 memory [GB] 1.2 1 0.8 0.6 0.4 0.2 0 0 10 20 30 40 50 60 70 80 90 100 elements in million 16

  17. Sequential Performance std::set std::hash_set concurrent btree std::set std::hash_set concurrent btree brie 0.1% brie 1% brie 100% brie 0.1% brie 1% brie 100% 100 50 insertions/s insertions/s 80 40 60 30 40 20 20 10 0 0 million million 1000² 2000² 5000² 10000² 1000² 2000² 5000² 10000² total elements inserted total elements inserted ordered insertion random order insertion 17

  18. Sequential Performance (2) std::set std::hash_set concurrent btree std::set std::hash_set concurrent btree brie 0.1% brie 1% brie 100% brie 0.1% brie 1% brie 100% 100 queries/s entries/s 80 400 60 40 200 20 million million 0 0 1000² 2000² 5000² 10000² 1000² 2000² 5000² 10000² elements in set and number of queries elements in set membership test (random order) full range scan 18

  19. Parallel Performance tbb::hash_set concurrent btree tbb::hash_set concurrent btree brie 0.1% brie 1% brie 0.1% brie 1% brie 100% brie 100% 10000 1000 insertions/s insertions/s 1000 100 100 10 10 1 1 0.1 0.1 million million 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 number of threads number of threads ordered insertion random order insertion 4x8 core Intel Xeon E5-4650 19

  20. Parallel Performance up to 11x up to 15x faster than B-trees faster than B-trees tbb::hash_set concurrent btree tbb::hash_set concurrent btree brie 0.1% brie 1% brie 0.1% brie 1% brie 100% brie 100% 2000 150 insertions/s insertions/s 1500 100 1000 50 500 0 0 million million 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 number of threads number of threads ordered insertion random order insertion 4x8 core Intel Xeon E5-4650 20

  21. Datalog Query Processing -50% ~4x faster memory btree brie mixed btree brie mixed 900 16 800 14 700 [GB] [s] 12 600 time 10 usage 500 8 query 400 Memory 6 300 total 4 200 2 100 0 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 number of threads number of threads context sensitive var-points-to analysis 21

  22. Conclusion › Developed concurrent set for Datalog relations: – Trie derived structure + blocked nodes › enables fast relational operations – Low overhead synchronization › atomic operation based synchronization sufficient › Results: – up to 5-17 17x faster for sequential insert and query operations – up to 15 15x faster for parallel insertion operations – up to 4x 4x faster and 50% 50% less memory for real-world qu quer ery proces essing › Future work: – investigate other data structures for specialized use cases 22

  23. Thank you! visit us on https://souffle-lang.github.io sources: https://github.com/souffle- lang/souffle/blob/master/src/Brie.h 23

  24. Parallel Performance tbb::hash_set concurrent btree tbb::hash_set concurrent btree brie 0.1% brie 1% brie 0.1% brie 1% brie 100% reduction btree brie 100% reduction btree 10000 1000 insertions/s insertions/s 1000 100 100 10 10 1 1 0.1 0.1 million million 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 number of threads number of threads ordered insertion random order insertion 4x8 core Intel Xeon E5-4650 27

  25. Parallel Performance up to 11x up to 15x faster than B-trees faster than B-trees tbb::hash_set concurrent btree tbb::hash_set concurrent btree brie 0.1% brie 1% brie 0.1% brie 1% brie 100% reduction btree brie 100% reduction btree 2000 150 insertions/s insertions/s 1500 100 1000 50 500 0 0 million million 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 number of threads number of threads ordered insertion random order insertion 4x8 core Intel Xeon E5-4650 28

  26. Example ,(-#" ← !"#ℎ while ( ,(-#" ≠ ∅ ) { '() ← *(,(-#" ⋈ (,/() ∖ !"#ℎ path (X,Z) :- path (X,Y), edge (Y,Z). !"#ℎ ← !"#ℎ ∪ '() ,(-#" ← '() } 29

Recommend


More recommend