Triples compression and Indexing Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017
Agenda RDF management overview K 2 -Tree data structure K 2 -Triples Compressed Suffix Array (CSA) RDF-CSA PAGE 2 images: zurb.com
RDF magament overview “ Recall we can set string from RDF into a dictionary and then handle a set of RDF-triples as a set of ID-based-triples PAGE 3 BIG (LINKED) SEMANTIC DATA COMPRESSION
RDF management overview 4 of 51 SO 1 London P 1 attends 2 SPIRE 2 capital of 3 held on S 3 A.Gionis UK inv-speaker Finland 4 lives in 4 M.Lalmas lives in p 5 position lives in o 5 R.Raman s i lives t i 6 works in o capital of n in O 3 Finland R.Raman 4 inv-speaker M.Lalmas A.Gionis attends works a t 5 UK t s e d n in n d e t s t a Dictionary Encoding London SPIRE held on (SPIRE, held on, London) (London, capital of, UK) (A.Gionis, attends, SPIRE) (R.Raman, attends, SPIRE) (2,3,1) (M.Lalmas, attends, SPIRE) (1,2,5) (M.Lalmas, lives in, UK) (3,1,2) (M.Lalmas, works in, London) (5,1,2) (A.Gionis, lives in, Finland) (4,1,2) (R.Raman, lives in, UK) (4,4,5) (R.Raman, position, inv-speaker) (4,6,1) (3,4,3) Original Triplets (5,4,5) (5,5,4) Id-based Triplets
Agenda RDF management overview K 2 -Tree data structure K 2 -Triples Compressed Suffix Array (CSA) RDF-CSA PAGE 5 images: zurb.com
K 2 -tree data structure “ A k 2 -tree permits a compact representation of an adjacency matrix. PAGE 6 BIG (LINKED) SEMANTIC DATA COMPRESSION
K 2 -Tree Motivation 7 of 51 Structure for representing adjacency matrix Originally designed for web graphs Simple directed graph 2 3 4 5 6 7 8 9 10 11 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 6 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 3 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 5 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 8 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 4 8 10 9 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 11 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0
K 2 -Tree Construction process 8 of 51 Example with K=2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0100 0011 0010 0010 10101000 0110 0010 0100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T = 101111010100100011001000000101011110 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 L = 010000110010001010101000011000100100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
K 2 -Tree Direct neighbor operation 9 of 51 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 6 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 8 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 9 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 10 0100 0011 0010 0010 10101000 0110 0010 0100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 T = 101111010100100011001000000101011110 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 children(2) = rank1(T,2)* k 2 = 2*4=8 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 children(9) = rank1(T,9)* k 2 = 7*4=28 children(31) = rank1(T,31)* k 2 = 14*4=56 L = 010000110010001010101000011000100100 36 38 40 42 44 46 48 50 52 54 56 …
Agenda RDF management overview K 2 -Tree data structure K 2 -Triples Compressed Suffix Array (CSA) RDF-CSA PAGE 10 images: zurb.com
K 2 -triples “ k 2 -triples applies vertical partitioning of an RDF dataset by predicate. Then, |P| k 2 -trees permit to represent all the triples involving a given predicate. PAGE 11 BIG (LINKED) SEMANTIC DATA COMPRESSION
K 2 -Triples Data structure 12 of 51 Dictionary encoding Triples as a set of identifiers Mapped RDF triples Dictionary triples
K 2 -Triples Data structure 13 of 51 O Vertical partitioning (by predicates) 7 P1 P2 One K 2 -tree per predicate (S,P ,O) (8,5,4) S4 1 1 1 1 (4,2,3) 1 1 (4,4,6) 1 1 (4,1,7) 1 1 (7,2,3) (3,3,5) P3 P4 P5 (5,2,1) 1 (1,3,5) 1 (6,2,2) 1 (2,3,5) 1 1
K 2 -Triples operations: solving triple patterns 14 of 51 Query: (4,2,3) SPO checking a cell SP? ?PO P2 S?O S?? 1 1 ??O 1 1 ?P? 1 1 1 1 Result: (4,2,3)
K 2 -Triples operations: solving triple patterns 15 of 51 Query: (4,2,?) SPO checking a cell SP? direct neighbours ?PO P2 S?O S?? 1 1 ??O 1 1 ?P? 1 1 1 1 Result: (4,2,3)
K 2 -Triples operations: solving triple patterns 16 of 51 Query: (?,2,3) SPO checking a cell SP? direct neighbours ?PO reverse neighbours P2 S?O S?? 1 1 ??O 1 1 ?P? 1 1 1 1 Result: (4,2,3), (7,2,3)
K 2 -Triples operations: solving triple patterns Query: (4,?,6) 17 of 51 SPO checking a cell SP? direct neighbours 1 1 1 1 ?PO reverse neighbours P1 P2 1 1 S?O checking |P| cells 1 1 1 1 S?? P3 P4 P5 ??O 1 ?P? 1 1 1 1 Result: (4,4,6)
K 2 -Triples operations: solving triple patterns Query: (4,?,?) 18 of 51 SPO checking a cell SP? direct neighbours 1 1 1 1 ?PO reverse neighbours P1 P2 1 1 S?O checking |P| cells 1 1 1 1 S?? |P| direct neighbours P3 P4 P5 ??O 1 ?P? 1 1 1 1 Result: (4,1,7), (4,2,3), (4,4,6)
K 2 -Triples operations: solving triple patterns Query: (?,?,4) 19 of 51 SPO checking a cell SP? direct neighbours ?PO reverse neighbours 1 1 1 1 P1 P2 1 1 S?O checking |P| cells 1 1 1 1 S?? |P| direct neighbours ??O |P| reverse neighbours P3 P4 P5 1 ?P? 1 1 1 1 Result: (8,5,4)
K 2 -Triples operations: solving triple patterns 20 OF 51 20 of 51 Query: (?,2,?) SPO checking a cell SP? direct neighbours ?PO reverse neighbours P2 S?O checking |P| cells S?? |P| direct neighbours 1 1 ??O |P| reverse neighbours 1 1 ?P? full adjacency matrix 1 1 1 1 Result: (4,2,3), (5,2,1),(6,2,2),(7,2,3)
K 2 -Triples SP & OP indexes 21 of 51 Weakness of vertical partitioning unbounded predicates (S,?,?), (?,?,O), (S,?,O) Checking the |P| K 2 -trees! They proposed indexes SP and OP S Predicates (S,P,O) (8,5,4) 1 3 (4,2,3) 2 3 (4,4,6) SP INDEX (4,1,7) 3 3 (7,2,3) 4 1,2,4 (3,3,5) 5 2 (5,2,1) Statistically compressed (1,3,5) 6 2 Direct access with DAC (6,2,2) 7 2 (2,3,5) 8 5
K 2 -Triples SP & OP indexes 22 of 51 Subject 4? Query (4,?,?) SP INDEX Predicate list: 1,2,4 P3 P1 P2 P4 P5 1 1 1 1 1 1 1 1 1 1
K 2 -Triples Joins 23 of 51 They implemented three join strategies Query: (8,5,?X) (?X,2,?) Taking advantage of the K 2 -triples structure merge-join Independent join • Best strategy depends on the Chain join index-join • dataset and the type of join Interactive join •
Recommend
More recommend