Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlýnková Charles University in Prague The Czech Republic 21st October 2011 SWWS@OTM, Crete, Greece
Outline • Introduction • Dimensions • Approaches • Observations • Challenges • Conclusion Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 2
Introduction • Motivation Web of Documents Web of Data • Linked Data Principles ‒ Unique identifiers (URIs) ‒ Useful description (HTTP, RDF) ‒ Links Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 3
Introduction • RDF (Resource Description Framework) Triples ‒ Subject Predicate Object. Graph ‒ Directed labeled multigraph ‒ Vertices for subjects and objects ‒ Edges for particular triples Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 4
Intent • Querying framework Architecture ‒ Compromise between local and distributed approaches Issues ‒ Physical storage ‒ Index structures ‒ Query processor Problems ‒ Data scalability, distribution and dynamicity Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 5
Intent • Architecture Local ‒ Efficient processing ‒ Independent data ‒ Storage requirements Distributed ‒ Runtime requests ‒ Up-to-date data ‒ Network throughput Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 6
Dimensions • Aspects Data Index Querying • Dimensions Not all combinations make sense Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 7
Dimensions • Data distribution Local , distributed or global data • Data units Triples , quads , documents or other sources • Data dynamicity Durable , changeable or volatile data • Index organization Local or distributed model Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 8
Dimensions • Index items Keywords , triples , quads , trees , paths or areas • Index content Pure data , statistics or summaries about data • Index dynamicity Dynamic or static structures • Access patterns Universal or limited approaches Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 9
Dimensions • Querying layer Syntactic , structural or semantic querying • Query models Full text querying or graph patterns • Query evaluation Local or distributed processing • Query results Complete or incomplete results Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 10
Categories • Main approach types Querying systems ‒ Local or distributed data ‒ Structural queries ‒ Complete results Searching engines ‒ Global data cloud ‒ Full text queries ‒ Imprecise results Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 11
Approaches • Source selection ‒ Andreas Harth et al.: Data Summaries for On-Demand Queries over Linked Data Data transformation ‒ 3-dimenisonal space ‒ Hash functions Q-trees based on R-trees (5, 10, 5) ‒ Overlapping bounding boxes Dataset A 87 Dataset B 14 ‒ Buckets with summaries (15, 20, 25) Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 12
Approaches • BitMat index ‒ Medha Atre et al.: Matrix "Bit"loaded: A Scalable Lightweight Join Query Processor for RDF Data 3-dimensional matrix subjects predicates ‒ Bit values 0 or 1 0 2-dimensional slices 1 0 0 John ‒ S-O, O-S, P-O, P-S slices 0 lives in Peter 0 Implementation 1 0 knows ‒ Compressed bit runs objects Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 13
Observations • String compression Repeating string values ‒ URIs and literals Unique integer identifiers ‒ Efficient processing ‒ Space requirements Translation maps ‒ Both directions ‒ Based on B-trees Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 14
Observations • Data pruning Idea ‒ Query optimization ‒ Relevant data Methods ‒ Filtering selections ‒ Join ordering Problem ‒ Partial knowledge Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 15
Challenges • Data distribution Motivation ‒ Datasets are distributed ‒ Appropriate compromise Problems ‒ Network drawbacks ‒ Space requirements ‒ Independent datasets Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 16
Challenges • Data scalability Motivation ‒ Web of Data size explosion • September 2011: • 295 datasets, 31 billion triples, 504 million links Problems ‒ Scalable storages and indices ‒ Efficient query evaluation ‒ Quality, provenance and trust Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 17
Challenges • Data dynamicity Motivation ‒ Data tend to ageing Problems ‒ Continuous updates ‒ Dynamic structures Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 18
Conclusion • Problem Linked Data indexing methods • Contributions Approaches comparison ‒ Dimensions ‒ Observations ‒ Challenges Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 19
Thank you for your attention… Faculty of Mathematics and Physics Charles University in Prague
Recommend
More recommend