linked data indexing methods a survey
play

Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlnkov - PowerPoint PPT Presentation

Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlnkov Charles University in Prague The Czech Republic 21st October 2011 SWWS@OTM, Crete, Greece Outline Introduction Dimensions Approaches Observations


  1. Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlýnková Charles University in Prague The Czech Republic 21st October 2011 SWWS@OTM, Crete, Greece

  2. Outline • Introduction • Dimensions • Approaches • Observations • Challenges • Conclusion Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 2

  3. Introduction • Motivation  Web of Documents  Web of Data • Linked Data  Principles ‒ Unique identifiers (URIs) ‒ Useful description (HTTP, RDF) ‒ Links Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 3

  4. Introduction • RDF (Resource Description Framework)  Triples ‒ Subject Predicate Object.  Graph ‒ Directed labeled multigraph ‒ Vertices for subjects and objects ‒ Edges for particular triples Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 4

  5. Intent • Querying framework  Architecture ‒ Compromise between local and distributed approaches  Issues ‒ Physical storage ‒ Index structures ‒ Query processor  Problems ‒ Data scalability, distribution and dynamicity Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 5

  6. Intent • Architecture  Local ‒ Efficient processing ‒ Independent data ‒ Storage requirements  Distributed ‒ Runtime requests ‒ Up-to-date data ‒ Network throughput Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 6

  7. Dimensions • Aspects  Data  Index  Querying • Dimensions  Not all combinations make sense Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 7

  8. Dimensions • Data distribution  Local , distributed or global data • Data units  Triples , quads , documents or other sources • Data dynamicity  Durable , changeable or volatile data • Index organization  Local or distributed model Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 8

  9. Dimensions • Index items  Keywords , triples , quads , trees , paths or areas • Index content  Pure data , statistics or summaries about data • Index dynamicity  Dynamic or static structures • Access patterns  Universal or limited approaches Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 9

  10. Dimensions • Querying layer  Syntactic , structural or semantic querying • Query models  Full text querying or graph patterns • Query evaluation  Local or distributed processing • Query results  Complete or incomplete results Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 10

  11. Categories • Main approach types  Querying systems ‒ Local or distributed data ‒ Structural queries ‒ Complete results  Searching engines ‒ Global data cloud ‒ Full text queries ‒ Imprecise results Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 11

  12. Approaches • Source selection ‒ Andreas Harth et al.: Data Summaries for On-Demand Queries over Linked Data  Data transformation ‒ 3-dimenisonal space ‒ Hash functions  Q-trees based on R-trees (5, 10, 5) ‒ Overlapping bounding boxes Dataset A 87 Dataset B 14 ‒ Buckets with summaries (15, 20, 25) Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 12

  13. Approaches • BitMat index ‒ Medha Atre et al.: Matrix "Bit"loaded: A Scalable Lightweight Join Query Processor for RDF Data  3-dimensional matrix subjects predicates ‒ Bit values 0 or 1 0  2-dimensional slices 1 0 0 John ‒ S-O, O-S, P-O, P-S slices 0 lives in Peter 0  Implementation 1 0 knows ‒ Compressed bit runs objects Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 13

  14. Observations • String compression  Repeating string values ‒ URIs and literals  Unique integer identifiers ‒ Efficient processing ‒ Space requirements  Translation maps ‒ Both directions ‒ Based on B-trees Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 14

  15. Observations • Data pruning  Idea ‒ Query optimization ‒ Relevant data  Methods ‒ Filtering selections ‒ Join ordering  Problem ‒ Partial knowledge Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 15

  16. Challenges • Data distribution  Motivation ‒ Datasets are distributed ‒ Appropriate compromise  Problems ‒ Network drawbacks ‒ Space requirements ‒ Independent datasets Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 16

  17. Challenges • Data scalability  Motivation ‒ Web of Data size explosion • September 2011: • 295 datasets, 31 billion triples, 504 million links  Problems ‒ Scalable storages and indices ‒ Efficient query evaluation ‒ Quality, provenance and trust Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 17

  18. Challenges • Data dynamicity  Motivation ‒ Data tend to ageing  Problems ‒ Continuous updates ‒ Dynamic structures Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 18

  19. Conclusion • Problem  Linked Data indexing methods • Contributions  Approaches comparison ‒ Dimensions ‒ Observations ‒ Challenges Linked Data Indexing Methods: A Survey 21st October 2011 SWWS@OTM, Crete, Greece 19

  20. Thank you for your attention… Faculty of Mathematics and Physics Charles University in Prague

Recommend


More recommend