compressed rdf practical uses hands on
play

Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier - PowerPoint PPT Presentation

Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017 General agenda Session I (09:00 - 10:30) "


  1. Compressed RDF: Practical Uses & Hands-on Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017

  2. General agenda Session I (09:00 - 10:30) " Basics of Compression for Big Linked Data Management “  Big (Linked) Semantic Data Compression: motivation & challenges  Compact Data Structures  Session II (13:30 - 15:00) “ RDF Compression “  RDF Compression. HDT  RDF Dictionaries  RDF Triples  Session III (15:30- 17:00) “ Compressed RDF: Practical Uses & Hands-on ”  Practical Uses (LOD-a-lot, RDF Archiving, etc.)  Hands on  PAGE 2 images: zurb.com

  3. Agenda of this session Practical uses  LOD-a-lot: Web-scale queries in your pocket  RDF archiving  Linked Data markets (Linked Close Data)  Hands on  HDT-it  Command line tools  HDT and Fuseki  HDT and Linked Data Fragments  HDT and C++/Java  HDT and Jena  PAGE 3 images: zurb.com

  4. Use case 1 LOD-a-lot

  5. Still… what about Web -scale queries E.g. retrieve all entities in LOD with the label “Axel Polleres “  select distinct ?x { ?x rdfs:label “Axel Polleres" } Options:  Crawl and index LOD locally (-no-)  Follow-your-nose (where should I start?)  Federated querying (as good as the endpoints you query)  Use LOD Laundromat as a “good approximation” (still querying 650K datasets)  5

  6. LOD Laundromat Linked Open Data SPARQL LOD endpoint Laundromat (metadata) Dataset 1 Dataset 650K N-Triples N-Triples (zip) (zip) 6

  7. But what about Web-scale queries LOD-a-lot - flashback - 7

  8. The real motivation consume

  9. The real motivation Oh man I’m hungry and I don’ t even know if I will like whatever you are cooking Article/413995/serving-the-masses/ http://www.kunsan.af.mil/News/ consume

  10. The real motivation Oh man I’m hungry and I don’ t even know if I will like whatever you are cooking Article/413995/serving-the-masses/ http://www.kunsan.af.mil/News/ consume

  11. But what about Web-scale queries But one could be really hungry LOD-a-lot https://hwy55burgers.wordpress.com/tag/food-challenge/ 11

  12. LOD-a-lot Linked Open Data SPARQL LOD endpoint Laundromat (metadata) Dataset 1 Dataset 650K N-Triples N-Triples (zip) (zip) LOD-a-lo lot 28B triples 12 Kudos Javier D. Fernandez, Wouter Beek, Miguel A. Martínez-Prieto, and Mario Arias

  13. LOD-a-lot (some numbers) Disk size:  HDT: 304 GB  HDT-FoQ (additional indexes): 133 GB  305 € Memory footprint (to query):  15.7 GB of RAM (3% of the size)  144 seconds loading time  8 cores (2.6 GHz), RAM 32 GB, SATA HDD on Ubuntu 14.04.5 LTS  LDF page resolution in milliseconds.  (LOD-a-lot creation took 64 h & 170GB RAM. HDT-FoQ took 8 h & 250GB RAM) 13

  14. http://purl.org/HDT/lod-a-lot LOD-a-lot https://datahub.io/dataset/lod-a-lot 14

  15. LOD-a-lot (some use cases) Query resolution at Web scale  Evaluation and Benchmarking  No excuse   RDF metrics and analytics  subjects predicates objects 15

  16. ACKs LOD-a-lot 16

  17. Use case 2 Archiving

  18. So far so good... But RDF is evolving Update rate Virtual/Augmented Internet Reality second of Things minute hour day week Dyldo versions? LOD-a-lot month DBpedia BTC year Number ANDREAS HARTH - STREAM REASONING IN MIXED REALITY APPLICATIONS, of STREAM REASONING WORKSHOP 2015 10 0 10 1 10 2 10 3 10 4 10 5 10 6 sources

  19. Linked Data Archives: The missing link in the RDF evolution Most semantic Web/Linked Data tools are focused on this “ static view ” but do not consider versioning/evolution Sindice, SWSE, Swoogle, LOD Cache, LOD-Laundromat … so far, no versions! 3

  20. Preservation matters Web archives: Common Crawl, Internet Memory, Internet Archive, …  20

  21. …in the last few years: RDF evolution at Scale one of the fundamental problems in the Web of Data Research projects Managing the Evolution and Preservation of the Data Web (FP7) Preserving Linked Data (FP7) Archives Tools v-RDFCSA Benchmarking BEnchmark of RDF ARchives 21

  22. …in the last few years: RDF evolution at Scale one of the fundamental problems in the Web of Data Research projects Managing the Evolution and Preservation of the Data Web (FP7) Preserving Linked Data (FP7) Archives Tools v-RDFCSA Benchmarking BEnchmark of RDF ARchives 22

  23. RDF Archiving. Archiving policies a) Independent Copies/Snapshots (IC) RETRIEVAL MEDIATOR c) Timestamp-based approach (TB) V 1 V 2 V 3 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P2 . ex:S1 ex:study ex:C1 . ex:S1 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:S2 . ex:S2 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:S1 ex:study ex:C1 . V 1,2, ex:S3 ex:study ex:C1 . b) Change-based approach (CB) 3 ex:C1 ex:hasProfessor ex:P1 [V 1 ,V 2 ]. ex:C1 ex:hasProfessor ex:P2 [V 3 ]. ex:C1 ex:hasProfessor ex:S2 [V 3 ]. ex:S1 ex:study ex:C1 [V 1 ,V 2 ,V 3 ]. ex:S2 ex:study ex:C1 [V 1 ]. ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P2 . ex:S3 ex:study ex:C1 [V 2 ,V 3 ]. ex:C1 ex:hasProfessor ex:S2 . V 1 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P1 . ex:S3 ex:study ex:C1 . 23

  24. BEAR https://aic.ai.wu.ac.at/qadlod/bear.html 24

  25. BEAR: Benchmarking the Efficiency of RDF Archiving Queries and systems  We implemented and evaluate archiving systems on Jena-TDB and HDT,  based on IC, CB and TB policies. Serve as an initial baseline to compare archiving systems  More info: https://aic.ai.wu.ac.at/qadlod/bear.html  25

  26. RDF Archiving. Archiving policies a) Independent Copies/Snapshots (IC) RETRIEVAL MEDIATOR c) Timestamp-based approach (TB) V 1 V 2 V 3 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P2 . ex:S1 ex:study ex:C1 . ex:S1 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:S2 . ex:S2 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:S1 ex:study ex:C1 . V 1,2, ex:S3 ex:study ex:C1 . b) Change-based approach (CB) 3 ex:C1 ex:hasProfessor ex:P1 [V 1 ,V 2 ]. ex:C1 ex:hasProfessor ex:P2 [V 3 ]. ex:C1 ex:hasProfessor ex:S2 [V 3 ]. ex:S1 ex:study ex:C1 [V 1 ,V 2 ,V 3 ]. ex:S2 ex:study ex:C1 [V 1 ]. ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P2 . ex:S3 ex:study ex:C1 [V 2 ,V 3 ]. ex:C1 ex:hasProfessor ex:S2 . V 1 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P1 . ex:S3 ex:study ex:C1 . 26

  27. Benchmarking: Define the queries Instantiation of archive queries in AnQL [1]  Mat(Q,V1)  SELECT * WHERE { Q :[v1] } version materialization  Diff(Q,V1,V2)  Ver(Q)  join(Q1,vi,Q2,vj)  Change(Q)  [1] Antoine Zimmermann, Nuno Lopes, Axel Polleres, and Umberto Straccia. A general framework for representing, reasoning and querying with annotated Semantic Web data . Journal of Web Semantics (JWS), 12:72--95, March 2012. 27

  28. Benchmarking: Define the queries Instantiation of archive queries in AnQL  Mat(Q,V1)  SELECT * WHERE { Diff(Q,V1,V2) { { {Q :[v1]} MINUS {Q :[v2]} } BIND (v1 AS ?V )  } delta materialization  UNION { { {Q :[v2] } MINUS {Q :[v1]}} BIND (v2 AS ?V ) Ver(Q)  } join(Q1,vi,Q2,vj)  Change(Q)  28

  29. Benchmarking: Define the queries Instantiation of archive queries in AnQL  Mat(Q,V1)  Diff(Q,V1,V2)  Ver(Q)  SELECT * WHERE { Q :?V } results of Q annotated with the version  join(Q1,vi,Q2,vj)  Change(Q)  29

  30. Benchmarking: Define the queries Instantiation of archive queries in AnQL  Mat(Q,V1)  Diff(Q,V1,V2)  Ver(Q)  join(Q1,v1,Q2,v2)  SELECT * WHERE { {Q :[v1]} {Q :[v2]} } Change(Q)  30

  31. Benchmarking: Define the queries Instantiation of archive queries in AnQL  Open question remains: What is the right query syntax for archive queries? Mat(Q,V1)  SELECT ?V1 ?V2 WHERE Diff(Q,V1,V2)  { {{Q :?V1 } MINUS {Q :?V2}} UNION Ver(Q)  {{Q :?V2 } MINUS {Q :?V1}} join(Q1,vi,Q2,vj)  FILTER( abs(?V1-?V2) = 1 ) } Change(Q)  Returns consecutive versions in which Diff of a query is not null  31

  32. Time-based access. Queries Materialize (s,?,? ; version) 32

  33. Time-based access. Queries diff(?,?,o ; version0 ; version t) 33

Recommend


More recommend