sparqlytics multidimensional analytics for rdf
play

SPARQLytics: Multidimensional Analytics for RDF Michael Rudolf - PowerPoint PPT Presentation

SPARQLytics: Multidimensional Analytics for RDF Michael Rudolf Database Technology Group, Technische Universitt Dresden March 8, 2017 Agenda Motivation RDF and SPARQL Multidimensional Analytics for RDF 2 Motivation Focus of Interest


  1. SPARQLytics: Multidimensional Analytics for RDF Michael Rudolf Database Technology Group, Technische Universität Dresden March 8, 2017

  2. Agenda Motivation RDF and SPARQL Multidimensional Analytics for RDF 2

  3. Motivation

  4. Focus of Interest Focus moved from single To aggregations over sets To connections between entity (OLTP) of entities of the same kind entities (OLAP) Bookkeeping Who likes what and why? Reporting Where is what? What do the friends of What are the sales figures? your customers buy? 4

  5. Business Use Cases Supply Chain Management Transportation & logistics: routing, tendering, tracking, auditing, payment http://787updates.newairplane.com/787-Suppliers/World-Class-Supplier-Quality 5

  6. Business Use Cases Supply Chain Management Track & Trace Transportation & logistics: routing, Pinpoint product recalls tendering, tracking, auditing, payment Mandated by law for certain industries (e.g. pharmaceuticals, food, waste) EU Commission’s Rapid Alert System non-food (RAPEX) food & feed (RASFF) 2013 2364 3137 2014 2435 3157 http://787updates.newairplane.com/787-Suppliers/World-Class-Supplier-Quality 5

  7. RDF and SPARQL

  8. Resource Description Framework (RDF) [WLC14] Subjects name an entity no built-in schema Predicates describe the relationship can re-use vocabularies and ontologies Objects can be literals or name suitable for inferencing facts @prefix amazon: <http://www.amazon.com/ #> . @prefix customer: <http://www.amazon.com/customer #> . “Consumer contains 1 4 Electronics” @prefix product: <http://www.amazon.com/product #> . “Freddy” ordered 8 part of 16 black FR 24/02/14 @prefix category: <http://www.amazon.com/category #> . part of 32 GB 2 authors “Apple records “Tablets” “Phones” in 7 5 product:1 amazon:capacity "64 GB" . iPhone 5” 5/5 rates 12 in stars product:1 amazon:color "black" . in authors “Steve” 16 GB 11 product:1 amazon:in category:7 . US 5/5 stars 13 rates “Apple category:7 amazon:name "Tablets" . white 3 iPhone 4” “Apple iPad category:7 amazon:partOf category:6 . 1 64 GB likes likes MC707LL/A” contains 2 category:6 amazon:name "Computers & Accessories" . black 9 “Mike” 10 user:8 amazon:country "FR" . 15 US 14 records “Karl” rates contains 1 DE user:8 amazon:rates product:1 . delivered 4/5 stars 24/02/14 7

  9. SPARQL Protocol and RDF Query Language [HS13] Built around pattern matching, produces pattern variable bindings Grouping and aggregation, CRUD operations No multidimensional concepts ➔ complex and error-prone queries PREFIX amazon: <http://www.amazon.com/#> SELECT (AVG(?capacity) AS ?avgCap) (?name AS ?categoryName) WHERE { ?product amazon:in ?category . ?category amazon:name ?name . ?category amazon:partOf+ category:6 . ?product amazon:capacity ?capacity } GROUP BY ?categoryName 8

  10. Multidimensional Analytics for RDF

  11. Multidimensional Data Model [KR13] (Base) Facts Describe events and measurements Mostly numeric and continuous Dimensions Provide context for facts If numeric, then often discrete Can embody structure Measures Are computed from grouped facts Are “arranged” in (hyper-)cubes 10

  12. Multidimensional Data Model [KR13] (Base) Facts Describe events and measurements Slice Mostly numeric and continuous Dimensions Provide context for facts Dice If numeric, then often discrete Can embody structure Measures Drill-down Are computed from grouped facts Roll-up Are “arranged” in (hyper-)cubes 10

  13. Multidimensional Data Model [KR13] (Base) Facts Star schema Describe events and measurements Slice Mostly numeric and continuous Dimensions Provide context for facts Dice If numeric, then often discrete Snowflake schema Can embody structure Measures Drill-down Are computed from grouped facts Roll-up Are “arranged” in (hyper-)cubes 10

  14. From Intensional to Extensional Analytics MD Query User Intension ETL Data Warehouse Data Transformation Intension fixed by domain expert or metadata Import data using ETL process 11

  15. From Intensional to Extensional Analytics MD Query User User Intension Intension ETL Graph Query MD Query Data Warehouse MD Model ... Data Transformation Query Generation Intension fixed by domain Intension fixed by metadata expert or metadata Generate SPARQL queries Import data using ETL from model process 11

  16. From Intensional to Extensional Analytics MD Query Intension & MD Query User User User Intension Intension Graph Query ETL Graph Query MD Query Data Warehouse MD Model ... Time Data Transformation Query Generation Extensional Intension fixed by domain Intension fixed by metadata Intension not fixed up-front expert or metadata Generate SPARQL queries Generate graph queries Import data using ETL from model from user-specified process intension 11

  17. SPARQLytics for the Data Enthusiast SPARQLytics Workflow Artifacts Repository DSL Commands Fact Message User Dimension Time Result Dimension Location Query Cube Postings Query . Generator . . SPARQL endpoint 12

  18. SPARQLytics for the Data Enthusiast SPARQLytics Workflow Example Artifacts Repository USING REPOSITORY "myrepo"; DSL SELECT FACTS { Commands Fact Message ?person rdf:type snvoc:Person ; User snvoc:birthday ?birthday . Dimension Time FILTER (YEAR(NOW()) - YEAR(?birthday) >= 18) Result }; Dimension Location DEFINE DIMENSION "Location" FROM ( ?person snvoc:isLocatedIn ?city . Query Cube Postings Query ?city snvoc:isPartOf ?country . . Generator . ?country snvoc:isPartOf ?continent . ) WITH ( SPARQL LEVEL "City" AS ?city, endpoint LEVEL "Country" AS ?country, LEVEL "Continent" AS ?continent ); 1. Create artifacts in repository DEFINE MEASURE "Avg. No. Languages" AS COUNT(DISTINCT ?language) WHERE ( ?person snvoc:speaks ?language ) WITH "AVG"; CREATE CUBE "QB" FROM "Location", ... WITH "Avg. No. Languages", ...; 12

  19. SPARQLytics for the Data Enthusiast SPARQLytics Workflow Example Artifacts Repository USING CUBE "QB" OVER <http://localhost:3030/ds/sparql>; DSL SLICE("Location", "Country", dbpedia:Italy); Commands Fact Message COMPUTE ("Avg. No. Languages"); User Dimension Time Result Dimension Location Query Cube Postings Query . Generator . . SPARQL endpoint 1. Create artifacts in repository 2. Start session re-using artifacts 12

  20. SPARQLytics for the Data Enthusiast SPARQLytics Workflow Example Artifacts Repository USING CUBE "QB" OVER <http://localhost:3030/ds/sparql>; DSL SLICE("Location", "Country", dbpedia:Italy); Commands Fact Message COMPUTE ("Avg. No. Languages"); User Dimension Time RESET FILTER("Location", "Country"); Result Dimension Location ROLLUP("Location", 1); COMPUTE ("Avg. No. Languages"); Query Cube Postings Query ... . Generator . . SPARQL endpoint 1. Create artifacts in repository 2. Start session re-using artifacts 3. Iteratively explore data, optionally create additional artifacts 12

  21. Summary Big Graph Data Not just social networks, also business scenarios Not enough data scientists, enable data enthusiasts RDF and SPARQL Linked Open Data a rich source of information SPARQL does not expose multidimensional concepts SPARQLytics Re-use core SPARQL elements for defining multidimensional model Generate complex SPARQL queries from analytical session Stateful approach integrates well with data enthusiasts workflow 13

  22. Additional Material & References

  23. References I Charu C. Aggarwal and Haixun Wang. A Survey of Clustering Algorithms for Graph Data. In Charu C. Aggarwal and Haixun Wang, editors, Managing and Mining Graph Data , volume 40 of Advances in Database Systems , chapter 9, pages 275–301. Springer US, 2010. Seyed-Mehdi-Reza Beheshti, Boualem Benatallah, Hamid Reza Motahari-Nezhad, and Mohammad Allahbakhsh. A framework and a language for on-line analytical processing on graphs. In Proceedings of the 13 th International Conference on Web Information Systems Engineering (WISE) , volume 7651 of Lecture Notes in Computer Science , pages 213–227. Springer, 2012. Peter Boncz. LDBC: Benchmarks for Graph and RDF Data Management. In Proc. IDEAS , pages 1–2. ACM, 2013. Fabio Crestani. Application of spreading activation techniques in information retrieval. Artificial Intelligence Review , 11(6):453–482, December 1997. Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, and Philip S. Yu. Graph OLAP: Towards Online Analytical Processing on Graphs. In Proceedings of the 8 th International Conference on Data Mining , pages 103–112. IEEE, December 2008. Hartmut Ehrig, Gregor Engels, Hans-J¨ org Kreowski, and Grzegorz Rozenberg, editors. Handbook of Graph Grammars and Computing by Graph Transformation: Applications, Languages and Tools , volume 2. World Scientific, 1997. 15

Recommend


More recommend