COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL - PowerPoint PPT Presentation

COMP60411: Modelling Data on the Web   Graphs, RDF, RDFS, SPARQL   Week 5 Bijan Parsia & Uli Sattler University of Manchester � 1

Feedback on SE3 In 200-300 words, explain [ … ] In particular, explain which style of query is the "most robust" in the face of such format changes. (As usual, if you are unsure whether you understand the exact meaning of a term, e.g., 'robust', you should look it up.) Wikipedia : In computer science, robustness is the ability of a computer system to cope with errors during execution. … • only few discussed robustness! – many mentioned which style requires which changes – but few discussed how that affects • likelihood of errors • which kind of errors (silent/breaking totally) • many confused format with schema – but they are different concepts! � 2

Feedback on SE3 • mostly better :) • I see clear improvements in most students! • an XPath expression is an XQuery query • some still make things up : – “X is mostly used for Y” – “X is better for efficiency than Y” – “Using X makes processing faster” – … statements like this require evidence/reference:   “According to [3], X is mostly used for Y”. • consider your situations carefully: – do we need to update schema? • if yes, … • if no, … � 3

Formats for ExtRep of data (SE4) • a format (e.g., for occupancy of houses) consists of 1. a data structure formalism (csv, table, XML, JSON, … ) 2. a conceptual model, independent of [1] 3. schema(s) formalising/describing the format • documents describing (some aspects of our) design • e.g., occupancy.rnc, occupancy.sch, … 4. the set of (XML) documents conforming to a format • concrete embodiments of our design • e.g., an XML document d escribing Smiths, HighBrow, … • [2&3] the CM & schema can be • explicit/tangible or implicit • written down in a note versus ‘in our head’ or by example • formalised or unformalised • ER-Diagram, XSD versus drawing, description in English • [4] the documents are implicit

Formats for ExtRep of data (SE4) e.g., XML-based our schema S docs   conforming   to S all XML docs in your format � 5

Formats for ExtRep of data (SE4) • Consider 2 formats F 1 = <DS 1 , CM 1 , S 1 , D 1 >   F 2 = <DS 2 , CM 2 , S 2 , D 2 > • it may be that • S 1 only captures some aspects of D 1 • S 1 is only a description in English • D 1 = D 2 but S 1 ≠ S 2 • DS 1 = DS 2 and CM 1 = CM 2 but S 1 ≠ S 2 and D 1 ≠ D 2 • … and that F 1 makes better use of DS 1 ’s features than DS 2 • When you design a format , you design each of its aspect and – how much you make explicit – how you formalise CM, S � 6

Today • General concepts: recap of – data models – pain points – formats – error handling – schemas, … • New data model & technologies: graph-based DM – RDF – RDFS, a schema language for RDF • but quite different from all other schema languages – SPARQL, a data manipulation mechanism for RDF • Retrospective session � 7

Re-cap of Data Models � 8

Recall: core concepts • We look at data models, Data Infor Level unit mati • shape: none, tables, trees, graphs, … cogniti • and data structure formalisms for the above applica tree – [tables] csv files, SQL tables adorn s nam Element – [trees] sets of feature-value pairs, XML, JSON Element Element Attribute c esp n a h ace – [graphs] RDF ot sc e tree Element well- • and schema languages for the above Element Element Attribute t com <foo:N o plex ame – [SQL tables] SQL simp <foo:N k le ame – [XML] RelaxNG, XSD, Schematron, … e charact < which er foo:Na encod – [JSON] JSON Schema bit 10011010 • and manipulation mechanisms – [SQL tables] SQL – [XML] DOM, SAX, XQuery, … – [JSON] JSON API, … � 9

Recall: core concepts • Each Data Model was motivated by – representational needs of some domain and – pain points • Fundamental Pain Points –Mismatch between the domain and the data structure • Tech-specific Pain Points –XPath Limitations • Alleviating pain It’s important to understand the – Try to squish it in – pain points & • E.g., encoding trees in SQL – trade offs • E.g., layering – Polyglot persistence • Use multiple data models � 10

Domains/applications discussed so far • People, addresses, personal data – with(out) management structure • SwissProt protein data • Cartoons • Arithmetic expressions – [CW1] easy, binary expressions with students, attempts, etc. – [CW2, CW3] nested expressions of varying parity • Horse sharing – as an example for ‘sharing’ applications – e.g., AirBnB, MoBike, ride shares � 11

1st DM: Flat File • Domain : People, addresses,   personal data • in 1 (flat) csv file • Pain Points: • variable numbers of the "same" attribute • phone number • email address • … • inserting columns is painful • partial columns/NULL values aren’t great • companies have addresses – more than one! No data integrity guarantee! – and phone numbers, etc. � 12

From Flat File towards 2nd DM: Relational • Better Format • two 2 (flat) csv files • Pain Points: • sorting destroys the   relationship • we used row numbers to connect the 2 files • sorting changes the row number! • hard to see the record • no longer a flat file • CSV format makes assumptions � 13

2nd DML: Relational Model for Addresses • M1 1.Design a conceptual model for this domain 2.normalise it 3.create different tables for suitable aspects of this domain 4.linked via “foreign keys” offered by relational formalism ➡ no more pain points: • this domain fits nicely our “table” relational data model (RDM) • RDM also comes with a suitable • data manipulation language for • querying SQL • sorting • inserting tuples And with • schema language data integrity guarantee! • constraining values • expressing functional/key constraints � 14

From Relational to XML (1) • Domain : People, addresses,   management structure Complicated to write/ maintain queries • in relational/SQL tables • 2 Pain points: 1. (cumbersome) querying - it requires (too) many joins! 2. (nigh impossible) ensuring integrity - unbounded ‘manages’ paths require recursive queries/joins to avoid cyclic management structure Employees Management Manager ID ManageeID Employee ID Postcode City … 1234123 M16 0P2 Manchester … 1234124 1234123 1234124 M2 3OZ Manchester … 1234567 1234124 1234567 SW1 A London … 1234123 1234567 ... ... ... ... ... ... � 15

From Relational to XML (2) • Domain : Proteins • Pain points: – cumbersome: Protein Alternative Name ID • querying: too many joins! 1234123 ATP-dependent RNA helicase BRIP1 1234123 BRCA1-interacting protein C-terminal Protein Full Shor Organis ... helicase 1 ID Name t m 1234123 BRCA1-interacting Nam 1234123 Fanconi FAC Halorubr ... protein 1 anemia J um ... ... group J phage 1234567 ATP- N/A Gallus ... depend gallus / ent Chicken ... ... ... ... Protein Genes ID 1234123 BRIP1 1234123 BACH1 1234567 helicas e ... � 16

Graph-based Data Models � 17

New Domains • with new requirements: • Sociality – friend-of/knows/likes/acquainted-with/trusts/ … – works-with/colleague-of/ … – interacts-with/reacts-with/binds-to/activates/ … – student-of/fan-of/ … – cites – … – such relationships form   social/professional/bio-chemical/adademic networks – we focus on social here: knows   • How are they different to “manages” • How do we capture these? � 18

Draw an ER diagram of social networks involving • people • knows � 19

“Knows” in SQL - ER Diagram simple: � 20

“Knows” in SQL tables CREATE TABLE Persons CREATE TABLE knows ( ( PersonID int, Who int, LastName varchar(255), Whom int, FirstName varchar(255), FOREIGN KEY (Who)   Address varchar(255), REFERENCES Persons(P_Id), City varchar(255) FOREIGN KEY (Whom)   ); REFERENCES Persons(P_Id) ); not optimal - remember W1 � 21

“Knows” in SQL - Queries (1) CREATE TABLE Persons CREATE TABLE knows ( ( PersonID int, Who int, “friends of LastName varchar(255), Whom int, Bob Builder” FirstName varchar(255), FOREIGN KEY (Who)   Address varchar(255), REFERENCES Persons(P_Id), City varchar(255) FOREIGN KEY (Whom)   ); REFERENCES Persons(P_Id) ); How many people does Bob Builder know? SELECT COUNT(DISTINCT k.Whom) FROM Persons P, knows k WHERE ( P.PersonID = k.Who AND   P.FirstName = “Bob” AND   P.LastName = “Builder” ); � 22

“Knows” in SQL - Queries (2) CREATE TABLE Persons CREATE TABLE knows ( ( PersonID int, Who int, LastName varchar(255), Whom int, FirstName varchar(255), FOREIGN KEY (Who)   Address varchar(255), REFERENCES Persons(P_Id), City varchar(255) FOREIGN KEY (Whom)   ); REFERENCES Persons(P_Id) ); Give me the names of Bob Builder’s friends? SELECT P2.FirstName , P2.LastName FROM knows k, Persons P1, Persons P2 WHERE ( P1.FirstName = “Bob” AND   P1.LastName = “Builder” AND P1.PersonID = k.Who AND P2.PersonID = k.Whom ); � 23

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL - PowerPoint PPT Presentation

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia & Uli Sattler University of Manchester 1 Feedback on SE3 In 200-300 words, explain [ ] In particular, explain which style of query is the

The Resource Description Framework (RDF 1.1) M2 CPS RDF RDF is to the Semantic Web what HTML

COMP60411 Modelling Data on the Web More error handling & RDF, a graph-based DM

The RDF* and SPARQL* Approach to Annotate Statements in RDF and to Reconcile RDF and Property

RDF, RDFS and OWL: Graph Data Models for the Semantic Web Semantic Web: The Idea Semantic

SPARQL Query Language for RDF Motivation RDF, RDF Schema, OWL provide data and meta- data

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES RDFS Rule-based Reasoning Sebastian Rudolph Dresden,

IDM in-a-box Roland.Hedberg@adm.umu.se <owl:Class rdf:about="#OMThing">

RDF Topics Finish up XML. What is RDF? Why is it interesting? SPARQL: The

Introduction to RDF Sandro Hawke, W3C @sandhawke Semantic Web Tutorial ISWC 2010 Overview

COMP60411 Modelling Data On The Web Tim Morris & Uli Sattler Week 1 Introduction, Data

Economic and Environmental Rationales The RDF Industry Group welcomes you RDF Export: Analysis of

RDF* and SPARQL* An Alternatjve Approach to Statement-Level Metadata in RDF Olaf Hartjg

COMP60411: Modelling Data on the Web Tree Data Models Week 2 Tim Morris & Uli Sattler

Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema(Broekstra et. al.)

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness & Errors Week 4

Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA

Analytical Data Management with R Hannes Mhleisen /132 1 Overview 1. Motivations to use a

Anno unc e me nts FIT100 FIT100 FIT100 Quiz c a nc e le d fo r this we e k Anno unc e me

Guest Lecture Daniel Dao & Chad Cotton OVERVIEW What is Civitas Learning What We Do

C S C I 1 2 7 0 I n t r o d u c t i o n t o D a t a b a s e S y s

Streaming Grand Challenge Overview Graham Heyes February 12 th 2019 Where are we now? Online :

Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a

File Syst ems Last t ime we t alked about disk int ernals 11: File Syst em Basics Despit

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL - PowerPoint PPT Presentation

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia & Uli Sattler University of Manchester 1 Feedback on SE3 In 200-300 words, explain [ ] In particular, explain which style of query is the

The Resource Description Framework (RDF 1.1) M2 CPS RDF RDF is to the Semantic Web what HTML

COMP60411 Modelling Data on the Web More error handling &amp; RDF, a graph-based DM

The RDF* and SPARQL* Approach to Annotate Statements in RDF and to Reconcile RDF and Property

RDF, RDFS and OWL: Graph Data Models for the Semantic Web Semantic Web: The Idea Semantic

SPARQL Query Language for RDF Motivation RDF, RDF Schema, OWL provide data and meta- data

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES RDFS Rule-based Reasoning Sebastian Rudolph Dresden,

IDM in-a-box Roland.Hedberg@adm.umu.se &lt;owl:Class rdf:about=&quot;#OMThing&quot;&gt;

RDF Topics Finish up XML. What is RDF? Why is it interesting? SPARQL: The

Introduction to RDF Sandro Hawke, W3C @sandhawke Semantic Web Tutorial ISWC 2010 Overview

COMP60411 Modelling Data On The Web Tim Morris &amp; Uli Sattler Week 1 Introduction, Data

Economic and Environmental Rationales The RDF Industry Group welcomes you RDF Export: Analysis of

RDF* and SPARQL* An Alternatjve Approach to Statement-Level Metadata in RDF Olaf Hartjg

COMP60411: Modelling Data on the Web Tree Data Models Week 2 Tim Morris &amp; Uli Sattler

Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema(Broekstra et. al.)

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness &amp; Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness &amp; Errors Week 4

Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA

Analytical Data Management with R Hannes Mhleisen /132 1 Overview 1. Motivations to use a

Anno unc e me nts FIT100 FIT100 FIT100 Quiz c a nc e le d fo r this we e k Anno unc e me

Guest Lecture Daniel Dao &amp; Chad Cotton OVERVIEW What is Civitas Learning What We Do

C S C I 1 2 7 0 I n t r o d u c t i o n t o D a t a b a s e S y s

Streaming Grand Challenge Overview Graham Heyes February 12 th 2019 Where are we now? Online :

Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a

File Syst ems Last t ime we t alked about disk int ernals 11: File Syst em Basics Despit

COMP60411 Modelling Data on the Web More error handling & RDF, a graph-based DM

IDM in-a-box Roland.Hedberg@adm.umu.se <owl:Class rdf:about="#OMThing">

COMP60411 Modelling Data On The Web Tim Morris & Uli Sattler Week 1 Introduction, Data

COMP60411: Modelling Data on the Web Tree Data Models Week 2 Tim Morris & Uli Sattler

COMP60411: Modelling Data on the Web SAX, Schematron, JSON, Robustness & Errors Week 4

COMP60411: Modelling Data on the Web Schematron, SAX, JSON, Robustness & Errors Week 4

Guest Lecture Daniel Dao & Chad Cotton OVERVIEW What is Civitas Learning What We Do