Enabling Completeness-aware Querying in SPARQL Luis Galárraga, Katja Hose, Simon Razniewski May 14 th , 2017 WebDB, Chicago 1
Outline ● Completeness in RDF knowledge bases ● Completeness oracles ● Our vision – Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL ● Summary & conclusions 2
Outline ● Completeness in RDF knowledge bases ● Completeness oracles ● Our vision – Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL ● Summary & conclusions 3
Outline ● Completeness in RDF knowledge bases ● Completeness oracles ● Our vision – Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL ● Summary & conclusions 4
RDF Knowledge Bases (KBs) Collection of structured knowledge offjcialLanguage Français family offjcialLanguage Italiano Romance family Switzerland citizenOf Leonhard Euler 5
Plenty of KBs out there! 6
Plenty of KBs out there! 7
KBs in action 8
Outline ● Completeness in RDF knowledge bases ● Completeness oracles ● Our vision – Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL ● Summary & conclusions 9
Completeness in RDF KBs ● KBs are highly incomplete – 1% of people have a citizenship in YAGO 10
Completeness in RDF KBs ● KBs are highly incomplete – 1% of people have a citizenship in YAGO ● We do not know where the incompleteness lies 11
Completeness in RDF KBs ● KBs are highly incomplete – 1% of people have a citizenship in YAGO ● We do not know where the incompleteness lies – A single person in the KB could be actually single or the KB may be incomplete 12
Completeness in RDF KBs ● KBs are highly incomplete – 1% of people have a citizenship in YAGO ● We do not know where the incompleteness lies – A single person in the KB could be actually single or the KB may be incomplete ● Problems for data producers and consumers 13
Completeness in RDF KBs ● KBs are highly incomplete – 1% of people have a citizenship in YAGO ● We do not know where the incompleteness lies – A single person in the KB could be actually single or the KB may be incomplete ● Problems for data producers and consumers – Consumers: no completeness guarantees for queries. – Producers: which parts of the KB need to be populated? 14
Completeness ● Defjned with respect to a query q via a complete hypothetical KB K*. 15
Completeness ● Defjned with respect to a query q via a complete hypothetical KB K*. – A query q is complete in K, if q(K*) ⊆ q(K). 16
Completeness ● Defjned with respect to a query q via a complete hypothetical KB K*. – A query q is complete in K, if q(K*) ⊆ q(K). SELECT ?x WHERE { Switzerland offjcialLang ?x } offjcialLanguage Français offjcialLanguage Italiano Switzerland 17
Completeness ● Defjned with respect to a query q via a complete hypothetical KB K*. – A query q is complete in K, if q(K*) ⊆ q(K). SELECT ?x WHERE { Switzerland offjcialLang ?x } offjcialLanguage Are these all the offjcial Français languages of offjcialLanguage Switzerland? Italiano Switzerland 18
Completeness ● Defjned with respect to a query q via a complete hypothetical KB K*. – A query q is complete in K, if q(K*) ⊆ q(K). SELECT ?x WHERE { Switzerland offjcialLang ?x } offjcialLanguage Are these all the offjcial Français languages of offjcialLanguage Switzerland? Italiano Switzerland [Incomplete query] 19
Completeness in RDF data ● Wikidata provides no value annotations 20
Completeness in RDF data ● Wikidata provides no value annotations SELECT ?x WHERE { USA offjcialLang ?x } offjcialLanguage 21
Completeness in RDF data ● Wikidata provides no value annotations SELECT ?x WHERE { USA offjcialLang ?x } offjcialLanguage [Complete query] 22
Completeness in RDF data ● Wikidata provides no value annotations SELECT ?x WHERE { USA offjcialLang ?x } offjcialLanguage [Complete query] ● Not applicable if we know some offjcial language 23
Completeness in RDF data ● Wikidata provides no value annotations SELECT ?x WHERE { USA offjcialLang ?x } offjcialLanguage [Complete query] ● Not applicable if we know some offjcial language offjcialLanguage Français offjcialLanguage Italiano Switzerland 24
Outline ● Completeness in RDF knowledge bases ● Completeness oracles ● Our vision – Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL ● Summary & conclusions 25
Completeness oracle ● Boolean function ɷ(q, K) that guesses the completeness of a query q in a KB K . 26
SR completeness oracle ● Function ɷ that guesses the completeness of queries of the form [Galárraga et. al, 2017]: SELECT ?x WHERE { subject relation ?x } 27
SR completeness oracle ● Function ɷ that guesses the completeness of queries of the form [Galárraga et. al, 2017]: SELECT ?x WHERE { subject relation ?x } ● We use the notation ɷ(subject, relation) 28
SR completeness oracle ● Function ɷ that guesses the completeness of queries of the form [Galárraga et. al, 2017]: SELECT ?x WHERE { subject relation ?x } ● We use the notation ɷ(subject, relation) ● ɷ = pca(s, r) = partial completeness assumption 29
SR completeness oracle ● Function ɷ that guesses the completeness of queries of the form [Galárraga et. al, 2017]: SELECT ?x WHERE { subject relation ?x } ● We use the notation ɷ(subject, relation) ● ɷ = pca(s, r) = partial completeness assumption – Query is complete in KB if at least one answer is known 30
Evaluating SR oracles ɷ = pca(s, r) = partial completeness assumption Gold standard: Complete instances in the domain of o ffjcialLanguage Italiano Français Dansk Français Français Italiano 31
Evaluating SR oracles ɷ = pca(s, r) = partial completeness assumption Gold standard: Complete instances in the domain of o ffjcialLanguage Italiano PCA oracle Français Dansk Français Français Italiano 32
Evaluating SR oracles ɷ = american-country-oracle(s, r) Gold standard: Complete instances in the domain of o ffjcialLanguage Italiano PCA oracle Français Dansk Français Français Italiano American country 33 oracle
Evaluating SR oracles PCA oracle American country oracle Precision = 3/5 Precision = 1/2 Recall = 3/4 Recall = 1/4 Gold standard: Complete instances in the domain of o ffjcialLanguage Italiano PCA oracle Français Dansk Français Français Italiano American country 34 oracle
SR completeness oracles ● Closed World Assumption: cwa(s, r) = true ● PCA: pca(s, r) = o : r(s, o) ∃ ● Cardinality: card(s, r) = #(o : r(s, o)) ≥ k ● Popular entities: popularity pop (s, r) = pop(s) ● No-chg over time: nochange chg (s, r) = ∼ chg(s, r) ● Star : star r 1 ,..,r n (s, r) = i {1,..,n} : o : r ∀ ∊ ∃ i (s, o) ● Class: class c (s, r) = type(s, c) ● Rule mining oracle 35
Rule mining SR oracle ● Based on completeness rules ⇒ notype(x, Adult), type(x, Person) complete(x, hasChild) ⇒ dateOfDeath(x, y), lessThan 1 (x, placeOfDeath) incomplete(x, placeOfDeath) 36
Rule mining SR oracle ● Based on completeness rules ⇒ notype(x, Adult), type(x, Person) complete(x, hasChild) ⇒ dateOfDeath(x, y), lessThan 1 (x, placeOfDeath) incomplete(x, placeOfDeath) ● Learned using the AMIE [Galárraga et. al, 2013] rule mining system – On gold standard built via crowdsourcing 37
Rule mining SR oracle ● Based on completeness rules ⇒ notype(x, Adult), type(x, Person) complete(x, hasChild) ⇒ dateOfDeath(x, y), lessThan 1 (x, placeOfDeath) incomplete(x, placeOfDeath) ● Learned using the AMIE [Galárraga et. al, 2013] rule mining system – On gold standard built via crowdsourcing – 100% F1-measure for functional relations, quite good for relations hasChild , graduatedFrom 38
Outline ● Completeness in RDF knowledge bases ● Completeness oracles ● Our vision – Representations for completeness oracles – Reasoning with completeness oracles – Enabling completeness in SPARQL ● Summary & conclusions 40
Representing completeness oracles ● Extensional approach [Darari, et. Al, 2013] – An oracle is a collection of completeness statements about queries 41
Representing completeness oracles ● Extensional approach [Darari, et. Al, 2013] – An oracle is a collection of completeness statements about queries SELECT DISTINCT ?y WHERE { ?x hasOffjcialLang ?y } is complete in the KB 42
Representing completeness oracles ● Extensional approach [Darari, et. Al, 2013] – An oracle is a collection of completeness statements about queries SELECT DISTINCT ?y WHERE { ?x hasOffjcialLang ?y } is complete in the KB Variable a ?x subject hasPattern statement pattern p hasOffjcialLang r e d i c a t e object distinct h a s P r o j e c t i o n V a r i a b l e a ?y true 43
Recommend
More recommend