Applications of Rule Mining in Knowledge Bases Luis Galárraga November 3 rd , 2014 PIKM, Shanghai 1
Knowledge Bases (KBs) Barack Obama hasChild born On hasChild Malia Aug 4, 1961 hasChild marriedTo hasChild Michelle Sasha 2
KBs in action 3
KBs in action 4
Some popular KBs 5
Rule Mining in KBs hasChild Barack Obama born On hasChild Malia Aug 4, 1961 hasChild marriedTo hasChild Michelle Sasha 6
Rule Mining in KBs hasChild y x born On hasChild Malia Aug 4, 1961 hasChild marriedTo z hasChild Sasha 7
Rule Mining in KBs hasChild Barack Obama born On hasChild Malia Aug 4, 1961 hasChild marriedTo hasChild Michelle Sasha 8
Rule Mining in KBs hasChild y born On hasChild Malia Aug 4, 1961 hasChild marriedTo x z hasChild 9
Rule Mining in KBs hasChild y born On hasChild Malia Aug 4, 1961 hasChild marriedTo x z hasChild hasChild(y, x), marriedTo(y, z) => hasChild(z, x) 10
Rule Mining in KBs KBs are often incomplete Elvis Presley hasChild marriedTo Lisa Marie Priscilla 11
Rule Mining in KBs Rules can be used to make predictions Elvis Presley hasChild marriedTo hasChild? Lisa Marie Priscilla hasChild(y, x), marriedTo(y, z) => hasChild(z, x) 12
Rule Mining in KBs Missing information is counter-evidence under the Closed World Assumption Elvis Presley hasChild isMarriedTo hasChild Lisa Marie Priscilla 13
Rule Mining in KBs KBs operate under the Open World Assumption Elvis Presley hasChild isMarriedTo hasChild Lisa Marie Priscilla 14
Partial Completeness Assumption (PCA) Malia hasChild hasChild Michelle Sasha 15
Partial Completeness Assumption (PCA) Malia hasChild hasChild hasChild Michelle Sasha 16
PCA for Rule Mining hasChild hasChild Malia hasChild marriedTo hasChild Michelle Sasha hasChild(y, x), marriedTo(y, z) => hasChild(z, x) 17
PCA for Rule Mining hasChild hasChild Malia hasChild marriedTo hasChild Michelle Sasha hasChild(y, x), marriedTo(y, z) => hasChild(z, x) Hits Misses 2 0 18
PCA for Rule Mining Prince Charles marriedTo hasChild Camilla hasChild hasChild Prince William Tom Laura hasChild(y, x), marriedTo(y, z) => hasChild(z, x) Hits Misses 19 2 0
PCA for Rule Mining Prince Charles marriedTo hasChild Camilla hasChild hasChild hasChild Prince William Tom Laura hasChild(y, x), marriedTo(y, z) => hasChild(z, x) Hits Misses 20 2 0
PCA for Rule Mining Prince Charles marriedTo hasChild Camilla hasChild hasChild Prince William Tom Laura hasChild(y, x), marriedTo(y, z) => hasChild(z, x) Hits Misses 21 2 1
PCA for Rule Mining Elvis Presley hasChild marriedTo Lisa Marie Priscilla hasChild(y, x), marriedTo(y, z) => hasChild(z, x) Hits Misses 2 1 22
PCA for Rule Mining Elvis Presley hasChild marriedTo hasChild Lisa Marie Priscilla hasChild(y, x), marriedTo(y, z) => hasChild(z, x) Hits Misses 2 1 23
PCA for Rule Mining Elvis Presley Standard confidence counts it hasChild marriedTo as a miss hasChild Lisa Marie Priscilla hasChild(y, x), marriedTo(y, z) => hasChild(z, x) Hits Misses 2 1 24
PCA for Rule Mining Elvis Presley Standard PCA Confidence Confidence hasChild marriedTo 2/4 = 50% 2/3 = 66.67% hasChild Lisa Marie Priscilla hasChild(y, x), marriedTo(y, z) => hasChild(z, x) Hits Misses 2 1 25
AMIE: Association Rule Mining Under Incomplete Evidence ● AMIE is a system that learns closed Horn rules hasChild(y, x), marriedTo(y, z) => hasChild(z, x) ● It performs exhaustive search based on: – Minimum support threshold – Mining operators – Monotonicity of support for pruning – Optimized in-memory database – Confidence gain is used to prune the output. Luis Galárraga, Christina Teflioudi, Katja Hose, Fabian Suchanek. AMIE: Association Rule Mining Under Incomplete Evidence in Ontological Knowledge Bases. In WWW, 2013. Best student paper award. 26
hasChild(y, x), marriedTo(y, z) => hasChild(z, x) hasChild x z 27
hasChild x z 28
hasChild marriedTo x z influences …. hasChild ?r Add dangling atom (O D ) y z x 29
hasChild marriedTo x z influences …. hasChild ?r Add dangling atom (O D ) y z x marriedTo hasChild y z x 30
hasChild marriedTo x z influences …. hasChild ?r Add dangling atom (O D ) y z x marriedTo hasChild y z x marriedTo hasChild Add closing atom (O C ) y z x ?r hasChild supervises … 31
hasChild marriedTo x z influences …. hasChild ?r Add dangling atom (O D ) y z x marriedTo hasChild y z x marriedTo hasChild Add closing atom (O C ) y z x ?r hasChild supervises … marriedTo hasChild y z x hasChild 32
hasChild marriedTo x z influences …. hasChild ?r Add dangling atom (O D ) y z x marriedTo hasChild y z x marriedTo hasChild Add closing atom (O C ) y z x ?r hasChild supervises … marriedTo hasChild y z x hasChild hasChild(y, x), marriedTo(y, z) => hasChild(z, x) 33
AMIE: Association Rule Mining Under Incomplete Evidence k Tailored In-memory DB Minimum support 1 threshold 1 RDF KB Concurrent mining implementation 34
AMIE: Association Rule Mining Under Incomplete Evidence PCA Confidence used to rank rules k Tailored In-memory DB Minimum support 1 threshold 1 RDF KB Concurrent mining implementation 35
AMIE: Association Rule Mining Under Incomplete Evidence Some rules mined by AMIE on YAGO: ∧ isMarriedTo(x, y) livesIn(x, z) => livesIn(y, z) isCitizenOf(x, y) => livesIn(x, y) hasAdvisor(x, y) graduatedFrom(x, z) => worksAt(y, z) ∧ hasWonPrize(x, Gottfried Wilhelm Leibniz Prize) => livesIn(x, Germany)
AMIE: Association Rule Mining Under Incomplete Evidence AMIE finds rules in medium-size ontologies in a few minutes. Dataset Runtime Rules Facts 1M 3.62min 138 YAGO2 YAGO2 (const) 1M 17.76min 18K 6.7M 2.89min 6.9K Dbpedia (2 atoms)
AMIE: Association Rule Mining Under Incomplete Evidence PCA confidence better for prediction than standard confidence. 38
R ules for O ntology S chema A lignment Rule mining can be used for data integration KB 1 KB 2 Malia parent hasChild Malia sibling hasChild parent Barack Obama President Obama 39 Sasha Sasha
R ules for O ntology S chema A lignment Use instance alignments to align the schemas KB 1 KB 2 Malia sameAs parent hasChild Malia sameAs sibling hasChild parent Barack Obama President Obama sameAs 40 Sasha Sasha
R ules for O ntology S chema A lignment hasChild(x, y) <=> parent(y, x) KB 1 KB 2 Malia sameAs parent hasChild Malia sameAs sibling hasChild parent Barack Obama President Obama sameAs 41 Sasha Sasha
R ules for O ntology S chema A lignment hasChild(y, x) hasChild(y, z) => sibling(x, z) KB 1 KB 2 Malia sameAs parent hasChild Malia sameAs sibling hasChild parent Barack Obama President Obama sameAs 42 Sasha Sasha
R ules for O ntology S chema A lignment Run AMIE on a coalesce of the KBs Malia AMIE hasChild parent sibling hasChild Barack Obama parent hasChild <=> parent -1 hasChild(y, x) hasChild(y, z) => sibling(x, z) 43 Sasha
ROSA rules ROSA rules are a class of cross-ontology alignments r(x, y) => r'(x, y) R-subsumption r(x, y) <=> r'(x, y) R-equivalence type(x, C) => type(x, C') C-subsumption r1(x, y), r2(y, z) => r'(x, z) 2-hops translation r(x, z) r(y, z) => r'(x, y) Triangle alignment r1(x, y), r2(x, V) => r'(x, y) Specific R-subsumption r(x, V) => r'(x, V') Attribute-Value translation r1(x, V1), r2(x, V2) => r'(x, V') 2-values translation Luis Galárraga, Nicoleta Preda, Fabian Suchanek. Mining Rules to Align Knowledge Bases. In Automated Knowledge Base Construction Workshop (AKBC), 2013. 44
Rule Mining for canonicalization of relations Open KBs express relations in multiple ways Harvard Law is a graduate of School earned degree from earned degree Barack Obama from Columbia University 45
Rule Mining for canonicalization of relations Problem for query answering Harvard Law is a graduate of School earned degree from earned degree Barack Obama from Columbia University 46
Rule Mining for canonicalization of relations Barack Obama is a graduate of? Harvard Law is a graduate of School earned degree from earned degree Barack Obama from Columbia University 47
Rule Mining for canonicalization of relations Use rule mining to find equivalent relations Harvard Law is a graduate of AMIE School earned degree from earned degree Barack Obama from Columbia is a graduate of <=> earned degree from University Luis Galárraga, Geremy Heitz, Kevin Murphy, Fabian Suchanek. Canonicalizing Open Knowledge Bases. In CIKM, 2014 48
Research outlook ● Numerical correlations export(x, y), import(x, z) => cad(x, 1.2 * (z - y)) ● Probabilistic model to learn confidence of predictions – Multiple rules can predict a fact – Integrate soft and hard constraints 49
Recommend
More recommend