Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 1 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University hasCapital Canada Ottawa hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 1
Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 2 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University hasCapital Canada Ottawa hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 2
Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 1 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University | ΔΓ 1 | = 2 hasCapital Canada Ottawa hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 1
Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 2 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University | ΔΓ 1 | = 2 hasCapital Canada Ottawa | ΔΓ 2 | = 0 hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 2
Recursive Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 1 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University hasCapital Canada Ottawa hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 1
Recursive Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 3,4 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University hasCapital Canada Ottawa Overlapping hasCapital Unites States Washington, Independent D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 3,4
Partitioning Partitioned joins: M 1 ⨝ Γ 1 ⨝ Γ 1 and M 2 ⨝ Γ 2 ⨝ Γ 2 , etc. Worst case complexity |M|| γ | 2 , where γ is the maximum partition size.
Knowledge Expansion Partition- Parallel Relational Cross ing Inference Joins Validation
Parallel Inference H(x, y) ← b 1 (x, z), b 2 (y, z)
Parallel Inference F 1 R 1 R 2 R 3 F 1 F 4 F 4 F 5 R 1 R 1 R 2 R 2 R 3 R 3 F 5 R 1 R 2 R 3
Parallel Inference F 1 0 F 1 0, R 1 , R 2 F 1 R 1 F 2 , F 5 R 2 F 4 R 3 F 2 R 1 , R 2 , R 3 F 4 0 F 2 , F 3 R 1 F 3 R 1 , R 2 F 1 R 2 F 2 R 3 F 4 0, R 2 , R 3 F 5 0 F 5 R 1 F 3 , F 4 R 2 F 5 0, R 1 , R 2 , R 3 F 4 R 3
Parallel Inference F 2 F 3
Performance 927M new facts in 19 hours. First inference engine on Freebase.
Efficiency Improvement ProbKB 25 Tuffy 100K Rules 2,196 30 200K Rules 4,271 65 500K Rules 9,045 210 1M Rules 16,507 0 4500 9000 13500 18000 Runtime/s
Effect of Partitioning 41.915 200M 24.693 100M 16.499 50M 11.842 20M 9.668 10M 6.374 5M 0 12.5 25 37.5 50 Runtime/h
Knowledge Expansion Partition- Parallel Relational Cross ing Inference Joins Validation
AMIE+ Validation YAGO2s Rules YAGO2
AMIE+ Validation Rules Facts YAGO2s YAGO2
AMIE+ Validation Facts Rules YAGO2s YAGO2
AMIE+ Validation Facts Rules YAGO2s YAGO2
Limitations KB availability. Inference biase.
Cross Validation Test OP Train Rules
Cross Validation Infer Test Facts OP Train Rules
Cross Validation Infer Test Facts OP Train Rules Verify
Cross Validation : Inferred facts sorted by descending confidence. Γ + Recall( Γ + | Γ ) = | Γ + − Γ | Recall: | Γ | Precision( Γ + | Γ ) = | Γ + ∩ Γ | Precision: | Γ + |
Cross Validation Freebase YAGO2s 1 0.75 Precision 0.5 0.25 0 0 0.25 0.5 0.75 1 Recall
Inferred Facts 21,463,725 Music 1,384,209 Book 1,361,939 Film 1,354,632 People 916,438 Location 0 7500000 15000000 22500000 30000000 # Correct Inferred Facts
Examples music/album/artist(Live Era ’87-’93, Guns N’ Roses) book/series_editor/book_edition_series_edited( Janet Morris, Heroes in Hell by Baen Books) film/film/production_companies(Butt Spanking, Bacchus) user/anjackson/default_domain/bitstream_encoding/ format(PDF 1.4, Portable Document Format)
Inferred Errors bornIn(Mandel, Berlin) isLocatedIn(Baltimore, Berlin) bornIn(Mandel, Baltimore)
Inferred Errors isLocatedIn(Baltimore, Berlin) bornIn(Freud, Berlin) bornIn(Freud, Baltimore)
Functional Constraints wasBornIn isCitizenOf isMarriedTo isCapitalOf isLocatedIn headquaterIn
Functional Constraints bornIn(Mandel, Berlin) bornIn(Mandel, Baltimore)
Functional Constraints Without constraints With constraints 0.9 0.675 Precision 0.45 0.225 0 7500 15000 22500 30000 Estimated # of correct facts
Error Analysis Others Incorrect facts 3% 6% Ambiguities (detected) 34% Incorrect rules 33% Ambiguous join keys 24%
Error Analysis Others Incorrect facts 3% 6% Ambiguities (detected) 34% Incorrect rules 33% Ambiguous join keys 24%
Knowledge Activation
Knowledge Activation Aristotle Plato Philosophy Francis Cicero Bacon John Meta Locke physics
Knowledge Activation Aristotle y B d e c n e u fl n i Plato Philosophy Francis Cicero Bacon John Meta Locke physics
Knowledge Activation Aristotle mainInterest Plato Philosophy Francis Cicero Bacon John Meta Locke physics
Knowledge Activation Aristotle Plato Philosophy Francis Cicero 1.00 Bacon John Meta Locke physics
Knowledge Activation 0.42 Aristotle y B d e c n e u fl 0.29 n i Plato Philosophy influencedBy influencedBy Francis Cicero 1.00 0.00 Bacon influence John Meta 0.00 Locke physics
Knowledge Activation 0.42 Aristotle mainInterest 0.29 0.00 Plato Philosophy mainInterest coreSubject Francis Cicero 1.00 0.00 Bacon John Meta mainInterest 0.00 0.00 Locke physics
x w p x y Francis 1.0 influencedBy Francis Aristotle influencedBy Francis Plato Q influencedBy Francis Cicero influence Francis John Locke mainInterest Aristotle Philosophy mainInterest Plato Philosophy coreSubject Cicero Philosophy mainInterest John Locke Meta physics Γ
x w p x y Francis 1.0 influencedBy Francis Aristotle influencedBy Francis Plato Q influencedBy Francis Cicero x w influence Francis John Locke Francis 1 mainInterest Aristotle Philosophy Francis 3 mainInterest Plato Philosophy Plato 3 coreSubject Cicero Philosophy Aristotle 6 mainInterest John Locke Meta physics H Γ
x w p x y Francis 1.0 influencedBy Francis Aristotle influencedBy Francis Plato Q influencedBy Francis Cicero x w influence Francis John Locke Francis 1 mainInterest Aristotle Philosophy Francis 3 mainInterest Plato Philosophy Plato 3 coreSubject Cicero Philosophy Aristotle 6 mainInterest John Locke Meta physics H Γ
Query Optimization Build materialized views; Query assumed to be small; Q ⨝ Γ is efficient with indexes; Use (Q ⨝ Γ ) ⨝ H to reduce result size.
Experiments 70 52.5 Runtime/ms 35 17.5 0 Query 1Query 2Query 3 Query 1Query 2Query 3 Query 1Query 2Query 3 Iter. 1 Iter. 2 Iter 3.
Experiments More than 500 times of speedup 10,900 10000 Runtime/ms 100 11.29 1 SemMemDB Douglass
Conclusion
Conclusion We tackle the knowledge expansion problem. We propose the Ontological Pathfinding (OP) algorithm for scalable rule mining. We extend the OP algorithm for knowledge expansion to infer missing facts in existing knowledge bases. Develop the first mining and inference engine for Freebase.
Future Work Extending rule mining to constraint mining. Online and incremental learning over dynamic knowledge bases. Abductive reasoning for query processing.
Publications (In Submission) 1. Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases Yang Chen , Xiaofeng Zhou, Kun Li, Daisy Zhe Wang The SIGMOD Record, 2017 2. Quality Control in Uncertain Knowledge Bases Daisy Zhe Wang, Yang Chen , Sean Goldberg, Miguel Rodríguez, Yang Peng 20th International Conference on Extending Database Technology, 2017
Publications 3. ScaLeKB: Scalable Learning and Inference over Large Knowledge Bases Yang Chen , Daisy Zhe Wang, Sean Goldberg The VLDB Journal, 2016 4. ArchimedesOne: Query Processing over Probabilistic Knowledge Bases Xiaofeng Zhou, Yang Chen , Daisy Zhe Wang Proceedings of the VLDB Endowment, 2016
Recommend
More recommend