scalable learning and inference in large knowledge bases
play

Scalable Learning and Inference in Large Knowledge Bases Yang Chen - PowerPoint PPT Presentation

Scalable Learning and Inference in Large Knowledge Bases Yang Chen Data Science Research Lab University of Florida @ D ata S cience R esearch Agenda Ontological Knowledge Introduction Conclusion Pathfinding Expansion Agenda Ontological


  1. Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 1 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University hasCapital Canada Ottawa hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 1

  2. Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 2 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University hasCapital Canada Ottawa hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 2

  3. Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 1 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University | ΔΓ 1 | = 2 hasCapital Canada Ottawa hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 1

  4. Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 2 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University | ΔΓ 1 | = 2 hasCapital Canada Ottawa | ΔΓ 2 | = 0 hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 2

  5. Recursive Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 1 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University hasCapital Canada Ottawa hasCapital Unites States Washington, D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 1

  6. Recursive Partitioning H(x, y) b 1 (x, z) b 2 (y, z) p x y dealsWith exports imports exports United States computer exports Canada Aluminum dealsWith isLocatedIn isLocatedIn imports United States Aluminum isCitizenOf wasBornIn HasCapital imports United States Clothing worksAt wasBornIn isLocatedIn dealsWith Canada United States isLocatedIn hasCapital isLocatedIn isLocatedIn Washington, United States M 3,4 D.C. isLocatedIn Ottawa Canada isLocatedIn Stanford Stanford, CA University hasCapital Canada Ottawa Overlapping hasCapital Unites States Washington, Independent D.C. wasBornIn Donald Knuth Milwaukee, Wisconsin isCitizenOf Donald Knuth United States worksAt Donald Knuth Stanford University Γ 3,4

  7. Partitioning Partitioned joins: M 1 ⨝ Γ 1 ⨝ Γ 1 and M 2 ⨝ Γ 2 ⨝ Γ 2 , etc. Worst case complexity |M|| γ | 2 , where γ is the maximum partition size.

  8. Knowledge Expansion Partition- Parallel Relational Cross ing Inference Joins Validation

  9. Parallel Inference H(x, y) ← b 1 (x, z), b 2 (y, z)

  10. Parallel Inference F 1 R 1 R 2 R 3 F 1 F 4 F 4 F 5 R 1 R 1 R 2 R 2 R 3 R 3 F 5 R 1 R 2 R 3

  11. Parallel Inference F 1 0 F 1 0, R 1 , R 2 F 1 R 1 F 2 , F 5 R 2 F 4 R 3 F 2 R 1 , R 2 , R 3 F 4 0 F 2 , F 3 R 1 F 3 R 1 , R 2 F 1 R 2 F 2 R 3 F 4 0, R 2 , R 3 F 5 0 F 5 R 1 F 3 , F 4 R 2 F 5 0, R 1 , R 2 , R 3 F 4 R 3

  12. Parallel Inference F 2 F 3

  13. Performance 927M new facts in 19 hours. First inference engine on Freebase.

  14. Efficiency Improvement ProbKB 25 Tuffy 100K Rules 2,196 30 200K Rules 4,271 65 500K Rules 9,045 210 1M Rules 16,507 0 4500 9000 13500 18000 Runtime/s

  15. Effect of Partitioning 41.915 200M 24.693 100M 16.499 50M 11.842 20M 9.668 10M 6.374 5M 0 12.5 25 37.5 50 Runtime/h

  16. Knowledge Expansion Partition- Parallel Relational Cross ing Inference Joins Validation

  17. AMIE+ Validation YAGO2s Rules YAGO2

  18. AMIE+ Validation Rules Facts YAGO2s YAGO2

  19. AMIE+ Validation Facts Rules YAGO2s YAGO2

  20. AMIE+ Validation Facts Rules YAGO2s YAGO2

  21. Limitations KB availability. Inference biase.

  22. Cross Validation Test OP Train Rules

  23. Cross Validation Infer Test Facts OP Train Rules

  24. Cross Validation Infer Test Facts OP Train Rules Verify

  25. Cross Validation : Inferred facts sorted by descending confidence. Γ + Recall( Γ + | Γ ) = | Γ + − Γ | Recall: | Γ | Precision( Γ + | Γ ) = | Γ + ∩ Γ | Precision: | Γ + |

  26. Cross Validation Freebase YAGO2s 1 0.75 Precision 0.5 0.25 0 0 0.25 0.5 0.75 1 Recall

  27. Inferred Facts 21,463,725 Music 1,384,209 Book 1,361,939 Film 1,354,632 People 916,438 Location 0 7500000 15000000 22500000 30000000 # Correct Inferred Facts

  28. Examples music/album/artist(Live Era ’87-’93, Guns N’ Roses) book/series_editor/book_edition_series_edited( 
 Janet Morris, Heroes in Hell by Baen Books) film/film/production_companies(Butt Spanking, Bacchus) user/anjackson/default_domain/bitstream_encoding/ format(PDF 1.4, Portable Document Format)

  29. Inferred Errors bornIn(Mandel, Berlin) isLocatedIn(Baltimore, Berlin) bornIn(Mandel, Baltimore)

  30. Inferred Errors isLocatedIn(Baltimore, Berlin) bornIn(Freud, Berlin) bornIn(Freud, Baltimore)

  31. Functional Constraints wasBornIn isCitizenOf isMarriedTo isCapitalOf isLocatedIn headquaterIn

  32. Functional Constraints bornIn(Mandel, Berlin) bornIn(Mandel, Baltimore)

  33. Functional Constraints Without constraints With constraints 0.9 0.675 Precision 0.45 0.225 0 7500 15000 22500 30000 Estimated # of correct facts

  34. Error Analysis Others Incorrect facts 3% 6% Ambiguities (detected) 34% Incorrect rules 33% Ambiguous join keys 24%

  35. Error Analysis Others Incorrect facts 3% 6% Ambiguities (detected) 34% Incorrect rules 33% Ambiguous join keys 24%

  36. Knowledge Activation

  37. Knowledge Activation Aristotle Plato Philosophy Francis Cicero Bacon John Meta Locke physics

  38. Knowledge Activation Aristotle y B d e c n e u fl n i Plato Philosophy Francis Cicero Bacon John Meta Locke physics

  39. Knowledge Activation Aristotle mainInterest Plato Philosophy Francis Cicero Bacon John Meta Locke physics

  40. Knowledge Activation Aristotle Plato Philosophy Francis Cicero 1.00 Bacon John Meta Locke physics

  41. Knowledge Activation 0.42 Aristotle y B d e c n e u fl 0.29 n i Plato Philosophy influencedBy influencedBy Francis Cicero 1.00 0.00 Bacon influence John Meta 0.00 Locke physics

  42. Knowledge Activation 0.42 Aristotle mainInterest 0.29 0.00 Plato Philosophy mainInterest coreSubject Francis Cicero 1.00 0.00 Bacon John Meta mainInterest 0.00 0.00 Locke physics

  43. x w p x y Francis 1.0 influencedBy Francis Aristotle influencedBy Francis Plato Q influencedBy Francis Cicero influence Francis John Locke mainInterest Aristotle Philosophy mainInterest Plato Philosophy coreSubject Cicero Philosophy mainInterest John Locke Meta physics Γ

  44. x w p x y Francis 1.0 influencedBy Francis Aristotle influencedBy Francis Plato Q influencedBy Francis Cicero x w influence Francis John Locke Francis 1 mainInterest Aristotle Philosophy Francis 3 mainInterest Plato Philosophy Plato 3 coreSubject Cicero Philosophy Aristotle 6 mainInterest John Locke Meta physics H Γ

  45. x w p x y Francis 1.0 influencedBy Francis Aristotle influencedBy Francis Plato Q influencedBy Francis Cicero x w influence Francis John Locke Francis 1 mainInterest Aristotle Philosophy Francis 3 mainInterest Plato Philosophy Plato 3 coreSubject Cicero Philosophy Aristotle 6 mainInterest John Locke Meta physics H Γ

  46. Query Optimization Build materialized views; Query assumed to be small; Q ⨝ Γ is efficient with indexes; Use (Q ⨝ Γ ) ⨝ H to reduce result size.

  47. Experiments 70 52.5 Runtime/ms 35 17.5 0 Query 1Query 2Query 3 Query 1Query 2Query 3 Query 1Query 2Query 3 Iter. 1 Iter. 2 Iter 3.

  48. Experiments More than 500 times of speedup 10,900 10000 Runtime/ms 100 11.29 1 SemMemDB Douglass

  49. Conclusion

  50. Conclusion We tackle the knowledge expansion problem. We propose the Ontological Pathfinding (OP) algorithm for scalable rule mining. We extend the OP algorithm for knowledge expansion to infer missing facts in existing knowledge bases. Develop the first mining and inference engine for Freebase.

  51. Future Work Extending rule mining to constraint mining. Online and incremental learning over dynamic knowledge bases. Abductive reasoning for query processing.

  52. Publications (In Submission) 1. Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases 
 Yang Chen , Xiaofeng Zhou, Kun Li, Daisy Zhe Wang 
 The SIGMOD Record, 2017 2. Quality Control in Uncertain Knowledge Bases 
 Daisy Zhe Wang, Yang Chen , Sean Goldberg, Miguel Rodríguez, Yang Peng 
 20th International Conference on Extending Database Technology, 2017

  53. Publications 3. ScaLeKB: Scalable Learning and Inference over Large Knowledge Bases 
 Yang Chen , Daisy Zhe Wang, Sean Goldberg 
 The VLDB Journal, 2016 4. ArchimedesOne: Query Processing over Probabilistic Knowledge Bases 
 Xiaofeng Zhou, Yang Chen , Daisy Zhe Wang 
 Proceedings of the VLDB Endowment, 2016

Recommend


More recommend