entity linking to knowledge graphs to infer column types
play

Entity Linking to Knowledge Graphs to Infer Column Types and - PowerPoint PPT Presentation

Entity Linking to Knowledge Graphs to Infer Column Types and Properties Avijit Thawani , Minda Hu, Erdong Hu, Husain Zafar, Naren Teja Divvala, Amandeep Singh, Ehsan Qasemi, Pedro Szekely, and Jay Pujara About Us Team ISI: Information


  1. Entity Linking to Knowledge Graphs to Infer Column Types and Properties Avijit Thawani , Minda Hu, Erdong Hu, Husain Zafar, Naren Teja Divvala, Amandeep Singh, Ehsan Qasemi, Pedro Szekely, and Jay Pujara

  2. About Us Team ISI: ● Information Sciences Institute ● University of Southern California Me: ● PhD student, USC

  3. Outline 1. CEA 2. tf-idf 3. CTA and CPA 4. Shortcomings 5. Analysis 6. Appendix: PSL

  4. 1. CEA

  5. Objective: CEA dbp.org/resource/Mark_Knopfler Mark Knopfler dbp.org/resource/Super_Furry_Animals Super Furry Animals dbp.org/resource/The_Killers The Killers Brian Wilson dbp.org/resource/Brian_Wilson AlunaGeorge dbp.org/resource/AlunaGeorge

  6. Approach: CEA

  7. Lots of Cues

  8. Lots of Cues ● Class

  9. Lots of Cues ● Class ● Properties

  10. Lots of Cues ● Class ● Properties ● Values

  11. Lots of Cues ● Class ● Properties ● Values

  12. Lots of Cues ● Class ● Properties ● Values instanceOf: Human

  13. Lots of Cues ● Class ● Properties ● Values

  14. Lots of Cues ● Class ● Properties ● Values occupation: Singer

  15. Lots of Cues ● Class ● Properties ● Values

  16. Lots of Cues ● Class ● Properties ● Values Record Label: ...

  17. Lots of Cues ● Class ● Properties ● Values

  18. Lots of Cues Features ● Class ● Properties ● Values

  19. What to do with all those Features?

  20. What to do with all those Features? If labelled data -> Machine Learning

  21. What to do with all those Features? If labelled data -> Machine Learning Human? occ:Singer? Record Label? ... Chef? 1 1 1 ... 0 Weights 20 30 10 ... 0.5 Confidence = 60

  22. What to do with all those Features? If labelled data -> Machine Learning

  23. What to do with all those Features? If labelled data -> Machine Learning If not -> Image Source: icon-library.net

  24. What to do with all those Features? If labelled data -> Machine Learning If not -> Heuristics!

  25. 2. tf-idf

  26. Image Source: becominghuman.ai blog

  27. properties genre family record disco- Dbo: TF/IDF Levenshtein entities name label graphy MusicalArtist Q313013 (Brian Wilson, 1 1 1 1 1 0.98 1.0 musician) Q913269 (Brian Wilson, 0 1 0 0 0 0.64 1.0 baseball player) Q1135582 (Super Flurry 1 0 1 1 1 0.23 1.0 Animals, band) Q7642367 (Super Flurry 0 0 0 0 0 0.0 0.61 Animals Discography) Q185343 (Mark Knopfler, 1 1 1 1 1 0.99 1.0 musician) DF = document 52 31 36 15 49 frequency IDF = log 3.20 1.85 1.65 3.46 2.11

  28. 3. CTA and CPA

  29. Objective: CTA Auckland Los Angeles dbp.org/ontology/Settlement California ... Waikato District

  30. Approach: CTA

  31. CPA

  32. Results: CEA Round 1 Round 2 Round 3 Round 4 f1 precision f1 precision f1 precision f1 precision 0.884 0.908 0.826 0.852 0.857 0.866 0.804 0.814

  33. 4. Shortcomings

  34. Shortcomings

  35. Shortcomings Another pass needed

  36. Shortcomings Another pass needed Custom handling of data types

  37. Shortcomings Another pass needed Custom handling of data types Intra-row information

  38. 5. Analysis

  39. Analysis: # Rows

  40. Analysis: # Rows

  41. Analysis: # Rows

  42. Analysis: Custom Handling

  43. Analysis: Embeddings Levenshtein Similarity tf-idf

  44. Analysis: Embeddings Levenshtein Similarity tf-idf on Property tf-idf on feature Class feature

  45. Takeaways

  46. Takeaways ● Lots of Semantic Cues (not just classes)

  47. Takeaways ● Lots of Semantic Cues (not just classes) ● When no data -> TF-IDF

  48. Takeaways ● Lots of Semantic Cues (not just classes) ● When no data -> TF-IDF ● Revising always good

  49. Takeaways ● Lots of Semantic Cues (not just classes) ● When no data -> TF-IDF ● Revising always good ● Over-revising is an overkill (PSL)

  50. Takeaways ● Lots of Semantic Cues (not just classes) ● When no data -> TF-IDF ● Revising always good ● Over-revising is an overkill (PSL) ● String Similarity ⊥ Semantic Similarity

  51. Avijit Thawani PhD student with Pedro Szekely Fin. and Jay Pujara Thank You thawani@isi.edu kia mihi

  52. Appendix

  53. PSL Graphical Model = Several passes!

  54. Probabilistic Soft Logic PSL is a - Probabilistic Programming Language for easily defining - Hinge Loss Markov Random Fields - using a syntax like First Order Logic.

  55. PSL in one slide

  56. PSL in one slide Define closed predicates: - instance(madonna, Singer) instance(st_madonna, Saint) … - candidate(R 3 C 1 , madonna) candidate(R 3 C 1 , st_madonna) …

  57. PSL in one slide Define closed predicates: - instance(madonna, Singer) instance(st_madonna, Saint) … - candidate(R 3 C 1 , madonna) candidate(R 3 C 1 , st_madonna) … Define open predicates: - type(C 1 , Singer)? type(C 1 , Saint)? - entity(R 3 C 1 , madonna)? entity(R 3 C 1 , st_madonna)?

  58. PSL in one slide Define closed predicates: - instance(madonna, Singer) instance(st_madonna, Saint) … - candidate(R 3 C 1 , madonna) candidate(R 3 C 1 , st_madonna) … Define open predicates: - type(C 1 , Singer)? type(C 1 , Saint)? - entity(R 3 C 1 , madonna)? entity(R 3 C 1 , st_madonna)? Restrict with PSL rules: - 10: candidate(R x C y , Q z ) -> entity(R x C y , Q z ) - 20: candidate(R x C y , Q z ) & type(C y , T w ) & instance(Q z , T w ) -> entity(R x C y , Q z ) - entity(R x C y , Q 1 ) & Q 1 !=Q 2 -> ! entity(R x C y , Q 2 ) .

  59. PSL output class(C 1 , Singer): 0.12 class(C 1 , Saint): 0.89 entity(R 3 C 1 , madonna): 0.23 entity(R 3 C 1 , st_madonna): 0.68

  60. 1st result baseline F1: 0.865 Precision: 0.871 Recall: 0.858 (7 datasets annotated by us)

  61. PSL results F1: 0.903 Precision: 0.910 Recall: 0.896 (7 datasets annotated by us)

  62. PSL without ranked priors F1: 0.777 Precision: 0.783 Recall: 0.771 (7 datasets annotated by us)

Recommend


More recommend