algorithms and applications for web scale knowledge graphs
play

Algorithms and Applications for Web-Scale Knowledge Graphs Marco - PowerPoint PPT Presentation

Algorithms and Applications for Web-Scale Knowledge Graphs Marco Ponza Supervisor Prof. Paolo Ferragina Menu 1. Entity Annotation The Modeling of Knowledge Terminology The Annotation Pipeline Applications A New


  1. Algorithms and Applications for Web-Scale Knowledge Graphs Marco Ponza Supervisor Prof. Paolo Ferragina

  2. Menu 1. Entity Annotation ○ The Modeling of Knowledge ○ Terminology ○ The Annotation Pipeline ○ Applications ○ A New Text Representation 2. Work done in the first year Entity Relatedness ○ Document Aboutness ○ 3. Future Work

  3. 1. Entity Annotation

  4. The Modeling of Knowledge ▷ Classical approaches ○ Document Knowledge = Words ○ Bag-of-words (aka BoW) ○ Vector Space Model (aka VSM) (Salton, 1971) 2 1 0 2 0 Stop-word removal, Counting, scaling, Vector Space Model stemming, ... normalization, ... Document Document’s Words

  5. The Modeling of Knowledge ▷ Well-known issues (Jurafsky, ‘00) ○ Ambiguity (Polysemy and Synonymy) Jaguar ? or Jaguar (felin) Jaguar_Cars

  6. The Modeling of Knowledge ▷ Well-known issues (Jurafsky, ‘00) ○ Ambiguity (Polysemy and Synonymy) ○ Semantic Connections Barack_Obama United_States

  7. The Modeling of Knowledge ▷ Well-known issues (Jurafsky, ‘00) ○ Ambiguity (Polysemy and Synonymy) ○ Semantic Connections ▷ Algorithmic solutions ○ Latent Approaches (e.g. LSI/LSA, Word2Vec) ■ Unintelligible for humans (Gabrilovich IJCAI ‘07) ○ “Knowledge is Power” Hypothesis (Lenat, ‘91; Gabrilovich SIGIR ‘16) ■ Semantic and unambiguous concepts ■ Depend on the design of Entity Annotators

  8. Entity Annotation Terminology ▷ Wikipedia Knowledge Graph ▷ Node?

  9. Entity Annotation Terminology ▷ Wikipedia Knowledge Graph ▷ Node: Wikipedia Page (Entity) ▷ Link?

  10. Entity Annotation Terminology ▷ Wikipedia Knowledge Graph ▷ Node: Wikipedia Page (Entity) ▷ Link: Wikipedia Hyperlink Enrich a text T with proper annotations Goal Annotation = ( mention , entity)

  11. Entity Annotation The Annotation Pipeline Input Text Entity Annotator Pruning Disambiguation Spotting 1. Identify Remove not pertinent Assign the most mentions (spots) pertinent entity to each annotations 2. Retrieve candidate spot entities Annotated Text

  12. Entity Annotation The Annotation Pipeline Spotting Mention Detection Yesterday Maradona won against Mexico. Pruning Candidate Generation Yesterday_(Time) Diego_Maradona Mexico Yesterday_ Mexico,_ Diego_Sinagra (Beatles_song) New_York Yesterday_ Maradona_by_ Mexico_national_ (Guns_N_Roses_ Kusturica football_team ... ... ... song)

  13. Entity Annotation The Annotation Pipeline Spotting 1. Mention Detection Named Entity Recognition (aka NER) ○ N-gram generation ○ 2. Candidate Generation Gazetteer: { mention → entities } ○ How? ■

  14. 1. Mention Detection ○ Named Entity Recognition (aka NER) ○ N-gram generation Entity Annotation The Annotation Pipeline Spotting 1. Mention Detection Named Entity Recognition (aka NER) ○ N-gram generation ○ 2. Candidate Generation Gazetteer: { mention → entities } ○ How? Wikipedia anchor texts! ■ Ranking (+ Thresholding) ■ ● Commonness (Ferragina, CIKM ’10; Guo, CIKM ’14) ● Entity-context Similarity (Zwicklbauer, SIGIR ’16) ... ●

  15. Entity Annotation The Annotation Pipeline Spotting Disambiguation Yesterday Maradona won against Mexico. Pruning Yesterday_(Time) Diego_Maradona Mexico Yesterday_ Mexico,_ Diego_Sinagra (Beatles_song) New_York Yesterday_ Maradona_by_ Mexico_national_ (Guns_N_Roses_ Kusturica football_team ... ... ... song)

  16. Entity Annotation The Annotation Pipeline Disambiguation Disambiguation Spotting Yesterday Maradona won against Mexico. Pruning Yesterday_ Mexico_national_ Diego_Maradona (Beatles_song) football_team 0.1 0.8 0.7 ▷ Spots have been disambiguated ○ Ambiguous lexical elements (words) are now labeled with unambiguous concepts ▷ Finally, coherence scores are assigned

  17. Entity Annotation The Annotation Pipeline The Annotation Pipeline Disambiguation Spotting LS NED (Cucerzan, ACL ‘07) (Mihalcea, CIKM ‘07) (Scaiella, CIKM ‘10) ... (Mendes, SemSys ‘11) AIDA (Nguyen, LDOW ‘14) (Moro, ACL ‘14) (Piccinno, SIGIR ‘14) PBoH DoSeR (Zwicklbauer, SIGIR ’16) (Ganea, WWW ‘16)

  18. Entity Annotation The Annotation Pipeline Disambiguation Spotting Algorithm: (Scaiella, CIKM ‘10; Piccinno, SIGIR ‘14) [...] Maradona won against Mexico. Pruning Voting Scheme ● M&W / Jaccard ● Relatedness

  19. Entity Annotation The Annotation Pipeline Disambiguation Spotting Algorithm: DoSeR (Zwicklbauer, SIGIR ’16) [...] Maradona won against Mexico. Pruning Graph of ● candidates

  20. Entity Annotation The Annotation Pipeline Disambiguation Spotting Algorithm: DoSeR (Zwicklbauer, SIGIR ’16) [...] Maradona won against Mexico. Pruning Graph of ● candidates Entity2Vec ● Relatedness

  21. Entity Annotation The Annotation Pipeline Disambiguation Spotting Algorithm: DoSeR (Zwicklbauer, SIGIR ’16) [...] Maradona won against Mexico. Pruning Graph of ● candidates Entity2Vec ● Relatedness PageRank ●

  22. Entity Annotation The Annotation Pipeline Pruning Disambiguation Spotting Yesterday Maradona won against Mexico. Pruning Yesterday_ Mexico_national_ Diego_Maradona (Beatles_song) football_team 0.1 0.8 0.7 ▷ Remove not pertinent annotations ▷ Clear text from erroneous annotations ▷ Coherence thresholding

  23. Applications Web Search Results (Gabrilovich, SIGIR ’16)

  24. Applications Web Search Results (Gabrilovich, SIGIR ’16)

  25. Applications Question Answering (Gabrilovich, SIGIR ’16)

  26. Applications Implicit Questions (Gabrilovich, SIGIR ’16) Condition → What does it mean? Symptoms → What do they indicate?

  27. A New Text Representation ▷ Originally introduced by (Scaiella, WSDM ‘12) Widely deployed (Dunietz, EACL ‘14; Schuhmacher, WSDM '14; Ni, WSDM ‘15), ... ○ ▷ Text = Graph of Entities ▷ What about… Text Entity Annotator Graph of Entities

  28. A New Text Representation ▷ Originally introduced by (Scaiella, WSDM ‘12) Widely deployed (Dunietz, EACL ‘14; Schuhmacher, WSDM '14; Ni, WSDM ‘15), ... ○ ▷ Text = Graph of Entities ▷ What about… ○ ...edge weights? Work done in ○ ...node weights? the first year Text Entity Annotator Graph of Entities

  29. 2. Work done in the first year Entity Relatedness & Document Aboutness

  30. Entity Relatedness

  31. Entity Relatedness Compute how much two entities are related Goal Relatedness : Entities x Entities → Real ▷ How much related are... ○ ... Bank with Money ? ○ ... Wood with Book ? ▷ Semantic Reasoning: ○ Human: Background Knowledge ○ Machines: Knowledge Graph

  32. Entity Relatedness (A brief list of) Algorithms and Applications ▷ Document/Word Similarity WikiRelate (Strube, AAI ‘06) ○ Explicit Semantic Analysis (Gabrilovich, IJCAI ‘07) ○ WikiWalk (Yeh, ACL ‘09) ■ ■ Temporal Semantic Analysis (Radinsky, WWW ‘11) Concept Graph Representation (Ni, WSDM ‘16) ■ Milne & Witten (Witten, AAI ‘08) ○ Salient Semantic Analysis (Hassan, AAI ‘11) ○ ▷ Machine Translation (Agirre, NAACL ‘09; Rothe, ACL ‘14) ▷ Document Classification (Perozzi, WWW ‘14; Tang, WWW ‘15) ▷ ...

  33. Entity Relatedness ▷ Two entities are related whether… ○ ...they are described by related texts (Corpus-based) Example: ESA (Gabrilovich, IJCAI ‘07) ■ Concepts grounded in human cognition ● Opposite to latent concepts ●

  34. Entity Relatedness ▷ Two entities are related whether… ○ ...they are described by related texts (Corpus-based) Example: ESA (Gabrilovich, IJCAI ‘07) ■ Concepts grounded in human cognition ● Opposite to latent concepts ● ○ ...they are referenced by related entities (Graph-based) ■ Example: CoSimRank (Rothe, ACL ‘14)

  35. Entity Relatedness CoSimRank (Rothe, ACL ‘14) ▷ Graph-based approach ▷ Relatedness algorithm for nodes in a graph ▷ Exploits Random Walks ▷ Algorithm (in brief) e 1 , e 2 ∈ Entities 1. Sets damping vectors for e 1 and e 2 2. Runs an iteration of PageRank 3. Updates relatedness score

  36. Entity Relatedness p 0 ( e 1 ) p 0 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 1.0 0.0 0.0 0.0 e 1 e 2 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Relatedness 0 ( e 1 , e 2 ) = 0.0

  37. Entity Relatedness p 1 ( e 1 ) p 1 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 0.2 0.0 0.4 0.4 e 1 e 2 0.0 0.2 0.4 0.0 0.0 0.4 0.0 0.0 0.0 0.0 Relatedness 1 ( e 1 , e 2 ) = 0.16

  38. Entity Relatedness p 2 ( e 1 ) p 2 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 0.52 0.16 0.08 0.08 e 1 e 2 0.16 0.46 0.08 0.0 0.0 0.05 0.0 0.21 0.16 0.10 Relatedness 2 ( e 1 , e 2 ) = 0.33

  39. Entity Relatedness p 3 ( e 1 ) p 3 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 0.26 0.03 0.27 0.25 e 1 e 2 0.03 0.25 0.27 0.10 0.08 0.20 0.00 0.04 0.03 0.02 Relatedness 3 ( e 1 , e 2 ) = 0.47

  40. Entity Relatedness p 0 ( e 1 ) p 0 ( e 3 ) CoSimRank (Rothe, ACL ‘14) 1.0 0.0 0.0 0.0 e 1 0.0 0.0 0.0 0.0 e 3 0.0 0.0 0.0 1.0 0.0 0.0 Relatedness 0 ( e 1 , e 3 ) = 0.0

  41. Entity Relatedness p 3 ( e 1 ) p 3 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 0.26 0.0 0.27 0.04 e 1 0.03 0.02 0.27 0.04 e 3 0.08 0.16 0.00 0.24 0.03 0.02 Relatedness 3 ( e 1 , e 3 ) = 0.13

Recommend


More recommend