Algorithms and Applications for Web-Scale Knowledge Graphs Marco Ponza Supervisor Prof. Paolo Ferragina
Menu 1. Entity Annotation ○ The Modeling of Knowledge ○ Terminology ○ The Annotation Pipeline ○ Applications ○ A New Text Representation 2. Work done in the first year Entity Relatedness ○ Document Aboutness ○ 3. Future Work
1. Entity Annotation
The Modeling of Knowledge ▷ Classical approaches ○ Document Knowledge = Words ○ Bag-of-words (aka BoW) ○ Vector Space Model (aka VSM) (Salton, 1971) 2 1 0 2 0 Stop-word removal, Counting, scaling, Vector Space Model stemming, ... normalization, ... Document Document’s Words
The Modeling of Knowledge ▷ Well-known issues (Jurafsky, ‘00) ○ Ambiguity (Polysemy and Synonymy) Jaguar ? or Jaguar (felin) Jaguar_Cars
The Modeling of Knowledge ▷ Well-known issues (Jurafsky, ‘00) ○ Ambiguity (Polysemy and Synonymy) ○ Semantic Connections Barack_Obama United_States
The Modeling of Knowledge ▷ Well-known issues (Jurafsky, ‘00) ○ Ambiguity (Polysemy and Synonymy) ○ Semantic Connections ▷ Algorithmic solutions ○ Latent Approaches (e.g. LSI/LSA, Word2Vec) ■ Unintelligible for humans (Gabrilovich IJCAI ‘07) ○ “Knowledge is Power” Hypothesis (Lenat, ‘91; Gabrilovich SIGIR ‘16) ■ Semantic and unambiguous concepts ■ Depend on the design of Entity Annotators
Entity Annotation Terminology ▷ Wikipedia Knowledge Graph ▷ Node?
Entity Annotation Terminology ▷ Wikipedia Knowledge Graph ▷ Node: Wikipedia Page (Entity) ▷ Link?
Entity Annotation Terminology ▷ Wikipedia Knowledge Graph ▷ Node: Wikipedia Page (Entity) ▷ Link: Wikipedia Hyperlink Enrich a text T with proper annotations Goal Annotation = ( mention , entity)
Entity Annotation The Annotation Pipeline Input Text Entity Annotator Pruning Disambiguation Spotting 1. Identify Remove not pertinent Assign the most mentions (spots) pertinent entity to each annotations 2. Retrieve candidate spot entities Annotated Text
Entity Annotation The Annotation Pipeline Spotting Mention Detection Yesterday Maradona won against Mexico. Pruning Candidate Generation Yesterday_(Time) Diego_Maradona Mexico Yesterday_ Mexico,_ Diego_Sinagra (Beatles_song) New_York Yesterday_ Maradona_by_ Mexico_national_ (Guns_N_Roses_ Kusturica football_team ... ... ... song)
Entity Annotation The Annotation Pipeline Spotting 1. Mention Detection Named Entity Recognition (aka NER) ○ N-gram generation ○ 2. Candidate Generation Gazetteer: { mention → entities } ○ How? ■
1. Mention Detection ○ Named Entity Recognition (aka NER) ○ N-gram generation Entity Annotation The Annotation Pipeline Spotting 1. Mention Detection Named Entity Recognition (aka NER) ○ N-gram generation ○ 2. Candidate Generation Gazetteer: { mention → entities } ○ How? Wikipedia anchor texts! ■ Ranking (+ Thresholding) ■ ● Commonness (Ferragina, CIKM ’10; Guo, CIKM ’14) ● Entity-context Similarity (Zwicklbauer, SIGIR ’16) ... ●
Entity Annotation The Annotation Pipeline Spotting Disambiguation Yesterday Maradona won against Mexico. Pruning Yesterday_(Time) Diego_Maradona Mexico Yesterday_ Mexico,_ Diego_Sinagra (Beatles_song) New_York Yesterday_ Maradona_by_ Mexico_national_ (Guns_N_Roses_ Kusturica football_team ... ... ... song)
Entity Annotation The Annotation Pipeline Disambiguation Disambiguation Spotting Yesterday Maradona won against Mexico. Pruning Yesterday_ Mexico_national_ Diego_Maradona (Beatles_song) football_team 0.1 0.8 0.7 ▷ Spots have been disambiguated ○ Ambiguous lexical elements (words) are now labeled with unambiguous concepts ▷ Finally, coherence scores are assigned
Entity Annotation The Annotation Pipeline The Annotation Pipeline Disambiguation Spotting LS NED (Cucerzan, ACL ‘07) (Mihalcea, CIKM ‘07) (Scaiella, CIKM ‘10) ... (Mendes, SemSys ‘11) AIDA (Nguyen, LDOW ‘14) (Moro, ACL ‘14) (Piccinno, SIGIR ‘14) PBoH DoSeR (Zwicklbauer, SIGIR ’16) (Ganea, WWW ‘16)
Entity Annotation The Annotation Pipeline Disambiguation Spotting Algorithm: (Scaiella, CIKM ‘10; Piccinno, SIGIR ‘14) [...] Maradona won against Mexico. Pruning Voting Scheme ● M&W / Jaccard ● Relatedness
Entity Annotation The Annotation Pipeline Disambiguation Spotting Algorithm: DoSeR (Zwicklbauer, SIGIR ’16) [...] Maradona won against Mexico. Pruning Graph of ● candidates
Entity Annotation The Annotation Pipeline Disambiguation Spotting Algorithm: DoSeR (Zwicklbauer, SIGIR ’16) [...] Maradona won against Mexico. Pruning Graph of ● candidates Entity2Vec ● Relatedness
Entity Annotation The Annotation Pipeline Disambiguation Spotting Algorithm: DoSeR (Zwicklbauer, SIGIR ’16) [...] Maradona won against Mexico. Pruning Graph of ● candidates Entity2Vec ● Relatedness PageRank ●
Entity Annotation The Annotation Pipeline Pruning Disambiguation Spotting Yesterday Maradona won against Mexico. Pruning Yesterday_ Mexico_national_ Diego_Maradona (Beatles_song) football_team 0.1 0.8 0.7 ▷ Remove not pertinent annotations ▷ Clear text from erroneous annotations ▷ Coherence thresholding
Applications Web Search Results (Gabrilovich, SIGIR ’16)
Applications Web Search Results (Gabrilovich, SIGIR ’16)
Applications Question Answering (Gabrilovich, SIGIR ’16)
Applications Implicit Questions (Gabrilovich, SIGIR ’16) Condition → What does it mean? Symptoms → What do they indicate?
A New Text Representation ▷ Originally introduced by (Scaiella, WSDM ‘12) Widely deployed (Dunietz, EACL ‘14; Schuhmacher, WSDM '14; Ni, WSDM ‘15), ... ○ ▷ Text = Graph of Entities ▷ What about… Text Entity Annotator Graph of Entities
A New Text Representation ▷ Originally introduced by (Scaiella, WSDM ‘12) Widely deployed (Dunietz, EACL ‘14; Schuhmacher, WSDM '14; Ni, WSDM ‘15), ... ○ ▷ Text = Graph of Entities ▷ What about… ○ ...edge weights? Work done in ○ ...node weights? the first year Text Entity Annotator Graph of Entities
2. Work done in the first year Entity Relatedness & Document Aboutness
Entity Relatedness
Entity Relatedness Compute how much two entities are related Goal Relatedness : Entities x Entities → Real ▷ How much related are... ○ ... Bank with Money ? ○ ... Wood with Book ? ▷ Semantic Reasoning: ○ Human: Background Knowledge ○ Machines: Knowledge Graph
Entity Relatedness (A brief list of) Algorithms and Applications ▷ Document/Word Similarity WikiRelate (Strube, AAI ‘06) ○ Explicit Semantic Analysis (Gabrilovich, IJCAI ‘07) ○ WikiWalk (Yeh, ACL ‘09) ■ ■ Temporal Semantic Analysis (Radinsky, WWW ‘11) Concept Graph Representation (Ni, WSDM ‘16) ■ Milne & Witten (Witten, AAI ‘08) ○ Salient Semantic Analysis (Hassan, AAI ‘11) ○ ▷ Machine Translation (Agirre, NAACL ‘09; Rothe, ACL ‘14) ▷ Document Classification (Perozzi, WWW ‘14; Tang, WWW ‘15) ▷ ...
Entity Relatedness ▷ Two entities are related whether… ○ ...they are described by related texts (Corpus-based) Example: ESA (Gabrilovich, IJCAI ‘07) ■ Concepts grounded in human cognition ● Opposite to latent concepts ●
Entity Relatedness ▷ Two entities are related whether… ○ ...they are described by related texts (Corpus-based) Example: ESA (Gabrilovich, IJCAI ‘07) ■ Concepts grounded in human cognition ● Opposite to latent concepts ● ○ ...they are referenced by related entities (Graph-based) ■ Example: CoSimRank (Rothe, ACL ‘14)
Entity Relatedness CoSimRank (Rothe, ACL ‘14) ▷ Graph-based approach ▷ Relatedness algorithm for nodes in a graph ▷ Exploits Random Walks ▷ Algorithm (in brief) e 1 , e 2 ∈ Entities 1. Sets damping vectors for e 1 and e 2 2. Runs an iteration of PageRank 3. Updates relatedness score
Entity Relatedness p 0 ( e 1 ) p 0 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 1.0 0.0 0.0 0.0 e 1 e 2 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Relatedness 0 ( e 1 , e 2 ) = 0.0
Entity Relatedness p 1 ( e 1 ) p 1 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 0.2 0.0 0.4 0.4 e 1 e 2 0.0 0.2 0.4 0.0 0.0 0.4 0.0 0.0 0.0 0.0 Relatedness 1 ( e 1 , e 2 ) = 0.16
Entity Relatedness p 2 ( e 1 ) p 2 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 0.52 0.16 0.08 0.08 e 1 e 2 0.16 0.46 0.08 0.0 0.0 0.05 0.0 0.21 0.16 0.10 Relatedness 2 ( e 1 , e 2 ) = 0.33
Entity Relatedness p 3 ( e 1 ) p 3 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 0.26 0.03 0.27 0.25 e 1 e 2 0.03 0.25 0.27 0.10 0.08 0.20 0.00 0.04 0.03 0.02 Relatedness 3 ( e 1 , e 2 ) = 0.47
Entity Relatedness p 0 ( e 1 ) p 0 ( e 3 ) CoSimRank (Rothe, ACL ‘14) 1.0 0.0 0.0 0.0 e 1 0.0 0.0 0.0 0.0 e 3 0.0 0.0 0.0 1.0 0.0 0.0 Relatedness 0 ( e 1 , e 3 ) = 0.0
Entity Relatedness p 3 ( e 1 ) p 3 ( e 2 ) CoSimRank (Rothe, ACL ‘14) 0.26 0.0 0.27 0.04 e 1 0.03 0.02 0.27 0.04 e 3 0.08 0.16 0.00 0.24 0.03 0.02 Relatedness 3 ( e 1 , e 3 ) = 0.13
Recommend
More recommend