natural language understanding using knowledge bases and
play

Natural Language Understanding using Knowledge Bases and Random - PowerPoint PPT Presentation

Natural Language Understanding using Knowledge Bases and Random Walks Eneko Agirre ixa2.si.ehu.eus/eneko IXA NLP Group University of the Basque Country Darmstadt, 2015 Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 1 / 43


  1. Natural Language Understanding using Knowledge Bases and Random Walks Eneko Agirre ixa2.si.ehu.eus/eneko IXA NLP Group University of the Basque Country Darmstadt, 2015 Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 1 / 43

  2. Algorithms on Large Graphs WWW, Random walks, PageRank and Google source: http://opte.org Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 2 / 43

  3. Algorithms on Large Graphs WWW, Random walks, PageRank and Google source: http://opte.org Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 2 / 43

  4. Algorithms on Large Graphs Linked Data Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 3 / 43

  5. Algorithms on Large Graphs Wikipedia (DBpedia) Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 3 / 43

  6. Algorithms on Large Graphs WordNet Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 3 / 43

  7. Algorithms on Large Graphs Unified Medical Language System Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 3 / 43

  8. Algorithms on Large Graphs sources: http://sixdegrees.hu/ http://www2.research.att.com/˜yifanhu/ http://www.cise.ufl.edu/research/sparse/matrices/Gleich/ http://www.ebremer.com/ Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 3 / 43

  9. Text Understanding Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’ Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 4 / 43

  10. Text Understanding Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’ Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 4 / 43

  11. Text Understanding Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’ Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 4 / 43

  12. Text Understanding: Knowledge Bases and Graph algorithms How far can we go with current KBs and graph-based algorithms? Ground words in context to KB concepts and instances Word Sense Disambiguation Named Entity Disambiguation , Entity Linking, Wikification Similarity between concepts, instances and words Improve ad-hoc information retrieval Applied to WordNet(s), UMLS, Wikipedia Excellent results Open source software and data: http://ixa2.si.ehu.eus/ukb/ Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 5 / 43

  13. Text Understanding: Knowledge Bases and Graph algorithms How far can we go with current KBs and graph-based algorithms? Ground words in context to KB concepts and instances Word Sense Disambiguation Named Entity Disambiguation , Entity Linking, Wikification Similarity between concepts, instances and words Improve ad-hoc information retrieval Applied to WordNet(s), UMLS, Wikipedia Excellent results Open source software and data: http://ixa2.si.ehu.eus/ukb/ Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 5 / 43

  14. Text Understanding: Knowledge Bases and Graph algorithms How far can we go with current KBs and graph-based algorithms? Ground words in context to KB concepts and instances Word Sense Disambiguation Named Entity Disambiguation , Entity Linking, Wikification Similarity between concepts, instances and words Improve ad-hoc information retrieval Applied to WordNet(s), UMLS, Wikipedia Excellent results Open source software and data: http://ixa2.si.ehu.eus/ukb/ Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 5 / 43

  15. Outline WordNet, PageRank and Personalized PageRank 1 Random walks for WSD 2 Random walks for WSD (biomedical domain) 3 Random walks for NED 4 Random walks for similarity 5 Similarity and Information Retrieval 6 Conclusions 7 Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 6 / 43

  16. WordNet, PageRank and Personalized PageRank Outline WordNet, PageRank and Personalized PageRank 1 Random walks for WSD 2 Random walks for WSD (biomedical domain) 3 Random walks for NED 4 Random walks for similarity 5 Similarity and Information Retrieval 6 Conclusions 7 Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 7 / 43

  17. WordNet, PageRank and Personalized PageRank Wordnet, Pagerank and Personalized PageRank WordNet is the most widely used hierarchically organized lexical database for English (Fellbaum, 1998) Broad coverage of nouns, verbs, adjectives, adverbs Main unit: synset (concept) coach#1, manager#3, handler#2 someone in charge of training an athlete or a team. Relations between concepts: synonymy (built-in), hyperonymy, antonymy, meronymy, entailment, derivation, gloss Closely linked versions in several languages Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 8 / 43

  18. WordNet, PageRank and Personalized PageRank Wordnet Representing WordNet as a graph: Nodes represent concepts Edges represent relations (undirected) In addition, directed edges from words to corresponding concepts (senses) Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 9 / 43

  19. WordNet, PageRank and Personalized PageRank Wordnet managership#n3 handle#v6 derivation trainer#n1 derivation sport#n1 hyperonym teacher#n1 coach#n1 domain hyperonym coach#n2 coach derivation tutorial#n1 coach#n5 holonym hyperonym holonym fleet#n2 public_transport#n1 seat#n1 Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 10 / 43

  20. WordNet, PageRank and Personalized PageRank Random Walks: PageRank Given a graph, ranks nodes according to their relative structural importance If an edge from n i to n j exists, a vote from n i to n j is produced Strength depends on the rank of n i The more important n i is, the more strength its votes will have. PageRank is more commonly viewed as the result of a random walk process Rank of n i represents the probability of a random walk over the graph ending on n i , at a sufficiently large time. Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 11 / 43

  21. WordNet, PageRank and Personalized PageRank Random Walks: PageRank G : graph with N nodes n 1 , . . . , n N d i : outdegree of node i M : N × N matrix  1 an edge from i to j exists  M ji = d i 0 otherwise  PageRank equation: Pr = cM Pr + ( 1 − c ) v surfer follows edges surfer randomly jumps to any node (teleport) c : damping factor: the way in which these two terms are combined Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 12 / 43

  22. WordNet, PageRank and Personalized PageRank Random Walks: PageRank G : graph with N nodes n 1 , . . . , n N d i : outdegree of node i M : N × N matrix  1 an edge from i to j exists  M ji = d i 0 otherwise  PageRank equation: Pr = cM Pr + ( 1 − c ) v surfer follows edges surfer randomly jumps to any node (teleport) c : damping factor: the way in which these two terms are combined Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 12 / 43

  23. WordNet, PageRank and Personalized PageRank Random Walks: PageRank G : graph with N nodes n 1 , . . . , n N d i : outdegree of node i M : N × N matrix  1 an edge from i to j exists  M ji = d i 0 otherwise  PageRank equation: Pr = cM Pr + ( 1 − c ) v surfer follows edges surfer randomly jumps to any node (teleport) c : damping factor: the way in which these two terms are combined Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 12 / 43

  24. WordNet, PageRank and Personalized PageRank Random Walks: PageRank G : graph with N nodes n 1 , . . . , n N d i : outdegree of node i M : N × N matrix  1 an edge from i to j exists  M ji = d i 0 otherwise  PageRank equation: Pr = cM Pr + ( 1 − c ) v surfer follows edges surfer randomly jumps to any node (teleport) c : damping factor: the way in which these two terms are combined Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 12 / 43

  25. WordNet, PageRank and Personalized PageRank Random Walks: PageRank G : graph with N nodes n 1 , . . . , n N d i : outdegree of node i M : N × N matrix  1 an edge from i to j exists  M ji = d i 0 otherwise  PageRank equation: Pr = cM Pr + ( 1 − c ) v surfer follows edges surfer randomly jumps to any node (teleport) c : damping factor: the way in which these two terms are combined Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 12 / 43

  26. WordNet, PageRank and Personalized PageRank Random Walks: Personalized PageRank Pr = cM Pr + ( 1 − c ) v PageRank: v is a stochastic normalized vector, with elements 1 N Equal probabilities to all nodes in case of random jumps Personalized PageRank , non-uniform v (Haveliwala 2002) Assign stronger probabilities to certain kinds of nodes Bias PageRank to prefer these nodes For ex. if we concentrate all mass on node i All random jumps return to n i Rank of i will be high High rank of i will make all the nodes in its vicinity also receive a high rank Importance of node i given by the initial v spreads along the graph Agirre (UBC) NLU using KBs and Random Walks Feb. 2015 13 / 43

Recommend


More recommend