Understanding Text with Knowledge-Bases and Random Walks Eneko Agirre ixa2.si.ehu.es/eneko IXA NLP Group University of the Basque Country MAVIR, 2011 Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 1 / 54
Random Walks on Large Graphs WWW, PageRank and Google source: http://opte.org Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 2 / 54
Random Walks on Large Graphs WWW, PageRank and Google source: http://opte.org Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 2 / 54
Random Walks on Large Graphs Linked Data Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54
Random Walks on Large Graphs Wikipedia (DBpedia) Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54
Random Walks on Large Graphs WordNet Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54
Random Walks on Large Graphs Unified Medical Language System Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54
Random Walks on Large Graphs sources: http://sixdegrees.hu/ http://www2.research.att.com/˜yifanhu/ http://www.cise.ufl.edu/research/sparse/matrices/Gleich/ http://www.ebremer.com/ Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54
Text Understanding Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’ Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54
Text Understanding Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’ Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54
Text Understanding Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’ Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54
Text Understanding Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’ End systems that we would like to build: natural dialogue, speech recognition, machine translation improving parsing, semantic role labeling, information retrieval, question answering Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54
Text Understanding From string to semantic representation (First Order Logic) Barcelona coach praises Jose Mourinho. Exist e1, x1, x2, x3 such that FC Barcelona (x1) and coach:n:1 (x2) and praise:v:2 (e1, x2,x3 ) and Jos´ e Mourinho (x3) Disambiguation: Concepts , Entities and Semantic Roles Quantifiers, modality, negation, etc. Inference and Reasoning Barcelona coach praises Mourinho ∼ Guardiola honors Mourinho . . . with respect to some Knowledge Base Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 5 / 54
Text Understanding From string to semantic representation (First Order Logic) Barcelona coach praises Jose Mourinho. Exist e1, x1, x2, x3 such that FC Barcelona (x1) and coach:n:1 (x2) and praise:v:2 (e1, x2,x3 ) and Jos´ e Mourinho (x3) Disambiguation: Concepts , Entities and Semantic Roles Quantifiers, modality, negation, etc. Inference and Reasoning Barcelona coach praises Mourinho ∼ Guardiola honors Mourinho . . . with respect to some Knowledge Base Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 5 / 54
Text Understanding: Knowledge Bases and Random Walks Focus on the following tasks on inference and disambiguation: Map words in context to KB concepts (Word Sense Disambiguation) Similarity between concepts and words Similarity to improve ad-hoc information retrieval Applied to WordNet(s), UMLS, Wikipedia Excellent results Open source software and data: http://ixa2.si.ehu.es/ukb/ Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 6 / 54
Text Understanding: Knowledge Bases and Random Walks Focus on the following tasks on inference and disambiguation: Map words in context to KB concepts (Word Sense Disambiguation) Similarity between concepts and words Similarity to improve ad-hoc information retrieval Applied to WordNet(s), UMLS, Wikipedia Excellent results Open source software and data: http://ixa2.si.ehu.es/ukb/ Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 6 / 54
Outline WordNet, PageRank and Personalized PageRank 1 Random walks for WSD 2 Adapting WSD to domains 3 WSD on the biomedical domain 4 Random walks for similarity 5 Similarity and Information Retrieval 6 Conclusions 7 Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 7 / 54
WordNet, PageRank and Personalized PageRank Outline WordNet, PageRank and Personalized PageRank 1 Random walks for WSD 2 Adapting WSD to domains 3 WSD on the biomedical domain 4 Random walks for similarity 5 Similarity and Information Retrieval 6 Conclusions 7 Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 8 / 54
WordNet, PageRank and Personalized PageRank Wordnet, Pagerank and Personalized PageRank ( with Aitor Soroa ) WordNet is the most widely used hierarchically organized lexical database for English (Fellbaum, 1998) Broad coverage of nouns, verbs, adjectives, adverbs Main unit: synset (concept) coach#1, manager#3, handler#2 someone in charge of training an athlete or a team. Relations between concepts: synonymy (built-in), hyperonymy, antonymy, meronymy, entailment, derivation, gloss Closely linked versions in several languages Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 9 / 54
WordNet, PageRank and Personalized PageRank Wordnet Example of hypernym relations: coach#1 trainer leader person organism . . . entity synonyms: manager, handler gloss words (and synsets): charge, train (verb), athlete, team hyponyms: baseball coach, basketball coach, conditioner, football coach instance:John McGraw domain: sport, athletics derivation: coach (verb), managership, manage (verb), handle (verb) Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 10 / 54
WordNet, PageRank and Personalized PageRank Wordnet Representing WordNet as a graph: Nodes represent concepts Edges represent relations (undirected) In addition, directed edges from words to corresponding concepts (senses) Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 11 / 54
WordNet, PageRank and Personalized PageRank Wordnet managership#n3 handle#v6 derivation trainer#n1 derivation sport#n1 hyperonym teacher#n1 coach#n1 domain hyperonym coach#n2 coach derivation tutorial#n1 coach#n5 holonym hyperonym holonym fleet#n2 public_transport#n1 seat#n1 Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 12 / 54
WordNet, PageRank and Personalized PageRank Random Walks: PageRank Given a graph, ranks nodes according to their relative structural importance If an edge from n i to n j exists, a vote from n i to n j is produced Strength depends on the rank of n i The more important n i is, the more strength its votes will have. PageRank is more commonly viewed as the result of a random walk process Rank of n i represents the probability of a random walk over the graph ending on n i , at a sufficiently large time. Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 13 / 54
WordNet, PageRank and Personalized PageRank Random Walks: PageRank G : graph with N nodes n 1 , . . . , n N d i : outdegree of node i M : N × N matrix 1 an edge from i to j exists M ji = d i 0 otherwise PageRank equation: Pr = cM Pr + ( 1 − c ) v surfer follows edges surfer randomly jumps to any node (teleport) c : damping factor: the way in which these two terms are combined Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54
WordNet, PageRank and Personalized PageRank Random Walks: PageRank G : graph with N nodes n 1 , . . . , n N d i : outdegree of node i M : N × N matrix 1 an edge from i to j exists M ji = d i 0 otherwise PageRank equation: Pr = cM Pr + ( 1 − c ) v surfer follows edges surfer randomly jumps to any node (teleport) c : damping factor: the way in which these two terms are combined Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54
WordNet, PageRank and Personalized PageRank Random Walks: PageRank G : graph with N nodes n 1 , . . . , n N d i : outdegree of node i M : N × N matrix 1 an edge from i to j exists M ji = d i 0 otherwise PageRank equation: Pr = cM Pr + ( 1 − c ) v surfer follows edges surfer randomly jumps to any node (teleport) c : damping factor: the way in which these two terms are combined Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54
WordNet, PageRank and Personalized PageRank Random Walks: PageRank G : graph with N nodes n 1 , . . . , n N d i : outdegree of node i M : N × N matrix 1 an edge from i to j exists M ji = d i 0 otherwise PageRank equation: Pr = cM Pr + ( 1 − c ) v surfer follows edges surfer randomly jumps to any node (teleport) c : damping factor: the way in which these two terms are combined Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54
Recommend
More recommend