web news sentence searching using linguistic graph
play

Web News Sentence Searching Using Linguistic Graph Similarity Kim - PowerPoint PPT Presentation

Web News Sentence Searching Using Linguistic Graph Similarity Kim Schouten & Flavius Frasincar schouten@ese.eur.nl frasincar@ese.eur.nl Problem Most text search methods are word-based Often, the context is lost for the sake of


  1. Web News Sentence Searching Using Linguistic Graph Similarity Kim Schouten & Flavius Frasincar schouten@ese.eur.nl frasincar@ese.eur.nl

  2. Problem • Most text search methods are word-based • Often, the context is lost for the sake of simplicity • However, the meaning of a word is defined by both word and context. • How can we include context information of words into the search algorithm?

  3. Graph-based Approach • Grammatically parsing a sentence yields a graph • Words are the nodes • Grammatical relations between words are the edges • Set of relations of a word can then be used as context. • NLP pipeline transforms both query and news sentences into graphs.

  4. Pipeline

  5. Graph representation of sentence

  6. Graph comparison • Problem is similar to graph isomorphism • But partial similarity makes it much harder • Nodes may be missing on either side • Nodes may be only partially similar (pc <> workstation) • Relation labels may be different for similar nodes • Hence, output is not binary but a real-valued similarity score

  7. Graph comparison • Nodes are compared on: • Basic and full part-of-speech (POS) label • Stem, lemma, and fully inflected word • If POS is the same, but word is not then check for: • Synonymy • Hypernymy (1 / steps in hypernym tree) • Correct for word frequency

  8. Graph comparison • We can recursively go through both graphs • Compare nodes and edges to assign score • However, a starting position within both graphs is needed: • Using all possibilities is inefficient • Always starting at root is inaccurate • Use index of stemmed words (nouns/verbs) • Only the best scoring starting position is kept

  9. “In Gartner’s rankings , Lenovo is the top PC maker. ” “Hewlett -Packard is still top workstation manufacturer according to new ranking by IDC. ” the Hewlett-Packard legend: determiner proper noun word part-of-speech (lemma) adjectival top top nominal subject (stem) modifier adjective adjective Search algorithm adjectival adjectival PC workstation modifier modifier noun noun noun compound noun compound modifier Lenovo still modifier proper noun adverb nominal nominal is is subject subject verb verb (be) (be) copula copula (be) (be) maker manufacturer noun noun (maker) (manufacturer) (make) (manufactur) prepositional modifier “according” prepositional modifier “in” rankings noun – plural to prepositional object (ranking) (rank) Case difference (plural vs. singular) ranking possession and different relation type modifier noun (ranking) Gartner (rank) proper noun adjectival modifier new prepositional adjective modifier “by” IDC proper noun

  10. “In Gartner’s rankings , Lenovo is the top PC maker. ” “Hewlett -Packard is still top workstation manufacturer according to new ranking by IDC. ” the Hewlett-Packard legend: determiner proper noun word part-of-speech (lemma) adjectival top top nominal subject (stem) modifier adjective adjective Search algorithm adjectival adjectival PC workstation modifier modifier noun noun noun compound noun compound modifier Lenovo still modifier proper noun adverb nominal nominal is is subject subject verb verb (be) (be) copula copula (be) (be) maker manufacturer noun noun (maker) (manufacturer) (make) (manufactur) prepositional modifier “according” prepositional modifier “in” rankings noun – plural to prepositional object (ranking) (rank) Case difference (plural vs. singular) ranking possession and different relation type modifier noun (ranking) Gartner (rank) proper noun adjectival modifier new Part-of-Speech identical prepositional adjective modifier “by” but different relation type IDC proper noun

  11. “In Gartner’s rankings , Lenovo is the top PC maker. ” “Hewlett -Packard is still top workstation manufacturer according to new ranking by IDC. ” the Hewlett-Packard legend: determiner proper noun word part-of-speech (lemma) adjectival top top nominal subject (stem) modifier adjective adjective Search algorithm adjectival adjectival PC workstation modifier modifier noun noun noun compound noun compound modifier Lenovo still modifier proper noun adverb nominal nominal is is subject subject verb verb (be) (be) copula copula (be) (be) maker manufacturer noun noun Synonym and different relation type (maker) (manufacturer) (make) (manufactur) prepositional modifier “according” prepositional modifier “in” rankings noun – plural to prepositional object (ranking) (rank) Case difference (plural vs. singular) ranking possession modifier noun (ranking) Gartner (rank) proper noun adjectival modifier new Part-of-Speech identical prepositional adjective modifier “by” but different relation type IDC proper noun

  12. “In Gartner’s rankings , Lenovo is the top PC maker. ” “Hewlett -Packard is still top workstation manufacturer according to new ranking by IDC. ” the Hewlett-Packard legend: determiner proper noun word part-of-speech (lemma) adjectival top top nominal subject (stem) modifier adjective adjective Search algorithm adjectival adjectival PC workstation modifier modifier noun noun noun compound noun compound modifier Lenovo still modifier proper noun adverb nominal nominal is is subject subject verb verb Identical words and relation type (be) (be) copula copula (be) (be) maker manufacturer noun noun Synonym and different relation type (maker) (manufacturer) (make) (manufactur) prepositional modifier “according” prepositional modifier “in” rankings noun – plural to prepositional object (ranking) (rank) Case difference (plural vs. singular) ranking possession modifier noun (ranking) Gartner (rank) proper noun adjectival modifier new Part-of-Speech identical prepositional adjective modifier “by” but different relation type IDC proper noun

  13. “In Gartner’s rankings , Lenovo is the top PC maker. ” “Hewlett -Packard is still top workstation manufacturer according to new ranking by IDC. ” the Hewlett-Packard legend: determiner proper noun word part-of-speech (lemma) adjectival top top nominal subject (stem) modifier adjective adjective Part-of-Speech and relation type is identical Search algorithm adjectival adjectival PC workstation modifier modifier noun noun noun compound noun compound modifier Lenovo still modifier proper noun adverb nominal nominal is is subject subject verb verb Identical words and relation type (be) (be) copula copula (be) (be) maker manufacturer noun noun Synonym and different relation type (maker) (manufacturer) (make) (manufactur) prepositional modifier “according” prepositional modifier “in” rankings noun – plural to prepositional object (ranking) (rank) Case difference (plural vs. singular) ranking possession modifier noun (ranking) Gartner (rank) proper noun adjectival modifier new Part-of-Speech identical prepositional adjective modifier “by” but different relation type IDC proper noun

  14. “In Gartner’s rankings , Lenovo is the top PC maker. ” “Hewlett -Packard is still top workstation manufacturer according to new ranking by IDC. ” the Hewlett-Packard legend: determiner proper noun word part-of-speech (lemma) adjectival top top nominal subject (stem) modifier adjective adjective Part-of-Speech and relation type is identical Search algorithm adjectival adjectival PC workstation modifier modifier Hypernym noun noun noun compound noun compound modifier Lenovo still modifier proper noun adverb nominal nominal is is subject subject verb verb Identical words and relation type (be) (be) copula copula (be) (be) maker manufacturer noun noun Synonym and different relation type (maker) (manufacturer) (make) (manufactur) prepositional modifier “according” prepositional modifier “in” rankings noun – plural to prepositional object (ranking) (rank) Case difference (plural vs. singular) ranking possession modifier noun (ranking) Gartner (rank) proper noun adjectival modifier new Part-of-Speech identical prepositional adjective modifier “by” but different relation type IDC proper noun

Recommend


More recommend