delexicalized parsing
play

Delexicalized Parsing Daniel Zeman, Rudolf Rosa April 3, 2020 - PowerPoint PPT Presentation

Delexicalized Parsing Daniel Zeman, Rudolf Rosa April 3, 2020 NPFL120 Multilingual Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated


  1. Delexicalized Parsing Daniel Zeman, Rudolf Rosa April 3, 2020 NPFL120 Multilingual Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

  2. Delexicalized Parsing MD VB CC VB IN DT NN Delexicalized Parsing 1/22 • What if we feed the parser with tags instead of words? • Ændringer i listen i bilaget ofgentliggøres og meddeles på samme måde. • NNS IN NN IN NN VB CC VB IN DT NN • NNS IN NN • Förändringar i förteckningen skall ofgentliggöras och meddelas på samma sätt.

  3. Delexicalized Parsing ((MD (VB CC VB)) (IN (DT NN)))) Delexicalized Parsing 2/22 • What if we feed the parser with tags instead of words? • Ændringer i listen i bilaget ofgentliggøres og meddeles på samme måde. • ((NNS (IN NN (IN NN))) ((VB CC VB) (IN (DT NN)))) • ((NNS (IN NN)) • Förändringar i förteckningen skall ofgentliggöras och meddelas på samma sätt.

  4. • (JRC-Acquis parallel corpus) • Hajič tagger for Swedish (PAROLE tagset) • CoNLL 2006 treebanks (dependencies) • Danish Dependency Treebank • Swedish Talbanken05 • Two constituency parsers: • “Charniak” • “Brown” (Charniak N-best parser + Johnson reranker) • Other resources Danish – Swedish Setup Related Languages India Delexicalized Parsing 3/22 • Daniel Zeman, Philip Resnik (2008). Cross-Language Parser Adaptation between • In IJCNLP 2008 Workshop on NLP for Less Privileged Languages, pp. 35–42, Hyderabad,

  5. Danish – Swedish Setup Related Languages India Delexicalized Parsing 3/22 • Daniel Zeman, Philip Resnik (2008). Cross-Language Parser Adaptation between • In IJCNLP 2008 Workshop on NLP for Less Privileged Languages, pp. 35–42, Hyderabad, • CoNLL 2006 treebanks (dependencies) • Danish Dependency Treebank • Swedish Talbanken05 • Two constituency parsers: • “Charniak” • “Brown” (Charniak N-best parser + Johnson reranker) • Other resources • (JRC-Acquis parallel corpus) • Hajič tagger for Swedish (PAROLE tagset)

  6. • (JRC-Acquis parallel corpus) Danish – Swedish Setup Related Languages India Delexicalized Parsing 3/22 • Daniel Zeman, Philip Resnik (2008). Cross-Language Parser Adaptation between • In IJCNLP 2008 Workshop on NLP for Less Privileged Languages, pp. 35–42, Hyderabad, • CoNLL 2006 treebanks (dependencies) • Danish Dependency Treebank • Swedish Talbanken05 • Two constituency parsers: • “Charniak” • “Brown” (Charniak N-best parser + Johnson reranker) • Other resources • Hajič tagger for Swedish (PAROLE tagset)

  7. • NUM governs NOUN • NOUN governs NUM • GEN governs NOM • NOM governs GEN • COORD: last member on • COORD: member on Treebank Normalization Delexicalized Parsing and conjs on next member previous member, commas year’s income års inkomster Swedish and ADJ Danish on fjrst member conjunction, everything else Russia’s way Ruslands vej ADJ governs NOUN 4/22 • DET governs ADJ • NOUN governs both DET

  8. • GEN governs NOM • NOM governs GEN • COORD: last member on • COORD: member on Treebank Normalization and ADJ Delexicalized Parsing and conjs on next member previous member, commas year’s income års inkomster Swedish Danish on fjrst member conjunction, everything else Russia’s way Ruslands vej ADJ governs NOUN 4/22 • DET governs ADJ • NOUN governs both DET • NUM governs NOUN • NOUN governs NUM

  9. • COORD: last member on • COORD: member on Treebank Normalization Danish Delexicalized Parsing and conjs on next member previous member, commas year’s income års inkomster and ADJ Swedish on fjrst member conjunction, everything else Russia’s way Ruslands vej ADJ governs NOUN 4/22 • DET governs ADJ • NOUN governs both DET • NUM governs NOUN • NOUN governs NUM • GEN governs NOM • NOM governs GEN

  10. Treebank Normalization on fjrst member Delexicalized Parsing and conjs on next member previous member, commas year’s income års inkomster and ADJ Danish Swedish conjunction, everything else Russia’s way Ruslands vej ADJ governs NOUN 4/22 • DET governs ADJ • NOUN governs both DET • NUM governs NOUN • NOUN governs NUM • GEN governs NOM • NOM governs GEN • COORD: last member on • COORD: member on

  11. • Convert dependencies to constituents • Flattest possible structure • DA/SV tagset converted to Penn Treebank tags • Nonterminal labels: • derived from POS tags • then translated to the Penn set of nonterminals • Make the parser feel it works with the Penn Treebank • (Although it could have been confjgured to use other sets of labels.) Treebank Preparation Delexicalized Parsing 5/22 • Transform Danish to Swedish tree style • A few heuristics • Only for evaluation! Not needed in real world.

  12. • DA/SV tagset converted to Penn Treebank tags • Nonterminal labels: • derived from POS tags • then translated to the Penn set of nonterminals • Make the parser feel it works with the Penn Treebank • (Although it could have been confjgured to use other sets of labels.) Treebank Preparation Delexicalized Parsing 5/22 • Transform Danish to Swedish tree style • A few heuristics • Only for evaluation! Not needed in real world. • Convert dependencies to constituents • Flattest possible structure

  13. • Nonterminal labels: • derived from POS tags • then translated to the Penn set of nonterminals • Make the parser feel it works with the Penn Treebank • (Although it could have been confjgured to use other sets of labels.) Treebank Preparation Delexicalized Parsing 5/22 • Transform Danish to Swedish tree style • A few heuristics • Only for evaluation! Not needed in real world. • Convert dependencies to constituents • Flattest possible structure • DA/SV tagset converted to Penn Treebank tags

  14. Delexicalized Parsing Treebank Preparation 5/22 • Transform Danish to Swedish tree style • A few heuristics • Only for evaluation! Not needed in real world. • Convert dependencies to constituents • Flattest possible structure • DA/SV tagset converted to Penn Treebank tags • Nonterminal labels: • derived from POS tags • then translated to the Penn set of nonterminals • Make the parser feel it works with the Penn Treebank • (Although it could have been confjgured to use other sets of labels.)

  15. • da-sv lexicalized: Charniak = 43.28, Brown = 41.84 • (no morphology tweaking) • da-da delexicalized: Charniak = 79.62, Brown = 80.20 (!) • (hybrid sv-da Hajič-like tagset = “words”, Penn POS = “tags”) • sv-sv delexicalized: Charniak = 76.07, Brown = 77.01 • da-sv delexicalized: Charniak = 65.50, Brown = 66.40 Unlabeled F Scores Delexicalized Parsing 6/22 • da-da lexicalized: Charniak = 78.16, Brown = 78.24 • (CoNLL train 94K words, test 5852 words) • sv-sv lexicalized: Charniak = 77.81, Brown = 78.74 • (CoNLL train 191K words, test 5656 words)

  16. • da-da delexicalized: Charniak = 79.62, Brown = 80.20 (!) • (hybrid sv-da Hajič-like tagset = “words”, Penn POS = “tags”) • sv-sv delexicalized: Charniak = 76.07, Brown = 77.01 • da-sv delexicalized: Charniak = 65.50, Brown = 66.40 Unlabeled F Scores Delexicalized Parsing 6/22 • da-da lexicalized: Charniak = 78.16, Brown = 78.24 • (CoNLL train 94K words, test 5852 words) • sv-sv lexicalized: Charniak = 77.81, Brown = 78.74 • (CoNLL train 191K words, test 5656 words) • da-sv lexicalized: Charniak = 43.28, Brown = 41.84 • (no morphology tweaking)

  17. • sv-sv delexicalized: Charniak = 76.07, Brown = 77.01 • da-sv delexicalized: Charniak = 65.50, Brown = 66.40 Unlabeled F Scores Delexicalized Parsing 6/22 • da-da lexicalized: Charniak = 78.16, Brown = 78.24 • (CoNLL train 94K words, test 5852 words) • sv-sv lexicalized: Charniak = 77.81, Brown = 78.74 • (CoNLL train 191K words, test 5656 words) • da-sv lexicalized: Charniak = 43.28, Brown = 41.84 • (no morphology tweaking) • da-da delexicalized: Charniak = 79.62, Brown = 80.20 (!) • (hybrid sv-da Hajič-like tagset = “words”, Penn POS = “tags”)

  18. • da-sv delexicalized: Charniak = 65.50, Brown = 66.40 Unlabeled F Scores Delexicalized Parsing 6/22 • da-da lexicalized: Charniak = 78.16, Brown = 78.24 • (CoNLL train 94K words, test 5852 words) • sv-sv lexicalized: Charniak = 77.81, Brown = 78.74 • (CoNLL train 191K words, test 5656 words) • da-sv lexicalized: Charniak = 43.28, Brown = 41.84 • (no morphology tweaking) • da-da delexicalized: Charniak = 79.62, Brown = 80.20 (!) • (hybrid sv-da Hajič-like tagset = “words”, Penn POS = “tags”) • sv-sv delexicalized: Charniak = 76.07, Brown = 77.01

  19. Unlabeled F Scores Delexicalized Parsing 6/22 • da-da lexicalized: Charniak = 78.16, Brown = 78.24 • (CoNLL train 94K words, test 5852 words) • sv-sv lexicalized: Charniak = 77.81, Brown = 78.74 • (CoNLL train 191K words, test 5656 words) • da-sv lexicalized: Charniak = 43.28, Brown = 41.84 • (no morphology tweaking) • da-da delexicalized: Charniak = 79.62, Brown = 80.20 (!) • (hybrid sv-da Hajič-like tagset = “words”, Penn POS = “tags”) • sv-sv delexicalized: Charniak = 76.07, Brown = 77.01 • da-sv delexicalized: Charniak = 65.50, Brown = 66.40

  20. How Big Swedish Treebank Yields Similar Results? Delexicalized Parsing 7/22 Unlabeled F 1 -score

Recommend


More recommend