inspecting the structural biases of dependency parsing
play

Inspecting the Structural Biases of Dependency Parsing Algorithms - PowerPoint PPT Presentation

Inspecting the Structural Biases of Dependency Parsing Algorithms Yoav Goldberg and Michael Elhadad Ben Gurion University CoNLL 2010, Sweden There are many ways to parse a sentence There are many ways to parse a sentence Transition Based


  1. Inspecting the Structural Biases of Dependency Parsing Algorithms Yoav Goldberg and Michael Elhadad Ben Gurion University CoNLL 2010, Sweden

  2. There are many ways to parse a sentence

  3. There are many ways to parse a sentence Transition Based Parsers Graph Based Parsers - Covington - First Order - Multiple Passes - Second Order (two children / - Arc-Eager with grandparent) - Arc-Standard - Third Order - With Swap Operator - MST Algorithm / Matrix Tree - First-Best Parser Theorem - DAG Parsing - Eisner Algorithm - With a Beam - Belief Propagation - With Dynamic Programming - With global constraints (ILP / - With Tree Revision gibbs sampling) - Left-to-right Combinations - Right-to-left - Voted Ensembles (Sagae’s Easy-First Parsing (check out way, Attardi’s way) our naacl 2010 paper) - Stacked Learning

  4. We can build many reasonably accurate parsers

  5. We can build many reasonably accurate parsers Parser combinations work

  6. We can build many reasonably accurate parsers Parser combinations work ⇒ every parser has its strong points

  7. We can build many reasonably accurate parsers Parser combinations work ⇒ every parser has its strong points Different parsers behave differently

  8. Open questions

  9. Open questions WHY do they behave as they do?

  10. Open questions WHY do they behave as they do? WHAT are the differences between them?

  11. More open questions Which linguistic phenomena are hard for parser X?

  12. More open questions Which linguistic phenomena are hard for parser X? What kinds of errors are common for parser Y?

  13. More open questions Which linguistic phenomena are hard for parser X? What kinds of errors are common for parser Y? Which parsing approach is most suitable for language Z?

  14. Previously McDonald and Nivre 2007: “Characterize the Errors of Data-Driven Dependency Parsing Models”

  15. Previously McDonald and Nivre 2007: “Characterize the Errors of Data-Driven Dependency Parsing Models” ◮ Focus on single-edge errors

  16. Previously McDonald and Nivre 2007: “Characterize the Errors of Data-Driven Dependency Parsing Models” ◮ Focus on single-edge errors ◮ M ST better for long edges, M ALT better for short ◮ M ST better near root, M ALT better away from root ◮ M ALT better at nouns and pronouns, M ST better at others

  17. Previously McDonald and Nivre 2007: “Characterize the Errors of Data-Driven Dependency Parsing Models” ◮ Focus on single-edge errors ◮ M ST better for long edges, M ALT better for short ◮ M ST better near root, M ALT better away from root ◮ M ALT better at nouns and pronouns, M ST better at others ◮ . . . but all these differences are very small

  18. we do something a bit different

  19. Assumptions ◮ Parsers fail in predictable ways ◮ those can be analyzed ◮ analysis should be done by inspecting trends rather than individual decisions

  20. Note: We do not do error analysis

  21. Note: We do not do error analysis ◮ Error analysis is complicated ◮ one error can yield another / hide another

  22. Note: We do not do error analysis ◮ Error analysis is complicated ◮ one error can yield another / hide another ◮ Error analysis is local to one tree ◮ many factors may be involved in that single error

  23. Note: We do not do error analysis ◮ Error analysis is complicated ◮ one error can yield another / hide another ◮ Error analysis is local to one tree ◮ many factors may be involved in that single error we are aiming at more global trends

  24. Structural Preferences

  25. Structural preferences for a given language+syntactic theory ◮ Some structures are more common than others ◮ (think Right Branching for English)

  26. Structural preferences for a given language+syntactic theory ◮ Some structures are more common than others ◮ (think Right Branching for English) ◮ Some structures are very rare ◮ (think non-projectivity, OSV constituent order)

  27. Structural preferences parsers also exhibit structural preferences

  28. Structural preferences parsers also exhibit structural preferences ◮ Some are explicit / by design ◮ e.g. projectivity

  29. Structural preferences parsers also exhibit structural preferences ◮ Some are explicit / by design ◮ e.g. projectivity ◮ Some are implicit, stem from ◮ features ◮ modeling ◮ data ◮ interactions ◮ and other stuff

  30. Structural preferences parsers also exhibit structural preferences ◮ Some are explicit / by design ◮ e.g. projectivity ◮ Some are implicit, stem from ◮ features ◮ modeling ◮ data ◮ interactions ◮ and other stuff These trends are interesting!

  31. Structural Bias

  32. Structural bias “The difference between the structural preferences of two languages”

  33. Structural bias “The difference between the structural preferences of two languages” For us: Which structures tend to occur more in language than in parser?

  34. Bias vs. Error related, but not the same Parser X makes many PP attachment errors ◮ claim about error pattern

  35. Bias vs. Error related, but not the same Parser X makes many PP attachment errors ◮ claim about error pattern Parser X tends to attach PPs low, while language Y tends to attach them high ◮ claim about structural bias (and also about errors)

  36. Bias vs. Error related, but not the same Parser X makes many PP attachment errors ◮ claim about error pattern Parser X tends to attach PPs low, while language Y tends to attach them high ◮ claim about structural bias (and also about errors) Parser X can never produce structure Y ◮ claim about structural bias

  37. Formulating Structural Bias “given a tree, can we say where it came from?” ?

  38. Formulating Structural Bias “given two trees of the same sentence, can we tell which parser produced each parse?” ?

  39. Formulating Structural Bias “which parser produced which tree?” ? any predictor that can help us answer this question is an indicator of structural bias

  40. Formulating Structural Bias “which parser produced which tree?” ? any predictor that can help us answer this question is an indicator of structural bias

  41. Formulating Structural Bias “which parser produced which tree?” ? any predictor that can help us answer this question is an indicator of structural bias uncovering structural bias = searching for good predictors

  42. Method ◮ start with two sets of parses for same set of sentences ◮ look for predictors that allow to distinguish between trees in each group

  43. Our Predictors ◮ all possible subtrees

  44. Our Predictors ◮ all possible subtrees ◮ always encode: ◮ parts of speech JJ ◮ relations ◮ direction IN NN VB

  45. Our Predictors ◮ all possible subtrees ◮ always encode: ◮ parts of speech JJ ◮ relations ◮ direction ◮ can encode also: ◮ lexical items IN / with NN VB

  46. Our Predictors ◮ all possible subtrees 4 ◮ always encode: ◮ parts of speech JJ ◮ relations ◮ direction ◮ can encode also: 2 ◮ lexical items IN / with ◮ distance to parent NN VB

  47. Search Procedure boosting with subtree features algorithm by Kudo and Matsumoto 2004.

  48. Search Procedure boosting with subtree features algorithm by Kudo and Matsumoto 2004. very briefly:

  49. Search Procedure boosting with subtree features algorithm by Kudo and Matsumoto 2004. very briefly: ◮ input: two sets of constituency trees ◮ while not done: ◮ choose a subtree that classifies most trees correctly ◮ re-weight trees based on errors

  50. Search Procedure boosting with subtree features algorithm by Kudo and Matsumoto 2004. very briefly: ◮ input: two sets of constituency trees ◮ while not done: ◮ choose a subtree that classifies most trees correctly ◮ re-weight trees based on errors ◮ output: weighted subtrees (= linear classifier)

  51. conversion to constituency 3 JJ → JJ d:3 VB ← 2 IN / with NN → IN ← NN VB w: with d:2 mandatory information at node label optional information as leaves

  52. conversion to constituency 3 JJ → JJ d:3 VB ← 2 IN / with NN → IN ← NN VB w: with d:2 mandatory information at node label optional information as leaves

  53. conversion to constituency 3 JJ → JJ d:3 VB ← 2 IN / with NN → IN ← NN VB w: with d:2 mandatory information at node label optional information as leaves

  54. Experiments Analyzed Parsers ◮ Malt Eager ◮ Malt Standard ◮ Mst 1 ◮ Mst 2

  55. Experiments Analyzed Parsers ◮ Malt Eager ◮ Malt Standard ◮ Mst 1 ◮ Mst 2 Data ◮ WSJ (converted using Johansson and Nugues) ◮ splits: parse-train (15-18), boost-train (10-11), boost-val (4-7) ◮ gold pos-tags

  56. Quantitative Results Q: Are the parsers biased with respect to English?

  57. Quantitative Results Q: Are the parsers biased with respect to English? A: Yes

  58. Quantitative Results Q: Are the parsers biased with respect to English? A: Yes Parser Train Accuracy Val Accuracy M ST 1 65.4 57.8 M ST 2 62.8 56.6 M ALT E 69.2 65.3 M ALT S 65.1 60.1 Table: Distinguishing parser output from gold-trees based on structural information

Recommend


More recommend