Data-Driven Parsing with Discontinuous Structures Wolfgang Maier Heinrich-Heine-Universit¨ at D¨ usseldorf GF Summer School 2013
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Overview Introduction 1 Data-Driven Parsing with Discontinuous Structures 2 The Data Parsing Making it Faster Going Further 3 Related work Future work Extract a grammar yourself Maier 2/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Overview Introduction 1 Data-Driven Parsing with Discontinuous Structures 2 The Data Parsing Making it Faster Going Further 3 Related work Future work Extract a grammar yourself Maier 2/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Overview Introduction 1 Data-Driven Parsing with Discontinuous Structures 2 The Data Parsing Making it Faster Going Further 3 Related work Future work Extract a grammar yourself Maier 2/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Constituency Parsing Constituency Parsing Determine whether a sentence is admissible given a specific grammar, and find the corresponding structure Different strategies: Top-down/bottom-up, directional/non-directional, . . . Maier 3/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Constituency Parsing Constituency Parsing Determine whether a sentence is admissible given a specific grammar, and find the corresponding structure Different strategies: Top-down/bottom-up, directional/non-directional, . . . Non-directional bottom-up (CYK) S → NP VP VP → V NP VP → VP PP NP → Det N NP → John NP → Sandy NP → Mary V → sees John sees Sandy . . . Maier 3/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Constituency Parsing Constituency Parsing Determine whether a sentence is admissible given a specific grammar, and find the corresponding structure Different strategies: Top-down/bottom-up, directional/non-directional, . . . Non-directional bottom-up (CYK) S → NP VP VP → V NP VP → VP PP NP → Det N NP → John NP → Sandy NP V NP NP → Mary V → sees John sees Sandy . . . Maier 3/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Constituency Parsing Constituency Parsing Determine whether a sentence is admissible given a specific grammar, and find the corresponding structure Different strategies: Top-down/bottom-up, directional/non-directional, . . . Non-directional bottom-up (CYK) S → NP VP VP → V NP VP → VP PP NP → Det N VP NP → John NP → Sandy NP V NP NP → Mary V → sees John sees Sandy . . . Maier 3/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Constituency Parsing Constituency Parsing Determine whether a sentence is admissible given a specific grammar, and find the corresponding structure Different strategies: Top-down/bottom-up, directional/non-directional, . . . Non-directional bottom-up (CYK) S → NP VP VP → V NP S VP → VP PP NP → Det N VP NP → John NP → Sandy NP V NP NP → Mary V → sees John sees Sandy . . . Maier 3/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Data-Driven Constituency Parsing To make parsing data-driven, instead of writing a grammar by hand: use a collection of structures which can be interpreted as parse trees of the grammar formalism we are using use an algorithm on it which infers the grammar rules which have been used to create a given parse tree equip the rules with probabilities (conditional probabilities from rule counts) use probabilities for disambiguation Maier 4/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Data Treebanks are corpora in which sentences are annotated with syntactic information very small ones contain a few thousand, large ones up to 100k sentences typically created from easily accessible text such as news text Treebank annotation mostly aims at neutrality concerning linguistic theories, does not always succeed however often has an easily accessible context-free annotation backbone Maier 5/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Grammar Extraction Example S → NP VP S NP → John NP VP VP → VP PP VP → V NP John VP PP PP → P NP V → sees V NP P NP NP → Sandy sees Sandy with Det N P → with NP → Det N the telescope . . . Maier 6/41
Introduction Data-Driven Parsing with Discontinuous Structures Going Further Grammar Extraction Example S → NP VP S 1.0 NP → John 0.333 NP VP VP → VP PP 0.5 VP → V NP 0.5 John VP PP PP → P NP 1.0 V → sees 1.0 V NP P NP NP → Sandy 0.333 sees Sandy with Det N P → with 1.0 NP → Det N 0.333 the telescope . . . Maier 6/41
Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Discontinuous Structure in Natural Language A sequence of words which is discontinuous but forms a linguistically meaningful unit. . . . . . . . . . ✄ ✂ � ✄ ✂ � ✁ ✁ Maier 7/41
Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Discontinuous Structure in Natural Language A sequence of words which is discontinuous but forms a linguistically meaningful unit. . . . . . . . . . ✄ ✂ � ✄ ✂ � ✁ ✁ Maier 7/41
Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Discontinuity Examples: German Extraposed relative clauses (1) wieder treffen alle Attribute zu, die auch again match all attributes which also Vpart sonst immer passen otherwise always fit ‘Again, the same attributes as always apply.’ Maier 8/41
Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Discontinuity Examples: German Extraposed relative clauses (1) wieder treffen alle Attribute zu, die auch again match all attributes which also Vpart sonst immer passen otherwise always fit ‘Again, the same attributes as always apply.’ Topicalization (2) Der CD wird bald ein Buch folgen The CD will soon a book follow ‘The CD will soon be followed by a book.’ Maier 8/41
Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Discontinuity Discontinuity is frequent in natural language, not only in languages with a relatively free word order. Maier 9/41
Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Discontinuity Discontinuity is frequent in natural language, not only in languages with a relatively free word order. Examples: English Relative clause (3) They sow a row of male-fertile plants nearby, which then pollinate the male-sterile plants . Maier 9/41
Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Discontinuity Discontinuity is frequent in natural language, not only in languages with a relatively free word order. Examples: English Relative clause (3) They sow a row of male-fertile plants nearby, which then pollinate the male-sterile plants . Long extraction (4) Those chains include Bloomingdale’s, which Campeau recently said it will sell . Maier 9/41
SBAR S SBJ TMP VP SBAR S SBJ VP VP WHNP NP ADVP NP NP *T* which Campeau recently said 0 it will sell *T* WDT NNP RB VBD -NONE- PRP MD VB -NONE- Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Annotation in the Penn Treebank “Movement”: Indirect annotation w/ trace nodes and coindexation Maier 10/41
Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Annotation in the Penn Treebank “Movement”: Indirect annotation w/ trace nodes and coindexation SBAR S SBJ TMP VP SBAR S SBJ VP VP WHNP NP ADVP NP NP *T* which Campeau recently said 0 it will sell *T* WDT NNP RB VBD -NONE- PRP MD VB -NONE- Maier 10/41
S HD OC SB VP DA MO HD NP NP NK NK NK NK Der CD wird bald ein Buch folgen ART NN VAFIN ADV ART NN VVINF Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Annotation in the German NeGra/TIGER Treebanks Direct annotation using crossing branches Maier 11/41
Introduction The Data Data-Driven Parsing with Discontinuous Structures Parsing Going Further Making it Faster Annotation in the German NeGra/TIGER Treebanks Direct annotation using crossing branches S HD OC SB VP DA MO HD NP NP NK NK NK NK Der CD wird bald ein Buch folgen ART NN VAFIN ADV ART NN VVINF Maier 11/41
Recommend
More recommend