Modeling Linguistic Theory on a Computer: From GB to Minimalism Sandiway Fong Dept. of Linguistics Dept. of Computer Science 1 MIT IAP Computational Linguistics Fest, 1/14/2005
Outline • Mature system: PAPPI • Current work – parser in the principles-and- – introduce a left-to-right parser parameters framework based on the probe-goal model – principles are formalized and from the Minimalist Program declaratively stated in Prolog (MP) (logic) – take a look at modeling some – principles are mapped onto data from SOV languages general computational • relativization in Turkish and mechanisms Japanese – recovers all possible parses • psycholinguistics (parsing – (free software, recently ported preferences) to MacOS X and Linux) – (software yet to be released...) – (see http://dingo.sbs.arizona.edu/~sandi way/ ) 2 MIT IAP Computational Linguistics Fest, 1/14/2005
3 PAPPI: Overview sentence • user’s viewpoint syntactic represent ations parser operations corresponding to linguistic principles (= theory) 3 MIT IAP Computational Linguistics Fest, 1/14/2005
PAPPI: Overview • parser operations can be – turned on or off – metered • syntactic representations can be – displayed – examined • in the context of a parser operation – dissected • features displayed 4 MIT IAP Computational Linguistics Fest, 1/14/2005
PAPPI: Coverage • supplied with a basic set of principles – X’-based phrase structure, Case, Binding, ECP, Theta, head movement, phrasal movement, LF movement, QR, operator-variable, WCO – handles a couple hundred English examples from Lasnik and Uriagereka’s (1988) A Course in GB Syntax • more modules and principles can be added or borrowed – VP-internal subjects, NPIs, double objects Zero Syntax (Pesetsky, 1995) – Japanese (some Korean): head-final, pro-drop, scrambling – Dutch (some German): V2, verb raising – French (some Spanish): verb movement, pronominal clitics – Turkish, Hungarian: complex morphology – Arabic: VSO, SVO word orders 5 MIT IAP Computational Linguistics Fest, 1/14/2005
PAPPI: Architecture • software layers GUI parser prolog os 6 MIT IAP Computational Linguistics Fest, 1/14/2005
2 PAPPI: Architecture Word Order pro -drop Wh -in-Syntax • software Scrambling layers GUI Lexicon Parameters Periphery parser PS Rules Principles prolog os Programming Language Compilation Stage LR(1) Type Chain Tree Inf. – competing parses can be run in parallel across multiple machines 7 MIT IAP Computational Linguistics Fest, 1/14/2005
PAPPI: Machinery • morphology – simple morpheme concatenation – morphemes may project or be rendered as features • (example from the Hungarian implementation) EXAMPLE: a szerzô-k megnéz-et------het----- � --------né-----nek---- � két cikk---et the author-Agr3Pl look_at---Caus-Possib-tns(prs)-Cond-Agr3Pl-Obj(indef) two article-Acc a munkatárs-a-----------ik---------------------kal 8 MIT IAP Computational Linguistics Fest, 1/14/2005 the colleague----Poss3Sg-Agr3Pl+Poss3Pl-LengdFC+Com
2 PAPPI: LR Machinery • phrase • specification structure – rule XP -> [XB|spec(XB)] ordered specFinal st max(XP), proj(XB,XP). – rule XB -> [X|compl(X)] ordered headInitial(X) st bar(XB), proj(X,XB), – parameterized head(X). X’-rules – rule v(V) moves_to i provided agr(strong), finite(V). – head – rule v(V) moves_to i provided agr(weak), V has_feature aux. movement rules State 3 • implementation NP -> D N . State 1 State 2 – rules are not used – bottom-up, shift-reduce parser NP -> D . N NP -> N . directly during – push-down automaton (PDA) parsing for computational – stack-based merge State 4 efficiency • shift S -> . NP VP – mapped at compile- NP -> . D N S -> NP . VP • reduce NP -> . N NP -> NP . PP time onto LR NP -> . NP PP VP -> . V NP machinery – canonical LR(1) VP -> . V VP -> . VP PP • disambiguate through one word lookahead PP -> . P NP State 0 9 MIT IAP Computational Linguistics Fest, 1/14/2005
1 PAPPI: Machine Parameters • selected parser operations may be integrated with phrase structure • specification recovery or – coindexSubjAndINFL in_all_configurations CF where chain formation specIP (CF,Subject) then coindexSI(Subject,CF). – subjacency in_all_configurations CF where isTrace(CF), – machine upPath (CF,Path) then lessThan2BoundingNodes(Path) parameter – however, not • implementation always efficient – use type inferencing defined over category labels to do so • figure out which LR reduce actions should place an outcall to a parser operation – subjacency can be called during chain aggregation 10 MIT IAP Computational Linguistics Fest, 1/14/2005
3 PAPPI: Chain Formation • recovery of • specification chains – assignment of a chain feature to constituents – compute all possible combinations • each empty category • combinatorics optionally – exponential growth participates in a chain • each overt constituent optionally heads a chain 11 MIT IAP Computational Linguistics Fest, 1/14/2005
3 PAPPI: Chain Formation • recovery of • specification chains – assignment of a chain feature to constituents – compute all possible combinations • each empty category • • implementation combinatorics optionally – possible chains compositionally defined – exponential growth participates in – incrementally computed a chain – bottom-up • each overt constituent optionally heads a chain – allows parser operation merge 12 MIT IAP Computational Linguistics Fest, 1/14/2005
3 PAPPI: Chain Formation • recovery of • specification chains – assignment of a chain feature to constituents – compute all possible combinations • each empty category • • • implementation combinatorics merge constraints on chain paths optionally – exponential growth – possible chains compositionally defined participates in – incrementally computed a chain – bottom-up • each overt constituent – loweringFilter in_all_configuration s CF where isTrace(CF), optionally downPath(CF,Path) then Path=[]. heads a chain – subjacency in_all_configurations CF where isTrace(CF), – allows parser operation merge upPath (CF,Path) then lessThan2BoundingNodes(Path) 13 MIT IAP Computational Linguistics Fest, 1/14/2005
2 PAPPI: Domain Computation • minimal • specification domain – gc(X) smallest_configuration CF st cat(CF,C), member(C,[np,i2]) – incremental – with_components – X, – bottom-up – G given_by governs(G,X,CF), – S given_by accSubj(S,X,CF). • implementing – Governing Category (GC): – GC( α ) is the smallest NP or IP containing: – (A) α , and – (B) a governor of α , and – (C) an accessible SUBJECT for α . 14 MIT IAP Computational Linguistics Fest, 1/14/2005
2 PAPPI: Domain Computation • minimal • specification domain – gc(X) smallest_configuration CF st cat(CF,C), member(C,[np,i2]) – incremental – with_components – X, – bottom-up – G given_by governs(G,X,CF), – S given_by accSubj(S,X,CF). • • implementing used in – Governing Category (GC): – Binding Condition A – GC( α ) is the smallest NP or IP containing: • An anaphor must be A-bound in its GC – (A) α , and – (B) a governor of α , and – conditionA in_all_configurations CF where – (C) an accessible SUBJECT for α . – anaphor(CF) then gc(CF,GC), aBound(CF,GC). – anaphor(NP) :- NP has_feature apos, NP has_feature a(+). 15 MIT IAP Computational Linguistics Fest, 1/14/2005
Probe-Goal Parser: Overview • strictly incremental – left-to-right – uses elementary tree (eT) composition • guided by selection • open positions filled from input – epp – no bottom-up merge/move • probe-goal agreement – uninterpretable interpretable feature system 16 MIT IAP Computational Linguistics Fest, 1/14/2005
3 Probe-Goal Parser: Selection 1 • recipe • select drives 3 Spec start(c) derivation pick eT headed by c C Comp 2 from input (or M) – left-to-right Move M fill Spec, run agree(P,M) fill Head, update P Probe P fill Comp (c select c’, recurse) • memory elements – MoveBox (M) • example • emptied in accordance with theta theory • filled from input – ProbeBox (P) • current probe 17 MIT IAP Computational Linguistics Fest, 1/14/2005
3 Probe-Goal Parser: Selection 1 • recipe • select drives 3 Spec start(c) derivation pick eT headed by c C Comp 2 from input (or M) – left-to-right Move M fill Spec, run agree(P,M) fill Head, update P Probe P fill Comp (c select c’, recurse) • memory elements – MoveBox (M) • example • emptied in accordance with theta theory • filled from input – ProbeBox (P) • current probe agree • note φ -features → probe case → goal – extends derivation to the right 18 MIT IAP Computational Linguistics Fest, 1/14/2005 • similar to Phillips (1995)
Recommend
More recommend