formal models of language
play

Formal Models of Language Paula Buttery Dept of Computer Science - PowerPoint PPT Presentation

Formal Models of Language Paula Buttery Dept of Computer Science & Technology, University of Cambridge Paula Buttery (Computer Lab) Formal Models of Language 1 / 31 Regular grammars give us linear trees guard start S the A B girl


  1. Formal Models of Language Paula Buttery Dept of Computer Science & Technology, University of Cambridge Paula Buttery (Computer Lab) Formal Models of Language 1 / 31

  2. Regular grammars give us linear trees guard start S the A B girl chases G = ( N , Σ , S , P ) where P = rabbit q 5 { A → aA , A → a | A ∈ N , a ∈ Σ } C the D girl S - N = { S , A , B , C , D , q 5 } - Σ = { the , girl , guard , ... } the A - S = S girl B - P = { S → the A , A → guard B | girl B , chases C B → chases C , the D C → the D , D → girl | rabbit } rabbit Paula Buttery (Computer Lab) Formal Models of Language 2 / 31

  3. Context-free grammars Context-free grammars capture phrase structure S NP VP G = ( N , Σ , S , P ) where N VP PP P = { A → α | A ∈ N , α ∈ ( N ∪ Σ) ∗ } alice V NP P NP A brief excursion into plays N with N linguistic terminology... croquet A N pink flamingos Paula Buttery (Computer Lab) Formal Models of Language 3 / 31

  4. Context-free grammars Context-free grammars capture phrase structure S NP VP When modelling natural language, linguists label the non-terminal symbols N VP PP with names that encode the most influential word alice V NP P NP in the phrase. They call this influential word the plays N with N head . - noun phrases, NP , croquet A N have a head noun pink flamingos Paula Buttery (Computer Lab) Formal Models of Language 4 / 31

  5. Context-free grammars Context-free grammars capture phrase structure S NP VP N VP PP - verb phrases, VP , alice V NP P NP have a head verb plays N with N croquet A N pink flamingos Paula Buttery (Computer Lab) Formal Models of Language 5 / 31

  6. Context-free grammars Context-free grammars capture phrase structure S NP VP N VP PP - prepositional phrases, alice V NP P NP PP , have a head preposition plays N with N croquet A N pink flamingos Paula Buttery (Computer Lab) Formal Models of Language 6 / 31

  7. Context-free grammars Context-free grammars capture phrase structure S NP VP N VP PP - the head of the alice V NP P NP whole string, S , is always the main verb plays N with N croquet A N pink flamingos Paula Buttery (Computer Lab) Formal Models of Language 7 / 31

  8. Context-free grammars Context-free grammars capture phrase structure S NP VP Trees below nodes of the same type are N VP PP interchangeable to yield another string in the alice V NP P NP language: - NP → N plays N with N - N → A N - N → alice | croquet | ... croquet A N pink flamingos Paula Buttery (Computer Lab) Formal Models of Language 8 / 31

  9. Context-free grammars Context-free grammars capture phrase structure S NP VP Trees below nodes of the same type are N VP PP interchangeable to yield another string in the croquet V NP P NP language: - NP → N plays N with N - N → A N - N → alice | croquet | ... A N alice pink flamingos Paula Buttery (Computer Lab) Formal Models of Language 9 / 31

  10. Context-free grammars CFGs are often written in Chomsky Normal Form Chomsky normal form : every production rule has the form, A → BC , or, A → a where A , B , C ∈ N , and, a ∈ Σ. Conversion to Chomsky Normal Form For every CFG there is a weakly equivalent CNF alternative. A → BCD may be rewritten as the two rules, A → BX , and, X → CD . A A B X B C D C D CNF is a requirement for some parsing algorithms. Paula Buttery (Computer Lab) Formal Models of Language 10 / 31

  11. Push down automata Context-free languages are accepted by push down automata A PDA is defined as M = ( Q , Σ , Γ , ∆ , s , ⊥ , F ) where: Q = { q 0 , q 1 , q 2 ... } is a finite set of states. Σ is the input alphabet. Γ is the stack alphabet. ∆ ⊆ ( Q × (Σ ∪ ǫ ) × Γ) × ( Q × Γ ∗ ) is a relation ( Q × (Σ ∪ ǫ ) × Γ) → ( Q × Γ ∗ ) which we write as δ . Given q ∈ Q , i ∈ Σ and A ∈ Γ then δ ( q , i , A ) returns ( q ′ , α ), that is, a new state q ′ ∈ Q and replaces A at the top of the stack with α ∈ Γ ∗ s is the starting state ⊥ is the initial stack symbol F is the set of all end states Paula Buttery (Computer Lab) Formal Models of Language 11 / 31

  12. Push down automata Moving from one state to the next we may push or pop in state q x on encountering transition symbol a transition to state q y popping A from the top of the stack and pushing B onto the stack a : A / B BEFORE AFTER q y q x A B z 0 z 0 in state q x transition to state q y pushing A onto the stack ǫ : ǫ/ A BEFORE AFTER q x q y z 0 A z 0 in state q x transition to state q y popping A from the stack ǫ : A /ǫ BEFORE AFTER q x q y A z 0 z 0 Paula Buttery (Computer Lab) Formal Models of Language 12 / 31

  13. Push down automata A toy context-free grammar S → S NP VP NP → Pron NP VP NP → Det N VP → V VP → V NP Det N V NP Det → { a, the } N → { maw, noggin, ... } the maw eats Pron → { he, she, him, her } Pron V → { eats, sings } him Paula Buttery (Computer Lab) Formal Models of Language 13 / 31

  14. Push down automata Recognising a string with a push down automaton S → NP VP Det → { a,the } NP → Pron N → { maw, noggin, ... } NP → Det N Pron → { he, him, her } VP → V V → { eats, sings } VP → V NP ǫ : NP / NP ǫ : ǫ/ Det a , the : Det /ǫ eats , sings : V /ǫ ǫ : ǫ/ V q 3 q 5 q 6 q 8 q 9 q 10 start ǫ : VP / NP ǫ : NP / N ǫ : ǫ/ VP ǫ : ǫ/ NP ǫ : VP / V maw , noggin : N /ǫ q 0 q 1 q 2 q 7 ǫ : z 0 / z 0 ǫ : NP / Pron ǫ : z 0 / z 0 he , she : Pron /ǫ q 4 q 11 Paula Buttery (Computer Lab) Formal Models of Language 14 / 31

  15. Push down automata Is ‘the maw eats him’ a string in the language? the q 0 z 0 the q 0 - q 1 VP z 0 the q 1 - q 2 NP VP z 0 S → NP VP the q 2 - q 3 N VP z 0 NP → Pron the q 3 - q 5 Det N VP z 0 NP → Det N maw q 5 - q 6 N VP z 0 VP → V eats q 6 - q 7 VP z 0 VP → V NP eats q 7 - q 8 NP z 0 Det → { a,the } eats q 8 - q 9 V NP z 0 N → { maw, noggin, ... } him q 9 - q 10 NP z 0 Pron → { he, him, her } him q 10 - q 2 NP z 0 V → { eats, sings } him q 2 - q 4 Pron z 0 him q 4 - q 7 z 0 ǫ q 7 - q 11 z 0 ǫ : NP / NP ǫ : ǫ/ Det a , the : Det /ǫ eats , sings : V /ǫ ǫ : ǫ/ V q 3 q 5 q 6 q 8 q 9 q 10 start ǫ : VP / NP ǫ : NP / N ǫ : ǫ/ VP ǫ : ǫ/ NP ǫ : VP / V maw , noggin : N /ǫ q 0 q 1 q 2 q 7 ǫ : z 0 / z 0 ǫ : NP / Pron ǫ : z 0 / z 0 he , him : Pron /ǫ ”the maw eats him” q 4 q 11 Paula Buttery (Computer Lab) Formal Models of Language 15 / 31

  16. Push down automata Can context-free grammars model natural language? Cross Serial Dependencies A small number of languages exhibit strings of the form ... ... noun 1 noun 2 noun n verb 1 verb 2 verb n Zurich dialect of Swiss German mer d’chind em Hans es huus haend wele laa h¨ alfe aastriiche. we the children Hans the house have wanted to let help paint. we have wanted to let the children help Hans paint the house Such expressions, i.e. of the form / a n b m c n d m / , may not be derivable by a context-free grammar. mer d’chind n em Hans m es huus haend wele laa n h¨ alfe m aastriiche. → / wa n b m xc n d m y / Paula Buttery (Computer Lab) Formal Models of Language 16 / 31

  17. Push down automata Use the pumping lemma to prove not context-free The pumping lemma for context-free languages (CFLs) is used to show that a language is not context-free. The pumping lemma property for CFLs is: All w ∈ L with | w | ≥ k can be expressed as a concatenation of five strings, w = u 1 yu 2 zu 3 , where u 1 , y , u 2 , z and u 2 satisfy: | yz | ≥ 1 (i.e. we cannot have y = ǫ and z = ǫ ) | yu 2 z | ≤ k for all n ≥ 0, u 1 y n u 2 z n u 3 ∈ L (i.e. u 1 u 2 u 3 ∈ L , u 1 yu 2 zu 3 ∈ L , u 1 yyu 2 zzu 3 ∈ L etc.) To prove that Swiss German is not context-free, similar proof as for centre embeddings (last lecture). Except that you need to remember that: L reg 1 ∩ L cfg 1 = L cfg 2 Paula Buttery (Computer Lab) Formal Models of Language 17 / 31

  18. Mildly context-sensitive languages Are CSGs required to model natural languages? Remember the complexity of a language class was defined in terms of the recognition problem . Type Language Class Complexity machine 3 regular O ( n ) DFA O ( n c ) 2 context-free PDA O ( c n ) 1 context-sensitive LBA 0 recursively enumerable undecidable Turing - Modelling natural languages using context-sensitive grammars is very expensive. In practice we don’t have to because only very limited constructions are not captured by context-free grammars. - However, it is still fun to place a limit on the complexity of natural languages — we are not limited to discussing language classes only in terms of the Chomsky hierarchy. Paula Buttery (Computer Lab) Formal Models of Language 18 / 31

  19. Mildly context-sensitive languages We are not limited to the Chomsky hierarchy Regular Languages Context Free Languages Context Sensitive Languages Recursively Enumerable Languages Paula Buttery (Computer Lab) Formal Models of Language 19 / 31

  20. Mildly context-sensitive languages We are not limited to the Chomsky hierarchy Regular Languages Natural Languages Context Free Languages Context Sensitive Languages Recursively Enumerable Languages Paula Buttery (Computer Lab) Formal Models of Language 20 / 31

Recommend


More recommend