distributional learning of context free grammars
play

Distributional Learning of Context-Free Grammars. Alexander Clark - PowerPoint PPT Presentation

Distributional Learning of Context-Free Grammars. Alexander Clark Department of Philosophy Kings College London alexander.clark@kcl.ac.uk 14 November 2018 UCL Outline Introduction Weak Learning Strong Learning An Algebraic Theory of


  1. Distributional Learning of Context-Free Grammars. Alexander Clark Department of Philosophy King’s College London alexander.clark@kcl.ac.uk 14 November 2018 UCL

  2. Outline Introduction Weak Learning Strong Learning An Algebraic Theory of CFGs

  3. Outline Introduction Weak Learning Strong Learning An Algebraic Theory of CFGs

  4. Machine learning Standard machine learning problem We learn a function f : X → Y from a sequence of input-output pairs � ( x 1 , y 1 ) . . . ( x n , y n ) � Convergence As n → ∞ we want our hypothesis ˆ f to tend to f Ideally we want ˆ f = f .

  5. Vector spaces Standard two assumptions 1. Assume sets have some algebraic structure: ◮ X is R n ◮ Y is R 2. Assume f satisfies some smoothness assumptions: ◮ f is linear ◮ or satisfies some Lipschitz condition: | f ( x i ) − f ( x j | ≤ c | x i − x j |

  6. ◮ The input examples are strings. ◮ No output (unsupervised learning!) ◮ Our representations are context-free grammars.

  7. Context-Free Grammars Context-Free Grammar G = � Σ , V , S , P � L ( G , A ) = { w ∈ Σ ∗ | A ∗ ⇒ G w } Example Σ = { a , b } , V = { S } P = { S → ab , S → aSb , S → ǫ } L ( G , S ) = { a n b n | n ≥ 0 }

  8. Least fixed point semantics [Ginsburg and Rice(1962)] Interpret this as a set of equations in P (Σ ∗ ) S = ( a ◦ b ) ∨ ( a ◦ S ◦ b ) ∨ ǫ

  9. Least fixed point semantics [Ginsburg and Rice(1962)] Interpret this as a set of equations in P (Σ ∗ ) S = ( a ◦ b ) ∨ ( a ◦ S ◦ b ) ∨ ǫ ◮ Ξ is the set of functions V → P (Σ ∗ ) ◮ Φ G : Ξ → Ξ Φ G ( ξ )[ S ] = ( a ◦ b ) ∨ ( a ◦ ξ ( S ) ◦ b ) ∨ ǫ n Φ n Least fixed point ξ G = � G ( ξ ⊥ ) = { S → L ( G , S ) }

  10. What Algebra? Monoid: � S , ◦ , 1 � Σ ∗

  11. What Algebra? Monoid: � S , ◦ , 1 � Σ ∗ Complete Idempotent Semiring: � S , ◦ , 1 , ∨ , ⊥� P (Σ ∗ )

  12. Outline Introduction Weak Learning Strong Learning An Algebraic Theory of CFGs

  13. Running example Propositional logic Alphabet rain, snow, hot, cold, danger A 1 , A 2 , . . . and, or, implies, iff ∧ , ∨ , → , ↔ ¬ not open, close ( , )

  14. Running example Propositional logic Alphabet rain, snow, hot, cold, danger A 1 , A 2 , . . . and, or, implies, iff ∧ , ∨ , → , ↔ ¬ not open, close ( , ) ◮ rain ◮ open snow implies cold close ◮ open snow implies open not hot close close

  15. Distributional Learning [Harris(1964)] ◮ Look at the dog ◮ Look at the cat

  16. Distributional Learning [Harris(1964)] ◮ Look at the dog ◮ Look at the cat ◮ That cat is crazy

  17. Distributional Learning [Harris(1964)] ◮ Look at the dog ◮ Look at the cat ◮ That cat is crazy ◮ That dog is crazy

  18. English counterexample ◮ I can swim ◮ I may swim ◮ I want a can of beer

  19. English counterexample ◮ I can swim ◮ I may swim ◮ I want a can of beer ◮ *I want a may of beer

  20. English counterexample ◮ She is Italian ◮ She is a philosopher ◮ She is an Italian philosopher

  21. English counterexample ◮ She is Italian ◮ She is a philosopher ◮ She is an Italian philosopher ◮ *She is an a philosopher philosopher

  22. Logic example Propositional logic is substitutable : ◮ open rain and cold close ◮ open rain implies cold close ◮ open snow implies open not hot close

  23. Logic example Propositional logic is substitutable : ◮ open rain and cold close ◮ open rain implies cold close ◮ open snow implies open not hot close ◮ open snow and open not hot close

  24. Formally The Syntactic Congruence: a monoid congruence Two nonempty strings u , v are congruent ( u ≡ L v ) if for all l , r ∈ Σ ∗ lur ∈ L ⇔ lvr ∈ L We write [ u ] for the congruence class of u . Definition L is substitutable if lur ∈ L , lvr ∈ L ⇒ u ≡ L v

  25. Example Input data D ⊆ L ◮ hot ◮ cold ◮ open hot or cold close ◮ open not hot close ◮ open hot and cold close ◮ open hot implies cold close ◮ open hot iff cold close ◮ danger ◮ rain ◮ snow

  26. One production for each example ◮ S → hot ◮ S → cold ◮ S → open hot or cold close ◮ S → open not hot close ◮ S → open hot and cold close ◮ S → open hot implies cold close ◮ S → open hot iff cold close ◮ S → danger ◮ S → rain ◮ S → snow

  27. A trivial grammar Input data D D = { w 1 , w 2 , . . . , w n } are nonempty strings. Starting grammar S → w 1 , S → w 2 , . . . , S → w n L ( G ) = D

  28. A trivial grammar Input data D D = { w 1 , w 2 , . . . , w n } are nonempty strings. Starting grammar S → w 1 , S → w 2 , . . . , S → w n L ( G ) = D Binarise this every way One nonterminal [[ w ]] for every substring w . ◮ [[ a ]] → a ◮ S → { w } , w ∈ D ◮ [[ w ]] → [[ u ]][[ v ]] when w = u · v L ( G , [[ w ]]) = { w }

  29. S [[ open not hot close ]] [[ open not ]] [[ hot close ]] [[ open ]] [[ not ]] [[ hot ]] [[ close ]] open not hot close

  30. Nonterminal for each substring iff cold and cold implies cold or cold not hot close hot iff cold close hot and cold close close open not hot open hot or cold hot implies cold close open hot and cold open hot or cold close open hot implies cold open hot iff cold iff and implies open not hot close open hot or cold close open hot and or hot danger open hot and cold close open hot implies open hot iff open hot open hot implies cold close cold open hot or snow open hot iff cold close cold close and cold close rain implies cold close iff cold close not hot hot or or cold close hot and cold hot and hot implies cold hot iff cold hot implies open not hot iff hot or cold hot close not

  31. Nonterminal for each cluster iff cold and cold implies cold or cold not hot close hot iff cold close close hot and cold close open not hot open hot or cold hot implies cold close open hot and cold open hot or cold close open hot implies cold open hot iff cold iff and implies open not hot close open hot or cold close open hot and or hot danger open hot and cold close open hot implies open hot iff open hot open hot implies cold close cold open hot or snow open hot iff cold close cold close and cold close rain implies cold close iff cold close not hot hot or or cold close hot and cold hot and hot implies cold hot iff cold hot implies open not hot iff hot or cold hot close not

  32. Productions Observation If w = u · v then [ w ] ⊇ [ u ] · [ v ]

  33. Productions Observation If w = u · v then [ w ] ⊇ [ u ] · [ v ] Add production [[ w ]] → [[ u ]][[ v ]]

  34. Productions Observation If w = u · v then [ w ] ⊇ [ u ] · [ v ] Add production [[ w ]] → [[ u ]][[ v ]] Consequence If L is substitutable, then L ( G , [[ w ]]) ⊆ [ w ] L ( G ) ⊆ L

  35. Theorem [Clark and Eyraud(2007)] ◮ If the language is a substitutable context-free language, then the hypothesis grammar will converge to a correct grammar. ◮ Efficient; provably correct

  36. Theorem [Clark and Eyraud(2007)] ◮ If the language is a substitutable context-free language, then the hypothesis grammar will converge to a correct grammar. ◮ Efficient; provably correct But the grammar may be different for each input data set!

  37. open not hot close NT 11 NT 11 NT 9 NT 5 NT 9 NT 5 NT 13 NT 2 close NT 0 NT 11 close open NT 15 NT 11 NT 13 NT 15 hot not hot open not NT 11 NT 11 NT 11 NT 0 NT 8 NT 13 NT 10 NT 13 NT 10 NT 13 NT 15 NT 11 NT 5 open NT 15 NT 8 open NT 2 NT 5 open not hot close not NT 11 NT 5 NT 15 NT 11 close hot close not hot

Recommend


More recommend