Distributional Learning of Context-Free Grammars. Alexander Clark Department of Philosophy King’s College London alexander.clark@kcl.ac.uk 14 November 2018 UCL
Outline Introduction Weak Learning Strong Learning An Algebraic Theory of CFGs
Outline Introduction Weak Learning Strong Learning An Algebraic Theory of CFGs
Machine learning Standard machine learning problem We learn a function f : X → Y from a sequence of input-output pairs � ( x 1 , y 1 ) . . . ( x n , y n ) � Convergence As n → ∞ we want our hypothesis ˆ f to tend to f Ideally we want ˆ f = f .
Vector spaces Standard two assumptions 1. Assume sets have some algebraic structure: ◮ X is R n ◮ Y is R 2. Assume f satisfies some smoothness assumptions: ◮ f is linear ◮ or satisfies some Lipschitz condition: | f ( x i ) − f ( x j | ≤ c | x i − x j |
◮ The input examples are strings. ◮ No output (unsupervised learning!) ◮ Our representations are context-free grammars.
Context-Free Grammars Context-Free Grammar G = � Σ , V , S , P � L ( G , A ) = { w ∈ Σ ∗ | A ∗ ⇒ G w } Example Σ = { a , b } , V = { S } P = { S → ab , S → aSb , S → ǫ } L ( G , S ) = { a n b n | n ≥ 0 }
Least fixed point semantics [Ginsburg and Rice(1962)] Interpret this as a set of equations in P (Σ ∗ ) S = ( a ◦ b ) ∨ ( a ◦ S ◦ b ) ∨ ǫ
Least fixed point semantics [Ginsburg and Rice(1962)] Interpret this as a set of equations in P (Σ ∗ ) S = ( a ◦ b ) ∨ ( a ◦ S ◦ b ) ∨ ǫ ◮ Ξ is the set of functions V → P (Σ ∗ ) ◮ Φ G : Ξ → Ξ Φ G ( ξ )[ S ] = ( a ◦ b ) ∨ ( a ◦ ξ ( S ) ◦ b ) ∨ ǫ n Φ n Least fixed point ξ G = � G ( ξ ⊥ ) = { S → L ( G , S ) }
What Algebra? Monoid: � S , ◦ , 1 � Σ ∗
What Algebra? Monoid: � S , ◦ , 1 � Σ ∗ Complete Idempotent Semiring: � S , ◦ , 1 , ∨ , ⊥� P (Σ ∗ )
Outline Introduction Weak Learning Strong Learning An Algebraic Theory of CFGs
Running example Propositional logic Alphabet rain, snow, hot, cold, danger A 1 , A 2 , . . . and, or, implies, iff ∧ , ∨ , → , ↔ ¬ not open, close ( , )
Running example Propositional logic Alphabet rain, snow, hot, cold, danger A 1 , A 2 , . . . and, or, implies, iff ∧ , ∨ , → , ↔ ¬ not open, close ( , ) ◮ rain ◮ open snow implies cold close ◮ open snow implies open not hot close close
Distributional Learning [Harris(1964)] ◮ Look at the dog ◮ Look at the cat
Distributional Learning [Harris(1964)] ◮ Look at the dog ◮ Look at the cat ◮ That cat is crazy
Distributional Learning [Harris(1964)] ◮ Look at the dog ◮ Look at the cat ◮ That cat is crazy ◮ That dog is crazy
English counterexample ◮ I can swim ◮ I may swim ◮ I want a can of beer
English counterexample ◮ I can swim ◮ I may swim ◮ I want a can of beer ◮ *I want a may of beer
English counterexample ◮ She is Italian ◮ She is a philosopher ◮ She is an Italian philosopher
English counterexample ◮ She is Italian ◮ She is a philosopher ◮ She is an Italian philosopher ◮ *She is an a philosopher philosopher
Logic example Propositional logic is substitutable : ◮ open rain and cold close ◮ open rain implies cold close ◮ open snow implies open not hot close
Logic example Propositional logic is substitutable : ◮ open rain and cold close ◮ open rain implies cold close ◮ open snow implies open not hot close ◮ open snow and open not hot close
Formally The Syntactic Congruence: a monoid congruence Two nonempty strings u , v are congruent ( u ≡ L v ) if for all l , r ∈ Σ ∗ lur ∈ L ⇔ lvr ∈ L We write [ u ] for the congruence class of u . Definition L is substitutable if lur ∈ L , lvr ∈ L ⇒ u ≡ L v
Example Input data D ⊆ L ◮ hot ◮ cold ◮ open hot or cold close ◮ open not hot close ◮ open hot and cold close ◮ open hot implies cold close ◮ open hot iff cold close ◮ danger ◮ rain ◮ snow
One production for each example ◮ S → hot ◮ S → cold ◮ S → open hot or cold close ◮ S → open not hot close ◮ S → open hot and cold close ◮ S → open hot implies cold close ◮ S → open hot iff cold close ◮ S → danger ◮ S → rain ◮ S → snow
A trivial grammar Input data D D = { w 1 , w 2 , . . . , w n } are nonempty strings. Starting grammar S → w 1 , S → w 2 , . . . , S → w n L ( G ) = D
A trivial grammar Input data D D = { w 1 , w 2 , . . . , w n } are nonempty strings. Starting grammar S → w 1 , S → w 2 , . . . , S → w n L ( G ) = D Binarise this every way One nonterminal [[ w ]] for every substring w . ◮ [[ a ]] → a ◮ S → { w } , w ∈ D ◮ [[ w ]] → [[ u ]][[ v ]] when w = u · v L ( G , [[ w ]]) = { w }
S [[ open not hot close ]] [[ open not ]] [[ hot close ]] [[ open ]] [[ not ]] [[ hot ]] [[ close ]] open not hot close
Nonterminal for each substring iff cold and cold implies cold or cold not hot close hot iff cold close hot and cold close close open not hot open hot or cold hot implies cold close open hot and cold open hot or cold close open hot implies cold open hot iff cold iff and implies open not hot close open hot or cold close open hot and or hot danger open hot and cold close open hot implies open hot iff open hot open hot implies cold close cold open hot or snow open hot iff cold close cold close and cold close rain implies cold close iff cold close not hot hot or or cold close hot and cold hot and hot implies cold hot iff cold hot implies open not hot iff hot or cold hot close not
Nonterminal for each cluster iff cold and cold implies cold or cold not hot close hot iff cold close close hot and cold close open not hot open hot or cold hot implies cold close open hot and cold open hot or cold close open hot implies cold open hot iff cold iff and implies open not hot close open hot or cold close open hot and or hot danger open hot and cold close open hot implies open hot iff open hot open hot implies cold close cold open hot or snow open hot iff cold close cold close and cold close rain implies cold close iff cold close not hot hot or or cold close hot and cold hot and hot implies cold hot iff cold hot implies open not hot iff hot or cold hot close not
Productions Observation If w = u · v then [ w ] ⊇ [ u ] · [ v ]
Productions Observation If w = u · v then [ w ] ⊇ [ u ] · [ v ] Add production [[ w ]] → [[ u ]][[ v ]]
Productions Observation If w = u · v then [ w ] ⊇ [ u ] · [ v ] Add production [[ w ]] → [[ u ]][[ v ]] Consequence If L is substitutable, then L ( G , [[ w ]]) ⊆ [ w ] L ( G ) ⊆ L
Theorem [Clark and Eyraud(2007)] ◮ If the language is a substitutable context-free language, then the hypothesis grammar will converge to a correct grammar. ◮ Efficient; provably correct
Theorem [Clark and Eyraud(2007)] ◮ If the language is a substitutable context-free language, then the hypothesis grammar will converge to a correct grammar. ◮ Efficient; provably correct But the grammar may be different for each input data set!
open not hot close NT 11 NT 11 NT 9 NT 5 NT 9 NT 5 NT 13 NT 2 close NT 0 NT 11 close open NT 15 NT 11 NT 13 NT 15 hot not hot open not NT 11 NT 11 NT 11 NT 0 NT 8 NT 13 NT 10 NT 13 NT 10 NT 13 NT 15 NT 11 NT 5 open NT 15 NT 8 open NT 2 NT 5 open not hot close not NT 11 NT 5 NT 15 NT 11 close hot close not hot
Recommend
More recommend