Distributional Learning of Context-Free Grammars. Alexander Clark - PowerPoint PPT Presentation

Distributional Learning of Context-Free Grammars. Alexander Clark Department of Philosophy King’s College London alexander.clark@kcl.ac.uk 14 November 2018 UCL

Outline Introduction Weak Learning Strong Learning An Algebraic Theory of CFGs

Machine learning Standard machine learning problem We learn a function f : X → Y from a sequence of input-output pairs � ( x 1 , y 1 ) . . . ( x n , y n ) � Convergence As n → ∞ we want our hypothesis ˆ f to tend to f Ideally we want ˆ f = f .

Vector spaces Standard two assumptions 1. Assume sets have some algebraic structure: ◮ X is R n ◮ Y is R 2. Assume f satisfies some smoothness assumptions: ◮ f is linear ◮ or satisfies some Lipschitz condition: | f ( x i ) − f ( x j | ≤ c | x i − x j |

◮ The input examples are strings. ◮ No output (unsupervised learning!) ◮ Our representations are context-free grammars.

Context-Free Grammars Context-Free Grammar G = � Σ , V , S , P � L ( G , A ) = { w ∈ Σ ∗ | A ∗ ⇒ G w } Example Σ = { a , b } , V = { S } P = { S → ab , S → aSb , S → ǫ } L ( G , S ) = { a n b n | n ≥ 0 }

Least fixed point semantics [Ginsburg and Rice(1962)] Interpret this as a set of equations in P (Σ ∗ ) S = ( a ◦ b ) ∨ ( a ◦ S ◦ b ) ∨ ǫ

Least fixed point semantics [Ginsburg and Rice(1962)] Interpret this as a set of equations in P (Σ ∗ ) S = ( a ◦ b ) ∨ ( a ◦ S ◦ b ) ∨ ǫ ◮ Ξ is the set of functions V → P (Σ ∗ ) ◮ Φ G : Ξ → Ξ Φ G ( ξ )[ S ] = ( a ◦ b ) ∨ ( a ◦ ξ ( S ) ◦ b ) ∨ ǫ n Φ n Least fixed point ξ G = � G ( ξ ⊥ ) = { S → L ( G , S ) }

What Algebra? Monoid: � S , ◦ , 1 � Σ ∗

What Algebra? Monoid: � S , ◦ , 1 � Σ ∗ Complete Idempotent Semiring: � S , ◦ , 1 , ∨ , ⊥� P (Σ ∗ )

Outline Introduction Weak Learning Strong Learning An Algebraic Theory of CFGs

Running example Propositional logic Alphabet rain, snow, hot, cold, danger A 1 , A 2 , . . . and, or, implies, iff ∧ , ∨ , → , ↔ ¬ not open, close ( , )

Running example Propositional logic Alphabet rain, snow, hot, cold, danger A 1 , A 2 , . . . and, or, implies, iff ∧ , ∨ , → , ↔ ¬ not open, close ( , ) ◮ rain ◮ open snow implies cold close ◮ open snow implies open not hot close close

Distributional Learning [Harris(1964)] ◮ Look at the dog ◮ Look at the cat

Distributional Learning [Harris(1964)] ◮ Look at the dog ◮ Look at the cat ◮ That cat is crazy

Distributional Learning [Harris(1964)] ◮ Look at the dog ◮ Look at the cat ◮ That cat is crazy ◮ That dog is crazy

English counterexample ◮ I can swim ◮ I may swim ◮ I want a can of beer

English counterexample ◮ I can swim ◮ I may swim ◮ I want a can of beer ◮ *I want a may of beer

English counterexample ◮ She is Italian ◮ She is a philosopher ◮ She is an Italian philosopher

English counterexample ◮ She is Italian ◮ She is a philosopher ◮ She is an Italian philosopher ◮ *She is an a philosopher philosopher

Logic example Propositional logic is substitutable : ◮ open rain and cold close ◮ open rain implies cold close ◮ open snow implies open not hot close

Logic example Propositional logic is substitutable : ◮ open rain and cold close ◮ open rain implies cold close ◮ open snow implies open not hot close ◮ open snow and open not hot close

Formally The Syntactic Congruence: a monoid congruence Two nonempty strings u , v are congruent ( u ≡ L v ) if for all l , r ∈ Σ ∗ lur ∈ L ⇔ lvr ∈ L We write [ u ] for the congruence class of u . Definition L is substitutable if lur ∈ L , lvr ∈ L ⇒ u ≡ L v

Example Input data D ⊆ L ◮ hot ◮ cold ◮ open hot or cold close ◮ open not hot close ◮ open hot and cold close ◮ open hot implies cold close ◮ open hot iff cold close ◮ danger ◮ rain ◮ snow

One production for each example ◮ S → hot ◮ S → cold ◮ S → open hot or cold close ◮ S → open not hot close ◮ S → open hot and cold close ◮ S → open hot implies cold close ◮ S → open hot iff cold close ◮ S → danger ◮ S → rain ◮ S → snow

A trivial grammar Input data D D = { w 1 , w 2 , . . . , w n } are nonempty strings. Starting grammar S → w 1 , S → w 2 , . . . , S → w n L ( G ) = D

A trivial grammar Input data D D = { w 1 , w 2 , . . . , w n } are nonempty strings. Starting grammar S → w 1 , S → w 2 , . . . , S → w n L ( G ) = D Binarise this every way One nonterminal [[ w ]] for every substring w . ◮ [[ a ]] → a ◮ S → { w } , w ∈ D ◮ [[ w ]] → [[ u ]][[ v ]] when w = u · v L ( G , [[ w ]]) = { w }

S [[ open not hot close ]] [[ open not ]] [[ hot close ]] [[ open ]] [[ not ]] [[ hot ]] [[ close ]] open not hot close

Nonterminal for each substring iff cold and cold implies cold or cold not hot close hot iff cold close hot and cold close close open not hot open hot or cold hot implies cold close open hot and cold open hot or cold close open hot implies cold open hot iff cold iff and implies open not hot close open hot or cold close open hot and or hot danger open hot and cold close open hot implies open hot iff open hot open hot implies cold close cold open hot or snow open hot iff cold close cold close and cold close rain implies cold close iff cold close not hot hot or or cold close hot and cold hot and hot implies cold hot iff cold hot implies open not hot iff hot or cold hot close not

Nonterminal for each cluster iff cold and cold implies cold or cold not hot close hot iff cold close close hot and cold close open not hot open hot or cold hot implies cold close open hot and cold open hot or cold close open hot implies cold open hot iff cold iff and implies open not hot close open hot or cold close open hot and or hot danger open hot and cold close open hot implies open hot iff open hot open hot implies cold close cold open hot or snow open hot iff cold close cold close and cold close rain implies cold close iff cold close not hot hot or or cold close hot and cold hot and hot implies cold hot iff cold hot implies open not hot iff hot or cold hot close not

Productions Observation If w = u · v then [ w ] ⊇ [ u ] · [ v ]

Productions Observation If w = u · v then [ w ] ⊇ [ u ] · [ v ] Add production [[ w ]] → [[ u ]][[ v ]]

Productions Observation If w = u · v then [ w ] ⊇ [ u ] · [ v ] Add production [[ w ]] → [[ u ]][[ v ]] Consequence If L is substitutable, then L ( G , [[ w ]]) ⊆ [ w ] L ( G ) ⊆ L

Theorem [Clark and Eyraud(2007)] ◮ If the language is a substitutable context-free language, then the hypothesis grammar will converge to a correct grammar. ◮ Efficient; provably correct

Theorem [Clark and Eyraud(2007)] ◮ If the language is a substitutable context-free language, then the hypothesis grammar will converge to a correct grammar. ◮ Efficient; provably correct But the grammar may be different for each input data set!

open not hot close NT 11 NT 11 NT 9 NT 5 NT 9 NT 5 NT 13 NT 2 close NT 0 NT 11 close open NT 15 NT 11 NT 13 NT 15 hot not hot open not NT 11 NT 11 NT 11 NT 0 NT 8 NT 13 NT 10 NT 13 NT 10 NT 13 NT 15 NT 11 NT 5 open NT 15 NT 8 open NT 2 NT 5 open not hot close not NT 11 NT 5 NT 15 NT 11 close hot close not hot

Distributional Learning of Context-Free Grammars. Alexander Clark - PowerPoint PPT Presentation

Distributional Learning of Context-Free Grammars. Alexander Clark Department of Philosophy Kings College London alexander.clark@kcl.ac.uk 14 November 2018 UCL Outline Introduction Weak Learning Strong Learning An Algebraic Theory of

Derivations Derivations Informatics 2A: Lecture 4 Tree Diagrams Non-Equivalent Derivations

Derivations Derivations Informatics 2A: Lecture 4 Tree Diagrams Non-Equivalent Derivations

Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Berlin Chen Graduate

Probabilistic Context-Free Grammars Zipfs Law Informatics 2A: Lecture 19 2 Probabilistic

Context-Free Grammars and Languages Context-Free Grammars and Languages p.1/40

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Probabilistic Context-Free Grammars Informatics 2A: Lecture 18 Bonnie Webber and Frank Keller

Context Sensitivity Example of a CSG Informatics 2A: Lecture 26 2 Context in Programming

Parsing: Introduction Context-free Grammars Chomsky hierarchy Type 0 Grammars/Languages

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

3. Parsing 3.1 Context-Free Grammars and Push-Down Automata 3.2 Recursive Descent Parsing 3.3

Syntax: Context-free Grammars Ling 571 Deep Processing Techniques for NLP January 9, 2017

Syntax: Context-Free Grammars LING 571 Deep Processing Techniques for NLP Sept 30, 2019

Context-free grammars Informatics 2A: Lecture 8 Alex Simpson School of Informatics University

Syntax: Context-free Grammars Ling 571 Deep Processing Techniques for NLP January 6, 2016

Lean in new research Neil Strickland January 7, 2020 Using Lean as a tool for new research I

Weighted Finite State Transducer (WFST) Efficient algorithms for various operations. Weights

Marginal Consistency: Unifying Constraint Propagation on Commutative Semirings Tom a s

Quantifiers on languages and codensity monads Luca Reggio Joint work with Mai Gehrke and Daniela

mathlib : Leans mathematical library Johannes Hlzl VU Amsterdam Lean Together 2019

Coinduction in concurrent timed systems Jan Komenda Institute of Mathematics, Czech Academy of

FLABloM: Functional linear algebra with block matrices Adam Sandberg Eriksson Patrik Jansson

Formulas for various domination numbers of products of paths and cycles Janez c 1 Zerovnik 1 ,