a bayesian approach to learning the structure of human
play

A Bayesian Approach to Learning the Structure of Human Languages - PowerPoint PPT Presentation

A Bayesian Approach to Learning the Structure of Human Languages Phil Blunsom University of Oxford 1/35 Grammar Induction ? The proposal would undermine effectiveness managers contend Grammar Induction research pursues two main aims:


  1. A Bayesian Approach to Learning the Structure of Human Languages Phil Blunsom University of Oxford 1/35

  2. Grammar Induction ? The proposal would undermine effectiveness managers contend Grammar Induction research pursues two main aims: • to produce testable models of human language acquisition, • to implement unsupervised parsers capable of reducing the reliance on annotated treebanks in Natural Language Processing. 2/35

  3. Language Acquisition ? The proposal would undermine effectiveness managers contend • The empirical success or otherwise of weak bias models of grammar induction impact on the viability of the Argument from the Poverty of the Stimulus. • This contrasts with the strong bias hypothesis of Universal Grammar. 3/35

  4. Machine Translation Grammar Induction for Machine Translation I wanted to read this book 4/35

  5. Machine Translation Learn the syntactic part-of-speech categories of words PRP Verb TO Verb DT Noun I wanted to read this book 4/35

  6. Machine Translation Learn the grammatical structure of the sentences PRP Verb TO Verb DT Noun I wanted to read this book 4/35

  7. Machine Translation Learn syntactic reorderings from Subject-Verb-Object to Subject-Object-Verb PRP Verb DT Noun TO Verb I wanted this book to read 4/35

  8. Machine Translation Learn to translate PRP Verb DT Noun TO Verb I wanted this book to read Ich wollte dieses Buch lesen 4/35

  9. Dependency Grammar Induction will IN be VB health NN threatened VBN Of IN the DT of IN if IN economy NN course NN the DT market NN the DT continues VBZ to TO dive VB week NN Formalism Dependency Grammar induction has provided one this DT of the most promising avenues for this research. 5/35

  10. Dependency Grammar Induction will IN be VB health NN threatened VBN Of IN the DT of IN if IN economy NN course NN the DT market NN the DT continues VBZ to TO dive VB week NN We induce two probabilistic models 1 A model of the syntactic part-of-speech this DT categories of the tokens (Noun, Verb, etc.), 2 A model of the dependency derivations of the text given these syntactic categories. 5/35

  11. Weak Bias: Power Laws 6/35

  12. Weak Bias: Pitman-Yor Process Priors In a Pitman-Yor Process (PYP) unigram language model words ( w 1 . . . w n ) are generated as follows: G | a , b , P 0 ∼ PYP ( a , b , P 0 ) w i | G ∼ G • G is a distribution over an infinite set of words, • P 0 is the probability that an word will be in the support of G , • a and b control the power-law behavior of the PYP . One way of understanding the predictions made by the PYP model is through the Chinese restaurant process (CRP) . . . 7/35

  13. The Chinese Restaurant Process the the n0=0 Customers (words) enter a restaurant and choose a table (from K + 1 tables) according to the distribution: 1 w k ( w ) × ( n −  k − a ) , 0 ≤ k < K  P ( z i = k | w i = w , z − ) ∝ ( Ka + b ) P 0 ( w ) , k = K + 1  8/35

  14. The Chinese Restaurant Process cats the cats n0=1 n1=0 Customers (words) enter a restaurant and choose a table (from K + 1 tables) according to the distribution: 1 w k ( w ) × ( n −  k − a ) , 0 ≤ k < K  P ( z i = k | w i = w , z − ) ∝ ( Ka + b ) P 0 ( w ) , k = K + 1  8/35

  15. The Chinese Restaurant Process cats the cats n0=1 n1=1 Customers (words) enter a restaurant and choose a table (from K + 1 tables) according to the distribution: 1 w k ( w ) × ( n −  k − a ) , 0 ≤ k < K  P ( z i = k | w i = w , z − ) ∝ ( Ka + b ) P 0 ( w ) , k = K + 1  8/35

  16. The Chinese Restaurant Process the the cats n0=1 n1=2 Customers (words) enter a restaurant and choose a table (from K + 1 tables) according to the distribution: 1 w k ( w ) × ( n −  k − a ) , 0 ≤ k < K  P ( z i = k | w i = w , z − ) ∝ ( Ka + b ) P 0 ( w ) , k = K + 1  8/35

  17. The Chinese Restaurant Process the the cats the n0=2 n1=2 n2=0 Customers (words) enter a restaurant and choose a table (from K + 1 tables) according to the distribution: 1 w k ( w ) × ( n −  k − a ) , 0 ≤ k < K  P ( z i = k | w i = w , z − ) ∝ ( Ka + b ) P 0 ( w ) , k = K + 1  8/35

  18. The Chinese Restaurant Process meow the cats the meow n0=2 n1=2 n2=1 n3=0 Customers (words) enter a restaurant and choose a table (from K + 1 tables) according to the distribution: 1 w k ( w ) × ( n −  k − a ) , 0 ≤ k < K  P ( z i = k | w i = w , z − ) ∝ ( Ka + b ) P 0 ( w ) , k = K + 1  8/35

  19. The Chinese Restaurant Process the the cats the meow n0=2 n1=2 n2=1 n3=1 The 7 th customer ‘ the ’ enters the restaurant and chooses a table from those already seating ‘ the ’, or opens a new table: P ( z 6 = 0 | w 6 = the , z − 6 ) ∝ 2 − a 8/35

  20. The Chinese Restaurant Process the the cats the meow n0=2 n1=2 n2=1 n3=1 The 7 th customer ‘ the ’ enters the restaurant and chooses a table from those already seating ‘ the ’, or opens a new table: P ( z 6 = 2 | w 6 = the , z − 6 ) ∝ 1 − a 8/35

  21. The Chinese Restaurant Process the the cats the meow the n0=2 n1=2 n2=1 n3=1 n4=0 The 7 th customer ‘ the ’ enters the restaurant and chooses a table from those already seating ‘ the ’, or opens a new table: P ( z 6 = 4 | w 6 = the , z − 6 ) ∝ ( 4 a + b ) P 0 ( the ) 8/35

  22. Outline Inducing the syntactic categories of words 1 Inducing the syntactic structure of sentences 2 Inducing the syntactic categories of words 9/35

  23. Unsupervised PoS Tagging 5 6 1 A simple example DT JJ NN A simple example Unsupervised part-of-speech tagging aims to learn a partitioning of tokens corresponding to syntactic equivalence classes. Inducing the syntactic categories of words 10/35

  24. Unsupervised PoS Tagging 5 6 1 A simple example DT JJ NN A simple example Previous research has followed two paradigms: • word class induction, popular for language modelling and Machine Translation. All tokens of a type must have the same class. • syntactic models, generally based on HMMs, allow multiple tags per type and evaluate against an annotated treebank. For both paradigms most models optimise the likelihood of the training corpus, though more recently Bayesian approaches have become popular. Inducing the syntactic categories of words 10/35

  25. A Hierarchical Pitman-Yor HMM Tri ij t 1 t 2 t 3 ... Em j w 1 w 2 w 3 t l | t l − 1 , t l − 2 , Tri ∼ Tri t l − 1 , t l − 2 w l | t l , Em ∼ Em t l . Inducing the syntactic categories of words 11/35

  26. A Hierarchical Pitman-Yor HMM Tri ij Bi j t 1 t 2 t 3 ... Em j w 1 w 2 w 3 Tri ij | a Tri , b Tri , Bi j ∼ PYP ( a Tri , b Tri , Bi j ) Em j | a Em , b Em , C ∼ PYP ( a Em , b Em , Uniform ) . Inducing the syntactic categories of words 11/35

  27. A Hierarchical Pitman-Yor HMM Tri ij Bi j Uni t 1 t 2 t 3 ... Em j w 1 w 2 w 3 Bi j | a Bi , b Bi , Uni ∼ PYP ( a Bi , b Bi , Uni ) Uni | a Uni , b Uni ∼ PYP ( a Uni , b Uni , Uniform ) Inducing the syntactic categories of words 11/35

  28. Unsupervised PoS Tagging We perform inference in this model using Gibbs sampling, an MCMC technique: • the tagging of one token, conditioned on all others, is considered at each sampling step • we employ a hierarchical Chinese Restaurant analogy in which trigrams are considered as customers sitting at restaurant tables Inducing the syntactic categories of words 12/35

  29. A Hierarchical Pitman-Yor HMM DT JJ ? A simple example ? . . . JJ NNP NNS JJ NN Tri (DT,JJ) : 1 2 3 4 5 Inducing the syntactic categories of words 13/35

  30. A Hierarchical Pitman-Yor HMM P Tri ( t l = NN , z l ≤ tables | z − l , t − l ) ∝ DT JJ NN (DT,JJ,NN) − a Tri × tables − count − (DT,JJ,NN) (DT,JJ) + b Tri count − A simple example NN . . . JJ NNP NNS JJ NN Tri (DT,JJ) : 1 2 3 4 5 Inducing the syntactic categories of words 13/35

  31. A Hierarchical Pitman-Yor HMM P Tri ( t l = NN , z l = tables + 1 | z − l , t − l ) ∝ DT JJ NN � a Tri × tables − (DT,JJ,NN) + b Tri � P Bi ( NN | z − l , t − l ) (DT,JJ) + b Tri count − A simple example NN . . . JJ NNP NNS JJ NN NN Tri (DT,JJ) : 1 2 3 4 5 6 Inducing the syntactic categories of words 13/35

  32. A Hierarchical Pitman-Yor HMM P Bi ( t l = NN , z l ≤ tables | z − l , t − l ) ∝ DT JJ NN (JJ,NN) − a Bi × tables − count − (JJ,NN) (JJ) + b Bi count − A simple example NN . . . JJ NNP NNS JJ NN NN Tri (DT,JJ) : 1 2 3 4 5 6 . . . JJ NNP NNS NN Bi (JJ) : 1 2 3 4 Inducing the syntactic categories of words 13/35

  33. A Hierarchical Pitman-Yor HMM P Bi ( t l = NN , z l = tables + 1 | z − l , t − l ) ∝ DT JJ NN � a Bi × tables − (JJ,NN) + b Bi � P Uni ( NN | z − l , t − l ) (JJ) + b Bi count − A simple example NN . . . JJ NNP NNS JJ NN NN Tri (DT,JJ) : 1 2 3 4 5 6 JJ NNP NNS NN NN Bi (JJ) : . . . 1 2 3 4 5 Inducing the syntactic categories of words 13/35

  34. A Hierarchical Pitman-Yor HMM DT JJ NN P Uni ( t l = NN , z l ≤ tables | z − l , t − l ) ∝ NN − a Uni × tables − count − NN count − + b Uni A simple example NN . . . JJ NNP NNS JJ NN NN Tri (DT,JJ) : 1 2 3 4 5 6 JJ NNP NNS NN NN Bi (JJ) : . . . 1 2 3 4 5 JJ NNP NNS NN . . . Uni: 1 2 3 4 Inducing the syntactic categories of words 13/35

Recommend


More recommend