Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang - PowerPoint PPT Presentation

Introduction Induction: HDP-PCFG Bayesian Inference Induction Experiments Refinement: HDP-PCFG-GR Refinement Experiments Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung & Gourab Kundu CS 598jhm April 9th 2013 Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments Background This paper (chapter of a book) describes a Bayesian approach to the problem of syntactic parsing and the underlying problems of grammar induction and grammar refinement . Grammar induction : estimating grammars based on raw sentences alone, without any other type of supervision Original approaches had poor performance due to the coarse-grained nature of the syntactic categories Grammar refinement : “splitting” coarse-grained syntactic categories into finer, more accurate and descriptive labels e.g. parent annotation (syntactic), lexicalization (semantic) Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments PCFG Example S s γ φ s ( γ ) ✏ PPP ✏ ✏ S → NP VP 0.9 ✏ P NP VP S → S CONJ S 0.1 ✏ PPPP ✏ ✏ NP → JJ JJ NNS 0.5 ✏ PRP VBP NP NP → PRP 0.5 ✘ ❳❳❳❳ ✘ ✘ ✘ ✘ ❳ VP → VP NP 0.4 They have JJ JJ NNS VP → VBP NP 0.3 VP → VBG NP 0.3 many theoretical ideas Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments Mathematical Definition Formally, a PCFG is specified by the following: Σ, a set of terminal symbols (the words in the sentence) S , a set of nonterminal symbols (the syntactic categories) Root ∈ S , a designated nonterminal starting symbol φ , rule probabilities: φ = ( φ s ( γ ) : s ∈ S , γ ∈ Σ ∪ ( S × S )), such that φ s ( γ ) ≥ 0 and � γ φ s ( γ ) = 1 Note the restriction on γ , that γ ∈ Σ or γ ∈ ( S × S ). Such transitions make a PCFG in Chomsky normal form. Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments Mathematical Definition II A parse tree has a set of nonterminal nodes N along with the corresponding symbols s = ( s i ∈ S , i ∈ N ). Now, let N E denote nodes having one terminal child, N B denote nodes having two nonterminal children The tree structure is represented by c = ( c j ( i ) : i ∈ N B , j = 1 , 2) for nonterminal nodes x = ( x i : i ∈ N E ) for terminal nodes (the “yield”) The joint probability of a parse tree z = ( N , s , c ) and x is then � � p ( x , z | φ ) = φ s i ( s c 1 ( i ) , s c 2 ( i ) ) φ s i ( x i ) i ∈ N B i ∈ N E Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments HDP-PCFG: Generating the parse tree and its yield So, given rule probabilities φ , for each syntactic category z consisting of φ T z (rule t ype parameters), φ E z ( e mission parameters), and φ B z ( b inary productions), we can generate a tree and its parse in the following way: For each node i in the parse tree: t i ∼ Mult ( φ T z i ) if t i = Emission , x i ∼ Mult ( φ E z i ) if t i = BinaryProduction , ( z c 1 ( i ) , z c 2 ( i ) ) ∼ Mult ( φ B z i ) Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments This Paper’s Focus Traditionally, PCFGs are defined with a fixed, finite S and the parameters φ are fit using smoothed maximum likelihood This paper develops a nonparametric version of the PCFG that allows S to be countably infinite The model then performs posterior inference over S and the set of parse trees to find φ This model is called a Hierarchical Dirichlet Process PCFG (HDP-PCFG), and is described in the next section Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Bayesian Inference The Model Induction Experiments Discussion Refinement: HDP-PCFG-GR Refinement Experiments HDP-PCFG: Generating the grammar β ∼ GEM ( α ) For each grammar symbol z ∈ { 1 , 2 , . . . } : φ T z ∼ Dir ( α T ) φ E z ∼ Dir ( α E ) φ B z ∼ DP ( α B , ββ ⊤ ) What do β, φ { T , E , B } , and ββ ⊤ look like? z Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Bayesian Inference The Model Induction Experiments Discussion Refinement: HDP-PCFG-GR Refinement Experiments HDP-PCFG: The whole process β ∼ GEM ( α ) For each grammar symbol z ∈ { 1 , 2 , . . . } : φ T z ∼ Dir ( α T ) φ E z ∼ Dir ( α E ) φ B z ∼ DP ( α B , ββ ⊤ ) For each node i in the parse tree: t i ∼ Mult ( φ T z i ) if t i = Emission , x i ∼ Mult ( φ E z i ) if t i = BinaryProduction , ( z c 1 ( i ) , z c 2 ( i ) ) ∼ Mult ( φ B z i ) Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Bayesian Inference The Model Induction Experiments Discussion Refinement: HDP-PCFG-GR Refinement Experiments Why is an HDP model advantageous? Allows the complexity of the grammar to grow as more training data is available; a DP prior penalizes the use of more symbols than are supported in the training data . . . which in turn means the level of sophistication of the grammar can adequately match the corpus Can you think of any disadvantages? Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Bayesian Inference The Model Induction Experiments Discussion Refinement: HDP-PCFG-GR Refinement Experiments Hierarchical Dirichlet Process How is this a Hierarchical DP? How is this related to the HDP-HMM from Thursday? Why not a simpler model: for each symbol z , draw a distribution separately over left children l z ∼ DP ( β ) and right children r z ∼ DP ( β )? Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Bayesian Inference Framing the Problem Induction Experiments Coordinate Ascent Refinement: HDP-PCFG-GR Refinement Experiments Bayesian Inference for HDP-PCFG The authors chose to use structured mean-field approximation (variational inference with KL-divergence as a dissimilarity function) The random variables of interest are the parameters θ = ( β, φ ), the parse tree z , and the observed yield x Thus the goal is to approximate the posterior p ( θ, z | x ). We want to find a q ( θ, z ) such that argmin KL ( q ( θ, z ) || p ( θ, z | x )) q ∈Q where Q is a tractable subset of distributions. Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Bayesian Inference Framing the Problem Induction Experiments Coordinate Ascent Refinement: HDP-PCFG-GR Refinement Experiments Bayesian Inference for HDP-PCFG The set of approximate distributions Q are defined to be those that factor as follows: � � K � � � q ( φ T z ) q ( φ E z ) q ( φ B Q = q : q ( β ) z ) q ( z ) z =1 Additionally, other constraints are introduced: q ( β ) is degenerate and truncated q ( φ { T , E , B } ) are Dirichlet distributions z q ( z ) is any multinomial distribution Note that we have a fixed K . How does this affect the approximation? Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Bayesian Inference Framing the Problem Induction Experiments Coordinate Ascent Refinement: HDP-PCFG-GR Refinement Experiments Coordinate Ascent The optimization problem to find the best q is non-convex They use a coordinate ascent algorithm to find a local optimum Iteratively, 1 Optimize q ( z ), keeping q ( φ ) and q ( β ) fixed 2 Optimize q ( φ ), keeping q ( z ) and q ( β ) fixed 3 Optimize q ( β ), keeping q ( z ) and q ( φ ) fixed Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Introduction Induction: HDP-PCFG Bayesian Inference Framing the Problem Induction Experiments Coordinate Ascent Refinement: HDP-PCFG-GR Refinement Experiments Prediction We want to parse a new sentence with the induced grammar. The prediction is given by z ∗ new = argmax E p ( θ, z | x ) p ( z new | θ, x new ) z new Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang - PowerPoint PPT Presentation

Introduction Induction: HDP-PCFG Bayesian Inference Induction Experiments Refinement: HDP-PCFG-GR Refinement Experiments Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung & Gourab Kundu CS

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Probabilistic Context-Free Grammars Zipfs Law Informatics 2A: Lecture 19 2 Probabilistic

Nested Hierarchical Dirichlet Processes John Paisley, Chong Wang, David M. Blei, and Michael I.

Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups Dongruo Zhou 1 Difan Zou 2

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Probabilistic Context-Free Grammars Informatics 2A: Lecture 18 Bonnie Webber and Frank Keller

Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Berlin Chen Graduate

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Formal Grammars Why Study Grammars? Whats a Grammar? August 24, 2014 Parsing Brian A.

Lexicalized Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview

Probabilistic Context-Free Grammars Informatics 2A: Lecture 20 Shay Cohen 6 November, 2015 1 /

Overview Grammars, or: how to specify linguistic knowledge Towards more complex grammar

Creating a Dynamic Grammar of Ancient Greek An on-going research project at the Catholic

Algorithms for NLP (11-711) Fall 2020 Formal Language Theory In one lecture Robert Frederking

On recovering multi-dimensional arrays in Polly Tobias Grosser, Sebastian Pop, J. Ramanujam, P.

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

A Component-based Approach for Constructing High-confidence Distributed Embedded Systems Barrett

Fine-grained language composition Edd Barrett, Carl Friedrich Bolz, Lukas Diekmann, Geoff French,

Algebraic Product is the Definitions Main Results Only t-Norm for Which Proof of Proposition 1