lti Overview We introduce cube summing, which extends dynamic - PowerPoint PPT Presentation

Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings Kevin Gimpel and Noah A. Smith lti

Overview � We introduce cube summing, which extends dynamic programming algorithms for summing with non-local features � Inspired by cube pruning (Chiang, 2007; Huang & Chiang, 2007) � We relate cube summing to semiring-weighted logic programming � Without non-local features, cube summing is a novel semiring � Non-local features break some of the semiring properties � We propose an implementation based on arithmetic circuits lti

Outline � Background � Cube Pruning � Cube Summing � Semirings � Implementation � Conclusion lti

Fundamental Problems � Consider an exponential probabilistic model � � λ � � � �� p � y | x � ∝ � � �� Two fundamental problems we often need to solve � Decoding � � � � � �� y � x � � �� λ � � ∈ � � �� Summing � � � � � � �� s � x � � λ � � ∈ � � �� lti

Fundamental Problems � Consider an exponential probabilistic model � � example: HMM λ � � � �� p � y | x � ∝ x y is a sentence, is a tag sequence � � �� Two fundamental problems we often need to solve � Decoding � � � � � �� y � x � � �� λ � Viterbi algorithm � ∈ � � �� Summing � � � � � � �� forward and backward algorithms s � x � � λ � � ∈ � � �� lti

Fundamental Problems � Consider an exponential probabilistic model � � example: PCFG λ � � � �� p � y | x � ∝ x y is a sentence, is a parse tree � � �� Two fundamental problems we often need to solve � Decoding � � � � � �� y � x � � �� λ � probabilistic CKY � ∈ � � �� Summing � � � � � � �� inside algorithm s � x � � λ � � ∈ � � �� lti

Fundamental Problems � Consider an exponential probabilistic model � � λ � � � �� p � y | x � ∝ � � �� Two fundamental problems we often need to solve supervised: unsupervised: � Decoding � � � � � �� y � x � � �� λ � perceptron, self-training, MIRA, Viterbi EM � ∈ � � �� MERT � Summing � � � EM, � � � �� s � x � � λ � log-linear models hidden-variable models � ∈ � � �� lti

Dynamic Programming � Consider the probabilistic CKY algorithm C �� − � �� λ � → � � �� ∈N � � ∈{ � �� − � } λ � → � � × C �� × C �� C �� C �� lti

Weighted Logic Programs Probabilistic CKY Example C �� theorem chart item λ � → � � axiom rule probability PP proof derivation NP of the list lti

Weighted Logic Programs Probabilistic CKY Example C �� theorem chart item λ � → � � axiom rule probability PP proof derivation NP of the list � In semiring-weighted logic programming, theorem and axiom values come from a semiring lti

Features � � λ � � � �� p � y | x � ∝ � Recall our model: � � �� h � � x, y � λ � � The are feature functions and the are nonnegative weights lti

Features � � λ � � � �� p � y | x � ∝ � Recall our model: � � �� h � � x, y � λ � � The are feature functions and the are nonnegative weights � Local features depend only on theorems used in an equation (or any of the axioms), not on the proofs of those theorems �� ∈N � � ∈{ � �� − � } λ � → � � × C �� × C �� C �� lti

S NP PP VP NP PP NP NP NP NP NP RB IN DT NN IN DT NN VBZ NN NNP NNP There near the top of the list is quarterback Troy Aikman lti

Features � � λ � � � �� p � y | x � ∝ � Recall our model: � � �� h � � x, y � λ � � The are feature functions and the are nonnegative weights � Local features depend only on theorems used in an equation (or any of the axioms), not on the proofs of those theorems �� ∈N � � ∈{ � �� − � } λ � → � � × C �� × C �� C �� Non-local features depend on theorem proofs lti

“NGramTree” feature (Charniak & Johnson, 2005) S NP PP VP NP PP NP NP NP NP NP RB IN DT NN IN DT NN VBZ NN NNP NNP There near the top of the list is quarterback Troy Aikman lti

“NGramTree” feature (Charniak & Johnson, 2005) S NP PP VP NP PP NP Non-local features break dynamic programming! NP NP NP NP RB IN DT NN IN DT NN VBZ NN NNP NNP There near the top of the list is quarterback Troy Aikman lti

Other Algorithms for Approximate Inference � Beam search (Lowerre, 1979) � Reranking (Collins, 2000) � Algorithms for graphical models � Variational methods (MacKay, 1997; Beal, 2003; Kurihara & Sato, 2006) � Belief propagation (Sutton & McCallum, 2004; Smith & Eisner, 2008) � MCMC (Finkel et al., 2005; Johnson et al., 2007) � Particle filtering (Levy et al., 2009) � Integer linear programming (Roth & Yih, 2004) � Stacked learning (Cohen & Carvalho, 2005; Martins et al., 2008) � Cube pruning (Chiang, 2007; Huang & Chiang, 2007) lti

Other Algorithms for Approximate Inference � Beam search (Lowerre, 1979) � Reranking (Collins, 2000) � Algorithms for graphical models � Variational methods (MacKay, 1997; Beal, 2003; Kurihara & Sato, 2006) � Belief propagation (Sutton & McCallum, 2004; Smith & Eisner, 2008) � MCMC (Finkel et al., 2005; Johnson et al., 2007) � Particle filtering (Levy et al., 2009) � Integer linear programming (Roth & Yih, 2004) � Stacked learning (Cohen & Carvalho, 2005; Martins et al., 2008) � Cube pruning (Chiang, 2007; Huang & Chiang, 2007) � Why add one more? � Cube pruning extends existing, widely-understood dynamic programming algorithms for decoding � We want this for summing too lti

Outline � Background � Cube Pruning � Cube Summing � Semirings � Implementation � Conclusion lti

Cube Pruning (Chiang, 2007; Huang & Chiang, 2007) � Modification to dynamic programming algorithms for decoding to use non-local features approximately � Keeps a k -best list of proofs for each theorem � Applies non-local feature functions on these proofs when proving new theorems lti

C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP S NP PP VP NP NP NP VBZ NN NNP NNP There near the top of the list is quarterback Troy Aikman 0 1 7 lti

C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP NP NP NP EX RB NNP There There There C NP,0,1 = 0.4 0.3 0.02 PP PP PP NP NP NP PP PP PP NP NP NP NP NP NP IN NN IN JJ RB NN DT IN DT NN DT IN DT NN DT IN DT NN near the top of the list near the top of the list near the top of the list C PP,1,7 = 0.2 0.1 0.05 lti

C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP PP PP PP NP NP NP ... ... ... NP NP NP IN NN IN JJ RB NN DT DT DT near the top ... near the top ... near the top ... C PP,1,7 C NP,0,1 0.1 0.2 0.05 NP EX 0.4 0.08 0.04 0.02 NP There RB 0.3 0.06 0.03 0.015 NP There 0.02 0.004 0.002 0.001 NNP There lti

C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP PP PP PP λ NP → NP PP = 0.5 NP NP NP ... ... ... NP NP NP NN JJ IN IN RB NN DT DT DT near the top ... near the top ... near the top ... C PP,1,7 C NP,0,1 0.2 0.1 0.05 NP EX 0.4 0.08 × 0.5 0.04 × 0.5 0.02 × 0.5 NP There RB 0.3 0.06 × 0.5 0.03 × 0.5 0.015 × 0.5 NP There 0.02 0.004 × 0.5 0.002 × 0.5 0.001 × 0.5 NNP There lti

C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP PP PP PP NP NP NP ... ... ... NP NP NP IN NN IN JJ RB NN DT DT DT near the top ... near the top ... near the top ... C PP,1,7 C NP,0,1 0.1 0.2 0.05 NP EX 0.4 0.04 0.02 0.01 NP There RB 0.3 0.03 0.015 0.0075 NP There 0.02 0.002 0.001 0.0005 NNP There lti

NP λ There EX NP NP PP IN near = 0.2 PP NP PP PP PP PP NP NP NP NP NP NP ... ... ... NP NP NP EX IN NN DT IN DT NN IN NN IN JJ RB NN DT DT DT There near the top of the list near the top ... near the top ... near the top ... C PP,1,7 C NP,0,1 0.1 0.2 0.05 NP EX 0.4 0.04 × 0.2 0.02 × 0.2 0.01 NP There RB 0.3 0.03 0.015 0.0075 NP There 0.02 0.002 0.001 0.0005 NNP There lti

lti Overview We introduce cube summing, which extends dynamic - PowerPoint PPT Presentation

Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings Kevin Gimpel and Noah A. Smith lti Overview We introduce cube summing, which extends dynamic programming algorithms for summing with

Models for LTI systems LTI system stands for linear time invariant system Model describing LTI

Topic 2: LTI Systems and Convolution Response of LTI Systems Impulse response and unit

SIMPLE & LEAN PRODUCER Expanding Production and Reducing Costs Health and Safety Update: No

lti 1 (typically) Unsupervised learning in NLP non-convex optimization lti 2

lti The Goal Input: educational text Output: quiz lti The Goal Input:

C. H. Perez & Associates C Consulting E lti E ngineers, Inc. i I FDOT District Four

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

dt < | ( ) | h t (this has to do with system stability system stability)

INC 212 Signals and systems Lecture#4: Frequency response of LTI systems Assoc. Prof. Benjamas

M u lti v ariable logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita

Textual Predictors of Bill Survival in Congressional Committees Tae Yano , LTI, CMU Noah Smith ,

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das , LTI, CMU Google

Signal and Systems Chapter 2: LTI Systems Representation of DT signals in terms of shifted unit

dt < | ( ) | h t (this has to do with system stability system stability (ECE

Representation of LTI Systems Prof. Seungchul Lee Industrial AI Lab. Transfer Function

VASCO (VAcuum Stability COde) : multi-gas code to calculate gas density lti d t l l t d it

Challenges in Chinese Knowledge Graph Construction Chengyu Wang, Ming Gao, Xiaofeng He, Rong

Beyond NP [HMU06,Chp.11a] Tautology Problem NP-Hardness and co-NP Historical

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 NP-Completeness SAT Wheeler Ruml (UNH)

Theory of Computer Science E4. Some NP-Complete Problems, Part I Gabriele R oger University

Kernel lower bounds using co-nondeterminism: Finding induced hereditary subgraphs Stefan Kratsch,

About some problems arising from large scale parallelization of NP class combinatorial problems

ADVANCED ALGORITHMS Lecture 25: Intractability 1 ANNOUNCEMENTS HW 6 is out due on

NP-completeness (review) have no feasible solutions NP have P feasible solutions

lti Overview We introduce cube summing, which extends dynamic - PowerPoint PPT Presentation

Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings Kevin Gimpel and Noah A. Smith lti Overview We introduce cube summing, which extends dynamic programming algorithms for summing with

Models for LTI systems LTI system stands for linear time invariant system Model describing LTI

Topic 2: LTI Systems and Convolution Response of LTI Systems Impulse response and unit

SIMPLE &amp; LEAN PRODUCER Expanding Production and Reducing Costs Health and Safety Update: No

lti 1 (typically) Unsupervised learning in NLP non-convex optimization lti 2

lti The Goal Input: educational text Output: quiz lti The Goal Input:

C. H. Perez &amp; Associates C Consulting E lti E ngineers, Inc. i I FDOT District Four

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

dt &lt; | ( ) | h t (this has to do with system stability system stability)

INC 212 Signals and systems Lecture#4: Frequency response of LTI systems Assoc. Prof. Benjamas

M u lti v ariable logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita

Textual Predictors of Bill Survival in Congressional Committees Tae Yano , LTI, CMU Noah Smith ,

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das , LTI, CMU Google

Signal and Systems Chapter 2: LTI Systems Representation of DT signals in terms of shifted unit

dt &lt; | ( ) | h t (this has to do with system stability system stability (ECE

Representation of LTI Systems Prof. Seungchul Lee Industrial AI Lab. Transfer Function

VASCO (VAcuum Stability COde) : multi-gas code to calculate gas density lti d t l l t d it

Challenges in Chinese Knowledge Graph Construction Chengyu Wang, Ming Gao, Xiaofeng He, Rong

Beyond NP [HMU06,Chp.11a] Tautology Problem NP-Hardness and co-NP Historical

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 NP-Completeness SAT Wheeler Ruml (UNH)

Theory of Computer Science E4. Some NP-Complete Problems, Part I Gabriele R oger University

Kernel lower bounds using co-nondeterminism: Finding induced hereditary subgraphs Stefan Kratsch,

About some problems arising from large scale parallelization of NP class combinatorial problems

ADVANCED ALGORITHMS Lecture 25: Intractability 1 ANNOUNCEMENTS HW 6 is out due on

NP-completeness (review) have no feasible solutions NP have P feasible solutions

SIMPLE & LEAN PRODUCER Expanding Production and Reducing Costs Health and Safety Update: No

C. H. Perez & Associates C Consulting E lti E ngineers, Inc. i I FDOT District Four

dt < | ( ) | h t (this has to do with system stability system stability)

dt < | ( ) | h t (this has to do with system stability system stability (ECE