lti
play

lti Overview We introduce cube summing, which extends dynamic - PowerPoint PPT Presentation

Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings Kevin Gimpel and Noah A. Smith lti Overview We introduce cube summing, which extends dynamic programming algorithms for summing with


  1. Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings Kevin Gimpel and Noah A. Smith lti

  2. Overview � We introduce cube summing, which extends dynamic programming algorithms for summing with non-local features � Inspired by cube pruning (Chiang, 2007; Huang & Chiang, 2007) � We relate cube summing to semiring-weighted logic programming � Without non-local features, cube summing is a novel semiring � Non-local features break some of the semiring properties � We propose an implementation based on arithmetic circuits lti

  3. Outline � Background � Cube Pruning � Cube Summing � Semirings � Implementation � Conclusion lti

  4. Fundamental Problems � Consider an exponential probabilistic model � � λ � � � ��� � p � y | x � ∝ � � �� � Two fundamental problems we often need to solve � Decoding � � � � � ��� � y � x � � ������ � λ � � ∈ � � �� � Summing � � � � � � ��� � s � x � � λ � � ∈ � � �� lti

  5. Fundamental Problems � Consider an exponential probabilistic model � � example: HMM λ � � � ��� � p � y | x � ∝ x y is a sentence, is a tag sequence � � �� � Two fundamental problems we often need to solve � Decoding � � � � � ��� � y � x � � ������ � λ � Viterbi algorithm � ∈ � � �� � Summing � � � � � � ��� � forward and backward algorithms s � x � � λ � � ∈ � � �� lti

  6. Fundamental Problems � Consider an exponential probabilistic model � � example: PCFG λ � � � ��� � p � y | x � ∝ x y is a sentence, is a parse tree � � �� � Two fundamental problems we often need to solve � Decoding � � � � � ��� � y � x � � ������ � λ � probabilistic CKY � ∈ � � �� � Summing � � � � � � ��� � inside algorithm s � x � � λ � � ∈ � � �� lti

  7. Fundamental Problems � Consider an exponential probabilistic model � � λ � � � ��� � p � y | x � ∝ � � �� � Two fundamental problems we often need to solve supervised: unsupervised: � Decoding � � � � � ��� � y � x � � ������ � λ � perceptron, self-training, MIRA, Viterbi EM � ∈ � � �� MERT � Summing � � � EM, � � � ��� � s � x � � λ � log-linear models hidden-variable models � ∈ � � �� lti

  8. Dynamic Programming � Consider the probabilistic CKY algorithm C ��� − � �� � λ � → � � ��� ∈N � � ∈{ � �� ������ − � } λ � → � � × C ����� × C ����� C ����� � ��� ���� � C �� � �� lti

  9. Weighted Logic Programs Probabilistic CKY Example C ����� theorem chart item λ � → � � axiom rule probability PP proof derivation NP of the list lti

  10. Weighted Logic Programs Probabilistic CKY Example C ����� theorem chart item λ � → � � axiom rule probability PP proof derivation NP of the list � In semiring-weighted logic programming, theorem and axiom values come from a semiring lti

  11. Features � � λ � � � ��� � p � y | x � ∝ � Recall our model: � � �� h � � x, y � λ � � The are feature functions and the are nonnegative weights lti

  12. Features � � λ � � � ��� � p � y | x � ∝ � Recall our model: � � �� h � � x, y � λ � � The are feature functions and the are nonnegative weights � Local features depend only on theorems used in an equation (or any of the axioms), not on the proofs of those theorems ��� ∈N � � ∈{ � �� ������ − � } λ � → � � × C ����� × C ����� C ����� � ��� lti

  13. S NP PP VP NP PP NP NP NP NP NP RB IN DT NN IN DT NN VBZ NN NNP NNP There near the top of the list is quarterback Troy Aikman lti

  14. S NP PP VP NP PP NP NP NP NP NP RB IN DT NN IN DT NN VBZ NN NNP NNP There near the top of the list is quarterback Troy Aikman lti

  15. Features � � λ � � � ��� � p � y | x � ∝ � Recall our model: � � �� h � � x, y � λ � � The are feature functions and the are nonnegative weights � Local features depend only on theorems used in an equation (or any of the axioms), not on the proofs of those theorems ��� ∈N � � ∈{ � �� ������ − � } λ � → � � × C ����� × C ����� C ����� � ��� � Non-local features depend on theorem proofs lti

  16. “NGramTree” feature (Charniak & Johnson, 2005) S NP PP VP NP PP NP NP NP NP NP RB IN DT NN IN DT NN VBZ NN NNP NNP There near the top of the list is quarterback Troy Aikman lti

  17. “NGramTree” feature (Charniak & Johnson, 2005) S NP PP VP NP PP NP Non-local features break dynamic programming! NP NP NP NP RB IN DT NN IN DT NN VBZ NN NNP NNP There near the top of the list is quarterback Troy Aikman lti

  18. Other Algorithms for Approximate Inference � Beam search (Lowerre, 1979) � Reranking (Collins, 2000) � Algorithms for graphical models � Variational methods (MacKay, 1997; Beal, 2003; Kurihara & Sato, 2006) � Belief propagation (Sutton & McCallum, 2004; Smith & Eisner, 2008) � MCMC (Finkel et al., 2005; Johnson et al., 2007) � Particle filtering (Levy et al., 2009) � Integer linear programming (Roth & Yih, 2004) � Stacked learning (Cohen & Carvalho, 2005; Martins et al., 2008) � Cube pruning (Chiang, 2007; Huang & Chiang, 2007) lti

  19. Other Algorithms for Approximate Inference � Beam search (Lowerre, 1979) � Reranking (Collins, 2000) � Algorithms for graphical models � Variational methods (MacKay, 1997; Beal, 2003; Kurihara & Sato, 2006) � Belief propagation (Sutton & McCallum, 2004; Smith & Eisner, 2008) � MCMC (Finkel et al., 2005; Johnson et al., 2007) � Particle filtering (Levy et al., 2009) � Integer linear programming (Roth & Yih, 2004) � Stacked learning (Cohen & Carvalho, 2005; Martins et al., 2008) � Cube pruning (Chiang, 2007; Huang & Chiang, 2007) � Why add one more? � Cube pruning extends existing, widely-understood dynamic programming algorithms for decoding � We want this for summing too lti

  20. Outline � Background � Cube Pruning � Cube Summing � Semirings � Implementation � Conclusion lti

  21. Cube Pruning (Chiang, 2007; Huang & Chiang, 2007) � Modification to dynamic programming algorithms for decoding to use non-local features approximately � Keeps a k -best list of proofs for each theorem � Applies non-local feature functions on these proofs when proving new theorems lti

  22. C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP S NP PP VP NP NP NP VBZ NN NNP NNP There near the top of the list is quarterback Troy Aikman 0 1 7 lti

  23. C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP NP NP NP EX RB NNP There There There C NP,0,1 = 0.4 0.3 0.02 PP PP PP NP NP NP PP PP PP NP NP NP NP NP NP IN NN IN JJ RB NN DT IN DT NN DT IN DT NN DT IN DT NN near the top of the list near the top of the list near the top of the list C PP,1,7 = 0.2 0.1 0.05 lti

  24. C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP PP PP PP NP NP NP ... ... ... NP NP NP IN NN IN JJ RB NN DT DT DT near the top ... near the top ... near the top ... C PP,1,7 C NP,0,1 0.1 0.2 0.05 NP EX 0.4 0.08 0.04 0.02 NP There RB 0.3 0.06 0.03 0.015 NP There 0.02 0.004 0.002 0.001 NNP There lti

  25. C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP PP PP PP λ NP → NP PP = 0.5 NP NP NP ... ... ... NP NP NP NN JJ IN IN RB NN DT DT DT near the top ... near the top ... near the top ... C PP,1,7 C NP,0,1 0.2 0.1 0.05 NP EX 0.4 0.08 × 0.5 0.04 × 0.5 0.02 × 0.5 NP There RB 0.3 0.06 × 0.5 0.03 × 0.5 0.015 × 0.5 NP There 0.02 0.004 × 0.5 0.002 × 0.5 0.001 × 0.5 NNP There lti

  26. C NP,0,7 = C NP,0,1 × C PP,1,7 × λ NP → NP PP PP PP PP NP NP NP ... ... ... NP NP NP IN NN IN JJ RB NN DT DT DT near the top ... near the top ... near the top ... C PP,1,7 C NP,0,1 0.1 0.2 0.05 NP EX 0.4 0.04 0.02 0.01 NP There RB 0.3 0.03 0.015 0.0075 NP There 0.02 0.002 0.001 0.0005 NNP There lti

  27. NP λ There EX NP NP PP IN near = 0.2 PP NP PP PP PP PP NP NP NP NP NP NP ... ... ... NP NP NP EX IN NN DT IN DT NN IN NN IN JJ RB NN DT DT DT There near the top of the list near the top ... near the top ... near the top ... C PP,1,7 C NP,0,1 0.1 0.2 0.05 NP EX 0.4 0.04 × 0.2 0.02 × 0.2 0.01 NP There RB 0.3 0.03 0.015 0.0075 NP There 0.02 0.002 0.001 0.0005 NNP There lti

Recommend


More recommend