lti
play

lti Is convex Perceptron Boosting Max-Margin Conditional - PowerPoint PPT Presentation

Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions Kevin Gimpel and Noah A. Smith lti Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Based on Uses a cost probabilistic function inference


  1. Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions Kevin Gimpel and Noah A. Smith lti

  2. Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Based on Uses a cost probabilistic function inference Minimum Error Latent Variable Rate Training Conditional Risk Likelihood lti

  3. Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Softmax-Margin Based on Uses a cost probabilistic function inference Minimum Error Latent Variable Rate Training Conditional Risk Likelihood lti

  4. Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Softmax-Margin Based on Uses a cost probabilistic function inference Jensen Minimum Error Risk Bound Latent Variable Rate Training Conditional Risk Likelihood lti

  5. Linear Models for Structured Prediction input output θ ⊤ f � �� � � � � ������ � � ∈ � � � � weights features � For probabilistic interpretation, exponentiate and normalize: ��� { θ ⊤ f � �� � � } � θ � � | � � � � � ′ ∈ � � � � ��� { θ ⊤ f � �� � ′ � } lti

  6. Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � ����� � � � � � � � θ ⊤ f � � � � � � � � ��� ��� θ � ∈ � � � � � � � � �� task-specific cost function lti

  7. Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � ����� � � � � � � � θ ⊤ f � � � � � � � � ��� ��� θ � ∈ � � � � � � � � �� cost-augmented decoding lti

  8. Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � ����� � � � � � � � θ ⊤ f � � � � � � � � ��� ��� θ � ∈ � � � � � � � � �� � Softmax-margin: replace “max” with “softmax”   � � � θ ⊤ f � � � � � � � � � ����� � � � � � � �  − θ ⊤ f � � � � � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � “cost-augmented summing” lti

  9. Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � ����� � � � � � � � θ ⊤ f � � � � � � � � ��� ��� θ � ∈ � � � � � � � � �� � Softmax-margin: replace “max” with “softmax”   � � � θ ⊤ f � � � � � � � � � ����� � � � � � � �  − θ ⊤ f � � � � � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � Sha and Saul (2006), Povey et al. (2008) lti

  10. Properties of Softmax-Margin � Has a probabilistic interpretation in the minimum divergence framework (Jelinek, 1997) � Details in technical report � Is a bound on: � Max-margin � Conditional likelihood � Risk lti

  11. Properties of Softmax-Margin � Has a probabilistic interpretation in the minimum divergence framework (Jelinek, 1997) � Details in technical report � Is a bound on: � Max-margin (because “softmax” bounds “max”) � � � � � Conditional likelihood � Risk lti

  12. Risk? � Risk is the expected value of the cost function (Smith and Eisner, 2006; Li and Eisner, 2009): � � � � θ � �| � � � � � ������ � � � � � � �� ��� θ � �� lti

  13. Bounding Conditional Likelihood and Risk � Softmax-margin:   � � �  − θ ⊤ f � � � � � � � � � � � � ��� ��� { θ ⊤ f � � � � � � � � � ����� � � � � � � � }  � �� � ∈ � � � � � � � � � � � � � − θ ⊤ f � � � � � � � � � � � � ��� � � ��� � � � ���� { ����� � � � � � � � } � � � � �� � �� Conditional likelihood Bound on risk via Jensen’s inequality lti

  14. Bounding Conditional Likelihood and Risk � Softmax-margin:   � � �  − θ ⊤ f � � � � � � � � � � � � ��� ��� { θ ⊤ f � � � � � � � � � ����� � � � � � � � }  � �� � ∈ � � � � � � � � � � � � � − θ ⊤ f � � � � � � � � � � � � ��� � � ��� � � � ���� { ����� � � � � � � � } � � � � �� � �� Conditional likelihood Bound on risk via Jensen’s inequality Softmax-margin is a convex bound on max-margin, conditional likelihood, and risk lti

  15. Bounding Conditional Likelihood and Risk � Softmax-margin:   � � �  − θ ⊤ f � � � � � � � � � � � � ��� ��� { θ ⊤ f � � � � � � � � � ����� � � � � � � � }  � �� � ∈ � � � � � � � � � � � � � � � � � − θ ⊤ f � � � � � � � � � � � � ��� � � − θ ⊤ f � � � � � � � � � � � � ��� � � ��� � � � ���� { ����� � � � � � � � } � � � � � � �� � �� � �� Bound on risk via Conditional likelihood Jensen Risk Bound Jensen’s inequality Easier to optimize than risk (cf. Li and Eisner, 2009) lti

  16. Implementation � Conditional likelihood → Softmax-margin � If cost function factors the same way as the features, it’s easy: � Add additional features for the cost function � Keep their weights fixed � If not, use a simpler cost function or use approximate inference lti

  17. Experiments � English named-entity recognition (CoNLL 2003) � Compared softmax-margin and Jensen risk bound with five baselines: � Perceptron (Collins, 2002) � 1-best MIRA with cost-augmented decoding (Crammer et al., 2006) � Max-margin via subgradient descent (Ratliff et al., 2006) � Conditional likelihood (Lafferty et al., 2001) � Risk (Xiong et al., 2009) � For risk and Jensen risk bound, initialized using output of conditional likelihood training � Used Hamming cost for cost function lti

  18. Results Method Test F 1 Perceptron 83.98* MIRA 85.72* Max-Margin 85.28* Conditional Likelihood 85.46* Risk 85.59* Jensen Risk Bound 85.65* Softmax-Margin 85.84* * Indicates significance (compared with softmax-margin) lti

  19. Results Method Test F 1 Perceptron 83.98* MIRA 85.72* Max-Margin 85.28* Conditional Likelihood 85.46* Significant improvement with Risk 85.59* equal training time and Jensen Risk Bound 85.65* implementation difficulty Softmax-Margin 85.84* * Indicates significance (compared with softmax-margin) lti

  20. Results Method Test F 1 Perceptron 83.98* MIRA 85.72* Max-Margin 85.28* Conditional Likelihood 85.46* Comparable Risk 85.59* performance with half the Jensen Risk Bound 85.65* training time Softmax-Margin 85.84* * Indicates significance (compared with softmax-margin) lti

  21. Is convex Perceptron Max-Margin Conditional Likelihood MIRA Softmax-Margin Based on Uses a cost probabilistic function inference Jensen Risk Bound Risk lti

  22. Softmax-Margin MIRA Jensen Risk Bound Risk Performance Conditional Likelihood Max-Margin Perceptron Time lti

  23. (Cost-Augmented) (Cost-Augmented) Decoding Decoding Expectations Expectations of Products Softmax-Margin of Products MIRA Jensen Risk Bound Risk Performance Conditional Likelihood Max-Margin (Cost-Augmented) (Cost-Augmented) Summing Summing Perceptron Time lti

Recommend