Discriminative Models Joakim Nivre Uppsala University Department - PowerPoint PPT Presentation

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Discriminative Models 1(11)

1. Generative and Discriminative Models 2. Log-Linear Models 3. Local Discriminative Models 4. Global Discriminative Models 5. Reranking Discriminative Models 2(11)

Generative and Discriminative Models A generative statistical model defines the joint probability P ( x , y ) of input x and output y ◮ Pros: ◮ Learning problems have closed form solutions ◮ Related probabilities can be derived: ◮ Conditionalization: P ( y | x ) = P ( x , y ) P ( x ) ◮ Marginalization: P ( x ) = � y P ( x , y ) ◮ Cons: ◮ Rigid independence assumptions (or intractable parsing) ◮ Indirect modeling of parsing problem Discriminative Models 3(11)

Generative and Discriminative Models A discriminative statistical model defines the conditional probability P ( y | x ) of output y given input x ◮ Pros: ◮ No rigid independence assumptions ◮ More direct modeling of parsing problem ◮ Cons: ◮ Learning problems require numerical approximation ◮ Related probabilities cannot be derived: ◮ No way to compute P ( x , y ) from P ( y | x ) ◮ No way to compute P ( x ) or P ( y ) from P ( y | x ) Discriminative Models 4(11)

Generative and Discriminative Models Two classes of discriminative models: ◮ Conditional models: ◮ Explicitly model the conditional probability P ( y | x ) ◮ Used in mapping X → Y : argmax y P ( y | x ) ◮ Purely discriminative models: ◮ Directly optimize mapping X → Y ◮ No explicit model of conditional probability P ( y | x ) Discriminative Models 5(11)

Log-Linear Models �� k � exp i = 1 f i ( x , y ) · w i P ( y | x ) = �� k � � y ′ ∈ GEN ( x ) exp i = 1 f i ( x , y ′ ) · w i ◮ f i ( x , y ) = feature function ◮ w i = feature weight �� k � ◮ exp i = 1 f i ( x , y ) · w i > 0 �� k � �� k � ◮ exp i = 1 f i ( x , y ) · w i ≤ � y ′ ∈ GEN ( x ) exp i = 1 f i ( x , y ′ ) · w i ◮ 0 ≤ P ( y | x ) ≤ 1 ◮ � y ′ P ( y ′ | x ) = 1 Discriminative Models 6(11)

Log-Linear Models y ∗ = argmax y P ( y | x ) exp [ � k i = 1 f i ( x , y ) · w i ] = argmax y y ′∈ GEN ( x ) exp [ � k i = 1 f i ( x , y ′ ) · w i ] � �� k � = argmax y exp i = 1 f i ( x , y ) · w i � k = argmax y i = 1 f i ( x , y ) · w i Discriminative Models 7(11)

Local Discriminative Models m � P ( y | x ) = P ( d i | Φ( d 1 , . . . , d i − 1 , x )) i = 1 �� k � exp i = 1 f i (Φ( d 1 , . . . , d i − 1 ) , d i ) · w i P ( d i | Φ( d 1 , . . . , d i − 1 , x )) = �� k � � d ′ ∈ GEN ( x ) exp i = 1 f i (Φ( d 1 , . . . , d i − 1 ) , d ′ ) · w i ◮ Conditional model over local decisions ◮ Pros: unconstrained features, efficient learning/decoding ◮ Cons: approximate search (beam search or similar) Discriminative Models 8(11)

Global Discriminative Models �� k � exp i = 1 f i ( x , y ) · w i P ( y | x ) = �� k � � y ′ ∈ GEN ( x ) exp i = 1 f i ( x , y ′ ) · w i ◮ Conditional model over global structure ◮ Factorization for efficient inference (dynamic programming) ◮ Pros: exact learning/decoding ◮ Cons: only local features, less efficient Discriminative Models 9(11)

Reranking �� k � exp i = 1 f i ( x , y ) · w i P ( y | x ) = �� k � � y ′ ∈ GEN n ( x ) exp i = 1 f i ( x , y ′ ) · w i ◮ Conditional model over global structure ◮ GEN n ( x ) = n-best list for efficient inference ◮ Pros: unconstrained features, (almost) exact learning/decoding ◮ Cons: can be inefficient Discriminative Models 10(11)

Reranking On Exact and Approximate Methods What if the objective function we want to maximize is not efficiently computable in our favorite model? 1. Use a simpler model (e.g., restrict feature scope) 2. Use approximate inference (e.g., beam search or reranking) 3. Use another objective function (e.g., labeled recall) Which strategy works best is (usually) an empirical question! Discriminative Models 11(11)

Discriminative Models Joakim Nivre Uppsala University Department - PowerPoint PPT Presentation

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Discriminative Models 1(11) 1. Generative and Discriminative Models 2. Log-Linear Models 3. Local Discriminative Models

Three models for discriminative machine Three models for discriminative machine translation using

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative word alignment by learning the Discriminative word alignment by learning the

Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation Matthieu

Generative Models for Discriminative Problems Chris Dyer DeepMind ASRU 2017 December 19,

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Discriminant

Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4 Discriminant

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin

Dynamic Re-ordering in Mining Top- k Productive Discriminative Patterns Yoshitaka Kameya * and

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 ,

NLP Programming Tutorial 6 - Advanced Discriminative Learning Graham Neubig Nara Institute of

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 Patrick Simianer, Gesa

On Discriminative Learning of Prediction Uncertainty Vojtch Franc, Daniel Pra Department of

Logistic Regression, Generative and Discriminative Classifiers Recommended reading: Ng and

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should

HMM-based acoustic model adaptation and discriminative training Steven Wegmann ICSI 11 April

MassiveBlack Rupert Croft Tiziana Di Matteo Yu Feng Nishikanta Khandai Colin Degraf Evan

Euro-Mediterranean Center on Climate Change M. Mancini 1 , A. Raolil 1 , G. Cal 1 , G. Aloisio

New Insights into Disability Beneficiaries Pursuit of Work Presenters Michael Levere, Denise

Semi-analytical solutions for transport PDEs in heterogeneous media Dr Elliot Carr

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang ,

Your Future Why become a Registered Dietitian, Registered Dietitian Nutritionist? What are

CLOSER 2019, May., 2-4, Heraklion, Greece 1 CLOSER 2019, May., 2-4, Heraklion, Greece 2 Cloud

Cada Da - Welsh Meeting Template Social Language Learning Program - Template - Thursday - Dydd

Sambuz

Useful Links

Newsletter

Mail Us