Discriminative L earning over C onstrained L atent R epresentations Ming-Wei Chang , Dan Goldwasser, Dan Roth and Vivek Srikumar Computer Science Department, University of Illinois at Urbana-Champaign Page. 1/27
An one minute version of the talk What we did Provide a general recipe for many important NLP problems Our algorithm: L earning over C onstrained L atent R epresentations Page. 2/27
An one minute version of the talk What we did Provide a general recipe for many important NLP problems Our algorithm: L earning over C onstrained L atent R epresentations Example NLP problems Transliteration (Klementiev and Roth 2008), Textual entailment (RTE) (Dagan, Glickman, and Magnini 2006) Paraphrase identification (Dolan, Quirk, and Brockett 2004) Question Answering, and many more! Page. 2/27
An one minute version of the talk What we did Provide a general recipe for many important NLP problems Our algorithm: L earning over C onstrained L atent R epresentations Example NLP problems Transliteration (Klementiev and Roth 2008), Textual entailment (RTE) (Dagan, Glickman, and Magnini 2006) Paraphrase identification (Dolan, Quirk, and Brockett 2004) Question Answering, and many more! Problems of Interests Binary classification tasks that require an intermediate representation Page. 2/27
Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said face Alan murder will charges be , charged Bob with said murder Page. 3/27
Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan murder will charges be , charged Bob with said murder Page. 3/27
Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan They carry the same information! murder will charges be , charged Bob with said murder Page. 3/27
Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan They carry the same information! murder will Justifying the decision requires an charges be intermediate representation , charged Bob with said murder Page. 3/27
Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan They carry the same information! murder will Justifying the decision requires an charges be intermediate representation , charged Bob with said murder Page. 3/27
Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan They carry the same information! murder will Justifying the decision requires an charges be intermediate representation , charged Just an example; the real intermediate Bob with representation is more complicated said murder Page. 3/27
Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan They carry the same information! murder will Justifying the decision requires an charges be intermediate representation , charged Just an example; the real intermediate Bob with representation is more complicated said murder Problem of interests Binary output problem: y ∈ {− 1 , 1 } Intermediate representation: h Some structure that justifies the positive label The intermediate representation is latent (not present in the data) Page. 3/27
Limitations of existing approaches: two-stage approach Most systems: a two-stage approach Stage 1: Generate the intermediate representation Obtain intermediate representation → Fix it (ignore the second stage) ! X → H Page. 4/27
Limitations of existing approaches: two-stage approach Most systems: a two-stage approach Stage 1: Generate the intermediate representation Obtain intermediate representation → Fix it (ignore the second stage) ! X → H Stage 2: Classification based on the intermediate representation Extract features using the fixed representation and learn: Φ( X , H ) → Y Page. 4/27
Limitations of existing approaches: two-stage approach Most systems: a two-stage approach Stage 1: Generate the intermediate representation Obtain intermediate representation → Fix it (ignore the second stage) ! X → H Stage 2: Classification based on the intermediate representation Extract features using the fixed representation and learn: Φ( X , H ) → Y Problem: the intermediate representation ignores the binary task Page. 4/27
Limitations of existing approaches: two-stage approach Most systems: a two-stage approach Stage 1: Generate the intermediate representation Obtain intermediate representation → Fix it (ignore the second stage) ! X → H Stage 2: Classification based on the intermediate representation Extract features using the fixed representation and learn: Φ( X , H ) → Y Problem: the intermediate representation ignores the binary task Page. 4/27
Limitations of existing approaches: inference Observation: decisions on intermediate representation are interdependent Alan Bob will said face Alan murder will charges be , charged Bob with said murder Page. 5/27
Limitations of existing approaches: inference Observation: decisions on intermediate representation are interdependent Alan Bob will said face Alan murder will charges be , charged Bob with said murder Page. 5/27
Limitations of existing approaches: inference Observation: decisions on intermediate representation are interdependent Alan Bob will said face Alan murder will charges be , charged Bob with said murder Page. 5/27
Limitations of existing approaches: inference Observation: decisions on intermediate representation are interdependent Alan Bob will said face Alan murder will charges be , charged Bob with said murder Many frameworks use custom designed inference procedures Difficult to add linguistic intuition/constraints on the intermediate representation Difficult to generalize to other tasks Page. 5/27
Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y Page. 6/27
Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y input Page. 6/27
Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y intermediate rep- input resentation Page. 6/27
Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y intermediate rep- input features resentation Page. 6/27
Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y intermediate rep- input binary label features resentation Page. 6/27
Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels feedback Φ( X , H ) X H Y intermediate rep- input binary label features resentation Page. 6/27
Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels feedback Φ( X , H ) X H Y intermediate rep- input binary label features resentation Find an intermediate representation that helps the binary task Page. 6/27
Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels feedback Φ( X , H ) X H Y intermediate rep- input binary label features resentation Find an intermediate representation that helps the binary task Property 2: Constraint-based inference for the intermediate representation Uses integer linear programming on latent variables Easy to inject constraints on latent variables Easy to generalize to other tasks Page. 6/27
Outline Motivation and Contribution 1 Property 1: Jointly learn intermediate representations and labels 2 Property 2: Constraint-based inference for the intermediate 3 representation LCLR: Putting Everything Together 4 Experiments 5 Page. 7/27
Outline Motivation and Contribution 1 Property 1: Jointly learn intermediate representations and labels 2 Property 2: Constraint-based inference for the intermediate 3 representation LCLR: Putting Everything Together 4 Experiments 5 Page. 8/27
The intuition behind the joint approach Yes/NO Alan Bob will said face Alan murder will charges be , charged Bob with said murder Page. 9/27
The intuition behind the joint approach Yes/NO intermediate representation ⇔ { 1 , − 1 } Alan Bob Only positive examples have good will said intermediate representations face Alan No negative example has a good intermediate murder will representation charges be , charged Bob with said murder Page. 9/27
Recommend
More recommend