Weakly-Supervised Learning with Cost-Augmented Contras;ve - PowerPoint PPT Presentation

Weakly-‑Supervised ¡Learning ¡with ¡ Cost-‑Augmented ¡Contras;ve ¡Es;ma;on ¡ Kevin ¡Gimpel ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mohit ¡Bansal ¡ 1 ¡

n New ¡objec;ve ¡for ¡weakly-‑supervised ¡NLP, ¡generalizes ¡ contras;ve ¡es;ma;on ¡(Smith ¡& ¡Eisner, ¡2005) ¡ n Adds ¡two ¡cost ¡func;ons: ¡inputs ¡and ¡outputs ¡ n Improved ¡system ¡combina;on ¡for ¡POS ¡tagging ¡ ¡ many-‑to-‑1 ¡ ¡ 1-‑to-‑1 ¡ ¡ accuracy ¡ accuracy ¡ 61.8 ¡ 47.2 ¡ Contras;ve ¡Es;ma;on ¡ 64.3 ¡ 51.7 ¡ Cost-‑Augmented ¡Contras;ve ¡Es;ma;on ¡ avg. ¡across ¡5 ¡languages, ¡ PASCAL ¡2012 ¡POS ¡shared ¡task ¡ 2 ¡

n New ¡objec;ve ¡for ¡weakly-‑supervised ¡NLP, ¡generalizes ¡ contras;ve ¡es;ma;on ¡(Smith ¡& ¡Eisner, ¡2005) ¡ n Adds ¡two ¡cost ¡func;ons: ¡inputs ¡and ¡outputs ¡ n Improved ¡system ¡combina;on ¡for ¡POS ¡tagging ¡ ¡ many-‑to-‑1 ¡ ¡ 1-‑to-‑1 ¡ ¡ accuracy ¡ accuracy ¡ 61.8 ¡ 47.2 ¡ Contras;ve ¡Es;ma;on ¡ 64.3 ¡ 51.7 ¡ Cost-‑Augmented ¡Contras;ve ¡Es;ma;on ¡ 60.9 ¡ 50.1 ¡ Posterior ¡Regulariza;on ¡(Graça ¡et ¡al., ¡2011) ¡ avg. ¡across ¡5 ¡languages, ¡ PASCAL ¡2012 ¡POS ¡shared ¡task ¡ 3 ¡

EM ¡and ¡Contras;ve ¡Es;ma;on ¡ ¡ Modifica;on ¡1: ¡Input ¡Cost ¡ ¡ Modifica;on ¡2: ¡Output ¡Cost ¡ ¡ 4 ¡

Genera;ve ¡Log-‑Linear ¡Models ¡ 5 ¡

Genera;ve ¡Log-‑Linear ¡Models ¡ word ¡ sequence ¡ part-‑of-‑speech ¡ tag ¡sequence ¡ 6 ¡

Genera;ve ¡Log-‑Linear ¡Models ¡ word ¡ sequence ¡ parameters ¡ feature ¡ vector ¡ part-‑of-‑speech ¡ tag ¡sequence ¡ 7 ¡

Genera;ve ¡Log-‑Linear ¡Models ¡ 8 ¡

Unsupervised ¡Learning ¡for ¡Log-‑Linear ¡Models ¡ 9 ¡

EM ¡ 10 ¡

EM ¡ 11 ¡

EM ¡ reward ¡all ¡y’s ¡for ¡observed ¡x ¡ penalize ¡all ¡y’s ¡for ¡ALL ¡x’s ¡ 12 ¡

Contras;ve ¡Es;ma;on ¡(CE) ¡ (Smith ¡& ¡Eisner, ¡2005) ¡ “corrup;on ¡neighborhood” ¡ 13 ¡

Contras;ve ¡Es;ma;on ¡(CE) ¡ (Smith ¡& ¡Eisner, ¡2005) ¡ 14 ¡

Contras;ve ¡Es;ma;on ¡(CE) ¡ (Smith ¡& ¡Eisner, ¡2005) ¡ reward ¡all ¡y’s ¡for ¡observed ¡x ¡ (same ¡as ¡EM) ¡ 15 ¡

Contras;ve ¡Es;ma;on ¡(CE) ¡ (Smith ¡& ¡Eisner, ¡2005) ¡ penalize ¡all ¡y’s ¡for ¡x’s ¡in ¡ ¡ reward ¡all ¡y’s ¡for ¡observed ¡x ¡ corrup;on ¡neighborhood ¡ (same ¡as ¡EM) ¡ 16 ¡

With ¡well-‑designed ¡neighborhood, ¡CE ¡shown ¡effec;ve ¡for: ¡ part-‑of-‑speech ¡tagging ¡(Smith ¡& ¡Eisner, ¡2005a) ¡ dependency ¡parsing ¡(Smith ¡& ¡Eisner, ¡2005b) ¡ morphological ¡segmenta;on ¡(Poon ¡et ¡al., ¡2009) ¡ bilingual ¡part-‑of-‑speech ¡induc;on ¡(Chen ¡et ¡al., ¡2011) ¡ machine ¡transla;on ¡(Xiao ¡et ¡al., ¡2011) ¡ 17 ¡

“Transpose1” ¡Neighborhood ¡ Sentence: ¡ ¡ red ¡leaves ¡don’t ¡hide ¡blue ¡jays ¡ Neighborhood: ¡ red ¡ leaves ¡ don ’ t ¡ hide ¡ blue ¡ jays ¡ don ’ t ¡ hide ¡ blue ¡ jays ¡ Smith ¡& ¡Eisner ¡(2005) ¡

EM ¡and ¡Contras;ve ¡Es;ma;on ¡ ¡ Modifica;on ¡1: ¡Input ¡Cost ¡ ¡ Modifica;on ¡2: ¡Output ¡Cost ¡ ¡ 19 ¡

Contras;ve ¡Es;ma;on: ¡ all ¡x’s ¡in ¡corrup;on ¡neighborhood ¡ ¡ treated ¡equally! ¡ 20 ¡

Transpose1 ¡Neighborhood ¡ Sentence: ¡ ¡ red ¡leaves ¡don’t ¡hide ¡blue ¡jays ¡ Neighborhood: ¡ red ¡ leaves ¡ don ’ t ¡ hide ¡ blue ¡ jays ¡ don ’ t ¡ hide ¡ blue ¡ jays ¡ Smith ¡& ¡Eisner ¡(2005) ¡

Transpose1 ¡Neighborhood ¡ Sentence: ¡ ¡ red ¡leaves ¡don’t ¡hide ¡blue ¡jays ¡ neighborhood ¡always ¡contains ¡original ¡sentence ¡ Neighborhood: ¡ red ¡ leaves ¡ don ’ t ¡ hide ¡ blue ¡ jays ¡ don ’ t ¡ hide ¡ blue ¡ jays ¡ Smith ¡& ¡Eisner ¡(2005) ¡

Transpose1 ¡Neighborhood ¡ some ¡corrup;ons ¡not ¡as ¡bad ¡as ¡others ¡ Sentence: ¡ ¡ red ¡leaves ¡don’t ¡hide ¡blue ¡jays ¡ Neighborhood: ¡ red ¡ leaves ¡ don ’ t ¡ hide ¡ blue ¡ jays ¡ don ’ t ¡ hide ¡ blue ¡ jays ¡ Smith ¡& ¡Eisner ¡(2005) ¡

First ¡modifica;on: ¡ ¡ add ¡ input ¡cost ¡func?on ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡_ ¡ 24 ¡

First ¡modifica;on: ¡ ¡ add ¡ input ¡cost ¡func?on ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡_ ¡ measures ¡difference ¡ between ¡observed ¡and ¡ corrupted ¡sentences, ¡ ¡ ¡ ¡ ¡is ¡weight ¡ 25 ¡

Inspira;on: ¡Structured ¡Large-‑Margin ¡Learning ¡ margin-‑rescaled ¡structured ¡hinge ¡(Taskar ¡et ¡al., ¡2003): ¡ sohmax-‑margin ¡(Povey ¡et ¡al., ¡2008; ¡Gimpel ¡& ¡Smith, ¡2010) ¡: ¡ 26 ¡

Inspira;on: ¡Structured ¡Large-‑Margin ¡Learning ¡ margin-‑rescaled ¡structured ¡hinge ¡(Taskar ¡et ¡al., ¡2003): ¡ sohmax-‑margin ¡(Povey ¡et ¡al., ¡2008; ¡Gimpel ¡& ¡Smith, ¡2010) ¡: ¡ (soh)max-‑margin: ¡cost ¡compares ¡two ¡outputs ¡ this ¡talk: ¡cost ¡compares ¡two ¡ inputs ¡ 27 ¡

Input ¡Cost ¡Func;ons ¡ Match: ¡ ¡ count ¡unmatched ¡bigrams ¡in ¡corrupted ¡sentence ¡ Match ¡LM: ¡ ¡ weight ¡by ¡language ¡model ¡(nega;ve) ¡log-‑probability ¡ 28 ¡

Experiments ¡ Unsupervised ¡part-‑of-‑speech ¡tagging, ¡12 ¡tags, ¡no ¡tag ¡dic;onaries ¡ ¡ Evalua;on: ¡many-‑to-‑1 ¡& ¡1-‑to-‑1 ¡accuracy ¡ ¡ 5 ¡languages ¡from ¡PASCAL ¡2012 ¡shared ¡task ¡(Gelling ¡et ¡al., ¡2012): ¡ ¡ ¡ ¡ ¡ ¡Danish, ¡Dutch, ¡Portuguese, ¡Slovene, ¡Swedish ¡ 29 ¡

Neighborhoods ¡ Transpose1 ¡(Smith ¡& ¡Eisner, ¡2005) ¡ ¡ Shuffle10: ¡ ¡ original ¡sentence ¡+ ¡10 ¡random ¡permuta;ons ¡ ¡ ¡ ¡ 30 ¡

Setup ¡ Features: ¡ ¡ ¡ ¡ ¡tag-‑tag ¡transi;ons ¡ ¡ ¡ ¡ ¡tag-‑word ¡emissions ¡ ¡ ¡ ¡ ¡spelling ¡features ¡(Smith ¡& ¡Eisner, ¡2005) ¡ ¡ ¡ ¡ ¡tag-‑cluster ¡emissions ¡(from ¡Brown ¡clustering ¡with ¡{12,40} ¡clusters) ¡ ¡ LBFGS ¡for ¡100 ¡itera;ons, ¡random ¡ini;aliza;on ¡ L2 ¡regulariza;on ¡with ¡(untuned) ¡coefficient ¡0.0001 ¡ 31 ¡

many-‑to-‑1 ¡ ¡ 1-‑to-‑1 ¡ ¡ input ¡cost ¡ accuracy ¡ accuracy ¡ None ¡(CE ¡baseline) ¡ 51.3 ¡ ¡(+1.3) ¡ 39.7 ¡ ¡(+0.4) ¡ Shuffle10 ¡ Match ¡ 53.3 ¡ ¡(+2.0) ¡ 40.5 ¡ ¡(+0.8) ¡ Match ¡LM ¡ 53.9 ¡ ¡(+2.6) ¡ 41.6 ¡ ¡(+1.9) ¡ None ¡(CE ¡baseline) ¡ 61.8 ¡ ¡ ¡(-‑1.2) ¡ 47.2 ¡ ¡(+4.3) ¡ Transpose1 ¡ Match ¡ 63.1 ¡ ¡(+1.3) ¡ 47.6 ¡ ¡(+0.4) ¡ Match ¡LM ¡ 62.8 ¡ ¡(+1.0) ¡ 49.9 ¡ ¡(+2.7) ¡ avg. ¡across ¡5 ¡languages: ¡ ¡ Danish, ¡Dutch, ¡Portuguese, ¡Slovene, ¡Swedish ¡ 32 ¡

many-‑to-‑1 ¡ ¡ 1-‑to-‑1 ¡ ¡ input ¡cost ¡ accuracy ¡ accuracy ¡ None ¡(CE ¡baseline) ¡ 51.3 ¡ ¡(+1.3) ¡ 39.7 ¡ ¡(+0.4) ¡ Shuffle10 ¡ Match ¡ 53.3 ¡ ¡(+2.0) ¡ 40.5 ¡ ¡(+0.8) ¡ Match ¡LM ¡ 53.9 ¡ ¡(+2.6) ¡ 41.6 ¡ ¡(+1.9) ¡ None ¡(CE ¡baseline) ¡ 61.8 ¡ ¡ ¡(-‑1.2) ¡ 47.2 ¡ ¡(+4.3) ¡ Transpose1 ¡ Match ¡ 63.1 ¡ ¡(+1.3) ¡ 47.6 ¡ ¡(+0.4) ¡ Match ¡LM ¡ 62.8 ¡ ¡(+1.0) ¡ 49.9 ¡ ¡(+2.7) ¡ avg. ¡across ¡5 ¡languages: ¡ ¡ Danish, ¡Dutch, ¡Portuguese, ¡Slovene, ¡Swedish ¡ 33 ¡

Weakly-Supervised Learning with Cost-Augmented Contras;ve - PowerPoint PPT Presentation

Weakly-Supervised Learning with Cost-Augmented Contras;ve Es;ma;on Kevin Gimpel Mohit Bansal 1 n New objec;ve for

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

Weakly-Supervised Temporal Localization via Occurrence Count Learning Julien Schroeter

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

LID Challenge: Weakly Supervised Semantic Segmentation 3d place solution NoPeopleAllowed: The 3

Dual-Gradients Localization framework for Weakly Supervised Object Localization Chuangchuang Tan

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Contras(ve learning, mul(-view redundancy, and linear models Daniel Hsu Columbia University

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Naming MINERALITY of French white wines : a contras;ve study

Commission Deliverable: 2.2 Presentation of Innovation and the pro&contras of the

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

1/08/2012 Augmented Reality How Does This Technology Fit in the Commercial World? Augmented

Portfolio of Work (9 pages) T H E N E X T R E V O L U T I O N I N R E T A I L AUGMENTED

ubiquitous computing and augmented realities virtual and augmented reality m aking the

HPC usage in the University of Luxembourg Soft Matter Theory Group Joshua T. Berryman, Muhammad

The LCLS LCLS X X- -Ray FEL and Ray FEL and The Related R&D at the SPPS Related R&D at

Protein unfolding and flexible systems Protein unfolding and flexible systems Protein unfolding

Joint use of SAXS and SANS Jill Trewhella, The University of Sydney Structural & biophysical

Economic and Mortgage Market Outlook April 2020 Presented by: Joel Kan Mortgage Bankers

Based on slides by Prof. Burton Ma 1 You know a lot about an object by knowing its class

AR and MA Models ARIMA Modeling with R AR and MA Models > x <- arima.sim(list(order = c(1,

GO MOBILE I N T E R N A T I O N A L C O L L A B O R A T I V E R O B O T I C S W O R K S H O P

Sambuz

Useful Links

Newsletter

Mail Us

Weakly-Supervised Learning with Cost-Augmented Contras;ve - PowerPoint PPT Presentation

Weakly-Supervised Learning with Cost-Augmented Contras;ve Es;ma;on Kevin Gimpel Mohit Bansal 1 n New objec;ve for

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

Weakly-Supervised Temporal Localization via Occurrence Count Learning Julien Schroeter

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

LID Challenge: Weakly Supervised Semantic Segmentation 3d place solution NoPeopleAllowed: The 3

Dual-Gradients Localization framework for Weakly Supervised Object Localization Chuangchuang Tan

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Contras(ve learning, mul(-view redundancy, and linear models Daniel Hsu Columbia University

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Naming MINERALITY of French white wines : a contras;ve study

Commission Deliverable: 2.2 Presentation of Innovation and the pro&amp;contras of the

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

1/08/2012 Augmented Reality How Does This Technology Fit in the Commercial World? Augmented

Portfolio of Work (9 pages) T H E N E X T R E V O L U T I O N I N R E T A I L AUGMENTED

ubiquitous computing and augmented realities virtual and augmented reality m aking the

HPC usage in the University of Luxembourg Soft Matter Theory Group Joshua T. Berryman, Muhammad

The LCLS LCLS X X- -Ray FEL and Ray FEL and The Related R&amp;D at the SPPS Related R&amp;D at

Protein unfolding and flexible systems Protein unfolding and flexible systems Protein unfolding

Joint use of SAXS and SANS Jill Trewhella, The University of Sydney Structural &amp; biophysical

Economic and Mortgage Market Outlook April 2020 Presented by: Joel Kan Mortgage Bankers

Based on slides by Prof. Burton Ma 1 You know a lot about an object by knowing its class

AR and MA Models ARIMA Modeling with R AR and MA Models &gt; x &lt;- arima.sim(list(order = c(1,

GO MOBILE I N T E R N A T I O N A L C O L L A B O R A T I V E R O B O T I C S W O R K S H O P

Sambuz

Useful Links

Newsletter

Mail Us

Commission Deliverable: 2.2 Presentation of Innovation and the pro&contras of the

The LCLS LCLS X X- -Ray FEL and Ray FEL and The Related R&D at the SPPS Related R&D at

Joint use of SAXS and SANS Jill Trewhella, The University of Sydney Structural & biophysical

AR and MA Models ARIMA Modeling with R AR and MA Models > x <- arima.sim(list(order = c(1,