Tsybakov noise adap/ve margin-based ac/ve learning Aar$ - PowerPoint PPT Presentation

Tsybakov ¡noise ¡adap/ve ¡ margin-‑based ¡ac/ve ¡learning ¡ Aar$ ¡Singh ¡ A. ¡Nico ¡Habermann ¡Associate ¡Professor ¡ ¡ ¡ NIPS ¡workshop ¡on ¡Learning ¡Faster ¡from ¡Easy ¡Data ¡II ¡ Dec ¡11, ¡2015 ¡

Passive ¡Learning ¡

Ac/ve ¡Learning ¡ ( X j , ?) ( X j , Y j ) ( X i , ?) ( X i , Y i )

Streaming ¡se;ng ¡

Streaming ¡se;ng ¡ � Algorithm ¡obtains ¡X t ¡ sampled ¡iid ¡from ¡ marginal ¡distribu$on ¡P X ¡ � Based ¡on ¡previous ¡ labeled ¡and ¡unlabeled ¡ data, ¡the ¡algorithm ¡ decides ¡whether ¡or ¡not ¡ to ¡accept ¡X t ¡and ¡query ¡its ¡ label. ¡ ¡ ¡ � If ¡label ¡is ¡queried, ¡algorithm ¡receives ¡Y t ¡sampled ¡iid ¡from ¡ condi$onal ¡distribu$on ¡P(Y|X=X t ) ¡

Problem ¡setup ¡ • X ¡is ¡d-‑dimensional, ¡P X ¡is ¡uniform ¡(or ¡log-‑concave ¡+ ¡isotropic) ¡ • Binary ¡classifica$on: ¡Labels ¡Y ¡in ¡{+1, ¡-‑1} ¡ • Homogeneous ¡linear ¡classifiers ¡sign(w. ¡X) ¡ ¡ ¡ w ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡with ¡||w|| 2 ¡= ¡1 ¡ ¡ • err(w) ¡= ¡P(sign(w.X) ¡≠ ¡Y) ¡ ¡ ¡ • Bayes ¡op$mal ¡classifier ¡is ¡linear ¡w* ¡ arg ¡max Y ¡P(Y|X) ¡= ¡sign(w*. ¡X) ¡ ¡

Tsybakov ¡Noise ¡Condi/on ¡ For ¡all ¡linear ¡classifiers ¡w ¡with ¡||w|| 2 ¡= ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡μ ¡θ(w,w*) κ ¡ ¡ ≤ ¡err(w) ¡– ¡err(w*) ¡ ¡ where ¡ κ ¡in ¡[1,∞) ¡is ¡the ¡TNC ¡exponent ¡and ¡0 ¡< ¡μ ¡< ¡∞ ¡is ¡a ¡constant. ¡ w* ¡ 1 ¡ θ(w,w*) ¡ 0.5 ¡ w ¡ 0 ¡ X κ ¡characterizes ¡noise ¡in ¡label ¡distribu$on ¡ ¡ κ makes ¡problem ¡easy ¡or ¡hard ¡– ¡small ¡κ ¡implies ¡easier ¡problem ¡ ¡

Minimax ¡ac/ve ¡learning ¡rates ¡ If ¡Tsybakov ¡Noise ¡Condi$on ¡(TNC) ¡holds, ¡then ¡minimax ¡op$mal ¡ ac$ve ¡learning ¡rate ¡is ¡ ¡ ¡E[err(w T ) ¡– ¡err(w*)] ¡= ¡Õ((d/T) κ /(2 κ -‑2) ) ¡ ¡ κ ¡= ¡∞ ¡passive ¡rate ¡1/√T ¡ ¡ κ = ¡1 ¡exponen$al ¡rate ¡e -‑T ¡ ¡ ¡ Lower ¡bound: ¡Castro-‑Nowak’06 ¡(d=1), ¡Hanneke-‑Yang’14 ¡(d, ¡P X ), ¡ ¡ Singh-‑Wang’14 ¡(d, ¡lower-‑bounded/uniform ¡P X ) ¡ ¡ ¡ Algorithms ¡need ¡to ¡know ¡ κ !! ¡ ¡ Upper ¡bound ¡(Margin-‑based ¡ac$ve ¡learning): ¡Balcan-‑Broder-‑ Model ¡selec/on ¡for ¡ac/ve ¡learning ¡-‑ ¡Can ¡we ¡adapt ¡to ¡easy ¡ Zhang’07 ¡(uniform ¡P X ), ¡Balcan-‑Long’13 ¡(log-‑concave+isotropic ¡P X ) ¡ cases, ¡while ¡being ¡robust ¡to ¡worst-‑case? ¡ 8 ¡

Balcan-‑Broder-‑Zhang’07 ¡ Margin-‑based ¡ac/ve ¡learning ¡ • Input: ¡Desired ¡accuracy ¡ε, ¡Failure ¡probability ¡δ ¡ • Ini$alize: ¡E; ¡For ¡e ¡= ¡1, ¡…, ¡E: ¡epoch ¡budgets ¡T e ¡, ¡search ¡radii ¡R e ¡, ¡ acceptance ¡regions ¡b e ¡, ¡precision ¡values ¡ε e ; ¡random ¡classifier ¡w 0 ¡ ¡ • For ¡e ¡= ¡1, ¡…, ¡E ¡ X ¡ ,Y ¡ ¡Un$l ¡labeled ¡examples ¡< ¡T e ¡ X ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Obtain ¡a ¡sample ¡X t ¡from ¡P X ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡If ¡|w e-‑1 . ¡X t | ¡≤ ¡b e-‑1 , ¡query ¡label ¡Y t ¡ ¡ ¡ w e-‑1 ¡ ¡end ¡ ¡ ¡ b e-‑1 ¡ ¡ ¡

Balcan-‑Broder-‑Zhang’07 ¡ Margin-‑based ¡ac/ve ¡learning ¡ • Input: ¡Desired ¡accuracy ¡ε, ¡Failure ¡probability ¡δ ¡ • Ini$alize: ¡E; ¡For ¡e ¡= ¡1, ¡…, ¡E: ¡epoch ¡budgets ¡T e ¡, ¡search ¡radii ¡R e ¡, ¡ acceptance ¡regions ¡b e ¡, ¡precision ¡values ¡ε e ; ¡random ¡classifier ¡w 0 ¡ ¡ • For ¡e ¡= ¡1, ¡…, ¡E ¡ R e-‑1 ¡ ¡Un$l ¡labeled ¡examples ¡< ¡T e ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Obtain ¡a ¡sample ¡X t ¡from ¡P X ¡ w e ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡If ¡|w e-‑1 . ¡X t | ¡≤ ¡b e-‑1 , ¡query ¡label ¡Y t ¡ ¡ ¡ w e-‑1 ¡ ¡end ¡ ¡Find ¡w e ¡ that ¡(approximately) ¡minimizes ¡training ¡error ¡up ¡ ¡to ¡precision ¡ε e ¡ on ¡the ¡T e ¡labeled ¡examples ¡among ¡all ¡w ¡ ¡s.t. ¡θ(w,w e-‑1 ) ¡≤ ¡R e-‑1 ¡ ¡ • Output: ¡w T ¡= ¡w E ¡

Balcan-‑Broder-‑Zhang’07 ¡ Margin-‑based ¡ac/ve ¡learning ¡ • Input: ¡Desired ¡accuracy ¡ε, ¡Failure ¡probability ¡δ ¡ • Ini$alize: ¡E; ¡For ¡e ¡= ¡1, ¡…, ¡E: ¡epoch ¡budgets ¡T e ¡, ¡search ¡radii ¡R e ¡, ¡ acceptance ¡regions ¡b e ¡, ¡precision ¡values ¡ε e ; ¡random ¡classifier ¡w 0 ¡ ¡ • For ¡e ¡= ¡1, ¡…, ¡E ¡ All ¡depend ¡on ¡ κ ¡ ¡Un$l ¡labeled ¡examples ¡< ¡T e ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Obtain ¡a ¡sample ¡X t ¡from ¡P X ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡If ¡|w e-‑1 . ¡X t | ¡≤ ¡b e-‑1 , ¡query ¡label ¡Y t ¡ ¡ ¡ ¡end ¡ ¡Find ¡w e ¡ that ¡(approximately) ¡minimizes ¡training ¡error ¡up ¡ ¡to ¡precision ¡ε e ¡ on ¡the ¡T e ¡labeled ¡examples ¡among ¡all ¡w ¡ ¡s.t. ¡θ(w,w e-‑1 ) ¡≤ ¡R e-‑1 ¡ ¡ • Output: ¡w T ¡ = ¡w E ¡

Adap/ve ¡margin-‑based ¡ac/ve ¡learning ¡ • Input: ¡Query ¡budget ¡T, ¡Failure ¡probability ¡δ, ¡shrink ¡rate ¡r ¡ • Ini$alize: ¡E ¡= ¡log ¡√T; ¡For ¡e ¡= ¡1, ¡…, ¡E: ¡epoch ¡budgets ¡T e ¡= ¡T/E, ¡search ¡ radius ¡R 0 ¡= ¡π, ¡acceptance ¡region ¡b 0 ¡= ¡∞; ¡random ¡classifier ¡w 0 ¡ ¡ • For ¡e ¡= ¡1, ¡…, ¡E ¡ No ¡knowledge ¡ ¡Un$l ¡labeled ¡examples ¡< ¡T e ¡ of ¡ κ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Obtain ¡a ¡sample ¡X t ¡from ¡P X ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡If ¡|w e-‑1 . ¡X t | ¡≤ ¡b e-‑1 , ¡query ¡label ¡Y t ¡ ¡ ¡ ¡end ¡ ¡Find ¡w e ¡ that ¡(approximately) ¡minimizes ¡training ¡error ¡on ¡ ¡the ¡T e ¡labeled ¡examples ¡among ¡all ¡w ¡s.t. ¡θ(w,w e-‑1 ) ¡≤ ¡R e-‑1 ¡ ¡ ¡R e ¡= ¡r ¡R e-‑1 ; ¡b e ¡= ¡2R e √ ¡[E(1+log(1/r))/d] ¡ • Output: ¡w T ¡= ¡w E ¡

Adap/ve ¡margin-‑based ¡ac/ve ¡learning ¡ Let ¡T ¡≥ ¡4, ¡d ¡≥ ¡4, ¡r ¡in ¡(0,1/2), ¡P X ¡is ¡uniform ¡on ¡d-‑dim ¡unit ¡ball ¡and ¡ P Y|X ¡sa$sfies ¡TNC(μ, ¡ κ ). ¡Then ¡the ¡streaming ¡adap$ve ¡ac$ve ¡ learning ¡algorithm ¡achieves, ¡with ¡probability ¡≥ ¡1 ¡– ¡δ, ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡err(w T ) ¡– ¡err(w*) ¡= ¡Õ((d+log(1/δ)/T) κ /(2 κ -‑2) ) ¡ ¡ for ¡all ¡1+ ¡1/(log(1/r)) ¡≤ ¡ κ < ∞ . ¡ ¡ ¡ ¡ Minimax ¡op/mal ¡rate ¡without ¡knowing ¡ µ, , κ ¡up ¡to ¡log ¡factors!! ¡ ¡ Adapt ¡to ¡easy ¡cases, ¡while ¡being ¡robust ¡to ¡worst-‑case! ¡

Why ¡does ¡it ¡work? ¡(proof ¡sketch) ¡ Consider ¡shrink ¡rate ¡r ¡= ¡½. ¡We ¡will ¡argue ¡adap$vity ¡to ¡ κ ¡in ¡[2,∞) ¡ ¡ ¡ Let ¡ ¡w e * ¡denote ¡the ¡best ¡linear ¡classifier ¡among ¡all ¡w ¡s.t. ¡ ¡θ(w,w e-‑1 ) ¡ ≤ ¡R e-‑1 ¡in ¡acceptance ¡region ¡b e-‑1 ¡ For ¡all ¡e, ¡with ¡high ¡probability ¡ ¡err(w e ) ¡– ¡err(w e *) ¡= ¡Õ(R e-‑1 ¡(d/T) 1/2 ) ¡ ¡passive ¡rate ¡ For ¡e ¡= ¡1,we ¡have ¡(d/T) 1/2 ¡ For ¡e ¡= ¡E ¡we ¡have ¡d/T ¡since ¡R E ¡= ¡R 0 /2 E ¡= ¡R 0 /√T. ¡(but ¡w* E ¡≠ ¡w*) ¡ ¡ Therefore, ¡there ¡exists ¡epoch ¡e’ ¡s.t. ¡with ¡high ¡probability ¡ ¡err(w e’ ) ¡– ¡err(w e’ *) ¡= ¡Õ((d/T) κ /(2 κ -‑2) ) ¡

Tsybakov noise adap/ve margin-based ac/ve learning Aar$ - PowerPoint PPT Presentation

Tsybakov noise adap/ve margin-based ac/ve learning Aar$ Singh A. Nico Habermann Associate Professor NIPS workshop on Learning Faster from Easy Data

ADAP Financial Forecasting Part I Britten Pund, NASTAD Evan Dial, Rudd Wisdom April 17, 2013

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Application Design: New Enrollee Application ADAP/MEDCAP/AIAP - OCTOBER 20, 2016 3

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

HLT MET Noise Filters in Run2011B Alex Mott Caltech Review of Noise Filters HBHE noise

ADAP Feedback Session Florida Comprehensive Planning Group Tampa, May 14, 2019 Florida AIDS Drug

Status of ADAP Waiting Lists and Other Cost-Containment Measures Jennifer Kates Kaiser Family

Confidence intervals and power Applied Statistics and Experimental Design Chapter 4 Peter Hoff

a culture of failure mathias meyer, @roidrage travis-ci.org / travis-ci.com failure risk 28

Development of a risk assessment strategy within the GUIDEnano project Dr. Susan Wijnhoven

Risk thinking and nuclear power Cathryn Carson Societal Risks

recent GEM developments Radoslaw Karabowicz GSI Darmstadt Geometry update Layout study

Agenda 2 6:00 p.m. Light Supper and meet and greet 6:30 p.m. Welcome 6:45 p.m.

Measurement of the atmospheric lepton energy spectra with AMANDA-II presented by Jan Lnemann*

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Tsybakov noise adap/ve margin-based ac/ve learning Aar$ - PowerPoint PPT Presentation

Tsybakov noise adap/ve margin-based ac/ve learning Aar$ Singh A. Nico Habermann Associate Professor NIPS workshop on Learning Faster from Easy Data

ADAP Financial Forecasting Part I Britten Pund, NASTAD Evan Dial, Rudd Wisdom April 17, 2013

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Application Design: New Enrollee Application ADAP/MEDCAP/AIAP - OCTOBER 20, 2016 3

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

HLT MET Noise Filters in Run2011B Alex Mott Caltech Review of Noise Filters HBHE noise

ADAP Feedback Session Florida Comprehensive Planning Group Tampa, May 14, 2019 Florida AIDS Drug

Status of ADAP Waiting Lists and Other Cost-Containment Measures Jennifer Kates Kaiser Family

Confidence intervals and power Applied Statistics and Experimental Design Chapter 4 Peter Hoff

a culture of failure mathias meyer, @roidrage travis-ci.org / travis-ci.com failure risk 28

Development of a risk assessment strategy within the GUIDEnano project Dr. Susan Wijnhoven

Risk thinking and nuclear power Cathryn Carson Societal Risks

recent GEM developments Radoslaw Karabowicz GSI Darmstadt Geometry update Layout study

Agenda 2 6:00 p.m. Light Supper and meet and greet 6:30 p.m. Welcome 6:45 p.m.

Measurement of the atmospheric lepton energy spectra with AMANDA-II presented by Jan Lnemann*

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random: