Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer - PowerPoint PPT Presentation

Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University December 11, 2015

Boosting: An Example Idea: combine weak “rules of thumb” to form a highly accurate predictor. 2 / 18

Boosting: An Example Idea: combine weak “rules of thumb” to form a highly accurate predictor. Example: email spam detection. 2 / 18

Boosting: An Example Idea: combine weak “rules of thumb” to form a highly accurate predictor. Example: email spam detection. Given: a set of training examples. ◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam) 2 / 18

Boosting: An Example Idea: combine weak “rules of thumb” to form a highly accurate predictor. Example: email spam detection. Given: a set of training examples. ◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam) Obtain a classifier by asking a “weak learning algorithm”: ◮ e.g. contains the word “money” ⇒ spam. 2 / 18

Boosting: An Example Idea: combine weak “rules of thumb” to form a highly accurate predictor. Example: email spam detection. Given: a set of training examples. ◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam) Obtain a classifier by asking a “weak learning algorithm”: ◮ e.g. contains the word “money” ⇒ spam. Reweight the examples so that “difficult” ones get more attention. ◮ e.g. spam that doesn’t contain “money”. 2 / 18

Boosting: An Example Idea: combine weak “rules of thumb” to form a highly accurate predictor. Example: email spam detection. Given: a set of training examples. ◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam) Obtain a classifier by asking a “weak learning algorithm”: ◮ e.g. contains the word “money” ⇒ spam. Reweight the examples so that “difficult” ones get more attention. ◮ e.g. spam that doesn’t contain “money”. Obtain another classifier: ◮ e.g. empty “to address” ⇒ spam. 2 / 18

Boosting: An Example Idea: combine weak “rules of thumb” to form a highly accurate predictor. Example: email spam detection. Given: a set of training examples. ◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam) Obtain a classifier by asking a “weak learning algorithm”: ◮ e.g. contains the word “money” ⇒ spam. Reweight the examples so that “difficult” ones get more attention. ◮ e.g. spam that doesn’t contain “money”. Obtain another classifier: ◮ e.g. empty “to address” ⇒ spam. ...... 2 / 18

Boosting: An Example Idea: combine weak “rules of thumb” to form a highly accurate predictor. Example: email spam detection. Given: a set of training examples. ◮ (“Attn: Beneficiary Contractor Foreign Money Transfer ...”, spam) ◮ (“Let’s meet to discuss QPR –Edo”, not spam) Obtain a classifier by asking a “weak learning algorithm”: ◮ e.g. contains the word “money” ⇒ spam. Reweight the examples so that “difficult” ones get more attention. ◮ e.g. spam that doesn’t contain “money”. Obtain another classifier: ◮ e.g. empty “to address” ⇒ spam. ...... At the end, predict by taking a (weighted) majority vote. 2 / 18

Online Boosting: Motivation Boosting is well studied in the batch setting, but become infeasible when the amount of data is huge. 3 / 18

Online Boosting: Motivation Boosting is well studied in the batch setting, but become infeasible when the amount of data is huge. Online learning has proven extremely useful: one pass of the data, make prediction on the fly. 3 / 18

Online Boosting: Motivation Boosting is well studied in the batch setting, but become infeasible when the amount of data is huge. Online learning has proven extremely useful: one pass of the data, make prediction on the fly. works even in an adversarial environment. ◮ e.g. spam detection. 3 / 18

Online Boosting: Motivation Boosting is well studied in the batch setting, but become infeasible when the amount of data is huge. Online learning has proven extremely useful: one pass of the data, make prediction on the fly. works even in an adversarial environment. ◮ e.g. spam detection. An natural question: how to extend boosting to the online setting? 3 / 18

Related Work Several algorithms exist (Oza and Russell, 2001; Grabner and Bischof, 2006; Liu and Yu, 2007; Grabner et al., 2008) . mimic offline counterparts. achieve great success in many real-world applications. no theoretical guarantees. 4 / 18

Related Work Several algorithms exist (Oza and Russell, 2001; Grabner and Bischof, 2006; Liu and Yu, 2007; Grabner et al., 2008) . mimic offline counterparts. achieve great success in many real-world applications. no theoretical guarantees. Chen et al. (2012): first online boosting algorithms with theoretical guarantees. online analogue of weak learning assumption. connecting online boosting and smooth batch boosting. 4 / 18

Batch Boosting Given a batch of T examples, ( x t , y t ) ∈ X × {− 1 , 1 } for t = 1 , . . . , T . Learner A predicts A ( x t ) ∈ {− 1 , 1 } for example x t . 5 / 18

Batch Boosting Given a batch of T examples, ( x t , y t ) ∈ X × {− 1 , 1 } for t = 1 , . . . , T . Learner A predicts A ( x t ) ∈ {− 1 , 1 } for example x t . Weak learner A (with edge γ ): � T t =1 1 {A ( x t ) � = y t } ≤ ( 1 2 − γ ) T 5 / 18

Batch Boosting Given a batch of T examples, ( x t , y t ) ∈ X × {− 1 , 1 } for t = 1 , . . . , T . Learner A predicts A ( x t ) ∈ {− 1 , 1 } for example x t . Weak learner A (with edge γ ): � T t =1 1 {A ( x t ) � = y t } ≤ ( 1 2 − γ ) T Strong learner A ′ (with any target error rate ǫ ): � T t =1 1 {A ′ ( x t ) � = y t } ≤ ǫ T 5 / 18

Batch Boosting Given a batch of T examples, ( x t , y t ) ∈ X × {− 1 , 1 } for t = 1 , . . . , T . Learner A predicts A ( x t ) ∈ {− 1 , 1 } for example x t . Weak learner A (with edge γ ): � T t =1 1 {A ( x t ) � = y t } ≤ ( 1 2 − γ ) T ⇓ Boosting (Schapire, 1990; Freund, 1995) Strong learner A ′ (with any target error rate ǫ ): � T t =1 1 {A ′ ( x t ) � = y t } ≤ ǫ T 5 / 18

Online Boosting Examples ( x t , y t ) ∈ X × {− 1 , 1 } arrive online, for t = 1 , . . . , T . Learner A observes x t and predicts A ( x t ) ∈ {− 1 , 1 } before seeing y t . Weak Online learner A (with edge γ ): � T t =1 1 {A ( x t ) � = y t } ≤ ( 1 2 − γ ) T Strong Online learner A ′ (with any target error rate ǫ ): � T t =1 1 {A ′ ( x t ) � = y t } ≤ ǫ T 5 / 18

Online Boosting Examples ( x t , y t ) ∈ X × {− 1 , 1 } arrive online, for t = 1 , . . . , T . Learner A observes x t and predicts A ( x t ) ∈ {− 1 , 1 } before seeing y t . Weak Online learner A (with edge γ and excess loss S ): � T t =1 1 {A ( x t ) � = y t } ≤ ( 1 2 − γ ) T + S Strong Online learner A ′ (with any target error rate ǫ and excess loss S ′ ) � T t =1 1 {A ′ ( x t ) � = y t } ≤ ǫ T + S ′ 5 / 18

Online Boosting Examples ( x t , y t ) ∈ X × {− 1 , 1 } arrive online, for t = 1 , . . . , T . Learner A observes x t and predicts A ( x t ) ∈ {− 1 , 1 } before seeing y t . Weak Online learner A (with edge γ and excess loss S ): � T t =1 1 {A ( x t ) � = y t } ≤ ( 1 2 − γ ) T + S ⇓ Online Boosting (our result) Strong Online learner A ′ (with any target error rate ǫ and excess loss S ′ ) � T t =1 1 {A ′ ( x t ) � = y t } ≤ ǫ T + S ′ 5 / 18

Online Boosting Examples ( x t , y t ) ∈ X × {− 1 , 1 } arrive online, for t = 1 , . . . , T . Learner A observes x t and predicts A ( x t ) ∈ {− 1 , 1 } before seeing y t . Weak Online learner A (with edge γ and excess loss S ): � T t =1 1 {A ( x t ) � = y t } ≤ ( 1 2 − γ ) T + S ⇓ Online Boosting (our result) Strong Online learner A ′ (with any target error rate ǫ and excess loss S ′ ) � T t =1 1 {A ′ ( x t ) � = y t } ≤ ǫ T + S ′ √ this talk: S = 1 γ (corresponds to T regret) 5 / 18

Main Results Parameters of interest: N = number of weak learners (of edge γ ) needed to achieve error rate ǫ . T ǫ = minimal number of examples s.t. error rate is ǫ . Algorithm Optimal? Adaptive? N T ǫ √ ˜ O ( 1 γ 2 ln 1 O ( 1 Online BBM ǫ ) ǫγ 2 ) × √ ˜ O ( 1 1 AdaBoost.OL ǫγ 2 ) O ( ǫ 2 γ 4 ) × O ( 1 O ( 1 ˜ Chen et al. (2012) ǫγ 2 ) ǫγ 2 ) × × 6 / 18

Structure of Online Boosting x 1 Booster

Structure of Online Boosting x 1 x 1 WL 1 predict y 1 ˆ 1 x 1 WL 2 predict y 2 ˆ 1 Booster . . . x 1 WL N predict y N ˆ 1

Structure of Online Boosting ˆ x 1 y 1 y 1 x 1 WL 1 predict y 1 ˆ 1 x 1 WL 2 predict y 2 ˆ 1 Booster . . . x 1 WL N predict y N ˆ 1

Structure of Online Boosting ˆ x 1 y 1 y 1 x 1 w.p. p 1 WL 1 WL 1 1 predict ( x 1 , y 1 ) update y 1 ˆ 1 x 1 w.p. p 2 WL 2 WL 2 1 predict update ( x 1 , y 1 ) y 2 ˆ 1 Booster . . . . . . x 1 w.p. p N WL N WL N 1 predict update ( x 1 , y 1 ) y N ˆ 1 7 / 18

Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer - PowerPoint PPT Presentation

Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University December 11, 2015 Boosting: An Example Idea: combine weak rules of

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Optimal and Adaptive Filtering Murat ney M.Uney@ed.ac.uk Institute for Digital Communications

Optimal and Adaptive Filtering Murat ney M.Uney@ed.ac.uk Institute for Digital Communications

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Optimal Minimal Margin Maximization with Boosting ndr 2. linje i overskriften til AU Passata

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

Java Java Basics Java Program Statements Java Review Conditional statements

NONPROFIT RESERVES & ENDOWMENTS: MANAGING THEM WISELY Please join the conversation! Use the

MONGODB A NoSQL , documen t -oriente d databas e DATABASES organized collections of data Databas

of the Norwegian Southern North Sea: Post Mortem studies of Selected Wells and Areas Ivar

in Continuous Touch-Based Authentication for Mobile Devices Vincent Sritapan Zaire Ali and Jamie

String-Object Transduction with Dogmatic P systems Jos M. Sempere Department of Information

Co nte nt-base d Onto lo g y Ranking Mathew Jones & Harith Alani 9th Intl. Protg

People, ideas, machines. @enricocoiera AUSTRALIAN INSTITUTE OF HEALTH INNOVATION 2014 (1) 2016

Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer - PowerPoint PPT Presentation

Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University December 11, 2015 Boosting: An Example Idea: combine weak rules of

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Optimal and Adaptive Filtering Murat ney M.Uney@ed.ac.uk Institute for Digital Communications

Optimal and Adaptive Filtering Murat ney M.Uney@ed.ac.uk Institute for Digital Communications

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Optimal Minimal Margin Maximization with Boosting ndr 2. linje i overskriften til AU Passata

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

Java Java Basics Java Program Statements Java Review Conditional statements

NONPROFIT RESERVES &amp; ENDOWMENTS: MANAGING THEM WISELY Please join the conversation! Use the

MONGODB A NoSQL , documen t -oriente d databas e DATABASES organized collections of data Databas

of the Norwegian Southern North Sea: Post Mortem studies of Selected Wells and Areas Ivar

in Continuous Touch-Based Authentication for Mobile Devices Vincent Sritapan Zaire Ali and Jamie

String-Object Transduction with Dogmatic P systems Jos M. Sempere Department of Information

Co nte nt-base d Onto lo g y Ranking Mathew Jones &amp; Harith Alani 9th Intl. Protg

People, ideas, machines. @enricocoiera AUSTRALIAN INSTITUTE OF HEALTH INNOVATION 2014 (1) 2016

NONPROFIT RESERVES & ENDOWMENTS: MANAGING THEM WISELY Please join the conversation! Use the

Co nte nt-base d Onto lo g y Ranking Mathew Jones & Harith Alani 9th Intl. Protg