Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 - PowerPoint PPT Presentation

Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 Wojciech Kot� Peter Gr¨ 1 National Research Institute for Mathematics and Computer Science (CWI) The Netherlands 2 University of Cambridge COLT 2010 1 / 14

Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 / 14

Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 Prediction strategies: Bayes strategy: “Follow the leader” strategy: achieves optimal regret simple to compute/update usually hard to calculate suboptimal 2 / 14

Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 Prediction strategies: Bayes strategy: “Follow the leader” strategy: achieves optimal regret simple to compute/update usually hard to calculate suboptimal 3 Our contribution “Follow the flattened leader” strategy: A slight modification of “follow the leader”. achieves performance of Bayes retains simplicity of ML 2 / 14

Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 Prediction strategies: Bayes strategy: “Follow the leader” strategy: achieves optimal regret simple to compute/update usually hard to calculate suboptimal 3 Our contribution “Follow the flattened leader” strategy: A slight modification of “follow the leader”. achieves performance of Bayes retains simplicity of ML 4 Applications: prediction, coding, model selection. 2 / 14

Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . 3 / 14

Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. 3 / 14

Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . 3 / 14

Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . After x n +1 is revealed, incur log-loss − log P ( x n +1 | x n ) . 3 / 14

Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . After x n +1 is revealed, incur log-loss − log P ( x n +1 | x n ) . Regret w.r.t. the best “expert” from M : n n � � R ( P, x n ) = − log P ( x i | x i − 1 ) − inf − log P µ ( x i | x i − 1 ) . µ ∈ Θ i =1 i =1 3 / 14

Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . After x n +1 is revealed, incur log-loss − log P ( x n +1 | x n ) . Regret w.r.t. the best “expert” from M : n n � � R ( P, x n ) = − log P ( x i | x i − 1 ) − inf − log P µ ( x i | x i − 1 ) . µ ∈ Θ i =1 i =1 Process generating the outcomes: adversarial: only boundedness assumptions on x n , stochastic: X 1 , X 2 , . . . i.i.d. ∼ P ∗ , possibly P ∗ / ∈ M , R ( P, X n ) is a random variable. 3 / 14

Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . 4 / 14

Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). 4 / 14

Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) 4 / 14

Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) ⇐ µ 0 undefined, P ( x 2 | x 1 ) = 0 . . . = ˆ 4 / 14

Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) ⇐ µ 0 undefined, P ( x 2 | x 1 ) = 0 . . . = ˆ i = #1+1 P ( ·| x i ) = P ˆ i ( · ) , ˆ µ ◦ n +2 (Laplace’s rule of succesion). µ ◦ 1 2 , 2 3 , 1 2 , 3 5 , 1 2 , 4 7 , 5 8 , 5 9 , 3 7 7 µ ◦ ˆ i : 5 , 11 , 12 . 4 / 14

Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) ⇐ µ 0 undefined, P ( x 2 | x 1 ) = 0 . . . = ˆ i = #1+1 P ( ·| x i ) = P ˆ i ( · ) , ˆ µ ◦ n +2 (Laplace’s rule of succesion). µ ◦ 1 2 , 2 3 , 1 2 , 3 5 , 1 2 , 4 7 , 5 8 , 5 9 , 3 7 7 µ ◦ ˆ i : 5 , 11 , 12 . If x ∞ such that for large n , ˆ µ n bounded away from { 0 , 1 } : R ( P, x n ) = 1 2 log n + O (1) . 4 / 14

Problem Statement 5 / 14

Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . 5 / 14

Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). 5 / 14

Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). 5 / 14

Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). Plug-in strategy: µ : X ∞ → Θ P plug-in ( x n +1 | x n ) = P ¯ µ ( x n ) ( x n +1 ) , ¯ U plug-in ( x n +1 | x n ) ∈ M (in-model strategy). 5 / 14

Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). Plug-in strategy: µ : X ∞ → Θ P plug-in ( x n +1 | x n ) = P ¯ µ ( x n ) ( x n +1 ) , ¯ U plug-in ( x n +1 | x n ) ∈ M (in-model strategy). µ ( x n ) = ˆ ML plug-in strategy (“follow the leader”) if ¯ µ ◦ n : n = n 0 x 0 + � n i =1 x i µ ◦ ˆ (smoothed ML estimator) n 0 + n 5 / 14

Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). Plug-in strategy: µ : X ∞ → Θ P plug-in ( x n +1 | x n ) = P ¯ µ ( x n ) ( x n +1 ) , ¯ U plug-in ( x n +1 | x n ) ∈ M (in-model strategy). µ ( x n ) = ˆ ML plug-in strategy (“follow the leader”) if ¯ µ ◦ n : n = n 0 x 0 + � n i =1 x i µ ◦ ˆ (smoothed ML estimator) n 0 + n R ( P plug-in , , x n ) ≥ c k 2 log n + O (1) , worst case: c ≫ 1 . 5 / 14

Contribution Bayes strategy: Plug-in strategy (incl. ML): (strategy outside the model) (strategy in the model) asympt. optimal regret: suboptimal: k c k 2 log n + O (1) 2 log n + O (1) usually hard to calculate simple to compute/update “Follow the Flattened Leader” A slight modification (“flattening”) of the ML plug-in strategy, “almost” in the model, achieving optimal regret. achieves performance of Bayes retains simplicity of ML 6 / 14

Motivating Example: Why Bayes is better than ML? M = {N ( µ, 1): µ ∈ R } . 7 / 14

Motivating Example: Why Bayes is better than ML? M = {N ( µ, 1): µ ∈ R } . ML strategy prediction: Bayes strategy prediction: � � 1 µ ◦ N (ˆ µ ◦ N ˆ n , 1 + n , 1) n + 1 7 / 14

Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 - PowerPoint PPT Presentation

Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 Wojciech Kot Peter Gr 1 National Research Institute for Mathematics and Computer Science (CWI) The Netherlands 2 University of Cambridge COLT 2010 1 / 14 Outline 1

Flattened Device Trees for embedded FreeBSD Rafa Jaworowski raj@semihalf.com, raj@FreeBSD.org

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Flattened Image Trees: A powerful kernel image format Feb 21, 2013 Joel A Fernandes

LEADER 2014-2020 LEADER in Co. Westmeath LEADER has been supporting communities and

LEADER & EMFF 2014-2020 Programmes Colin Gilmour Chair of Outer Hebrides LEADER Local

LEADER PRODUCTS (NZ) LEADER WEBSITE RETAILER NAIT TAG ORDERING GUIDE Leader Products Home Page

Leader Election Stefan Schmid @ T-Labs, 2011 Motivation Leader Election Nodes in network agree

operation. Presented by: Ed Dyson Date: 24 th September 2014, LEADER Exchange Group A. Network

LEADER in Scotland: its implementation & future prospects 22 nd November 2018 Alistair

Presidents Day of Service Team Leader Training Team Leader Training Overview I. Overview of

The Leader in Me 1 Presentation Overview History of Leader in Me in Beaufort County School

Single Secret Leader Election Dan Boneh Saba Eskandarian Lucjan Hanzlik Nicola Greco What is

Following the Leader LESSON 9 Your Response to the Lesson What was most interesting in the Bible

Cervical Case Study M. Benson, A. Felts, S. Kibiloski, J. Mowen, A. Rijhwani Medical Dx 35

Agropyron cristatum crested wheatgrass Perennial bunchgrass Introduced, cool- season

PHSA Town Hall: Returning Staff to the Workplace Process, Resources, Next Steps May 29, 2020

Fluctuation Smoothing Production Control at IBMs 200mm Wafer Fabricator James R. Morrison

Dynamics or diversity? An empirical appraisal of distinct means to measure inflation uncertainty

FY20 Superintendents Proposed Budget Presented to the LCPS School Board January 8, 2019 1

Low Sulfur Fuel, Vehicle Emission and Fuel Economy Standard Ahmad Safrudin KPBB Conclave of

Task 879.1: Intelligent Demand Aggregation and Forecasting Task Leader: Argon Chen

Spectral characterization of nonuniform behaviour Davor Dragi cevi c, UNSW (joint work with

INVESTOR UPDATE HALF YEAR 2019 RESULTS PRESENTATION 22 AUGUST 2019 African Export-Import Bank

NACo Annual Conference International Economic Development Task Force Meeting July 12, 2014 Who

Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 - PowerPoint PPT Presentation

Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 Wojciech Kot Peter Gr 1 National Research Institute for Mathematics and Computer Science (CWI) The Netherlands 2 University of Cambridge COLT 2010 1 / 14 Outline 1

Flattened Device Trees for embedded FreeBSD Rafa Jaworowski raj@semihalf.com, raj@FreeBSD.org

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Flattened Image Trees: A powerful kernel image format Feb 21, 2013 Joel A Fernandes

LEADER 2014-2020 LEADER in Co. Westmeath LEADER has been supporting communities and

LEADER &amp; EMFF 2014-2020 Programmes Colin Gilmour Chair of Outer Hebrides LEADER Local

LEADER PRODUCTS (NZ) LEADER WEBSITE RETAILER NAIT TAG ORDERING GUIDE Leader Products Home Page

Leader Election Stefan Schmid @ T-Labs, 2011 Motivation Leader Election Nodes in network agree

operation. Presented by: Ed Dyson Date: 24 th September 2014, LEADER Exchange Group A. Network

LEADER in Scotland: its implementation &amp; future prospects 22 nd November 2018 Alistair

Presidents Day of Service Team Leader Training Team Leader Training Overview I. Overview of

The Leader in Me 1 Presentation Overview History of Leader in Me in Beaufort County School

Single Secret Leader Election Dan Boneh Saba Eskandarian Lucjan Hanzlik Nicola Greco What is

Following the Leader LESSON 9 Your Response to the Lesson What was most interesting in the Bible

Cervical Case Study M. Benson, A. Felts, S. Kibiloski, J. Mowen, A. Rijhwani Medical Dx 35

Agropyron cristatum crested wheatgrass Perennial bunchgrass Introduced, cool- season

PHSA Town Hall: Returning Staff to the Workplace Process, Resources, Next Steps May 29, 2020

Fluctuation Smoothing Production Control at IBMs 200mm Wafer Fabricator James R. Morrison

Dynamics or diversity? An empirical appraisal of distinct means to measure inflation uncertainty

FY20 Superintendents Proposed Budget Presented to the LCPS School Board January 8, 2019 1

Low Sulfur Fuel, Vehicle Emission and Fuel Economy Standard Ahmad Safrudin KPBB Conclave of

Task 879.1: Intelligent Demand Aggregation and Forecasting Task Leader: Argon Chen

Spectral characterization of nonuniform behaviour Davor Dragi cevi c, UNSW (joint work with

INVESTOR UPDATE HALF YEAR 2019 RESULTS PRESENTATION 22 AUGUST 2019 African Export-Import Bank

NACo Annual Conference International Economic Development Task Force Meeting July 12, 2014 Who

LEADER & EMFF 2014-2020 Programmes Colin Gilmour Chair of Outer Hebrides LEADER Local

LEADER in Scotland: its implementation & future prospects 22 nd November 2018 Alistair