Follow the leader if you can, Hedge if you must Tim van Erven - PowerPoint PPT Presentation

Follow the leader if you can, Hedge if you must Tim van Erven NIPS, 2013 Joint work with: Steven de Rooij Peter Gr ü nwald Wouter Koolen

Outline ● Follow-the-Leader: – works well for `easy' data : few leader changes, i.i.d. – but not robust to worst-case data ● Exponential weights with simple tuning: – robust , but does not exploit easy data ● Second-order bounds: – robust against worst case + can exploit i.i.d. data – but do not exploit few leader changes in general ● FlipFlop: robust + as good as FTL

Sequential Prediction with Expert Advice ● experts sequentially predict data ● Goal: predict (almost) as well as the best expert on average ● Applications: – online convex optimization – predicting electricity consumption – predicting air pollution levels – spam detection – ...

Set-up: Repeated Game ● Every round : 1. Predict probability distribution on experts 2. Observe expert losses 3. Our loss is Goal: minimize regret Loss of the best expert where

Follow-the-Leader ● Deterministically choose the expert that has predicted best in the past: where ● Equivalently:

FTL: the Good News ● Regret bounded by nr of leader changes ● Proof sketch: – If the leader does not change, our loss is the same as the loss of the leader, so the regret stays the same – If the leader does change, our regret increases at most by 1 (range of losses) ● Works well for i.i.d. losses, because the leader changes only finitely many times w.h.p.

FTL on IID Losses ● 4 experts with Bernoulli 0.1, 0.2, 0.3, 0.4 losses

FTL Worst-case Losses

Exponential Weights ● Follow-the-Leader: ● Exponential weights : add KL divergence from uniform distribution as a regularizer ● : recover FTL (aggressive learning) ● As closer to : closer to uniform distribution (more conservative learning)

Simple Tuning: the Good News ● Worst-case optimal for : Regret ● Proof idea: – approximate our loss: – by the mix loss : – and bound the approximation error :

Simple Tuning: the Good News our loss = mix loss + approx. error ● Cumulative mix loss is close to : ● Hoeffding's bound: Balances the two terms ● Together:

Lost Advantages of FTL ● Simple tuning does much worse than FTL on i.i.d. losses

Simple Tuning: the Bad News ● The bad news: – = conservative learning – In practice, better when learning rate does not go to 0 with ! [DGGS, 2013] – Lost advantages of FTL! ● We want to exploit luckiness : – robust against worst-case losses; but – if the data are `easy', we should learn faster!

Luckiness: Exploiting Easy Data ● Improvement for small losses: Regret variance of ● Second-order Bounds: – [CBMS, 2007] and AdaHedge: – Related bound by [HK, 2008]

2 nd -order Bounds: I.I.D. Data variance of ● Regret bound: ● For IID data, concentrates fast on best expert: Regret

2 nd -order Bounds: I.I.D. Data Recover FTL benefits for i.i.d. data

CBMS: Proof Idea our loss = mix loss + approx. error ● Cumulative mix loss is close to : ● Bernstein's bound: ● Together: balancing Regret

AdaHedge: Proof Idea our loss = mix loss + approx. error ● Cumulative mix loss is close to : ● No bound : ● Together: balancing Regret

AdaHedge: Proof Idea our loss = mix loss + approx. error ● Cumulative mix loss is close to : ● No bound : NB Bernstein's bound is pretty sharp, so in practice CBMS ≈ AdaHedge up to constants. ● Together: balancing Regret

Tuning Online ● Balancing in CBMS and AdaHedge depends on unknown quantities ● Solve this by changing with ● Problem: breaks Lemma [KV, 2005] : If , then

2nd-order Bounds: the Bad News ● Do not recover FTL benefits for other `easy' data with a small number of leader changes

Luckiness: Exploiting Easy Data ● Improvement for small losses: Regret ● Second-order Bounds: – [CBMS, 2007] and AdaHedge: – Related bound by [HK, 2008] ● FlipFlop: – “Follow the leader if you can, Hedge if you must” – Regret best of AdaHedge and FTL

FlipFlop ● FlipFlop bound: FTL Regret Regret AdaHedge Regret Bound ● Alternate Flip and Flop regimes – Flip: Tune like FTL – Flop: Tune like AdaHedge ( No restarts of the algorithm, like in `doubling trick'!) ●

FlipFlop: Proof Ideas ● Alternate Flip and Flop regimes – Flip: Tune like FTL – Flop: Tune like AdaHedge ● Analysing two regimes: 1. Relate mix loss for Flip to mix loss for Flop 2. Keep approximation errors balanced between regimes

1. Relating Mix Losses ● We violate condition of KV-lemma: ● But:

2. Balance Approximation Errors ● Alternate regimes to keep approximation errors balanced: Regret FTL Bound AdaHedge Bound

Small Nr Leader Changes Again ● FlipFlop exploits easy data , AdaHedge does not

FTL Worst-case Again

Summary ● Follow-the-Leader: – works well for `easy' data : i.i.d., few leader changes – but not robust to worst-case data ● Second-order bounds (e.g. CBMS, AdaHedge) : – robust against worst case + can exploit i.i.d. data – but do not exploit few leader changes in general ● FlipFlop: best of both worlds

Luckiness: What's Missing? ● FlipFlop: – “Follow the leader if you can, Hedge if you must” – Regret best of AdaHedge and FTL ● But what if optimal is in between AdaHedge and FTL? ● Can we compete with the best possible chosen in hindsight?

References Cesa-Bianchi and Lugosi. Prediction, learning, and games. 2006. ● Cesa-Bianchi, Mansour, Stoltz. Improved second-order bounds for prediction with ● expert advice. Machine Learning, 66(2/3):321–352, 2007. Devaine, Gaillard, Goude, Stoltz. Forecasting electricity consumption by ● aggregating specialized experts. Machine Learning, 90(2):231-260, 2013. Van Erven, Grünwald, Koolen and De Rooij. Adaptive Hedge. NIPS 2011. ● Hazan, Kale. Extracting certainty from uncertainty: Regret bounded by variation in ● costs. COLT 2008. De Rooij, Van Erven, Grünwald, Koolen. Follow the Leader If You Can, Hedge If ● You Must . Accepted by the Journal of Machine Learning Research, 2013.

EXTRA SLIDES

No Need to Pre-process Losses ● Common assumption requires translating and rescaling the losses ● CBMS: – Extension so this is not necessary . Important when range of losses is unknown! ● AdaHedge and FlipFlop: – Invariant under rescaling and translation of losses, so get this for free .

2 nd -order Bounds: I.I.D. Data variance of ● Regret bound: ● If concentrates fast on best expert, then Regret ● IID data: 1. Balancing is large for all 2. concentrates fast 3. Then 1. also holds for

FlipFlop on I.I.D. Data

Example: Spam Detection

Example: Spam Detection ● Data: with ● Predictions: probability that ● Loss (probability of wrong label): ● Experts: spam detection algorithms ● If expert predicts , then ● Regret: expected nr. mistakes over expected nr. of mistakes of best algorithm

FTL: the Bad News ● Consider two trivial spam detectors (experts): ● If we deterministically choose an expert (like FTL) then we could be wrong all the time: Regret: ● Let denote the number of times expert 1 has loss 1. Then ● Linear regret =

Follow the leader if you can, Hedge if you must Tim van Erven - PowerPoint PPT Presentation

Follow the leader if you can, Hedge if you must Tim van Erven NIPS, 2013 Joint work with: Steven de Rooij Peter Gr nwald Wouter Koolen Outline Follow-the-Leader: works well for `easy' data : few leader changes, i.i.d. but not

Hedge Fund of Funds VI. 2 Portfolio View: Hedge Fund-of-Funds Section One: Why Consider Hedge

Intro on Hedge Funds AQF-2005 Hedge Funds What are hedge funds? Why is their

Hedge Planting In Whitworth Park Hedge Bed Preparation Planting Hedge Plants Wildflower Planting

Hedge Fund Derivatives Date : 18 Feb 2011 Produced by : Angelo De Pol Contents 1. Introduction

"Hedge That Puppy Capital" Hedge Fund Style Our strategy is based off of a

Objectives Follow Sets Explain the purpose of the follow set. Dr. Mattox Beckman Be able

1. We must SEE Jesus clearly 1. We must SEE Jesus clearly 1. We must SEE Jesus clearly 1. We

Does Experience Matter for Hedge Fund Managers? Effects of Industry Expertise on Hedge Fund

The Russian Hedge Fund Universe data as of December 31, 2015 [presentation dated April 7, 2016]

IFM 2003 Geneva 2003 Alternative Strategies Hedge Funds Geneva, February 2003 Hedge funds

HEDGE FUND ADVISER REGISTRATION AND COMPLIANCE Cary J. Meer Mark D. Perlow Hedge Fund Adviser

Q Group October 19, 2011 Institutional Quality Hedge Funds David A Hsieh (c) David A. Hsieh,

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Integrating Hedge Funds into the Traditional Portfolio AQF -2005 Which moment matters most?

Understanding the Distribution of Hedge Fund Returns Dan diBartolomeo CFA So Paulo August

6 th Annual Funds Forum China Hedge Fund Management & Risk Control: Creating Stability in

MATH 676 Finite element methods in scientifjc computing Wolfgang Bangerth, T exas A&M

Numerical solutions of classical equations of motion Newton s laws govern the dynamics of

Second Order Predicting- Error Sorting for Reversible Data Hiding Jiajia Xu, Hang Zhou, Weiming

Verification Summary Report Updates Wendy Barkley Assistant Director OSPI Child Nutrition

Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical

Expressions and Types The Three Main Concepts 1.0 / 3.0 Expressions 34 * (23 + 14)

Applied Political Research Session 3 Statistical Significance & Tests of Hypotheses

Valex: A New Mini-Language: Valex Mulple Value Types, Condionals, Valex extends Bindex in the

Follow the leader if you can, Hedge if you must Tim van Erven - PowerPoint PPT Presentation

Follow the leader if you can, Hedge if you must Tim van Erven NIPS, 2013 Joint work with: Steven de Rooij Peter Gr nwald Wouter Koolen Outline Follow-the-Leader: works well for `easy' data : few leader changes, i.i.d. but not

Hedge Fund of Funds VI. 2 Portfolio View: Hedge Fund-of-Funds Section One: Why Consider Hedge

Intro on Hedge Funds AQF-2005 Hedge Funds What are hedge funds? Why is their

Hedge Planting In Whitworth Park Hedge Bed Preparation Planting Hedge Plants Wildflower Planting

Hedge Fund Derivatives Date : 18 Feb 2011 Produced by : Angelo De Pol Contents 1. Introduction

&quot;Hedge That Puppy Capital&quot; Hedge Fund Style Our strategy is based off of a

Objectives Follow Sets Explain the purpose of the follow set. Dr. Mattox Beckman Be able

1. We must SEE Jesus clearly 1. We must SEE Jesus clearly 1. We must SEE Jesus clearly 1. We

Does Experience Matter for Hedge Fund Managers? Effects of Industry Expertise on Hedge Fund

The Russian Hedge Fund Universe data as of December 31, 2015 [presentation dated April 7, 2016]

IFM 2003 Geneva 2003 Alternative Strategies Hedge Funds Geneva, February 2003 Hedge funds

HEDGE FUND ADVISER REGISTRATION AND COMPLIANCE Cary J. Meer Mark D. Perlow Hedge Fund Adviser

Q Group October 19, 2011 Institutional Quality Hedge Funds David A Hsieh (c) David A. Hsieh,

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Integrating Hedge Funds into the Traditional Portfolio AQF -2005 Which moment matters most?

Understanding the Distribution of Hedge Fund Returns Dan diBartolomeo CFA So Paulo August

6 th Annual Funds Forum China Hedge Fund Management &amp; Risk Control: Creating Stability in

MATH 676 Finite element methods in scientifjc computing Wolfgang Bangerth, T exas A&amp;M

Numerical solutions of classical equations of motion Newton s laws govern the dynamics of

Second Order Predicting- Error Sorting for Reversible Data Hiding Jiajia Xu, Hang Zhou, Weiming

Verification Summary Report Updates Wendy Barkley Assistant Director OSPI Child Nutrition

Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical

Expressions and Types The Three Main Concepts 1.0 / 3.0 Expressions 34 * (23 + 14)

Applied Political Research Session 3 Statistical Significance &amp; Tests of Hypotheses

Valex: A New Mini-Language: Valex Mul*ple Value Types, Condi*onals, Valex extends Bindex in the

"Hedge That Puppy Capital" Hedge Fund Style Our strategy is based off of a

6 th Annual Funds Forum China Hedge Fund Management & Risk Control: Creating Stability in

MATH 676 Finite element methods in scientifjc computing Wolfgang Bangerth, T exas A&M

Applied Political Research Session 3 Statistical Significance & Tests of Hypotheses

Valex: A New Mini-Language: Valex Mulple Value Types, Condionals, Valex extends Bindex in the