Following the Flattened Leader lowski 1 unwald 1 Steven de Rooij 2 Wojciech Kot� Peter Gr¨ 1 National Research Institute for Mathematics and Computer Science (CWI) The Netherlands 2 University of Cambridge COLT 2010 1 / 14
Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 / 14
Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 Prediction strategies: Bayes strategy: “Follow the leader” strategy: achieves optimal regret simple to compute/update usually hard to calculate suboptimal 2 / 14
Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 Prediction strategies: Bayes strategy: “Follow the leader” strategy: achieves optimal regret simple to compute/update usually hard to calculate suboptimal 3 Our contribution “Follow the flattened leader” strategy: A slight modification of “follow the leader”. achieves performance of Bayes retains simplicity of ML 2 / 14
Outline 1 Sequential prediction with log-loss. Set of experts = exponential family. 2 Prediction strategies: Bayes strategy: “Follow the leader” strategy: achieves optimal regret simple to compute/update usually hard to calculate suboptimal 3 Our contribution “Follow the flattened leader” strategy: A slight modification of “follow the leader”. achieves performance of Bayes retains simplicity of ML 4 Applications: prediction, coding, model selection. 2 / 14
Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . 3 / 14
Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. 3 / 14
Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . 3 / 14
Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . After x n +1 is revealed, incur log-loss − log P ( x n +1 | x n ) . 3 / 14
Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . After x n +1 is revealed, incur log-loss − log P ( x n +1 | x n ) . Regret w.r.t. the best “expert” from M : n n � � R ( P, x n ) = − log P ( x i | x i − 1 ) − inf − log P µ ( x i | x i − 1 ) . µ ∈ Θ i =1 i =1 3 / 14
Sequential Prediction Family of distributions (model) M = { P µ | µ ∈ Θ } . Sequence of outcomes x 1 , x 2 , . . . ∈ X ∞ , revealed one by one. In each iteration, after observing x n = x 1 , x 2 , . . . , x n , predict x n +1 by assigning a distribution P ( ·| x n ) . After x n +1 is revealed, incur log-loss − log P ( x n +1 | x n ) . Regret w.r.t. the best “expert” from M : n n � � R ( P, x n ) = − log P ( x i | x i − 1 ) − inf − log P µ ( x i | x i − 1 ) . µ ∈ Θ i =1 i =1 Process generating the outcomes: adversarial: only boundedness assumptions on x n , stochastic: X 1 , X 2 , . . . i.i.d. ∼ P ∗ , possibly P ∗ / ∈ M , R ( P, X n ) is a random variable. 3 / 14
Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . 4 / 14
Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). 4 / 14
Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) 4 / 14
Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) ⇐ µ 0 undefined, P ( x 2 | x 1 ) = 0 . . . = ˆ 4 / 14
Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) ⇐ µ 0 undefined, P ( x 2 | x 1 ) = 0 . . . = ˆ i = #1+1 P ( ·| x i ) = P ˆ i ( · ) , ˆ µ ◦ n +2 (Laplace’s rule of succesion). µ ◦ 1 2 , 2 3 , 1 2 , 3 5 , 1 2 , 4 7 , 5 8 , 5 9 , 3 7 7 µ ◦ ˆ i : 5 , 11 , 12 . 4 / 14
Sequential Prediction: Example M = { P µ | µ ∈ [0 , 1] } , P µ Bernoulli. x n = 1010110110 . µ n = #1 n ( = 3 Best expert in M : P ˆ µ n , ˆ 5 ). “Follow the leader” prediction strategy: P ( ·| x i ) = P ˆ µ i ( · ) ⇐ µ 0 undefined, P ( x 2 | x 1 ) = 0 . . . = ˆ i = #1+1 P ( ·| x i ) = P ˆ i ( · ) , ˆ µ ◦ n +2 (Laplace’s rule of succesion). µ ◦ 1 2 , 2 3 , 1 2 , 3 5 , 1 2 , 4 7 , 5 8 , 5 9 , 3 7 7 µ ◦ ˆ i : 5 , 11 , 12 . If x ∞ such that for large n , ˆ µ n bounded away from { 0 , 1 } : R ( P, x n ) = 1 2 log n + O (1) . 4 / 14
Problem Statement 5 / 14
Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . 5 / 14
Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). 5 / 14
Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). 5 / 14
Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). Plug-in strategy: µ : X ∞ → Θ P plug-in ( x n +1 | x n ) = P ¯ µ ( x n ) ( x n +1 ) , ¯ U plug-in ( x n +1 | x n ) ∈ M (in-model strategy). 5 / 14
Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). Plug-in strategy: µ : X ∞ → Θ P plug-in ( x n +1 | x n ) = P ¯ µ ( x n ) ( x n +1 ) , ¯ U plug-in ( x n +1 | x n ) ∈ M (in-model strategy). µ ( x n ) = ˆ ML plug-in strategy (“follow the leader”) if ¯ µ ◦ n : n = n 0 x 0 + � n i =1 x i µ ◦ ˆ (smoothed ML estimator) n 0 + n 5 / 14
Problem Statement M = { P µ | µ ∈ Θ } is k -parameter exponential family Bernoulli, Gaussian, Poisson, gamma, beta, geometric, χ 2 , . . . Mean-value parametrization, µ = E [ X ] . Bayes strategy: � P bayes ( x n +1 | x n ) = P µ ( x n +1 ) d π ( µ | x n ) Θ P bayes ( x n +1 | x n ) / ∈ M (strategy outside model). R ( P bayes , x n ) = k 2 log n + O (1) (asympt. optimal). Plug-in strategy: µ : X ∞ → Θ P plug-in ( x n +1 | x n ) = P ¯ µ ( x n ) ( x n +1 ) , ¯ U plug-in ( x n +1 | x n ) ∈ M (in-model strategy). µ ( x n ) = ˆ ML plug-in strategy (“follow the leader”) if ¯ µ ◦ n : n = n 0 x 0 + � n i =1 x i µ ◦ ˆ (smoothed ML estimator) n 0 + n R ( P plug-in , , x n ) ≥ c k 2 log n + O (1) , worst case: c ≫ 1 . 5 / 14
Contribution Bayes strategy: Plug-in strategy (incl. ML): (strategy outside the model) (strategy in the model) asympt. optimal regret: suboptimal: k c k 2 log n + O (1) 2 log n + O (1) usually hard to calculate simple to compute/update “Follow the Flattened Leader” A slight modification (“flattening”) of the ML plug-in strategy, “almost” in the model, achieving optimal regret. achieves performance of Bayes retains simplicity of ML 6 / 14
Motivating Example: Why Bayes is better than ML? M = {N ( µ, 1): µ ∈ R } . 7 / 14
Motivating Example: Why Bayes is better than ML? M = {N ( µ, 1): µ ∈ R } . ML strategy prediction: Bayes strategy prediction: � � 1 µ ◦ N (ˆ µ ◦ N ˆ n , 1 + n , 1) n + 1 7 / 14
Recommend
More recommend