Setting Growing experts in the specialist setting Growing experts and sequences of experts Efficient tracking of a growing number of experts Jaouad Mourtada & Odalric-ambrym Maillard CMAP, École Polytechnique & Sequel, INRIA Lille – Nord Europe ALT 2017, Kyoto University Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Prediction with expert advice Growing experts in the specialist setting Sequentially incoming forecasters Growing experts and sequences of experts 1 Setting Growing experts in the specialist setting 2 Growing experts and sequences of experts 3 Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Prediction with expert advice Growing experts in the specialist setting Sequentially incoming forecasters Growing experts and sequences of experts Prediction with expert advice Well studied, standard framework for online learning (see [Cesa-Bianchi and Lugosi, 2006]) Aim: combine the forecasts of several experts = ⇒ predict almost as well as the best of them Adversarial/worst case setting (no stochasticity assumption on the signal) Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Prediction with expert advice Growing experts in the specialist setting Sequentially incoming forecasters Growing experts and sequences of experts Formal setting X prediction space, Y signal space, ℓ : X × Y → R loss function Experts i = 1 , . . . , M Prediction with expert advice At each time step t = 1 , 2 , . . . 1 Experts i = 1 , . . . , M output predictions x i , t ∈ X 2 Forecaster predicts x t ∈ X 3 Environment chooses signal value y t ∈ Y 4 Experts i = 1 , . . . , M incur loss ℓ i , t := ℓ ( x i , t , y t ) , forecaster gets loss ℓ t := ℓ ( x t , y t ) Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Prediction with expert advice Growing experts in the specialist setting Sequentially incoming forecasters Growing experts and sequences of experts Formal setting Prediction with expert advice At each time step t = 1 , 2 , . . . 1 Experts i = 1 , . . . , M output predictions x i , t ∈ X 2 Forecaster predicts x t ∈ X 3 Environment chooses signal value y t ∈ Y 4 Experts i = 1 , . . . , M incur loss ℓ i , t := ℓ ( x i , t , y t ) , forecaster gets loss ℓ t := ℓ ( x t , y t ) Goal: strategy for the Forecaster with controlled worst-case regret T T � � R i , T = L T − L i , T = ℓ t − ℓ i , t t = 1 t = 1 Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Prediction with expert advice Growing experts in the specialist setting Sequentially incoming forecasters Growing experts and sequences of experts Assumption on the loss function Assumption ( η -Exp-concavity) Loss function ℓ is η - exp-concave for some η > 0, i.e. for every y ∈ Y , the function exp ( − η ℓ ( · , y )) : X → R + is concave. Important examples: Logarithmic , or self-information loss: X = P ( Y ) , ℓ ( p , y ) = − log p ( { y } ) Square loss on a bounded domain: X = Y = [ a , b ] , ℓ ( x , y ) = ( x − y ) 2 , η = 1 2 ( b − a ) 2 NOT the absolute loss ℓ ( x , y ) = | x − y | on [ 0 , 1 ] 2 Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Prediction with expert advice Growing experts in the specialist setting Sequentially incoming forecasters Growing experts and sequences of experts The exponential weights algorithm x it : prediction of expert i at time t Exponential weights/Hedge algorithm M π i e − η L i , t − 1 � x t = v i , t x i , t v i , t = � M j = 1 π j e − η L j , t − 1 i = 1 with π = ( π i ) 1 � i � M a prior probability distribution on the experts Start with v 1 = π . At end of round t � 1, after predicting and seeing losses ℓ i , t , update v t + 1 by setting it to the posterior distribution v m t : v i , t e − η ℓ i , t v i , t + 1 = v m i , t = � M j = 1 v j , t e − η ℓ j , t Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Prediction with expert advice Growing experts in the specialist setting Sequentially incoming forecasters Growing experts and sequences of experts Regret of the Hedge algorithm Proposition (Vovk, Littlestone & Warmuth) If ℓ is η -exp-concave, the Exponential Weights algorithm with prior π achieves the regret bound: L T − L i , T � 1 η log 1 ∀ i = 1 , . . . , M , . (1) π i In particular, if π = 1 M 1 is uniform, 1 � i � M L i , T + 1 L T � min η log M . (2) Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Prediction with expert advice Growing experts in the specialist setting Sequentially incoming forecasters Growing experts and sequences of experts Sequentially incoming forecasters What if new experts (algorithms, methods, new data/variables. . . ) become available over time ? How to incorporate them, with formal regret guarantees ? Proposed setting: Growing set of experts . M t increases over time, and is unknown in advance ; at time t , new experts i = M t − 1 + 1 , . . . , M t start issuing predictions rounds t . . . expert 1 • • • • • • • expert 2 • • • • • • • expert 3 • • • • • • . • • • • . . • • • • τ 3 = 2 τ 1 = τ 2 = 1 τ 4 = τ 5 = 4 Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Prediction with expert advice Growing experts in the specialist setting Sequentially incoming forecasters Growing experts and sequences of experts Objective Design forecasting strategies for the “Growing number of experts” setting, with emphasis on: computationally inexpensive strategies: ideal complexity O ( M t ) at step t anytime strategies: no fixed time horizon T no a priori knowledge of M t no free parameters to tune regret bounds against several classes of competitors , that are adaptive to the parameters of the comparison class Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Regret against constant experts Growing experts in the specialist setting The specialist setting Growing experts and sequences of experts From SpecialistHedge to GrowingHedge 1 Setting Growing experts in the specialist setting 2 Growing experts and sequences of experts 3 Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Regret against constant experts Growing experts in the specialist setting The specialist setting Growing experts and sequences of experts From SpecialistHedge to GrowingHedge Growing experts Recall the framework: At time t , experts i = 1 , . . . , M t issue predictions; i.e. at time t , m t := M t − M t − 1 new experts i = M t − 1 + 1 , . . . , M t enter τ i = inf { t � 1 | i � M t } entry time of expert i First notion of regret = constant experts : for each i , T � R i , T = ( ℓ t − ℓ i , t ) t = τ i − → “specialist trick” Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Regret against constant experts Growing experts in the specialist setting The specialist setting Growing experts and sequences of experts From SpecialistHedge to GrowingHedge rounds t . . . expert 1 • • • • • • • expert 2 • • • • • • • expert 3 • • • • • • . • • • • . . • • • • τ 1 = τ 2 = 1 τ 4 = τ 5 = 4 τ 3 = 2 • — comparison expert Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Regret against constant experts Growing experts in the specialist setting The specialist setting Growing experts and sequences of experts From SpecialistHedge to GrowingHedge The specialist setting Introduced by [Freund et al., 1997] Specialists i = 1 , . . . , M ; at each time step t , only a subset A t ⊂ { 1 , . . . , M } of active specialists output a prediction x i , t Goal : minimize “regret” with respect to each specialist i � R i , T = ( ℓ t − ℓ i , t ) t � T : i ∈ A t Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Setting Regret against constant experts Growing experts in the specialist setting The specialist setting Growing experts and sequences of experts From SpecialistHedge to GrowingHedge The “specialist trick” [Chernov and Vovk, 2009] General method to turn an “expert” algorithm into a “specialist” algorithm Idea : “complete” specialists’ predictions by making inactive specialists i �∈ A t predict the same as the forecaster x i , t := x t Jaouad Mourtada & Odalric-ambrym Maillard Efficient tracking of a growing number of experts
Recommend
More recommend