Adversarial Online Learning with noise Alon Resler Yishay Mansour Tel Aviv University Jun 13, 2019 Alon Resler Yishay Mansour (TAU) Online Learning with noise Jun 13, 2019 1 / 5
Adversarial bandits A T rounds game between a learner and an adversary Set of K actions A = { 1 , . . . , K } On round t : ℓ t ∈ { 0 , 1 } K where ℓ i , t is the loss ◮ The adversary selects a loss vector � associated with action i at round t ◮ The learner chooses an action I t (usually random) ◮ The learner incurs a loss ℓ I t , t ◮ Finally, the learner observes a feedback Alon Resler Yishay Mansour (TAU) Online Learning with noise Jun 13, 2019 2 / 5
Feedback Types and Regret Full information feedback : the learner observes � ℓ t Bandit feedback : the learner observes ℓ I t , t The learner goal is to minimize the expected regret: � T T � � � Regret ( T ) = E ℓ I t , t − min ℓ i , t i ∈ A t =1 t =1 We say that the algorithm has vanishing regret if Regret ( T ) = o ( T ) Alon Resler Yishay Mansour (TAU) Online Learning with noise Jun 13, 2019 3 / 5
Our work We study online learning settings in which the feedback is corrupted by random noise We consider binary losses xored with the noise, which is a Bernoulli random variable We consider both settings: bandit feedback and full information feedback Alon Resler Yishay Mansour (TAU) Online Learning with noise Jun 13, 2019 4 / 5
Results Summary Feedback type \ Noise model Constant noise Variable noise (Uniform) √ Θ( T 2 / 3 ln 1 / 3 K ) Θ( 1 Full information (known noise) T ln K ) ǫ √ Θ( 1 Full Information (unknown noise) T ln K ) Θ( T ) ǫ √ Θ( 1 ˜ ˜ Θ( T 2 / 3 K 1 / 3 ) Bandit (known noise) TK ) ǫ √ Θ( 1 ˜ Bandit (unknown noise) TK ) Θ( T ) ǫ Poster @ Pacific Ballroom #156 Alon Resler Yishay Mansour (TAU) Online Learning with noise Jun 13, 2019 5 / 5
Recommend
More recommend