1/7 Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payofgs Department of Computer Science and Engineering The Chinese University of Hong Kong NeurIPS, Dec. 2018 Han Shao ∗ , Xiaotian Yu ∗ , Irwin King and Michael R. Lyu
2/7 True Learning setting Optimal at t Linear Stochastic Bandits (LSB) Optimal Empirically Exploitation Exploration Previous setting x 1 , t ∈ R d x 4 , t ▶ 1. Given a set of arms represented by D ⊆ R d ▶ 2. At time t , select an arm x t ∈ D , and observe y t ( x t ) = ⟨ x t , θ ∗ ⟩ + η t ▶ 3. The goal is to maximize ∑ T t =1 E [ y t ( x t )] ▶ 4. η t follows a sub-Gaussian distribution ( E [ η 2 t ] < ∞ )
3/7 What Is A Heavy-Tailed Distribution? Practical scenarios Gaussian NASDAQ returns 1. Delays in communication networks (Liebeherr et al., 2012) 2. Analysis of biological data (Burnecki et al., 2015) 3. ... ▶ High-probability extreme returns in fjnancial markets ▶ Many other real cases
4/7 t MAB sub-Gaussian LSB with Heavy-Tailed Payofgs (1) (Bubeck et al., 2013) Problem defjnition LSB ▶ Multi-armed bandits (MAB) with heavy-tailed payofgs E [ η 1+ ϵ ] < + ∞ , where ϵ ∈ (0 , 1] ▶ Our setting: LSB with η t satisfying Eq. (1) ▶ Weaker assumption than sub-Gaussian ▶ Medina and Yang (2016) studied LSB with heavy-tailed payofgs heavy-tailed ( ϵ = 1 ) 1 1 2 ) 2 ) by Bubeck et al. (2013) O ( T O ( T � 1 � 3 2 ) 4 ) by Medina and Yang (2016) O ( T O ( T ▶ Can we achieve � 1 2 ) ? O ( T
5/7 Algorithm: Median of means under OFU (MENU) Framework comparison with MoM by Medina and Yang (2016)
6/7 algorithm MoM MENU CRT TOFU regret Regret Bounds ▶ Upper bounds 1+2 ϵ 1 1 � � 1 � 2(1+ ϵ ) ) � 1 1+3 ϵ ) 1+ ϵ ) 2 + 1+ ϵ ) O ( T O ( T O ( T O ( T 1 1+ ϵ ) ▶ Lower bound: Ω( T 1 When ϵ = 1 , our algorithms achieve � 2 ) O ( T
7/7 See You at the Poster Session Time: Dec. 5th, 10:45 AM – 12:45 PM Location: Room 210 & 230 AB #158
Recommend
More recommend