Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury Joint work with Aditya Gopalan Department of ECE, Indian Institute of Science NeurIPS, Dec. 2019
Black-box optimization 1.5 Problem: Maximize an unknown 1 utility function f : D → R by 0.5 0 f(x) Sequentially querying f at inputs x 1 , x 2 , . . . , x T and − 0.5 − 1 Observing noisy function D evaluations: y t = f ( x t ) + ǫ t − 1.5 − 2 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1 T � � f ( x ⋆ ) − f ( x t ) � Want: Low cumulative regret: t =1 1
Heavy-tailed noise Motivation: Significant chance of very high/low values Corrupted measurements Bursty traffic flow distributions Price fluctuations in financial and insurance data eg. Student’s- t , Pareto, Cauchy etc. Existing works assume light-tailed noise (e.g. Srinivas et. al ’11, Hernandez-Lobato et al.’14, ...) Question: Bayesian optimization algorithms with guarantees under heavy-tailed noise? 2
Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : 3
Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : Choose the query point x t using current GP posterior and a suitable 1 parameter β t : x t = argmax µ t − 1 ( x ) + β t σ t − 1 ( x ) x ∈ D 3
Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : Choose the query point x t using current GP posterior and a suitable 1 parameter β t : x t = argmax µ t − 1 ( x ) + β t σ t − 1 ( x ) x ∈ D Truncate the observed payoff y t using a suitable threshold b t : 2 y t = y t 1 | y t |≤ b t ˆ 3
Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : Choose the query point x t using current GP posterior and a suitable 1 parameter β t : x t = argmax µ t − 1 ( x ) + β t σ t − 1 ( x ) x ∈ D Truncate the observed payoff y t using a suitable threshold b t : 2 y t = y t 1 | y t |≤ b t ˆ Update GP posterior ( µ t , σ t ) with new observation ( x t , ˆ y t ) : 3 k t ( x ) T ( K t + λI ) − 1 [ˆ y t ] T µ t ( x ) = y 1 , . . . , ˆ k ( x, x ) − k t ( x ) T ( K t + λI ) − 1 k t ( x ) σ 2 t ( x ) = 3
Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 4
Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 � � 1 We also give a Ω regret lower bound for any algorithm T 1+ α 4
Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 � � 1 We also give a Ω regret lower bound for any algorithm T 1+ α � � 1 Question: Can we achieve ˜ O T regret scaling? 1+ α 4
Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 � � 1 We also give a Ω regret lower bound for any algorithm T 1+ α � � 1 Question: Can we achieve ˜ O T regret scaling? 1+ α Ans: YES 4
Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) 5
Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) 5
Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) Feature adaptive truncation: u 11 u 12 · · · u 1 t y 1 y 2 · · · y t Hadamard u 21 u 22 · · · u 2 t y 1 y 2 · · · y t product � . . . . . . ... ... . . . . . . . . . . . . u m t 1 u m t 2 · · · u m t t y 1 y 2 · · · y t 5
Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) Feature adaptive truncation: Find row sums r 1 , r 2 , . . . , r m t 5
Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) Approximate posterior GP: φ t ( x ) T V − 1 / 2 [ r 1 , . . . , r m t ] T µ t ( x ) ˜ = t k ( x, x ) − φ t ( x ) T φ t ( x ) + λφ t ( x ) T V − 1 σ 2 ˜ t ( x ) = φ t ( x ) t s =1 u is y s 1 | u is y s |≤ b t ( u i is the i th row of U t ) where r i = � t 5
See you at the poster session Bayesian Optimization under Heavy-tailed Payoffs Poster #11 Tue Dec 10th 05:30 – 07:30 PM @ East Exhibition Hall B + C Acknowledgements: Tata Trusts travel grant 1 Google India Phd fellowship grant 2 DST Inspire research grant 3 6
Recommend
More recommend