Algorithm 1: Upper Confidence Bounds in GP Bandits Model f ∼ GP ( 0 , κ ). Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) . f ( x ) ϕ t = µ t − 1 + β 1 / 2 σ t − 1 t x Construct upper conf. bound: ϕ t ( x ) = µ t − 1 ( x ) + β 1 / 2 σ t − 1 ( x ). t 9/39
Algorithm 1: Upper Confidence Bounds in GP Bandits Model f ∼ GP ( 0 , κ ). Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) . f ( x ) ϕ t = µ t − 1 + β 1 / 2 σ t − 1 t x t x Maximise upper confidence bound. 9/39
GP-UCB µ t − 1 ( x ) + β 1 / 2 x t = argmax σ t − 1 ( x ) t x ◮ µ t − 1 : Exploitation ◮ σ t − 1 : Exploration ◮ β t controls the tradeoff. β t ≍ log t . 10/39
GP-UCB µ t − 1 ( x ) + β 1 / 2 x t = argmax σ t − 1 ( x ) t x ◮ µ t − 1 : Exploitation ◮ σ t − 1 : Exploration ◮ β t controls the tradeoff. β t ≍ log t . GP-UCB , κ is an SE kernel (Srinivas et al. 2010) � log( n ) d vol ( X ) f ( x ⋆ ) − max w.h.p S n = t =1 ,..., n f ( x t ) � n 10/39
GP-UCB (Srinivas et al. 2010) f ( x ) x 11/39
GP-UCB (Srinivas et al. 2010) f ( x ) t = 1 x 11/39
GP-UCB (Srinivas et al. 2010) f ( x ) t = 2 x 11/39
GP-UCB (Srinivas et al. 2010) f ( x ) t = 3 x 11/39
GP-UCB (Srinivas et al. 2010) f ( x ) t = 4 x 11/39
GP-UCB (Srinivas et al. 2010) f ( x ) t = 5 x 11/39
GP-UCB (Srinivas et al. 2010) f ( x ) t = 6 x 11/39
GP-UCB (Srinivas et al. 2010) f ( x ) t = 7 x 11/39
GP-UCB (Srinivas et al. 2010) f ( x ) t = 11 x 11/39
GP-UCB (Srinivas et al. 2010) f ( x ) t = 25 x 11/39
Algorithm 2: Thompson Sampling in GP Bandits Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 12/39
Algorithm 2: Thompson Sampling in GP Bandits Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 12/39
Algorithm 2: Thompson Sampling in GP Bandits Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x t x Draw sample g from posterior. Choose x t = argmax x g ( x ). 12/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) x 13/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 1 x 13/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 2 x 13/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 3 x 13/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 4 x 13/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 5 x 13/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 6 x 13/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 7 x 13/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 11 x 13/39
Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 25 x 13/39
Outline ◮ Part I: Stochastic bandits (cont’d) 1. Gaussian processes for smooth bandits 2. Algorithms: Upper Confidence Bound (UCB) & Thompson Sampling (TS) ◮ Digression: SL2College Research Collaboration Program ◮ Part II: My research 1. Multi-fidelity bandit: cheap approximations to an expensive experiments 2. Parallelising arm pulls 14/39
SL2College www.sl2college.org 15/39
SL2College Research Collaboration Program -Ashwin de Silva www.sl2college.org/research-collab research-collab@sl2college.org 16/39
SL2College Research Collaboration Program How it works We have a pool of doctoral/post-doctoral/professorial mentors (all Sri Lankan). We connect Sri Lankan undergrads to mentors, who will guide the students on a research project. Aim: Publish a paper (at a good venue) within a 9-15 month time frame. 17/39
Application Process ◮ Fill out the application form on our webpage: www.sl2college.org/research-collab - mention areas of interests and preferred mentors. ◮ .. and email your CV to research-collab@sl2college.org . 18/39
Application Process ◮ Fill out the application form on our webpage: www.sl2college.org/research-collab - mention areas of interests and preferred mentors. ◮ .. and email your CV to research-collab@sl2college.org . ◮ If we decide to proceed, we ask you to submit a ∼ 1 page research statement, - your research interests & future plans - why you are interested in working with aforesaid mentor. 18/39
Application Process ◮ Fill out the application form on our webpage: www.sl2college.org/research-collab - mention areas of interests and preferred mentors. ◮ .. and email your CV to research-collab@sl2college.org . ◮ If we decide to proceed, we ask you to submit a ∼ 1 page research statement, - your research interests & future plans - why you are interested in working with aforesaid mentor. ◮ We send your CV & statement to the mentor. If he/she is interested, we initiate a collaboration. ◮ You report to us once every 3 months. 18/39
SL2College Research Collaboration Team Ashwin Nuwan Rajitha Umashanthi Kirthevasan www.sl2college.org/research-collab research-collab@sl2college.org 19/39
Outline ◮ Part I: Stochastic bandits (cont’d) 1. Gaussian processes for smooth bandits 2. Algorithms: Upper Confidence Bound (UCB) & Thompson Sampling (TS) ◮ Digression: SL2College Research Collaboration Program ◮ Part II: My research 1. Multi-fidelity bandit: cheap approximations to an expensive experiments 2. Parallelising arm pulls 20/39
Part 2.1: Multi-fidelity Bandits Motivating question: What if we have cheap approximations to f ? 21/39
Part 2.1: Multi-fidelity Bandits Motivating question: What if we have cheap approximations to f ? 1. Computational astrophysics and other scientific experiments: simulations and numerical computations with less granularity. Cosmological Simulator E.g: Likelihood Hubble Constant Score Baryonic Density Observation Likelihood computation 21/39
Part 2.1: Multi-fidelity Bandits Motivating question: What if we have cheap approximations to f ? 1. Computational astrophysics and other scientific experiments: simulations and numerical computations with less granularity. Cosmological Simulator E.g: Likelihood Hubble Constant Score Baryonic Density Observation Likelihood computation 2. Hyper-parameter tuning: Train & validate with a subset of the data. 3. Robotics & autonomous driving: computer simulation vs real world experiment. 21/39
Multi-fidelity Methods For specific applications, ◮ Industrial design (Forrester et al. 2007) ◮ Hyper-parameter tuning (Agarwal et al. 2011, Klein et al. 2015, Li et al. 2016) ◮ Active learning (Zhang & Chaudhuri 2015) ◮ Robotics (Cutler et al. 2014) Multi-fidelity bandits & optimisation (Huang et al. 2006, Forrester et al. 2007, March & Wilcox 2012, Poloczek et al. 2016) 22/39
Multi-fidelity Methods For specific applications, ◮ Industrial design (Forrester et al. 2007) ◮ Hyper-parameter tuning (Agarwal et al. 2011, Klein et al. 2015, Li et al. 2016) ◮ Active learning (Zhang & Chaudhuri 2015) ◮ Robotics (Cutler et al. 2014) Multi-fidelity bandits & optimisation (Huang et al. 2006, Forrester et al. 2007, March & Wilcox 2012, Poloczek et al. 2016) . . . with theoretical guarantees (Kandasamy et al. NIPS 2016a&b, Kandasamy et al. ICML 2017) 22/39
Multi-fidelity Bandits (Kandasamy et al. ICML 2017) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters Z X 23/39
Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z at cosmological parameters x . X 23/39
Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters f ( x ) g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z at cosmological parameters x . z • X Denote f ( x ) = g ( z • , x ) where z • ∈ Z . z • = highest grid size. 23/39
Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters f ( x ) g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z x ⋆ at cosmological parameters x . z • X Denote f ( x ) = g ( z • , x ) where z • ∈ Z . z • = highest grid size. End Goal: Find x ⋆ = argmax x f ( x ). 23/39
Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters f ( x ) g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z x ⋆ at cosmological parameters x . z • X Denote f ( x ) = g ( z • , x ) where z • ∈ Z . z • = highest grid size. End Goal: Find x ⋆ = argmax x f ( x ). A cost function, λ : Z → R + . λ ( z ) λ ( z ) = O ( z p ) (say). Z z • 23/39
Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). 24/39
Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). Simple Regret after capital Λ: S (Λ) = f ( x ⋆ ) − max t : z t = z • f ( x t ) . Λ ← amount of a resource spent, e.g. computation time or money. 24/39
Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). Simple Regret after capital Λ: S (Λ) = f ( x ⋆ ) − max t : z t = z • f ( x t ) . Λ ← amount of a resource spent, e.g. computation time or money. No reward for pulling an arm at low fidelities, but use cheap evaluations at z � = z • to speed up search for x ⋆ . 24/39
Algorithm: BOCA (Kandasamy et al. ICML 2017) 25/39
Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + 25/39
Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X 25/39
Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X 25/39
Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39
Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39
Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39
Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39
Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � λ ( z ) � q � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) = ξ ( z ) λ ( z • ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39
Theoretical Results for BOCA g ( z, x ) g ( z, x ) f ( x ) f ( x ) Z Z x ⋆ x ⋆ z • X z • X “good” “bad” 26/39
Theoretical Results for BOCA g ( z, x ) g ( z, x ) f ( x ) f ( x ) Z Z x ⋆ x ⋆ z • X z • X “good” “bad” large h Z small h Z E.g.: For SE kernels, bandwidth h Z controls smoothness. 26/39
Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ 27/39
Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ BOCA SE kernel, (Kandasamy et al. ICML 2017) � � vol ( X α ) vol ( X ) ∀ α > 0 , w.h.p S (Λ) � + Λ 2 − α Λ 1 � � X α = x ; f ( x ⋆ ) − f ( x ) � C α h Z 27/39
Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ BOCA SE kernel, (Kandasamy et al. ICML 2017) � � vol ( X α ) vol ( X ) ∀ α > 0 , w.h.p S (Λ) � + Λ 2 − α Λ 1 � � X α = x ; f ( x ⋆ ) − f ( x ) � C α h Z If h Z is large (good approximations), vol ( X α ) ≪ vol ( X ), and BOCA is much better than GP-UCB . 27/39
Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ BOCA SE kernel, (Kandasamy et al. ICML 2017) � � vol ( X α ) vol ( X ) ∀ α > 0 , w.h.p S (Λ) � + Λ 2 − α Λ 1 � � X α = x ; f ( x ⋆ ) − f ( x ) � C α h Z If h Z is large (good approximations), vol ( X α ) ≪ vol ( X ), and BOCA is much better than GP-UCB . N.B: Dropping constants and polylog terms. 27/39
Experiment: Cosmological inference on Type-1a supernovae data Estimate Hubble constant, dark matter fraction & dark energy fraction by maximising likelihood on N • = 192 data. Requires numerical integration on a grid of size G • = 10 6 . Approximate with N ∈ [50 , 192] or G ∈ [10 2 , 10 6 ] (2D fidelity space) . 28/39
Experiment: Cosmological inference on Type-1a supernovae data Estimate Hubble constant, dark matter fraction & dark energy fraction by maximising likelihood on N • = 192 data. Requires numerical integration on a grid of size G • = 10 6 . Approximate with N ∈ [50 , 192] or G ∈ [10 2 , 10 6 ] (2D fidelity space) . 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 1000 1500 2000 2500 3000 3500 28/39
Recommend
More recommend