Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc Toussaint Machine Learning & Robotics Lab – University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de NIPS BayesOpt, Dec 2016 1/20
(1) Learning Manipulation Skills ◦ Englert & Toussaint: Combined Optimization and Reinforcement Learning for Manipulation Skills . R:SS’16 2/20
Combined Black-Box and Analytical Optimization Englert & Toussaint: Combined Optimization and Reinforcement Learning for Manipulation Skills . R:SS’16 • CORL (Combined Optimization and RL): – Policy parameters w – analytically known cost function J ( w ) = E { � T t =0 c t ( x t , u t ) | w } – projection , implicitly given by a constraint h ( w, θ ) = 0 – unknown black-box return function R ( θ ) ∈ R – unknown black-box success constraint S ( θ ) ∈ { 0 , 1 } – Problem: min w,θ J ( w ) − R ( θ ) h ( w, θ ) = 0 , S ( θ ) = 1 s.t. • Alternate path optimization min w J ( w ) h ( w, θ ) = 0 s.t. with Bayesian Optimization max θ R ( θ ) s.t. S ( θ ) = 1 3/20
Heuristic to handle constraints • Prior mean µ = 2 for g • Sample only points g ( x ) ≤ 0 s.t. • Acquisition function combines PI with Boundary Uncertainty α PIBU ( x ) = [ g ( x ) ≥ 0] PI f ( x ) + [ g ( x ) = 0] βσ 2 g ( x ) 4/20
(2) Optimizing Controller Parameters ◦ Drieß, Englert & Toussaint: Constrained Bayesian Optimization of Combined Interaction Force/Task Space Controllers for Manipulations . IROS Workshop’16 5/20
Controller Details • Non-switching controller for smoothly establishing contacts – In (each) task space y ∗ = ¨ y ref + K p ( y ref − y ) + K d ( ˙ y ref − ˙ ¨ y ) – Operational space controller (linearized) q ∗ = ¯ K p q + ¯ q + ¯ ¨ K d ˙ k ¯ K p = ( H + J ⊤ CJ ) -1 [ HK q p + J ⊤ CK p J ] K d = ( H + J ⊤ CJ ) -1 [ HK q ¯ d + J ⊤ CK d J ] k = ( H + J ⊤ CJ ) -1 [ Hk q + J ⊤ Ck ] ¯ – Contact force limit control e ← γe + [ | f | > | f ref | ] ( f ref − f ) u = J ⊤ αe y ref , K d • Many parameters! Esp. α, ˙ 6/20
Optimizing Controller Parameters • Optimization objectives: – Low compliance: tr( ¯ K p ) and tr( ¯ K d ) ( f ref − f ) 2 dt � – Contact force error: – Peak force on onset: | f os | � � � dt f ( t ) | + | d 2 dt 2 f ( t ) | + | d 3 | d – Smooth force profile: dt 3 f ( t ) | dt – Boolean success: contact and staying in contact 7/20
Optimizing Controller Parameters • Optimization objectives: – Low compliance: tr( ¯ K p ) and tr( ¯ K d ) ( f ref − f ) 2 dt � – Contact force error: – Peak force on onset: | f os | � � � dt f ( t ) | + | d 2 dt 2 f ( t ) | + | d 3 | d – Smooth force profile: dt 3 f ( t ) | dt – Boolean success: contact and staying in contact • Establishing contact • Sliding 7/20
(3) Safe Active Learning & BayesOpt • S AFE O PT : Safety threshold on the objective f ( x ) ≥ h ◦ Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian Processes . ICML ’15 8/20
(3) Safe Active Learning & BayesOpt • S AFE O PT : Safety threshold on the objective f ( x ) ≥ h ◦ Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian Processes . ICML ’15 • Guarantee to never step outside an unknown g ( x ) ≤ 0 ... – Impossible when no failure data g ( x ) > 0 exists... 8/20
(3) Safe Active Learning & BayesOpt • S AFE O PT : Safety threshold on the objective f ( x ) ≥ h ◦ Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian Processes . ICML ’15 • Guarantee to never step outside an unknown g ( x ) ≤ 0 ... – Impossible when no failure data g ( x ) > 0 exists... – Unless you assume observation of near boundary discriminative values 8/20 Schreiter et al: Safe Exploration for Active Learning with Gaussian Processes . ECML ’15
Probabilistic guarantees on non-failure • Acquisition function α ( x ) = σ 2 f ( x ) s.t. µ g ( x ) + νσ g ( x ) ≥ 0 • Specify probability of failure δ after n points with m 0 initializations �→ ν • Application on cart-pole 9/20
So, what are the issues? 10/20
So, what are the issues? – Choice of hyper parameters! 10/20
So, what are the issues? – Choice of hyper parameters! – Stationary covariance functions! 10/20
So, what are the issues? – Choice of hyper parameters! – Stationary covariance functions! – Isotropic stationary covariance functions! 10/20
• Actually, I’m a fan of Newton Methods 11/20
• Actually, I’m a fan of Newton Methods • Two messages of classical (convex) optimization: – Step size (line search, trust region, Wolfe) – Step direction (Newton, quasi-Newton, BFGS, conjugate, covariant) • Newton methods are prefect for local optimum down-hill 11/20
Model-based Optimization • If the model is not given: classical model-based optimization (Nodecal et al. “Derivative-free optimization”) 1: Initialize D with at least 1 2 ( n + 1)( n + 2) data points 2: repeat Compute a regression ˆ ⊤ β on D f ( x ) = φ 2 ( x ) 3: Compute x + = argmin x ˆ f ( x ) s.t. | x − ˆ x | < α 4: x ) − f ( x + ) Compute the improvement ratio ̺ = f (ˆ 5: ˆ x ) − ˆ f (ˆ f ( x + ) if ̺ > ǫ then 6: Increase the stepsize α 7: x ← x + Accept ˆ 8: Add to data, D ← D ∪ { ( x + , f ( x + )) } 9: else 10: if det( D ) is too small then // Data improvement 11: Compute x + = argmax x det( D ∪ { x } ) s.t. | x − ˆ x | < α 12: Add to data, D ← D ∪ { ( x + , f ( x + )) } 13: else 14: Decrease the stepsize α 15: end if 16: end if 17: Prune the data, e.g., remove argmax x ∈ ∆ det( D \ { x } ) 18: 19: until x converges 12/20
This is similar to BayesOpt with polynomial kernel! 13/20
A prior about local polynomial optima • Assume that the objective has multiple local optima – Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function • “Between” the local optima, the function is smooth → standard squared exponential kernel 14/20
A prior about local polynomial optima • Assume that the objective has multiple local optima – Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function • “Between” the local optima, the function is smooth → standard squared exponential kernel • The Mixed-global-local kernel k q ( x, x ′ ) , x, x ′ ∈ U i , ∈ U i , x ′ / k MGL ( x, x ′ ) = k s ( x, x ′ ) , x / ∈ U j 0 , else for any i, j k q ( x, x ′ ) = ( x T x ′ + 1) 2 14/20
Finding convex neighborhoods • Data set D = { ( x i , y i ) } • U ⊂ D is a convex neighborhood if � 2 � ( β 0 + β T x k + 1 � 2 x T { β ∗ 0 , β ∗ , B ∗ } = argmin k Bx k ) − y k β 0 ,β,B k : x k ∈ U has a positive definite Hessian B ∗ 15/20
A heuristic to decrease length-scale • The SE-part still has a length-scale hyperparameter l • In each iteration we consider to decrease to ˜ l t < l t − 1 α ∗ (˜ l t ) α ∗ ( l ) = min α r,t := α ∗ ( l t − 1 ) , x α ( x ; l ) for any acquisition function α ( x ; l ) • Accept smaller lengthscale only if α r,t ≥ h (e.g., h ≈ 2 ) • Robust to non-stationary objectives Counter example function 1 Correlation adaption: Counter example 3 0.5 2 0 0.5 1 Median log10 IR -0.5 0 0 y -1 -1 -2 -1.5 LOO-CV -0.5 -3 Alpha Ratio -2 Optimal -4 -1 -2.5 -1 -0.5 0 0.5 1 5 10 15 20 25 30 x Iteration 16/20
Mixed-global-local kernel + alpha ratio Quadratic 2D Rosenbrock Branin-Hoo Hartmann 3D 5 5 0 1 -1 0 Median log10 IR Median log10 IR Median log10 IR Median log10 IR 0 0 -2 -1 -5 -3 -2 -5 -10 -4 -3 -15 -10 -5 -4 2 4 6 8 10 12 2 4 6 8 10 12 5 10 15 5 10 15 20 25 Iteration Iteration Iteration Iteration Hartmann 6D Exponential 3D Exponential 4D Exponential 5D 0.5 1 0 0 0 0 Median log10 IR Median log10 IR Median log10 IR Median log10 IR -0.5 -1 -0.5 -1 -1 PES -1 -2 -2 IMGPO -1.5 EI -1.5 -3 EI AR+MGL -3 -2 -4 -2 10 20 30 40 50 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 Iteration Iteration Iteration Iteration • PES: Bayesian integration over hyper parameters • IMGPO: Bayesian update for hyperparameters in each iteration 17/20
...work with Kim Wabersich 18/20
Conclusions • Solid optimization methods are the savior of robotics! • Rethink the priors we use for BayesOpt – Local optima with varying conditioning • Rethink the objective for choosing hyper parameters – Maximize optimization progress ( ∼ expected acquisition) rather than data likelihood 19/20
Thanks • for your attention! • to the students: – Peter Englert (BayesOpt for Manipulation) – Jens Schreiter (Safe Active Learning) – Danny Drieß(BayesOpt for Controller Optimization) – Kim Wabersich (Mixed-global-local kernel & alpha ratio) • and my lab: 20/20
Recommend
More recommend