Learning convex bounds for linear quadratic control policy synthesis Jack Umenberger Thomas B. Schön
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Problem set-up x t +1 = Ax t + Bu t + w t u t x t w t ∼ N (0 , Π ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Problem set-up x t +1 = Ax t + Bu t + w t u t x t w t ∼ N (0 , Π ) Goal: find a static state-feedback controller, u = Kx, to minimize P T lim T !1 1 t =0 E [ x 0 t Qx t + u 0 t Ru t ] , T NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Problem set-up x t +1 = Ax t + Bu t + w t x t +1 = Ax t + Bu t + w t u t x t w t ∼ N (0 , Π ) , w t ∼ N (0 , Π ) Goal: find a static state-feedback controller, u = Kx, to minimize P T lim T !1 1 t =0 E [ x 0 t Qx t + u 0 t Ru t ] , T Challenge: we don’t know the system parameters θ = { A, B, Π } NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning from data x t +1 = Ax t + Bu t + w t w t ∼ N (0 , Π ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning from data x t +1 = Ax t + Bu t + w t u 0: T w t ∼ N (0 , Π ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning from data x t +1 = Ax t + Bu t + w t x 0: T u 0: T w t ∼ N (0 , Π ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning from data x t +1 = Ax t + Bu t + w t x 0: T u 0: T w t ∼ N (0 , Π ) D := { u 0: T , x 0: T } From this data we can form the posterior belief over model parameters: posterior ( θ |D ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning from data x t +1 = Ax t + Bu t + w t x 0: T u 0: T w t ∼ N (0 , Π ) D := { u 0: T , x 0: T } From this data we can form the posterior belief over model parameters: posterior ( θ |D ) Instead of optimizing the cost for fixed parameters cost ( K | θ ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Learning from data x t +1 = Ax t + Bu t + w t x 0: T u 0: T w t ∼ N (0 , Π ) D := { u 0: T , x 0: T } From this data we can form the posterior belief over model parameters: posterior ( θ |D ) Instead of optimizing the cost for fixed parameters cost ( K | θ ) We can optimize the expected cost over the posterior R cost_avg ( K ) = cost ( K | θ ) posterior ( θ |D ) d θ NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convex upper bounds cost_avg ( K ) cost policy, K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convex upper bounds P M 1 θ i ∼ posterior ( θ |D ) cost_avg ( K ) ≈ cost_mc ( K ) := i =1 cost ( K | θ i ) M cost_avg ( K ) cost policy, K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convex upper bounds P M 1 θ i ∼ posterior ( θ |D ) cost_avg ( K ) ≈ cost_mc ( K ) := i =1 cost ( K | θ i ) M cost_avg ( K ) cost cost_mc ( K ) policy, K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convex upper bounds P M 1 θ i ∼ posterior ( θ |D ) cost_avg ( K ) ≈ cost_mc ( K ) := i =1 cost ( K | θ i ) M cost_bound ( K | K ( k ) ) cost_avg ( K ) cost cost_mc ( K ) policy, K K ( k ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convex upper bounds P M 1 θ i ∼ posterior ( θ |D ) cost_avg ( K ) ≈ cost_mc ( K ) := i =1 cost ( K | θ i ) M cost_avg ( K ) cost_bound ( K | K ( k +1) ) cost cost_mc ( K ) policy, K K ( k ) K ( k +1) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convexification The crux of the problem is the matrix inequality known quantities ( A i + B i K ) 0 K 0 X i � Q X � 1 ⌫ 0 A i + B i K 0 decision variables i R � 1 0 K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convexification The crux of the problem is the matrix inequality known quantities ( A i + B i K ) 0 K 0 X i � Q X � 1 ⌫ 0 A i + B i K 0 decision variables i R � 1 0 K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convexification The crux of the problem is the matrix inequality known quantities ( A i + B i K ) 0 K 0 X i � Q X � 1 ⌫ 0 A i + B i K 0 decision variables i R � 1 0 K • Replace the ‘problematic’ term with a Taylor series approx. X − 1 i linear approximation NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convexification The crux of the problem is the matrix inequality known quantities ( A i + B i K ) 0 K 0 X i � Q X � 1 ⌫ 0 A i + B i K 0 decision variables i R � 1 0 K • Replace the ‘problematic’ term with a Taylor series approx. X − 1 i • Leads to a new linear matrix inequality with a smaller feasible set. linear approximation NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Convexification The crux of the problem is the matrix inequality known quantities ( A i + B i K ) 0 K 0 X i � Q X � 1 ⌫ 0 A i + B i K 0 decision variables i R � 1 0 K • Replace the ‘problematic’ term with a Taylor series approx. X − 1 i • Leads to a new linear matrix inequality with a smaller feasible set. • Hence: convex upper bound. linear approximation NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Performance better performance more data for learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Poster presentation Poster #166 Today 05:00 -- 07:00 PM @ Room 210 & 230 NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Recommend
More recommend