learning convex bounds for linear quadratic control
play

Learning convex bounds for linear quadratic control policy synthesis - PowerPoint PPT Presentation

Learning convex bounds for linear quadratic control policy synthesis Jack Umenberger Thomas B. Schn Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning


  1. Learning convex bounds for linear quadratic control policy synthesis Jack Umenberger Thomas B. Schön

  2. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  3. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  4. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  5. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  6. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  7. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  8. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  9. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  10. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  11. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  12. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  13. Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  14. Problem set-up x t +1 = Ax t + Bu t + w t u t x t w t ∼ N (0 , Π ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  15. Problem set-up x t +1 = Ax t + Bu t + w t u t x t w t ∼ N (0 , Π ) Goal: find a static state-feedback controller, u = Kx, to minimize P T lim T !1 1 t =0 E [ x 0 t Qx t + u 0 t Ru t ] , T NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  16. Problem set-up x t +1 = Ax t + Bu t + w t x t +1 = Ax t + Bu t + w t u t x t w t ∼ N (0 , Π ) , w t ∼ N (0 , Π ) Goal: find a static state-feedback controller, u = Kx, to minimize P T lim T !1 1 t =0 E [ x 0 t Qx t + u 0 t Ru t ] , T Challenge: we don’t know the system parameters θ = { A, B, Π } NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  17. Learning from data x t +1 = Ax t + Bu t + w t w t ∼ N (0 , Π ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  18. Learning from data x t +1 = Ax t + Bu t + w t u 0: T w t ∼ N (0 , Π ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  19. Learning from data x t +1 = Ax t + Bu t + w t x 0: T u 0: T w t ∼ N (0 , Π ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  20. Learning from data x t +1 = Ax t + Bu t + w t x 0: T u 0: T w t ∼ N (0 , Π ) D := { u 0: T , x 0: T } From this data we can form the posterior belief over model parameters: posterior ( θ |D ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  21. Learning from data x t +1 = Ax t + Bu t + w t x 0: T u 0: T w t ∼ N (0 , Π ) D := { u 0: T , x 0: T } From this data we can form the posterior belief over model parameters: posterior ( θ |D ) Instead of optimizing the cost for fixed parameters cost ( K | θ ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  22. Learning from data x t +1 = Ax t + Bu t + w t x 0: T u 0: T w t ∼ N (0 , Π ) D := { u 0: T , x 0: T } From this data we can form the posterior belief over model parameters: posterior ( θ |D ) Instead of optimizing the cost for fixed parameters cost ( K | θ ) We can optimize the expected cost over the posterior R cost_avg ( K ) = cost ( K | θ ) posterior ( θ |D ) d θ NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  23. Convex upper bounds cost_avg ( K ) cost policy, K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  24. Convex upper bounds P M 1 θ i ∼ posterior ( θ |D ) cost_avg ( K ) ≈ cost_mc ( K ) := i =1 cost ( K | θ i ) M cost_avg ( K ) cost policy, K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  25. Convex upper bounds P M 1 θ i ∼ posterior ( θ |D ) cost_avg ( K ) ≈ cost_mc ( K ) := i =1 cost ( K | θ i ) M cost_avg ( K ) cost cost_mc ( K ) policy, K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  26. Convex upper bounds P M 1 θ i ∼ posterior ( θ |D ) cost_avg ( K ) ≈ cost_mc ( K ) := i =1 cost ( K | θ i ) M cost_bound ( K | K ( k ) ) cost_avg ( K ) cost cost_mc ( K ) policy, K K ( k ) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  27. Convex upper bounds P M 1 θ i ∼ posterior ( θ |D ) cost_avg ( K ) ≈ cost_mc ( K ) := i =1 cost ( K | θ i ) M cost_avg ( K ) cost_bound ( K | K ( k +1) ) cost cost_mc ( K ) policy, K K ( k ) K ( k +1) NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  28. Convexification The crux of the problem is the matrix inequality known quantities   ( A i + B i K ) 0 K 0 X i � Q X � 1  ⌫ 0 A i + B i K 0 decision variables  i R � 1 0 K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  29. Convexification The crux of the problem is the matrix inequality known quantities   ( A i + B i K ) 0 K 0 X i � Q X � 1  ⌫ 0 A i + B i K 0 decision variables  i R � 1 0 K NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  30. Convexification The crux of the problem is the matrix inequality known quantities   ( A i + B i K ) 0 K 0 X i � Q X � 1  ⌫ 0 A i + B i K 0 decision variables  i R � 1 0 K • Replace the ‘problematic’ term with a Taylor series approx. X − 1 i linear approximation NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  31. Convexification The crux of the problem is the matrix inequality known quantities   ( A i + B i K ) 0 K 0 X i � Q X � 1  ⌫ 0 A i + B i K 0 decision variables  i R � 1 0 K • Replace the ‘problematic’ term with a Taylor series approx. X − 1 i • Leads to a new linear matrix inequality with a smaller feasible set. linear approximation NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  32. Convexification The crux of the problem is the matrix inequality known quantities   ( A i + B i K ) 0 K 0 X i � Q X � 1  ⌫ 0 A i + B i K 0 decision variables  i R � 1 0 K • Replace the ‘problematic’ term with a Taylor series approx. X − 1 i • Leads to a new linear matrix inequality with a smaller feasible set. • Hence: convex upper bound. linear approximation NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  33. Performance better performance more data for learning NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

  34. Poster presentation Poster #166 Today 05:00 -- 07:00 PM @ Room 210 & 230 NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Recommend


More recommend