Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes
Recall Discrete Stage decision Continuous-time optimization problems control problems problems Discrete-time system & Differential equations & Formulation Transition diagram additive cost function additive cost function DP Graphical DP algorithm Hamilton Jacobi DP equation algorithm & DP equation Bellman equation Bayesian inference & Continuous-time Partial Kalman filter and decisions based on Kalman filter and information separation principle prob. distribution separation principle Alternative Pontryagin’s maximum Dijkstra's algorithm Static optimization algorithms principle (PMP) Today: continuous-time Kalman filter and separation principle And a new topic - frequency domain properties of LQR 1
Outline • Linear quadratic control, Kalman filter, separation principle • Frequency domain properties of LQR
Linear quadratic control The analogous problem to linear quadratic control for continuous-time systems would be Z T u ( t )= µ ( t,x ( t )) E [ min x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt + x ( T ) | Q T x ( T )] 0 x ( t ) = Ax ( t ) + Bu ( t )+ w ( t ) ˙ However, how to define disturbances for continuous-time systems? It it quite challenging! White noise disturbances are one of the few ways to define disturbances without ‘‘memory’ for continuous-time systems. 2
White noise Let us start with a scalar white noise process ω ( t ) ∈ R ω time Very interesting (or strange!) properties: • it is continuous but not differentiable anywhere • the integral over a finite interval is infinite • does not exist in nature • Even for a small time interval , and are uncorrelated, that ω ( t ) ω ( t + δ ) δ is the autocorrelation is zero R ( τ ) = E [ ω ( t ) ω ( t + τ )] = 0 • When , (infinite power) E [ ω ( t ) ω ( t )] = E [ ω ( t ) 2 ] = ∞ δ = 0 • The auto-correlation is then a Dirac delta function R ( τ ) = a δ ( t ) and the scalar process white noise is characterized by the amplitude 3
Random walk The integral of white noise is called random walk or the Wiener process x ( t ) = w ( t ) ˙ and it is more intuitive and it is easier to handle mathematically w ( t ) x ( t ) 1 s • has now finite power x ( t ) • x ( t ) and are correlated x ( t + τ ) • We shall assume that is Gaussian for each fixed time and this implies w ( t ) that is also Gaussian for fixed time. x ( t ) 4
Discussion • In a similar way to the Wiener process the solution to this stochastic differential equation x ( t ) = Ax ( t ) + Bu ( t )+ ˙ w ( t ) is more intuitive than white noise. • If , and we assume that w ( t ) ∈ R n x ( t ) ∈ R n w ( t ) = N ¯ w ( t ) ⇥ ¯ w p ( t ) ⇤ | w ( t ) = ¯ w 1 ( t ) w 2 ( t ) ¯ ¯ . . . where are Gaussian white noise scalar variables and uncorrelated w i ( t ) ¯ Thus, , E [ ¯ w ( t ) ¯ w ( t + τ ) | ] = I δ ( τ ) E [ w ( t ) w ( t + τ ) | ] = NN | δ ( τ ) := W δ ( τ ) 5
Discussion • It is possible to prove that the discretized system takes the form x k := x ( t k ) t k = k τ x k +1 = A d x k + B d u k + w k u ( t ) = u k , t ∈ [ t k , t k +1 ) Z τ A d = e A τ where as before , and are zero-mean e As Bds B d = w k 0 Gaussian random independent variables with covariance Z τ e As We A | s ds E [ w k w | k ] = 0 • The cost can also be written in terms of the discrete-time variables and it is also a quadratic function. • Since the optimal control policy for such system would be the same linear state feedback control law as for the deterministic version of the problem, the next results come with no surprise. 6
Finite horizon linear quadratic control The optimal control law for the problem Z T u ( t )= µ ( t,x ( t )) E [ min x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt + x ( T ) | Q T x ( T )] 0 Q > 0 R > 0 x ( t ) = Ax ( t ) + Bu ( t )+ ˙ w ( t ) where is zero-mean Gaussian white noise with , w ( t ) E [ w ( t ) w ( t + τ ) | ] = W δ ( τ ) K ( t ) = − R − 1 B | P ( t ) x ( t ) is , where u ( t ) = K ( t ) x ( t ) ˙ P ( t ) = − ( A | P ( t ) + P ( t ) A − P ( t ) BR − 1 B | P ( t ) + Q ) P ( T ) = Q T t ∈ [0 , T ) 7
Infinite horizon linear quadratic control The optimal control law for the problem Z T 1 u ( t )= µ ( x ( t )) lim min T E [ x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt ] Q > 0 T →∞ 0 R > 0 x ( t ) = Ax ( t ) + Bu ( t )+ ˙ w ( t ) ( A, B ) controllable where is zero-mean Gaussian white noise with , w ( t ) E [ w ( t ) w ( t + τ ) | ] = W δ ( τ ) K = − R − 1 B | P is , where is the unique positive u ( t ) = Kx ( t ) P definite solution to the (continuous-time) algebraic Riccati equation A | P + PA − PBR − 1 B | P + Q = 0 8
Output feedback linear quadratic control Problem formulation Z T u ( t )= µ ( t,I ( t )) E [ min x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt + x ( T ) | Q T x ( T )] 0 x ( t ) = Ax ( t ) + Bu ( t )+ w ( t ) ˙ y ( t ) = Cx ( t ) + n ( t ) - information set I ( t ) = { y ( s ) , u ( s ) | s ∈ [0 , t ) } - zero-mean Gaussian white noise with w ( t ) E [ w ( t ) w ( t + τ ) | ] = W δ ( τ ) - zero-mean Gaussian white noise with n ( t ) E [ n ( t ) n ( t + τ ) | ] = V δ ( τ ) - Gaussian random vector with mean and covariance ¯ x (0) Φ 0 ¯ x 0 9
Discussion • The solution to this problem is analogous to the discrete-time case. • In particular there is also a separation principle: the optimal controller consists of an optimal estimator (Kalman filter) + optimal controller (LQR) • The derivation of the Kalman filter (in continuous-time is known as the Kalman-Bucy filter) and of this result is mathematically quite involved. • We simply state the results next, without further justification. 10
Kalman-Bucy filter Consider the problem of finding an estimator for the state of ˆ x x ( t ) = Ax ( t ) + Bu ( t )+ w ( t ) ˙ E [ w ( t ) w ( t + τ ) | ] = W δ ( τ ) as a function of the information set which includes the measurements E [ n ( t ) n ( t + τ ) | ] = V δ ( τ ) y ( t ) = Cx ( t ) + n ( t ) where , are zero-mean Gaussian white noise and the initial state n ( t ) w ( t ) is zero-mean Gaussian random variable. The optimal estimator in the sense that minimizes c | E [(ˆ x ( t ) − x ( t ))(ˆ x ( t ) − x ( t )) | | I ( t )] c for any constant vector is the Kalman-Bucy filter c ˙ x ( t ) = A ˆ ˆ x ( t ) + Bu ( t ) + L ( t )( y ( t ) − C ˆ x ( t )) x (0) = ¯ ˆ x 0 Φ ( t ) = A Φ ( t ) + Φ ( t ) A | + W − Φ ( t ) C | V − 1 C Φ ( t ) ˙ L ( t ) = Φ ( t ) C | V − 1 t ≥ 0 x 0 ) | ] = ¯ Φ (0) = E [( x (0) − ¯ x 0 )( x (0) − ¯ Φ 0 11
LQG - Separation principle The optimal control input for the output feedback linear quadratic optimal control problem is u ( t ) = K ( t )ˆ x ( t ) ˙ x ( t ) = A ˆ ˆ x ( t ) + Bu ( t ) + L ( t )( y ( t ) − C ˆ x ( t )) where K ( t ) = − R − 1 B | P ( t ) ˙ P ( t ) = − ( A | P ( t ) + P ( t ) A − P ( t ) BR − 1 B | P ( t ) + Q ) P ( T ) = Q T t ∈ [0 , T ) Φ ( t ) = A Φ ( t ) + Φ ( t ) A | + W − Φ ( t ) C | V − 1 C Φ ( t ) ˙ L ( t ) = Φ ( t ) C | V − 1 t ∈ [0 , T ) x 0 ) | ] = ¯ Φ (0) = E [( x (0) − ¯ x 0 )( x (0) − ¯ Φ 0 12
LQG - Separation principle If instead of the finite-horizon cost, we consider Z T 1 (1) u ( t )= µ ( t,I ( t )) lim min T E [ x ( t ) | Qx ( t ) + u ( t ) | Ru ( t ) dt ] T → 0 0 The optimal control input for the output feedback linear quadratic optimal control problem with cost (1) is u ( t ) = K ˆ x ( t ) ˙ x ( t ) = A ˆ ˆ x ( t ) + Bu ( t ) + L ( y ( t ) − C ˆ x ( t )) where K = − R − 1 B | P A | P + PA − PBR − 1 B | P + Q = 0 A Φ + Φ A | + W − Φ C | V − 1 C Φ = 0 L = Φ C | V − 1 13
Inverted pendulum example For the model provided in Lecture II_1, slide 32, (state-feedback, for simplicity) let us compare discrete-time and continuous-time gains clear all, close all, clc Q = diag([1 1 1 1]); % definition of the continuous-time model S = zeros(4,1); m = 0.2; R = 1; M = 1; b = 0.05; % discretization I = 0.01; n = 4; g = 9.8; tau = 0.01; l = 0.5; sysd = c2d(ss(Ac,Bc,zeros(1,n),0),tau); p = (I+m*l^2)*(M+m)-m^2*l^2; A = sysd.a; B = sysd.b; Ac = [0 1 0 0; 0 -(I+m*l^2)*b/p (m^2*g*l^2)/p 0; % LQR control discrete time 0 0 0 1; K = dlqr(A,B,Q,R,S); K = -K; 0 -(m*l*b)/p m*g*l*(M+m)/p 0]; Bc = [ 0; % continuous-time (I+m*l^2)/p; Kc = lqr(Ac,Bc,Q,R,S); Kc =-Kc; 0; m*l/p]; 14
Inverted pendulum example Continuous-time gains ( policy ) u ( t ) = K c x ( t ) ⇥ 1 . 0000 − 7 . 8509 ⇤ K c = 2 . 3674 − 33 . 1623 Discrete-time gains ( policy ) u k = Kx k ⇥ 0 . 5955 − 5 . 9529 ⇤ K = 1 . 4650 − 25 . 3322 τ = 0 . 1 ⇥ 0 . 9495 − 7 . 6156 ⇤ τ = 0 . 01 K = 2 . 2551 − 32 . 1930 ⇥ 0 . 9948 − 7 . 8269 ⇤ τ = 0 . 001 K = 2 . 3559 − 33 . 0632 (converging to continuous-time gains as expected) 15
Recommend
More recommend