Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max - Planck - Institute for Inte lm igent Systems Tübingen, Germany & Computer Science, Neuroscience, & Biomedical Engineering University of Southern California, Los Angeles sschaal@is.mpg.de http://www-amd.is.tuebingen.mpg.de
Where Did We Stop ...
Outline • A Bit of Robotics History • Foundations of Control • Adaptive Control • Learning Control - Model-based Robot Learning - Reinforcement Learning
What Needs to Be Learned in Learning Control? Internal Models Coordinate The Majority of the Learning Transformations Problems Involve Function Approximation Control Policies Value Functions Unsupervised Learning & Classification
Learning Internal Models • Forward Models - models the causal functional relationship ( ) y = f x - for example: ( ) ( ) τ − C q, ! ( ) ! ( ) q = B − 1 q q − G q !! q • Inverse Models - models the inverse of the causal functional relationship ( ) x = f − 1 y - for example: ( ) !! ( ) ! ( ) = τ q + C q, ! q + G q B q q - NOTE: inverse models are not necessarily functions any more!
Inverse Models May Not Be Trivially Learnable
Inverse Models May Not Be Trivially Learnable ( ) t = f θ 1 1 , θ 2 1 ( ) t = f θ 1 2 , θ 2 2 ( ) ? what is f − 1 t
Characteristics of Function Approximation in Robotics • Incremental Learning – large amounts of data – continual learning – to be approximated functions of growing and unknown complexity • Fast Learning – data efficient – computationally efficient – real-time • Robust Learning – minimal interference – hundreds of inputs
Linear Regression: One of the Simplest Function Approximation Methods ( ) = θ x Recall the simple adaptive control model with: f x - find the line through all data points - imagine a spring y attached between the line and each data point - all springs have the same spring constant - points far away generate more “force” (danger of outliers) - springs are vertical - solution is the minimum energy solution achieved by the springs x
Linear Regression: One of the Simplest Function Approximation Methods • The data generating model: w T ! y = ! x + w 0 + ε = w T x + ε ⎡ ⎤ ! T , w = w { } = 0 where x = x T ,1 ⎡ ⎤ , E ε ⎢ ⎥ ⎣ ⎦ w 0 ⎢ ⎥ ⎣ ⎦ • The Least Squares cost function J = 1 ) = 1 T t − y T t − Xw ( ) ( ( ) ( ) 2 t − y 2 t − Xw ⎡ ⎤ ⎡ ⎤ T x 1 t 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ T t 2 x 2 where : t = X = , ⎢ ⎥ ⎢ ⎥ … … ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ t n T x n ⎣ ⎦ ⎣ ⎦ • Minimizing the cost gives ∂ w = 0 = ∂ J ∂ J ⎛ ⎞ 1 T t − Xw T X ( ) ( ) ( ) the least-square solution 2 t − Xw ⎟ = − t − Xw ⎜ ⎝ ⎠ ∂ w T X = − t T X + w T X T X ( ) = − t T X + Xw t T X = w T X T X X T t = X T Xw thus : or ( ) − 1 X T t w = X T X result :
Recursive Least Squares: An Incremental Version of Linear Regression • Based on the matrix inversion theorem: − 1 = A − 1 + A − 1 B I + CA − 1 B ( ) ( ) − 1 CA − 1 A − BC • Incremental updating of a linear regression model ( ) P n = I 1 − 1 where γ << 1 (note P ≡ X T X Initialize: γ ( ) For every new data point x , t (note that x includes the bias): ⎧ ⎛ ⎞ ⎪ λ P n − P n xx T P n 1 if no forgetting P n + 1 = 1 ⎟ where λ = ⎨ ⎜ ⎝ λ + x T P n x ⎠ < 1 if forgetting ⎪ ⎩ ( ) W n + 1 = W n + P n + 1 x t − W nT x T - NOTE: RLS gives exactly the same solution as linear regression if no forgetting
Making Linear Regression Nonlinear: Locally Weighted Regression Region of Validity 2 θ k Linear Receptive Field Model Activation w 1 0 N ( ) ∑ 2 J = w i y i − x i T β i = 1 Note: Using GPs, SVR, Mixture Models, etc., are other ways to nonlinear regression
Locally Weighted Regression • Piecewise linear function approximation, • Each local model is learned from only local data • No over - fitting due to too many local models ( unlike RBFs, ME )
Locally Weighted Regression Recursive weighted least squares: Linear Model: ( ) n + 1 = β k n + w P learned with T β k n + 1 x y − ! x T β k n k ⎛ ⎞ n ! x ! ⎜ ⎟ x T P n n + 1 = 1 n − P k k P λ P ⎜ ⎟ λ k k w + ! n ! ⎜ x T P ⎟ x [ ] T ˜ T 1 ⎝ ⎠ T y = β x T x + β 0 = β x = x k ˜ x where Gradient descent in penalized leave-one-out Weighting Kernel: learned with local cross-validation (PRESS) cost function: n − α ∂ J n + 1 = M k M k ∂ M ⎛ ⎞ w = exp − 1 T D x − c D = M T M ( ) ( ) 2 x − c where ⎝ ⎠ N n 1 ∑ ∑ 2 J = w k , i y i − ˆ + γ 2 y k , i , − i D k , ij N ∑ i = 1 i = 1, j = 1 w k , i K ∑ i = 1 w i y k Combined ( ) < w gen y = i = 1 add model when if min w k Prediction: K ∑ k w i createnew RF at c K + 1 = x i = 1
Locally Weighted Regression 1.5 1.5 1 1 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0.5 ⊕ ⊕ 0.5 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0 0 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ -0.5 ⊕ ⊕ -0.5 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ -1 -1 ( ) -1.5 -1.5 2 + y ( ) ( ) ,exp − 50 y ( ) ,1.25exp − 5 x ( ) z = max exp − 10 x 2 2 2 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 x x
Locally Weighted Regression Inserted into Adaptive Control
Locally Weighted Regression Learn forward model of task dynamics, then computer controller
Locally Weighted Regression Learn forward model of task dynamics, then computer controller
Criticism of Locally Weighted Learning • Breaks down in high-dimensional spaces • Computationally expensive and numerically brittle due to (incremental) dxd matrix inversion • Not compatible with modern probabilistic statistical learning algorithms • Too many “manual tuning parameters”
The Curse of Dimensionality • The power of local learning comes from exploiting the discriminative power of local neighborhood relations. • But the notion of a “local” breaks down in high dim. spaces!
The Curse of Dimensionality Movement Data is Locally Low Dimensional 0.25 / / 0.2 Probability 0.15 Thus, locally weighted learning can work if used with local dimensionality reduction! 0.1 0.05 0 / / 105 1 11 21 31 41 105 Dimensionality Derived with Bayesian Factor Analysis
A Bayesian Approach to Locally Weighted Learning • Linear Regression as a Graphical Model y i = x i T β + ε ( ) ε ∼ N 0, ψ y ( ) − 1 Xy β = X T X
A Bayesian Approach to Locally Weighted Learning • Inserting a Partial-Least-Squares-like projection as a set of hidden variables z i , m = x i , m β j + η m d ∑ y i = + ε z i , m m = 1 ( ) ε ∼ N 0, ψ y ( ) η m ∼ N 0, ψ z , m
A Bayesian Approach to Locally Weighted Learning • Robust linear regression with automatic relevance detection (ARD, sparsification) z i , m = x i , m β j + η m d ∑ y i = + ε z i , m m = 1 ( ) ε ∼ N 0, ψ y ( ) η m ∼ N 0, ψ z , m ⎛ ⎞ β m ∼ N 0, 1 ⎜ ⎟ α m ⎝ ⎠ ( ) α m ∼ Gamma a α , b α ,
A Full Bayesian Treatment of Locally Weighted Learning • The final model for full Bayesian parameter adaptation for regression and locality i = 1,.., N ψ y y i ψ z 2 ψ z 1 ψ zd z id b 1 z i 2 b d z i 1 b 2 … x i 1 x i 2 x id w i 1 w id w i 2 h 1 h 2 h d
Locally Weighted Learning In High Dimensional Spaces • Learning the “cross” function in 20-dimensional space 1.5 1.5 1 1 0.5 0.5 TextEnd TextEnd z z 0 0 -0.5 -0.5 1 1 0.5 1 0.5 1 0.5 0.5 0 0 0 0 -0.5 -0.5 -0.5 -0.5 -1 -1 y -1 y -1 x x
Locally Weighted Learning In High Dimensional Spaces • Learning the “cross” function in 20-dimensional space #Receptive Fields / Average #Projections 0.14 70 0.12 60 nMSE on Test Set 0.1 50 0.08 40 2D-cross 10D-cross 0.06 30 20D-cross 0.04 20 0.02 10 0 0 1000 10000 100000 #Training Data Points
Locally Weighted Learning In High Dimensional Spaces • Learning inverse kinematics in 60 dimensional space
Locally Weighted Learning In High Dimensional Spaces • Skill learning
Outline • A Bit of Robotics History • Foundations of Control • Adaptive Control • Learning Control - Model-based Robot Learning - Reinforcement Learning
Given: A Parameterized Policy and a Controller Note: we are now starting to address planning, i.e,. where do desired trajectories come from?
Trial & Error Learning Reinforcement Learning from Trajectories • Problem: – How can a motor system learn a novel motor skill? – Reinforcement learning is a general approach to this problem, but little work has been done to scale to the high- dimensional continuous state- action domains of humans • Approach: – Teach with imitation learning the initial skill using a parameterized control policy – Provide an objective function for the skill – Perform trial-and-error learning from exploratory trajectories
Recommend
More recommend