Slide from Pieter Abbeel ¡ Gaussian with mean ( ) and standard µ deviation ( ) σ 10/6/16 CSE-571: Robotics 2
2 X ~ N ( , ) ⎫ µ σ 2 2 Y ~ N ( a b , a ) ⇒ µ + σ ⎬ Y aX b = + ⎭
2 2 2 X ~ N ( , ) 1 ⎫ ⎛ ⎞ µ σ σ σ 1 1 1 p ( X ) p ( X ) ~ N 2 1 , ⎜ ⎟ ⇒ ⋅ µ + µ ⎬ 1 2 1 2 ⎜ ⎟ 2 2 2 2 2 2 2 − − X ~ N ( , ) σ + σ σ + σ σ + σ µ σ ⎝ ⎠ ⎭ 2 2 2 1 2 1 2 1 2
Picture from [Bishop: Pattern Recognition and Machine Learning, 2006] p ( x ) = Ν ( µ , Σ ) x b ⎛ ⎞ ⎛ ⎞ µ a x a x = µ = ⎜ ⎟ ⎜ ⎟ ⎟ , ⎜ ⎜ ⎟ µ b x b ⎝ ⎠ ⎝ ⎠ ⎛ ⎞ Σ aa Σ ab Σ = ⎜ ⎟ ⎜ ⎟ Σ ba Σ bb ⎝ ⎠ x a − 1 2( x − µ ) T Σ − 1 ( x − µ ) 1 p ( x ) = 1/2 e (2 π ) d /2 Σ 10/6/16 CSE-571: Robotics 5
Slide from Pieter Abbeel " µ = [1; 0] " µ = [-.5; 0] " µ = [-1; -1.5] " Σ = [1 0; 0 1] " Σ = [1 0; 0 1] " Σ = [1 0; 0 1] 10/6/16 CSE-571: Robotics 6
Slide from Pieter Abbeel ! µ = [0; 0] " µ = [0; 0] " µ = [0; 0] " Σ = [.6 0 ; 0 .6] " Σ = [2 0 ; 0 2] ! Σ = [1 0 ; 0 1] 10/6/16 CSE-571: Robotics 7
Slide from Pieter Abbeel " µ = [0; 0] " µ = [0; 0] " µ = [0; 0] " Σ = [1 0; 0 1] " Σ = [1 0.5; 0.5 1] " Σ = [1 0.8; 0.8 1] 10/6/16 CSE-571: Robotics 8
Slide from Pieter Abbeel " µ = [0; 0] " µ = [0; 0] " µ = [0; 0] " Σ = [1 -0.5 ; -0.5 1] " Σ = [1 -0.8 ; -0.8 1] " Σ = [3 0.8 ; 0.8 1] 1 3 10/6/16 CSE-571: Robotics 9
Pictures from [Bishop: PRML, 2006] ¡ Marginalizing joint distribution results in a Gaussian ⎛ ⎞ ⎛ ⎞ p ( x a ) = p ( x a , x b ) dx b ∫ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x a µ a Σ aa Σ ab ⎜ ⎟ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ p ⎥ , ⎟ = Ν ⎜ ⎜ ⎟ ⎢ x b ⎥ ⎢ ⎢ ⎥ µ b Σ ba Σ bb ( ) p ( x a ) = Ν µ a , Σ aa ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎝ ⎠ ⎝ ⎠ ¡ Conditioning also leads to a Gaussian ( ) p ( x a | x b ) = Ν µ a | b , Σ a | b − 1 ( x b − µ b ) µ a | b = µ a + Σ ab Σ bb Prior mean Cross co-variance Prior Variance (b) Observed (b) value − 1 Σ ba Σ a | b = Σ aa − Σ ab Σ bb Prior Variance (a) Shrink term (>= 0) 10/6/16 CSE-571: Robotics 10
10/6/16 CSE-571: Robotics 11
¡ Modeling the relationship between real-valued variables in data ▪ Sensor models, dynamics models, stock market etc ¡ Two broad classes of models: ▪ Parametric: ▪ Learn a model of the data, use model to make new predictions ▪ Eg: Linear, Non-linear, Neural Networks etc. ▪ Non-Parametric: ▪ Keep the data around and use it to make new predictions ▪ Eg: Nearest Neighbor methods, Locally Weighted Regression, Gaussian Processes etc. 10/6/16 CSE-571: Robotics 12
¡ Idea: Summarize Parametric models data using a 2 learned model: § Linear, Polynomial 1 § Neural Networks etc 0 y − 1 ¡ Computationally efficient, tradeoff − 2 Training set complexity vs Linear Polynomial − 4 − 3 generalization Polynomial − 8 0 2 4 6 8 10 x 10/6/16 CSE-571: Robotics 13
¡ Idea: Use nearest Non − Parametric models neighbor’s prediction (with 4 some interpolation) 3 § Non-parametric, keeps all data 2 § Ex: 1-NN, NN with linear 1 interpolation 0 Y ¡ Easy. Needs lot of data − 1 § Best you can do in limit of infinite data − 2 − 3 ¡ Computationally expensive Training set in high dimensions − 4 1 − NN NN − Linear 0 2 4 6 8 10 X 10/6/16 CSE-571: Robotics 14
¡ Idea: Interpolate based on Smooth Non − Parametric models “close” training data 4 § Closeness defined using a “kernel” function 3 § Test output is a weighted 2 interpolation of training outputs 1 § Locally Weighted 0 Regression, Gaussian Y Processes − 1 − 2 ¡ Can model arbitrary − 3 (smooth) functions Training set LWR − NN § Need to keep around some − 4 GP (maybe all) training data GP − Var 0 2 4 6 8 10 X 10/6/16 CSE-571: Robotics 15
10/6/16 CSE-571: Robotics 16
¡ Non-parametric regression model ¡ Distribution over functions ¡ Fully specified by training data, mean and covariance functions ¡ Covariance given by “kernel” which measures distance of inputs in kernel space 10/6/16 CSE-571: Robotics 17
¡ Given, inputs (x) and targets(y): D {( x , y ),( x , y ), ,( x , y )} ( , ) X y = … = 1 1 2 2 n n ¡ GPs model the targets as a noisy function of the inputs: 2 ) y i = f ( x i ) + ε ; ε ~ N (0, σ n ¡ Formally, a GP is a collection of random variables, any finite number of which have a joint Gaussian distribution: f ( x ) ~ GP ( m ( x ), k ( x , x ')) m ( x ) = E [ f ( x )] k ( x , x ') = E [( f ( x ) − m ( x ))( f ( x ') − m ( x '))] 10/6/16 CSE-571: Robotics 18
¡ Given a (finite) set of inputs (X), GP models the outputs (y) as jointly Gaussian: Noise 2 I ) P ( y | X ) = N ( m ( X ), K ( X , X ) + σ n ⎛ ⎞ ⎛ ⎞ m ( x 1 ) k ( x 1 , x 1 ) … k ( x 1 , x n ) ⎜ ⎟ ⎜ ⎟ m ( x 2 ) ⎜ ⎟ k ( x 2 , x 1 ) ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ m = ! K = ! k ( x i , x i ) ! ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ m ( x n ) ⎜ ⎟ ⎜ ⎟ k ( x n , x 1 ) … k ( x n , x n ) ⎝ ⎠ ⎝ ⎠ ¡ Usually, we assume zero-mean prior ▪ Can define other mean functions (constant, polynomials etc) 10/6/16 CSE-571: Robotics 19
¡ Covariance matrix (K) is defined through the “kernel” function: § Specifies covariance of the outputs as the function of inputs ¡ Example: Squared Exponential Kernel § Covariance proportional to distance in input space § Similar input points will have similar outputs − 1 x ) T 2 e 2( x − ʹ x ) W ( x − ʹ k ( x , ʹ x ) = σ f 10/6/16 CSE-571: Robotics 20
Pictures from [Bishop: PRML, 2006] ¡ GP prior: Outputs jointly zero-mean Gaussian: 2 I ) P ( y | X ) = Ν ( 0 , K + σ n 10/6/16 CSE-571: Robotics 21
¡ Training data: D {( x , y ),( x , y ), ,( x , y )} ( , ) X y = … = 1 1 2 2 n n ¡ Test pair (y unknown): { x * , y * } ¡ GP outputs are jointly Gaussian: P ( y , y * | X , x * ) = N ( µ , Σ ); P ( y | X ) = N (0, Κ + σ 2 n I ) ¡ Conditioning on y: ( ) ( ) 2 P ( y * | x * , y , X ) = N µ * , σ * p ( x a | x b ) = Ν µ a | b , Σ a | b − 1 y T K + σ n ( ) 2 I µ * = k * µ a | b = µ a + Σ ab Σ bb − 1 ( x b − µ b ) − 1 k * T K + σ n 2 = k ** − k * ( 2 I ) Σ a | b = Σ aa − Σ ab Σ bb − 1 Σ ba σ * k * [ i ] = k ( x * , x i ); k ** = k ( x * , x * ) Recall conditional 10/6/16 CSE-571: Robotics 22
10/6/16 CSE-571: Robotics 23
¡ Noise Standard deviation ( ) σ 2 n § Affects how likely a new observation changes predictions (and covariance) ¡ Kernel (choose based on data) § SE, Exponential, Matern etc. ¡ Kernel hyperparameters: − 1 x ) T 2 e 2( x − ʹ x ) W ( x − ʹ § SE kernel: k ( x , ʹ x ) = σ f ▪ Length scale (how fast the function changes) ▪ Scale factor (how large the function variance is) 10/6/16 CSE-571: Robotics 24
Pictures from [Bishop: PRML, 2006] x ) = θ 0 exp − θ 1 ⎛ ⎞ 2 k ( x , ′ 2 x − ′ ⎟ + θ 2 + θ 3 x T x ' x ⎜ ⎝ ⎠ 10/6/16 CSE-571: Robotics 25
¡ Maximize data log likelihood: θ * = argmax p ( y | X , θ ) θ ( ) − 1 y − 1 ( ) − n log p ( y | X , θ ) = − 1 2 y T K + σ n 2 log K + σ n 2 log2 π 2 I 2 I ¡ Compute derivatives wrt. params 2 2 , , l θ = 〈 σ σ 〉 n f ¡ Optimize using conjugate gradient descent 10/6/16 CSE-571: Robotics 26
10/6/16 CSE-571: Robotics 27
• Learn hyperparameters via numerical methods • Learn noise model at the same time 28 10/6/16 CSE-571: Robotics
• System: • Commercial blimp envelope with custom gondola • XScale based computer with Bluetooth connectivity • Two main motors with tail motor (3D control) • Ground truth obtained via VICON motion capture system 29 10/6/16 CSE-571: Robotics
e R v ⎡ ⎤ p ⎡ ⎤ b ⎢ ⎥ ⎢ ⎥ H ( ) ξ d ξ ⎢ ⎥ � ⎢ ⎥ s = = ⎢ ⎥ 1 M ( Forces * Mv ) − dt v ∑ ⎢ ⎥ − ω ⎢ ⎥ ⎢ ⎥ 1 ω ⎢ J − ( Torques * J ) ⎥ ∑ − ω ω ⎣ ⎦ ⎣ ⎦ ¡ 12-D state=[pos,rot,transvel,rotvel] ¡ Describes evolution of state as ODE ¡ Forces / torques considered: buoyancy, gravity, drag, thrust ¡ 16 parameters are learned by optimization on ground truth motion capture data 10/6/16 CSE-571: Robotics 30
c 2 s o Δ 2 2 s … Δ c 1 o s 1 s 3 2 3 s 1 • Use ground truth state to extract: – Dynamics data D [ s , c ], s , [ s , c ], s = Δ Δ … S 1 1 1 2 2 2 • Learn model using Gaussian process regression – Learn process noise inherent in system … 31 10/6/16 CSE-571: Robotics
c 1 s Δ 1 f ([ s 1 c , ]) 1 s s 2 1 • Combine GP model with parametric model D [ s , c ], s f ([ s , c ]) = Δ − X 1 1 1 1 1 • Advantages – Captures aspects of system not considered by parametric model – Learns noise model in same way as GP-only models – Higher accuracy for same amount of training data 32 10/6/16 CSE-571: Robotics
Recommend
More recommend