Hope you had a FANTASTIC spring break!
Hope you had a FANTASTIC spring break! Thanksgiving
CS 188: Artificial Intelligence Neural Nets (ctd) and IRL Instructor: Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]
Reminder: Linear Classifiers § Inputs are feature values § Each feature has a weight § Sum is the activation § If the activation is: w 1 f 1 S w 2 § Positive, output +1 >0? f 2 w 3 § Negative, output -1 f 3
Multiclass Logistic Regression § Multi-class linear classification § A weight vector for each class: § Score (activation) of a class y: § Prediction w/highest score wins: § How to make the scores into probabilities? e z 1 e z 2 e z 3 z 1 , z 2 , z 3 → e z 1 + e z 2 + e z 3 , e z 1 + e z 2 + e z 3 , e z 1 + e z 2 + e z 3 original activations softmax activations
Best w? § Maximum likelihood estimation: X log P ( y ( i ) | x ( i ) ; w ) max ll ( w ) = max w w i e w y ( i ) · f ( x ( i ) ) P ( y ( i ) | x ( i ) ; w ) = with: y e w y · f ( x ( i ) ) P = Multi-Class Logistic Regression
Gradient in n dimensions ∂ g ∂ w 1 ∂ g ∂ w 2 r g = · · · ∂ g ∂ w n
Optimization Procedure: Gradient Ascent § init w § for iter = 1, 2, … w w + α ⇤ r g ( w ) : learning rate --- tweaking parameter that needs to be § α chosen carefully § How? Try multiple choices § Crude rule of thumb: update changes about 0.1 – 1 % w
Neural Networks
Multi-class Logistic Regression § = special case of neural network f 1 (x) e z 1 z 1 s P ( y 1 | x ; w ) = e z 1 + e z 2 + e z 3 o f 2 (x) f e z 2 t z 2 P ( y 2 | x ; w ) = e z 1 + e z 2 + e z 3 f 3 (x) m a x e z 3 … z 3 P ( y 3 | x ; w ) = e z 1 + e z 2 + e z 3 f K (x)
Deep Neural Network = Also learn the features! z (1) z ( n ) z (2) x 1 z ( n − 1) 1 1 1 1 z ( OUT ) s P ( y 1 | x ; w ) = 1 z ( n ) z (1) z (2) x 2 o z ( n − 1) 2 2 2 2 f … t z ( OUT ) P ( y 2 | x ; w ) = z ( n ) z (2) x 3 z (1) z ( n − 1) 2 m 3 3 3 3 a x … … … … … z ( OUT ) P ( y 3 | x ; w ) = 3 z ( n ) z (1) z (2) x L z ( n − 1) K (1) K (2) K ( n ) K ( n − 1) z ( k ) W ( k − 1 ,k ) z ( k − 1) X g = nonlinear activation function = g ( ) i i,j j j
Deep Neural Network: Also Learn the Features! § Training the deep neural network is just like logistic regression: X log P ( y ( i ) | x ( i ) ; w ) max ll ( w ) = max w w i just w tends to be a much, much larger vector J à just run gradient ascent + stop when log likelihood of hold-out data starts to decrease
How well does it work?
Computer Vision
Object Detection
Manual Feature Design
Features and Generalization [HoG: Dalal and Triggs, 2005]
Features and Generalization Image HoG
Performance graph credit Matt Zeiler, Clarifai
Performance graph credit Matt Zeiler, Clarifai
Performance AlexNet graph credit Matt Zeiler, Clarifai
Performance AlexNet graph credit Matt Zeiler, Clarifai
Performance AlexNet graph credit Matt Zeiler, Clarifai
MS COCO Image Captioning Challenge Karpathy & Fei-Fei, 2015; Donahue et al., 2015; Xu et al, 2015; many more
Visual QA Challenge Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh
Speech Recognition graph credit Matt Zeiler, Clarifai
Machine Translation Google Neural Machine Translation (in production)
What’s still missing? – correlation \neq causation [Ribeiro et al.]
What’s still missing? – covariate shift [Carroll et al.]
What’s still missing? – covariate shift [Carroll et al.]
What’s still missing – knowing what loss to optimize
CS 188: Artificial Intelligence Neural Nets (ctd) and IRL Instructor: Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]
Reminder: Optimal Policies R(s) = -0.01 R(s) = -0.03 R(s) = -0.4 R(s) = -2.0
Utility? Clear utility function Not so clear utility function
Planning/RL 𝑆 → 𝜌 ∗
Inverse Planning/RL 𝜌 ∗ → 𝑆
Inverse Planning/RL 𝜊 → 𝑆
Inverse Planning/RL
Inverse Planning/RL
IRL is relevant to all 3 types of people: person in its its end-user its designer environment
Inverse Planning/RL given: 𝜊 " find: 𝑆(𝑡, 𝑏) 𝑺 𝜊 # ≥ 𝑺 𝜊 ∀𝜊 s.t.
Inverse Planning/RL given: 𝜊 " find: 𝑆 𝑡, 𝑏 = 𝜄 $ 𝜚(𝑡, 𝑏) 𝑺 𝜊 # ≥ 𝑺 𝜊 ∀𝜊 s.t.
Inverse Planning/RL given: 𝜊 " find: 𝑆 𝑡, 𝑏 = 𝜄 $ 𝜚(𝑡, 𝑏) 𝑺 𝜊 # ≥ max 𝑺 𝜊 s.t. %
Problem given: 𝜊 " zero/constant reward is a solution find: 𝑆 𝑡, 𝑏 = 𝜄 $ 𝜚(𝑡, 𝑏) 𝑺 𝜊 # ≥ max 𝑺 𝜊 s.t. %
Revised formulation given: 𝜊 " find: 𝑆 𝑡, 𝑏 = 𝜄 $ 𝜚(𝑡, 𝑏) 𝑺 𝜊 ! ≥ max [𝑺 𝜊 + 𝑚(𝜊, 𝜊 ! )] s.t. " small close to the demonstration
Optimization max [𝑺 𝜊 ! − max [𝑺 𝜊 + 𝑚 𝜊, 𝜊 ! ]] # "
Optimization [𝜄 " 𝝔(𝜊 # ) − max [𝜄 " 𝝔 𝜊 + 𝑚 𝜊, 𝜊 # ]] max ! $
Optimization ∗ = arg max 𝜊 # [𝜄 " 𝝔(𝜊 # ) − max [𝜄 " 𝝔 𝜊 + 𝑚 𝜊, 𝜊 # ]] max ! $
Optimization [𝜄 " 𝝔(𝜊 # ) − max [𝜄 " 𝝔 𝜊 + 𝑚 𝜊, 𝜊 # ]] max ! $ ∗ ) ∇ # = 𝝔(𝜊 ! ) - 𝝔(𝜊 # subgradient:
Optimization [𝜄 " 𝝔(𝜊 # ) − max [𝜄 " 𝝔 𝜊 + 𝑚 𝜊, 𝜊 # ]] max ! $ ∗ ) ∇ # = 𝝔(𝜊 ! ) - 𝝔(𝜊 # subgradient: ∗ ) ) 𝜄 %&' = 𝜄 % + 𝛽 ( 𝝔(𝜊 ! ) - 𝝔(𝜊 # !
Interpretation ∗ ) 𝝔(𝜊 # ! goes on rocks: [1,0] 𝝔(𝜊 ! ) goes on grass: [0,1] ∗ ) ) 𝜄 %&' = 𝜄 % + 𝛽 ( 𝝔(𝜊 ! ) - 𝝔(𝜊 # !
Interpretation ∗ ) 𝝔(𝜊 # ! goes on rocks: [1,0] 𝝔(𝜊 ! ) goes on grass: [0,1] 𝜄 %&' = 𝜄 % + 𝛽 ([-1,1]) ∗ ) ) 𝜄 %&' = 𝜄 % + 𝛽 ( 𝝔(𝜊 ! ) - 𝝔(𝜊 # !
Interpretation ∗ ) 𝝔(𝜊 # ! goes on rocks: [1,0] 𝝔(𝜊 ! ) goes on grass: [0,1] rocks weight goes down grass weight goes up 𝜄 %&' = 𝜄 % + 𝛽 ([-1,1]) ∗ ) ) 𝜄 %&' = 𝜄 % + 𝛽 ( 𝝔(𝜊 ! ) - 𝝔(𝜊 # !
Interpretation ∗ ) 𝝔(𝜊 # ! goes on rocks: [1,0] 𝝔(𝜊 ! ) goes on grass: [0,1] rocks weight goes down grass weight goes up The new reward likes grass more and rocks less.
Inverse Planning/RL
Inverse Planning/RL
Is the demonstrator really optimal? 𝑺 𝜊 # ≥ 𝑺 𝜊 ∀𝜊
The Bayesian view 𝑄 𝜊 # 𝜄 evidence hidden
The Bayesian view 𝑄 𝜊 # 𝜄 ∝ 𝑓 45 ! 𝝔(% " )
The Bayesian view 𝑄 𝜊 # 𝜄 = 𝑓 45 ! 𝝔(% " ) ∑ % 𝑓 45 ! 𝝔(%)
The Bayesian view 𝑄 𝜊 # 𝜄 = 𝑓 45 ! 𝝔(% " ) ∑ % 𝑓 45 ! 𝝔(%) 𝑐 6 𝜄 ∝ 𝑐 𝜄 𝑄(𝜊 # |𝜄)
The Bayesian view 𝑄 𝜊 # 𝜄 = 𝑓 45 ! 𝝔(% " ) ∑ % 𝑓 45 ! 𝝔(%) max 𝑄(𝜊 # |𝜄) 5
The Bayesian view log 𝑓 45 ! 𝝔(% " ) max ∑ % 𝑓 45 ! 𝝔(%) 5
The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 %
The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % 1 𝑓 %! ! 𝝔 $ ) ∇ ! = 𝛾𝝔 𝜊 # − ∑ $ 𝑓 %! ! 𝝔 $ ∇(= $
The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % 1 𝑓 %! ! 𝝔 $ ) ∇ ! = 𝛾𝝔 𝜊 # − ∑ $ 𝑓 %! ! 𝝔 $ ∇(= $
The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % 1 𝑓 %! ! 𝝔 $ 𝛾𝝔 𝜊 ∇ ! = 𝛾𝝔 𝜊 # − ∑ $ 𝑓 %! ! 𝝔 $ = $
The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % 𝑓 %! ! 𝝔($) ∇ ! = 𝛾𝝔 𝜊 # − = ∑ $) 𝑓 %! ! 𝝔 $) 𝛾𝝔 𝜊 $
The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % ∇ ! = 𝛾𝝔 𝜊 # − = 𝑄(𝜊|𝜄)𝛾𝝔 𝜊 $
The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % ∇ ! = 𝛾(𝝔 𝜊 # − 𝔽 $~! 𝝔 𝜊 )
The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % ∇ ! = 𝛾(𝝔 𝜊 # − 𝔽 $~! 𝝔 𝜊 ) expected feature values produced by the current reward
The Bayesian view 𝑄 𝜊 # 𝜄 = 𝑓 45 ! 𝝔(% " ) ∑ % 𝑓 45 ! 𝝔(%) 𝑐 6 𝜄 ∝ 𝑐 𝜄 𝑄(𝜊 # |𝜄)
The Bayesian view (actions) 𝑄 𝑏 # 𝑡, 𝜄 = 𝑓 4A(B,C " ;5) ∑ C 𝑓 4A(B,C;5) 𝑐 6 𝜄 ∝ 𝑐 𝜄 𝑄(𝑏 # |𝜄)
[Ratliff et al. Maximum Margin Planning ]
Recommend
More recommend