CS287 Advanced Robotics Lecture 4 (Fall 2019) Function - PowerPoint PPT Presentation

CS287 Advanced Robotics Lecture 4 (Fall 2019) Function Approximation Pieter Abbeel UC Berkeley EECS

Value Iteration Impractical for Algorithm: large state spaces Start with for all s. For i = 1, … , H For all states s in S: This is called a value update or Bellman update/back-up = expected sum of rewards accumulated starting from state s, acting optimally for i steps = optimal action when in state s and getting to act for i steps Similar issue for policy iteration and linear programming

Outline n Function approximation n Value iteration with function approximation n Policy iteration with function approximation n Linear programming with function approximation

Function Approximation Example 1 : Tetris state: board configuration + shape of the falling piece ~2 200 states! n action: rotation and translation applied to the falling piece n φ i 22 features aka basis functions n Ten basis functions, 0 , . . . , 9, mapping the state to the height h[k] of each column. n Nine basis functions, 10 , . . . , 18, each mapping the state to the absolute difference n between heights of successive columns: | h[k+1] − h[k] | , k = 1, . . . , 9. One basis function, 19, that maps state to the maximum column height: max k h[k] n One basis function, 20, that maps state to the number of ‘holes’ in the board. n One basis function, 21, that is equal to 1 in every state. n 21 ˆ X θ i φ i ( s ) = θ > φ ( s ) V θ ( s ) = i =0 [Bertsekas & Ioffe, 1996 (TD); Bertsekas & Tsitsiklis 1996 (TD); Kakade 2002 (policy gradient); Farias & Van Roy, 2006 (approximate LP)]

Function Approximation Example 2: Pacman V(s) = θ 0 + “distance to closest ghost” θ 1 + “distance to closest power pellet” θ 2 + “in dead-end” θ 3 + “closer to power pellet than ghost” θ 4 + … n = X θ i φ i ( s ) = θ > φ ( s ) i =0

Function Approximation Example 3: Nearest Neighbor n 0’th order approximation (1-nearest neighbor): V ( s ) = ˆ ˆ . V ( x 4) = θ 4 . . . . s x1 x2 x3 x4   0 . . . . x7 0 x5 x6 x8     0   ˆ V ( s ) = θ > φ ( s )   . . . . φ ( s ) = 1   x9   x10 x11 x12 0     . . .   0 Only store values for x1, x2, …, x12 – call these values θ 1 , θ 2 , . . . , θ 12 Assign other states value of nearest “x” state

Function Approximation Example 4: k-Nearest Neighbor n 1’th order approximation (k-nearest neighbor interpolation): ˆ V ( s ) = φ 1 ( s ) θ 1 + φ 2 ( s ) θ 2 + φ 5 ( s ) θ 5 + φ 6 ( s ) θ 6 . . . . x1 . x2 x3 x4   0 . 2 s . . . . 0 . 6   x7 x5 x6 x8   0     0   ˆ V ( s ) = θ > φ ( s ) . . . .   φ ( s ) = 0 . 05   x9 x10 x11 x12   0 . 15     0     . . .   0 Only store values for x1, x2, …, x12 – call these values θ 1 , θ 2 , . . . , θ 12 Assign other states interpolated value of nearest 4 “x” states

More Function Approximation Examples n Examples: ˆ S = R , V ( s ) = θ 1 + θ 2 s n ˆ V ( s ) = θ 1 + θ 2 s + θ 3 s 2 S = R , n n ˆ X θ i s i S = R , V ( s ) = n i =0 (e.g. neural net) n

Function Approximation Main idea: n ˆ Use approximation of the true value function , V θ n is a free parameter to be chosen from its domain Θ θ n | Θ | | S | n Representation size: downto + : less parameters to estimate - : less expressiveness, θ because typically there exist many V* for which there is no such that

Supervised Learning n Given: n set of examples ( s (1) , V ( s (1) )) , ( s (2) , V ( s (2) )) , . . . , ( s ( m ) , V ( s ( m ) )) , n Asked for: ˆ n “best” V θ n Representative approach: find through least squares θ m ( ˆ X V θ ( s ( i ) ) − V ( s ( i ) )) 2 min θ ∈ Θ i =1

Supervised Learning Example n Linear regression Error or “ residual ” Observation Prediction n X ( θ 0 + θ 1 x ( i ) − y ( i ) ) 2 min θ 0 , θ 1 i =1 0 0 20

Supervised Learning Example n Neural Nets

Single (Biological) Neuron [image source: cs231n.stanford.edu] Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L0: Background

Single (Artificial) Neuron g g [image source: cs231n.stanford.edu] Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L0: Background

Common Activation Functions Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L0: Background [source: MIT 6.S191 introtodeeplearning.com]

Neural Network 1 9 y = f ( x, w ) Notation: x z (1) z (2) z (3) Choice of w determines the function from x --> y Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L0: Background

What Functions Can a Neural Net Represent? Does there exist a choice for w to make this work? [images source: neuralnetworksanddeeplearning.com] Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L0: Background

Universal Function Approximation Theorem In words: Given any continuous function f(x), if a 2-layer neural network has enough hidden n units, then there is a choice of weights that allow it to closely approximate f(x). Cybenko (1989) “Approximations by superpositions of sigmoidal functions” Hornik (1991) “Approximation Capabilities of Multilayer Feedforward Networks” Leshno and Schocken (1991) ”Multilayer Feedforward Networks with Non-Polynomial Activation Functions Can Approximate Any Function” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L0: Background

Universal Function Approximation Theorem Cybenko (1989) “Approximations by superpositions of sigmoidal functions” Hornik (1991) “Approximation Capabilities of Multilayer Feedforward Networks” Leshno and Schocken (1991) ”Multilayer Feedforward Networks with Non-Polynomial Activation Functions Can Approximate Any Function” Full Stack Deep Learning (March 2019) Pieter Abbeel, Sergey Karayev, Josh Tobin L0: Background

Overfitting 30 25 20 Degree 15 polynomial 15 10 5 0 -5 -10 -15 0 2 4 6 8 10 12 14 16 18 20

Avoiding Overfitting Reduce number of features or size of the network n n Regularize θ n Early stopping: stop training updates once loss increases on hold-out data

Status n Function approximation through supervised learning BUT: where do the supervised examples come from?

Value Iteration with Function Approximation n Initialize by choosing some setting for θ (0) n Iterate for i = 0, 1, 2, …, H: S 0 ⊆ S n Step 0: Pick some (typically ) | S 0 | << | S | n Step 1: Bellman back-ups ∀ s ∈ S 0 : h i ¯ R ( s, a, s 0 ) + γ ˆ X V i +1 ( s ) ← max T ( s, a, s 0 ) V θ ( i ) ( s 0 ) a s 0 n Step 2: Supervised learning ⌘ 2 ⇣ X V θ ( i +1) ( s ) − ¯ ˆ find as the solution of: θ ( i +1) min V i +1 ( s ) θ s ∈ S 0

Value Iteration w/Function Approximation --- Example Mini-tetris: two types of blocks, can only choose translation (not rotation) n Example state: n Reward = 1 for placing a block n Sink state / Game over is reached when block is placed such that part of it extends above the n red rectangle If you have a complete row, it gets cleared n

Value Iteration w/Function Approximation --- Example S’ = { , , , }

Value Iteration w/Function Approximation --- Example S’ = { , , , } n 10 features (also called basis functions) φ i Four basis functions, 0 , . . . , 3, mapping the state to the height h[k] of each of the four columns. n Three basis functions, 4 , . . . , 6, each mapping the state to the absolute difference between n heights of successive columns: | h[k+1] − h[k] | , k = 1, . . . , 3. One basis function, 7, that maps state to the maximum column height: max k h[k] n One basis function, 8, that maps state to the number of ’holes’ in the board. n One basis function, 9, that is equal to 1 in every state. n n Init with θ (0) = ( -1, -1, -1, -1, -2, -2, -2, -3, -2, 10)

Value Iteration w/Function Approximation --- Example n Bellman back-ups for the states in S’: V( ) = max {0.5 *(1+ γ V( )) + 0.5*(1 + γ V( ) ) , 0.5 *(1+ γ V( )) + 0.5*(1 + γ V( ) ) , 0.5 *(1+ γ V( )) + 0.5*(1 + γ V( ) ) , 0.5 *(1+ γ V( )) + 0.5*(1 + γ V( ) ) }

CS287 Advanced Robotics Lecture 4 (Fall 2019) Function - PowerPoint PPT Presentation

CS287 Advanced Robotics Lecture 4 (Fall 2019) Function Approximation Pieter Abbeel UC Berkeley EECS Value Iteration Impractical for Algorithm: large state spaces Start with for all s. For i = 1, , H For all

www n http://www.cs.berkeley.edu/~pabbeel/cs287-fa11 n [Step through webpage] Page 1

CS287 Fall 2019 Lecture 2 Markov Decision Processes and Exact Solution Methods Pieter

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Supervised Learning Part 2/3 Kai Arras Social Robotics Lab, University

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

MEMORY ON THE KNL Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Some slides from Intel

Collusion-Preserving Computation Jol Alwen (ETH Zrich) Jonathan Katz (U. Maryland) Ueli

CMS remote data access (AAA) Giacinto DONVITO (INFN-Bari)

CSC 151 Spring 2020 Topic: Project Work Day 2 April 27, 2020 Day 36 Project Deadlines Part 1:

Mathematics of the Falling Cat Rajan Mehta Pennsylvania State University February 2, 2012 Rajan

Taming Transactions: Towards Hardware-Assisted Control Flow Integrity using Transactional Memory

Slides for Lecture 12 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

CS287 Advanced Robotics Lecture 4 (Fall 2019) Function - PowerPoint PPT Presentation

CS287 Advanced Robotics Lecture 4 (Fall 2019) Function Approximation Pieter Abbeel UC Berkeley EECS Value Iteration Impractical for Algorithm: large state spaces Start with for all s. For i = 1, , H For all

www n http://www.cs.berkeley.edu/~pabbeel/cs287-fa11 n [Step through webpage] Page 1

CS287 Fall 2019 Lecture 2 Markov Decision Processes and Exact Solution Methods Pieter

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Sensors for Robotics

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Supervised Learning Part 2/3 Kai Arras Social Robotics Lab, University

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

MEMORY ON THE KNL Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Some slides from Intel

Collusion-Preserving Computation Jol Alwen (ETH Zrich) Jonathan Katz (U. Maryland) Ueli

CMS remote data access (AAA) Giacinto DONVITO (INFN-Bari)

CSC 151 Spring 2020 Topic: Project Work Day 2 April 27, 2020 Day 36 Project Deadlines Part 1:

Mathematics of the Falling Cat Rajan Mehta Pennsylvania State University February 2, 2012 Rajan

Taming Transactions: Towards Hardware-Assisted Control Flow Integrity using Transactional Memory

Slides for Lecture 12 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics