value function approximation on non linear manifolds for
play

Value Function Approximation on Non-linear Manifolds for Robot - PowerPoint PPT Presentation

Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1 )2) Christopher Towell 2) Sethu Vijayakumar 2) 1) Computer Science, Tokyo Institute of Technology 2) School of Informatics,


  1. Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1 )2) Christopher Towell 2) Sethu Vijayakumar 2) 1) Computer Science, Tokyo Institute of Technology 2) School of Informatics, University of Edinburgh

  2. 2 Maze Problem: Guide Robot to Goal Possible actions Up Left Right Position (x,y) Down reward Goal � Robot knows its position but doesn’t know which direction to go. � We don’t teach the best action to take at each position but give a reward at the goal. � Task : make the robot select the optimal action.

  3. 3 Markov Decision Process (MDP) { } � An MDP consists of S A P R , , , { } S s � : set of states, i { } A up, down, left, right � : set of actions, Ρ (s,a, ′ P s ) � : transition probability, R R s a ( , ) � : reward, � An action the robot takes at state is a s π specified by policy . = π a ( s ) � Goal: make the robot learn optimal policy ∗ π

  4. 4 Definition of Optimal Policy � Action-value function: ⎛ ⎞ ∞ ∑ ⎜ ⎟ π = γ = = t Q s a E r s s a a ( , ) , ⎜ ⎟ t 0 0 ⎝ ⎠ = t 0 discounted sum of future rewards when π a s taking in and following thereafter ∗ π = Q s a Q s a � Optimal value: ( , ) arg max ( , ) π ∗ ∗ π = s a Q s a ( , ) arg max ( , ) � Optimal policy: a ∗ π ∗ � Q is computed if is given. ∗ Q � Question: How to compute ?

  5. 5 Policy Iteration (Sutton & Barto, 1998) � Starting from some initial policy π iterate Steps 1 and 2 until convergence. π Q π s a ( , ) Step 1. Compute for current π Step 2. Update by π π = s Q s a ( ) arg max ( , ) a ∗ π � Policy iteration always converges to Q π s a if in step 1 can be computed. ( , ) Q π � Question: How to compute ? s a ( , ) ⎛ ⎞ ∞ ∑ π = γ = = ⎜ t ⎟ Q s a E r s s a a ( , ) | , t 0 0 ⎝ ⎠ = t 0

  6. 6 Bellman Equation Q π s a � ( , ) can be recursively expressed by ∀ , s ∀ a ∑ π π = + γ π Q s a R s a P s a s Q s s ( , ) ( , ) ( , , ' ) ( ' , ( ' )) s ' Q π � s a ( , ) can be computed by solving Bellman equation � Drawback: dimensionality of Bellman equation becomes huge in large state and action spaces S × A high computational cost

  7. 7 Least-Squares Policy Iteration (Lagoudakis and Parr, 2003) � Linear architecture: φ s a ( , ) : fixed basis functions i K ∑ ˆ π = i φ w Q s a w s a ( , ) ( , ) : parameters i i = K i : # of basis functions 1 K w � { } is learned so as to optimally approximate = i i 1 Bellman equation in the least-squares sense � # of parameters is only : K << × K S A φ � LSPI works well if we choose appropriate K { } = i i 1 φ K � Question: How to choose ? { } = i i 1

  8. 8 Popular Choice: Gaussian Kernel (GK) ⎛− ⎞ Partitions ED s s 2 ( , ) ⎜ ⎟ = c k s ( ) exp ⎜ ⎟ σ 2 ⎝ 2 ⎠ ED : Euclidean distance s c s : Centre state c s � Smooth c � Gaussian tail goes over partitions

  9. 9 Approximated Value Function by GK Optimal value function Approximated by GK Log scale 20 randomly located Gaussians � Values around the partitions are not approximated well.

  10. 10 Policy Obtained by GK Optimal policy GK-based policy � GK provides an undesired policy around the partition.

  11. 11 Aim of This Research � Gaussian tails go over the partition. � Not suited for approximating discontinuous value functions. We propose new Gaussian kernel to overcome this problem.

  12. 12 State Space as a Graph � Ordinary Gaussian uses Euclidean distance. ⎛− ⎞ ED s s 2 ( , ) ⎜ ⎟ = k s c ( ) exp ⎜ ⎟ σ 2 ⎝ 2 ⎠ � Euclidean distance does not incorporate state space structure, so tail problems occur. � We represent state space structure by a graph, and use it for defining Gaussian kernels. (Mahadevan, ICML 2005)

  13. 13 Geodesic Gaussian Kernels � Natural distance on graph is shortest path. Shortest path � We use shortest path in Gaussian function. Euclidean distance ⎛− ⎞ SP s s 2 ( , ) ⎜ ⎟ = k s c ( ) exp ⎜ ⎟ σ 2 ⎝ ⎠ 2 � We call this kernel geodesic Gaussian. � SP can be efficiently computed by Dijkstra.

  14. 14 Example of Kernels s c Ordinary Gaussian Geodesic Gaussian s s c c � Tails do not go across the partition. � Values smoothly decrease along the maze.

  15. 15 Comparison of Value Functions Optimal Ordinary Gaussian Geodesic Gaussian � Values near the partition are well approximated. � Discontinuity across the partition is preserved.

  16. 16 Comparison of Policies Ordinary Gaussian Geodesic Gaussian � GGKs provide good policies near the partition.

  17. 17 Experimental Result Average over 100 runs Geodesic Fraction of optimal states Ordinary Number of kernels � Ordinary Gaussian: tail problem � Geodesic Gaussian: no tail problem

  18. 18 Robot Arm Reaching � Task: move the end effector to reach the object 2-DOF robot arm State space 180 Object End Obstacle Joint 2 (degree) effector 0 Joint 2 Joint 1 Reward: -180 -100 0 100 Joint 1 (degree) +1 reach the object 0 otherwise

  19. 19 Robot Arm Reaching Ordinary Gaussian Geodesic Gaussian Moves directly towards Successfully avoids the object without the obstacle and avoiding the obstacle. can reach the object.

  20. 20 Khepera Robot Navigation � Khepera has 8 IR sensors measuring the distance to obstacles. � Task: explore unknown maze without collision Reward: +1 (forward) -2 (collision) 0 (others) Sensor value: 0 - 1030

  21. 21 State Space and Graph Discretize 8D state space by self-organizing map. 2D visualization 1000 800 600 400 200 Partitions 0 −200 −400 −1000 −800 −600 −400 −200 0 200 400 600 800 1000

  22. 22 Khepera Robot Navigation Ordinary Gaussian Geodesic Gaussian When facing obstacle, When facing obstacle, goes backward makes a turn (and goes forward again). (and go forward).

  23. 23 Experimental Results Average over 30 runs Geodesic Ordinary � Geodesic outperforms ordinary Gaussian.

  24. 24 Conclusion � Value function approximation: good basis function needed � Ordinary Gaussian kernel: tail goes over discontinuities � Geodesic Gaussian kernel: smooth along the state space � Through the experiments, we showed geodesic Gaussian is promising in high-dimensional continuous problems!

Recommend


More recommend