Stochastic Optimal Control part 4 research issues, robotics - PowerPoint PPT Presentation

Stochastic Optimal Control – part 4 research issues, robotics applications Marc Toussaint Machine Learning & Robotics Group – TU Berlin mtoussai@cs.tu-berlin.de ICML 2008, Helsinki • challenges in stochastic optimal control • probabilistic inference approaches to control • robotics • model learning 1/14

challenges in stochastic optimal control • often said: “scale up” • Efficient Application in Real Systems! → try to extract the fundamental problems 2/14

research issues 1/3: structured state • notion of state (i.e., having one big state space) – curse of dimensionality – real systems are typically decomposed/modular/hierarchical/structured → exploit this! • interesting lines of work – Carlos Guestrin (PhD thesis) – probabilistic inference methods! (in graphical models, belief propagation, etc) – probabilistic inference for computing optimal policies 3/14

research issues 2/3: learning • learning – want to learn models from experience • interesting lines of work – ML for model learning in robotics 4/14

research issues 3/3: integration • integration – complex systems (e.g., robots) collect state information from many different modalities (sensors) – many subsystems (e.g, vision, position, haptics) – delayed/partial information – integration is hard 5/14

probabilistic inference approach • general idea: decision making, motion control and planning can be viewed as a problem of inferring a posterior over unknown variables (actions, control signals, whole trajectories) conditioned on available information (targets, goals, constraints) 6/14

probabilistic inference approach • given some model of the future: a 0 a 1 a 2 π x 0 x 1 x 2 r 0 r 1 r 2 (here a Markov-Decision Process with P ( x 0 ) , P ( x ′ | a, x ) , P ( r | a, x ) given, and the policy π ax = P ( a | x ) unknown) • condition it on something you want to see in the future • compute the posterior over actions/decisions to get there • Toussaint & Storkey (ICML 2006): proof that maximization of expected future rewards → likelihood maximization problem (EM-algorithm) [fwd-bwd video] 7/14

probabilistic inference approach [details: Toussaint, Storkey, ICML 2006] • problem: Find a policy π that maximizes V π = E { � ∞ t =0 γ t r t ; π } γ ∈ [0 , 1] with discount factor Maximizing the likelihood L π = P (ˆ • Theorem: r =1; π ) in the mixture of finite-time MDPs ( P ( T ) = γ T (1 − γ ) ) is equivalent to maximizing V π = E { � ∞ t =0 γ t r t ; π } in the original MDP . • problem of optimal policy → problem of likelihood maximization (EM-algorithm) [demo] 8/14

POMDP application • in POMDPs the agent needs some kind of memory b 0 b 1 b 2 y 0 a 0 y 1 a 1 y 2 a 2 x 0 x 1 x 2 r 0 r 1 r 2 • mazes: T-junctions, halls & corridors (379 locations, 1516 states) (Toussaint, Harmeling, Storkey & 2006) 9/14

POMDP application • UAI paper persented on Friday: Marc Toussaint, Laurent Charlin, Pascal Poupart: Hierarchical POMDP Controller Optimization by Likelihood Maximization N 2 N ′ 2 N 2 N ′ 2 N 2 N ′ 2 N 2 N 1 N 0 SS ′ E 1 N 2 N 1 N 0 S ′ N 1 N ′ 1 N 1 N ′ 1 N 1 N ′ 1 N 2 N ′ 2 N 1 N 0 S ′ E 0 N ′ 2 N 1 N 0 S ′ N 0 N ′ 0 N 0 N ′ 0 N 0 N ′ 0 N ′ 2 N 1 N ′ 1 N 0 S ′ O ′ O ′ N ′ 2 N ′ 1 N 0 S ′ O A O A N ′ 2 N ′ 1 N 0 N ′ 0 S ′ S ′ S ′ S ′ S S S V ∗ | S | , | A | , | O | HSVI2 Best results from [1] ML approach (avg. over 10 runs) Problem nodes t(s) nodes t(s) V V V paint 4, 4, 2 3.28 3 . 29 ± 0 . 04 (1,3) < 1 3.29 (5,3) 0 . 96 ± 0 . 3 3 . 26 ± 0 . 004 shuttle 8, 3, 5 32.7 32 . 9 ± 0 . 8 (1,3) 2 31.87 (5,3) 2 . 81 ± 0 . 2 31 . 6 ± 0 . 5 4x4 maze 16, 4, 2 3.7 3 . 75 ± 0 . 1 (1,2) 30 3.73 (3,3) 2 . 8 ± 0 . 8 3 . 72 ± 8 e − 5 157 . 1 ± 0 6 . 4 ± 0 . 2 151 . 6 ± 2 . 6 chain-of-chains 10, 4, 1 157.1 (3,3) 10 0.0 (10,3) handwashing 84, 7, 12 � 1052 N/A N/A (10,5) 655 ± 2 984 ± 1 − 9 ± 11(2 . 25 ∗ ) cheese-taxi 33, 7, 10 � 5.3 2 . 53 ± 0 . 3 N/A (10,3) 311 ± 14

robotic motion inference application 100 bayes (repeats) bayes (fwd-bwd) gradient (direct) four task variables gradient (spline) 10 (MAP) cost – position of right finger – collision with objects 1 – balance – comfortableness 0.1 0 1 2 3 4 5 6 time (sec) (Toussaint & Goerick, IROS 2007) 11/14

on Asimo • Toussaint, Gienger & Goerick (Humanoids 2007): Optimization of sequential attractor-based movement for compact behavior generation (other technique than inference) Time: 3s Time: 3s Time: 4s Control points: 8 Control points: 4 Control points: 10 Controlled: Both hands position Controlled: Left hand position Controlled: Both hands position and attitude and attitude and attitude 12/14

model learning • Control of a dynamic robot system dynamics: f : x, ˙ x, u �→ ¨ x x ∗ �→ u learning inverse model φ : x, ˙ x, ¨ [learn] [pole] (methods: A. Moore, C. Atkeson, S. Schaal, S. Vijayakumar, et al) 13/14

conclusions core of optimal control DP Bellman LQG HJB RL inference MDPs Value Iteration Path integral likelihood max TD Q-learning posterior trajectories/control Bayesian RL E^3 graphical models - state estimation - sensor processing • exciting potential for Machine Learning methods – structured state, abstraction, learning, integration • integrative view from ML perspective possible 14/14

Stochastic Optimal Control part 4 research issues, robotics - PowerPoint PPT Presentation

Stochastic Optimal Control part 4 research issues, robotics applications Marc Toussaint Machine Learning & Robotics Group TU Berlin mtoussai@cs.tu-berlin.de ICML 2008, Helsinki challenges in stochastic optimal control

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

Sensitivity analysis for optimal control problems. Chance-constrained stochastic optimal control.

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Industrial Robots Industrial Robots Control Control Part 2 Control Control Part 2 Part 2

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Stochastic Control for Pairs Trading Hui Gong, UCL

Optimal control of stochastic delay equations and time-advanced backward stochastic differential

An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule Touqir Sajed

Some References P. Carpentier Master MMMEF Cours MNOS 2014-2015 263 / 263 Stochastic

Stochastic Optimization and Discretization January 06, 2021 P. Carpentier Master Optimization

Optimal management of pension funds: a stochastic control approach Fausto Gozzi, LUISS Guido

Finance, Insurance, and Stochastic Control (I) Jin Ma Spring School on Stochastic Control in

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Sharing Scientific Data: Scenarios and Challenges Benjamin Aziz 2 , Shirley Crompton 1 and Michael

Introduction to HCI Methods and the Design of Studies Guest Lecturer: Marshini Chetty Slides

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Experimental Research Research Methods in HCI Lazar, Feng, and Hochheiser Laboratory vs.

Export Control Training for OSR Research Administrators Judy Faubert March 16, 2017 Objectives

LHC LLRF Models, tools and Studies Feedback Control of SPS E-Cloud/TMCI Instabilities LARP DOE

Scope of the Problem Liver Disease Should be Resected First (with the primary tumor)

Causal set is a partially ordered set defined as: a b if and only if one can travel from a to b