Recurrent Predictive State Policy ( RPSP ) Networks ICML 2018, - PowerPoint PPT Presentation

Instituto de Telecomunicações Instituto de Sistemas e Robótica Robotics Institute Instituto Superior Técnico Carnegie Mellon University Recurrent Predictive State Policy ( RPSP ) Networks ICML 2018, Stockholm Sweden July 12, 2018 Zita Marinho Co-authors: zmarinho@cmu.edu Ahmed Hefny, CMU ( equal contribution ) Wen Sun, CMU Siddhartha S. Srinivasa, UW/CMU Geoffrey J. Gordon, CMU/Microsoft

Policy learning and model learning partial obs actions a 1 a 2 … a t o 1 o 2 … o t policy π robot joint torques robot joint angles Recurrent Predictive State Policy nets 2 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Recurrent Predictive State Policy Nets ! " sample actions 4 ! " ! "#$ ! "#$ ! "#$ 6 " Σ # "3( # "'( # " # "'2 states - predictive states ` pred $ 0 " observations Recurrent Predictive State Policy nets 3 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Predictive State Representations h t = [o t-n:t , a t-n:t ] history future o 1 o 2 o t … o t-1 … o t+k o t+k+1 a t a 1 a 2 … … a t+k a t+k+1 a t-1 q t predictive state → E [ o t : t + k | h t ; a t : t + k ] q t − Boots et al. 2009 sufficient statistic of conditional future observations TPSRs, Rosencrantz et al. 2004 Littman et al. 2001, Jaeger et al.1998 Recurrent Predictive State Policy nets 4 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Predictive State Representations Prediction W pred q t history future o 1 o 2 o t … o t-1 … o t+k o t+k+1 a t a 1 a 2 … … a t+k a t+k+1 a t-1 linear transformation in feature space (RKHS) Recurrent Predictive State Policy nets 5 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Predictive State Representations Filtering W ext q t q t+1 o 1 o 2 o t … o t-1 … o t+k o t+k+1 a t a 1 a 2 … … a t+k a t+k+1 a t-1 PSR Filter state update q t +1 = f cond ( W ext q t , a t , o t ) in RKHS this is kernel Bayes' rule (Fukumizu et al. 2013) Recurrent Predictive State Policy nets 6 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Predictive State Representations ? W pred W ext q 0 how do we learn a PSR Boots et al. 2011, Hefny et al. 2015, Sun et al. 2015 o 1 o 2 o t … o t-1 … o t+k o t+k+1 a t a 1 a 2 … … a t+k a t+k+1 a t-1 q t no reward signal … reduction to supervised learning !!!! Recurrent Predictive State Policy nets 7 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Recurrent Predictive State Policy Nets Why use PSRs as filter? Consistent initialization Predictive State + Method of moments • • Non-linear dynamics Kernel-based representation • • Scalable learning algorithm Random projections • • Robustness and sample efficiency Local refinement by BPTT • • # "'( # "3( $ ) # "'2 # " PSR states *+,- %&" PSR $ ! " 0 " ./%- observations 0 1 " Recurrent Predictive State Policy nets 8 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Recurrent Predictive State Policy Nets π θ ) " ! " actions sample sample reactive policy 4 θ re 6 " Σ ) # "'( ! "#$ ! "($ # "3( $ # "'2 ! "#' # " ! " PSR states *+,- %&" θ PSR PSR $ 0 " ! " ./%- observations % " 0 % 1 " & " Z. Marinho,A. Hefny, W. Sun G. Gordon, S. Srinivasa ICML 2018 (under review) Recurrent Predictive State Policy nets 9 zmarinho@cmu.edu | ICML 2018 - poster #200 |

RPSP Initialization π θ ) " PSR initialization with Method of Moments actions sample • efficient and consistent Boots 11, Hefny et al. 2015 • does not require interaction (reward signal) reactive policy Downey et al. 2017 • differentiable can be trained end-to-end θ re ! "#$ ! "($ ! "#' ! " PSR states θ PSR PSR observations % " % & " Recurrent Predictive State Policy nets 10 zmarinho@cmu.edu | ICML 2018 - poster #200 |

RPSP Optimization actions reward Cumulative reward a t r t PSR states q t accomplish the task observations Prediction error ˆ o t o t ` pred keep model accurate Recurrent Predictive State Policy nets 11 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Algorithm ' " sample 1. Initialize PSR θ re initialize ! "#$ ! " θ PSR % " % & " Recurrent Predictive State Policy nets 12 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Algorithm 1. Initialize PSR 2. Optimize on a batch of trajectories o 1 o 2 … o t o t+1 … o t+k a 1 a 2 … a t a t+1 … a t+k r 1 r 2 … r t r t+1 … r t+k ` pred J ( π θ ) Recurrent Predictive State Policy nets 13 zmarinho@cmu.edu | ICML 2018 - poster #200 |

RPSP Optimization learning via policy gradient alternate optimization joint opt REINFORCE “Vanilla” Policy Gradient Natural Gradient - higher variance - requires Hessian vector mult. + faster , simpler + smoother policy changes Schulman et al. 2015 Williams et al. 1992 • direct policy estimation • applicable to any robust gradient optimizer Recurrent Predictive State Policy nets 14 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Experiments OpenAI Gym MUJOCO environments • partial observations (joints/ no vel.) • continuous observations • continous actions Swimmer CartPole Walker2d Hopper 3 joints 2 joints 8 joints 5 joints 6 DoFs 3 DoFs 2 DoFs 1DoF Recurrent Predictive State Policy nets 15 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Experiments Cross-environment performance Recurrent Predictive State Policy nets 16 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Conclusions • combine PSR filter + reactive network for partial environments • make use of consistent initialization methods for the filter • make use of prediction loss to improve policy • end-to-end policy learning algorithm Recurrent Predictive State Policy nets 17 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Thank you! zmarinho@cmu.edu Questions ? Come See US @ POSTER #200 This research was supported by the Portuguese Foundation of Science and Technology under grant SFRH/BD/52015/2012. Recurrent Predictive State Policy nets 18 zmarinho@cmu.edu | ICML 2018 - poster #200 |

Recurrent Predictive State Policy ( RPSP ) Networks ICML 2018, - PowerPoint PPT Presentation

Instituto de Telecomunicaes Instituto de Sistemas e Robtica Robotics Institute Instituto Superior Tcnico Carnegie Mellon University Recurrent Predictive State Policy ( RPSP ) Networks ICML 2018, Stockholm Sweden July 12, 2018 Zita

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Introduction to Recurrent Neural Networks Jakob Verbeek Modeling sequential data with Recurrent

Introduction CSCE CSCE 496/896 496/896 Lecture 6: Lecture 6: Recurrent Recurrent CSCE

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur

Economics for Business Macroeconomics Monetary policy Fiscal policy Supply side

Burglars were broken into our house. - English Passive Constructions in the Written

Why Buy Radian? NYSE: RDN www.radian.biz 1 POST CRISIS U.S. HOUSING MARKET Improved Credit and

APRAs approach to AASB 17 Discussion with AASB 17 TRG 1 Content 1. Outcomes desired from

Energy Efficiency and 111(d) A resource and tool for Clean Air Act Compliance Presented by Sara

THE TWO-STREAM TRANSVERSE INSTABILITY & BEAM PERFORMANCE LIMITATION Vadim Dudnikov,

PULSAR FORMATION RATES HOW HOW ? ?

year on Paul Smith, Head of Policy, PSR 1 Contents The PSR after a year What have we

Recurrent Predictive State Policy ( RPSP ) Networks ICML 2018, - PowerPoint PPT Presentation

Instituto de Telecomunicaes Instituto de Sistemas e Robtica Robotics Institute Instituto Superior Tcnico Carnegie Mellon University Recurrent Predictive State Policy ( RPSP ) Networks ICML 2018, Stockholm Sweden July 12, 2018 Zita

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Introduction to Recurrent Neural Networks Jakob Verbeek Modeling sequential data with Recurrent

Introduction CSCE CSCE 496/896 496/896 Lecture 6: Lecture 6: Recurrent Recurrent CSCE

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur

Economics for Business Macroeconomics Monetary policy Fiscal policy Supply side

Burglars were broken into our house. - English Passive Constructions in the Written

Why Buy Radian? NYSE: RDN www.radian.biz 1 POST CRISIS U.S. HOUSING MARKET Improved Credit and

APRAs approach to AASB 17 Discussion with AASB 17 TRG 1 Content 1. Outcomes desired from

Energy Efficiency and 111(d) A resource and tool for Clean Air Act Compliance Presented by Sara

THE TWO-STREAM TRANSVERSE INSTABILITY &amp; BEAM PERFORMANCE LIMITATION Vadim Dudnikov,

PULSAR FORMATION RATES HOW HOW ? ?

year on Paul Smith, Head of Policy, PSR 1 Contents The PSR after a year What have we

THE TWO-STREAM TRANSVERSE INSTABILITY & BEAM PERFORMANCE LIMITATION Vadim Dudnikov,