reinforcement learning through the optimization lens
play

reinforcement learning through the optimization lens Benjamin Recht - PowerPoint PPT Presentation

reinforcement learning through the optimization lens Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory ! Reinforcement Learning is the study of how to use past data to enhance the future


  1. reinforcement learning through the optimization lens Benjamin Recht University of California, Berkeley

  2. trustable, scalable, predictable

  3. Control Theory ! Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system

  4. Disciplinary Biases AE/CE/EE/ME CS Reinforcement Control Theory Learning RL Control continuous discrete model action data action IEEE Transactions Science Magazine

  5. Disciplinary Biases AE/CE/EE/ME CS Reinforcement Today’s talk will try to unify these camps and point Control Theory Learning out how to merge their perspectives. RL Control continuous discrete model action data action IEEE Transactions Science Magazine

  6. Main research challenge: What are the fundamental limits of learning systems that interact with the physical environment? How well must we understand a system in order to control it? •statistical learning theory theoretical •robust control theory foundations •core optimization

  7. Control theory is the study of dynamical systems with inputs y u x t + 1 = Ax t + Bu t G y t = Cx t + Du t x t Simplest case of such systems are linear systems x t is called the state , and the dimension of the state is called the degree, d. u t is called the input , and the dimension is p. y t is called the output , and the dimension is q. For today, will only consider C = I , D = 0 ( x t observed)

  8. Reinforcement e t e r c Learning s d i Control theory is the study of dynamical systems with inputs ^ y u p ( x t + 1 | past ) = p ( x t + 1 | x t , u t ) G p ( y t | past ) = p ( y t | x t , u t ) x t Simplest example: Partially Observed Markov Decision Process (POMDP) x t is the state , and it takes values in [d] u t is called the input , and takes values in [p]. y t is called the output , and takes values in [q]. For today, will only consider when x t observed (MDP).

Recommend


More recommend