Approximate information state for partially observed systems Jayakumar Subramanian and Aditya Mahajan McGill University Thanks to Amit Sinha and Raihan Seraj for simulation results IEEE Conference on Decision and Control 11 December 2019
Approx. info. state–(Subramanian and Mahajan) 1 Many successes of RL in recent years Algorithms based on comprehensive theory
Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Many successes of RL in recent years Algorithms based on comprehensive theory
Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Arcade games Many successes of RL in recent years Algorithms based on comprehensive theory
Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Arcade games Robotics Many successes of RL in recent years Algorithms based on comprehensive theory
Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Arcade games Robotics Many successes of RL in recent years Algorithms based on comprehensive theory restricted almost exclusively to systems with perfect state observations. Applications with partially observed state Healthcare Autonomous driving Finance (portfolio management) Retail and marketing
Approx. info. state–(Subramanian and Mahajan) 1 Alpha Go Arcade games Robotics Many successes of RL in recent years Algorithms based on comprehensive theory restricted almost exclusively to systems with perfect state observations. Applications with partially observed state Healthcare Autonomous driving Finance (portfolio management) Retail and marketing Develop a comprehensive theory of approximate DP and RL for partially observed systems
Notion of information state for partially observed systems
Approx. info. state–(Subramanian and Mahajan) 2 Stochastic System Controlled input: U t Stochastic input: W t Output: Y t Notion of state in partially observed stochastic dynamical systems Y t = f t (U 1:t , W 1:t ).
Approx. info. state–(Subramanian and Mahajan) 2 Stochastic System Controlled input: U t Stochastic input: W t Output: Y t STOCHASTIC INPUT IS NOT OBSERVED of inputs and OUTPUTS until time t . Notion of state in partially observed stochastic dynamical systems Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ).
Approx. info. state–(Subramanian and Mahajan) 2 Baum and Petrie, “Statistical inference for probabilistic functions of fjnite state Markov chains,” 1966. stochastic systems,” 1965. Striebel, “Suffjcient statistics in the optimal control of Astrom, “Optimal control of Markov decision processes with incomplete state information,” 1965. Notion of state in partially observed stochastic dynamical systems s ∈ 𝒯. the stochastic inputs are observed. TRADITIONAL SOLUTION: BELIEF STATES of inputs and OUTPUTS until time t . STOCHASTIC INPUT IS NOT OBSERVED Output: Y t Stochastic input: W t Controlled input: U t System Stochastic Stratonovich, “Conditional Markov processes,” 1960. Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ). Step 1 Identify a state {S t } t≥0 for predicting output assuming that Step 2 Defjne a BELIEF STATE B t ∈ Δ(𝒯) : B t (s) = ℙ (S t = s | H t = h t ),
Approx. info. state–(Subramanian and Mahajan) 3 Value function is piecewise linear and convex. Is exploited by various effjcient algorithms. Partially observed Markov decision processes (POMDPs): Pros and Cons of belief state representation Smallwood and Sondik, “The optimal control of partially observable Markov process over a fjnite horizon,” 1973. Chen, “Algorithms for partially observable Markov decision processes,” 1988. Kaelbling, Littmam, Cassandra, “Planning and acting in partially observable stochastic domains,” 1998. Pineau, Gordon, Thrun, “Point-based value iteration: an anytime algorithm for POMDPs,” 2003.
Approx. info. state–(Subramanian and Mahajan) 3 Value function is piecewise linear and convex. Is exploited by various effjcient algorithms. When the state space model is not known analytically (as is the case for black-box models and simulators as well as some real world application such as healthcare), belief states are diffjcult to construct and diffjcult to approximate from data. Partially observed Markov decision processes (POMDPs): Pros and Cons of belief state representation Smallwood and Sondik, “The optimal control of partially observable Markov process over a fjnite horizon,” 1973. Chen, “Algorithms for partially observable Markov decision processes,” 1988. Kaelbling, Littmam, Cassandra, “Planning and acting in partially observable stochastic domains,” 1998. Pineau, Gordon, Thrun, “Point-based value iteration: an anytime algorithm for POMDPs,” 2003.
Is there another ways to model partially observed systems which is more amenable to approximations? Let’s go back to first principles.
Approx. info. state–(Subramanian and Mahajan) 4 Stochastic System Controlled input: U t Stochastic input: W t Output: Y t WHEN THE STOCHASTIC INPUT IS NOT OBSERVED of inputs and OUTPUTS until time t . Notion of state in partially observed stochastic dynamical systems Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ).
Approx. info. state–(Subramanian and Mahajan) 4 a.s. Y (1) if for all future inputs (U t:T , W t:T ) , t ∼ H (2) t H (1) PREDICTING OUTPUTS ALMOST SURELY of inputs and OUTPUTS until time t . WHEN THE STOCHASTIC INPUT IS NOT OBSERVED Output: Y t Stochastic input: W t Controlled input: U t System Stochastic Notion of state in partially observed stochastic dynamical systems Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ). t:T = Y (2) t:T ,
Approx. info. state–(Subramanian and Mahajan) 4 Grassberger, “Complexity and forecasting in dynamical systems,” 1988. Notion of state in partially observed stochastic dynamical systems , U t:T ) t , U t:T ) = ℙ (Y (2) t ℙ (Y (1) if for all future CONTROL inputs U t:T , t ∼ H (2) t H (1) FORECASTING OUTPUTS IN DISTRIBUTION a.s. Cruthfjeld and Young, “Inferring statistical complexity,” 1989. Y (1) if for all future inputs (U t:T , W t:T ) , Stochastic System Controlled input: U t Stochastic input: W t Output: Y t WHEN THE STOCHASTIC INPUT IS NOT OBSERVED of inputs and OUTPUTS until time t . PREDICTING OUTPUTS ALMOST SURELY H (1) t ∼ H (2) t Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ). t:T = Y (2) t:T , t:T | H (1) t:T | H (2)
Approx. info. state–(Subramanian and Mahajan) ℙ (Y (1) FORECASTING OUTPUTS IN DISTRIBUTION H (1) t ∼ H (2) t if for all future CONTROL inputs U t:T , t 4 , U t:T ) = ℙ (Y (2) t , U t:T ) Too restrictive . . . Notion of state in partially observed stochastic dynamical systems Grassberger, “Complexity and forecasting in dynamical systems,” 1988. a.s. Cruthfjeld and Young, “Inferring statistical complexity,” 1989. Controlled input: U t System Stochastic input: W t Output: Y t Stochastic WHEN THE STOCHASTIC INPUT IS NOT OBSERVED Y (1) of inputs and OUTPUTS until time t . PREDICTING OUTPUTS ALMOST SURELY H (1) t ∼ H (2) t if for all future inputs (U t:T , W t:T ) , Let H t = (Y 1:t−1 , U 1:t−1 ) denote the history Y t = f t (U 1:t , W 1:t ). t:T = Y (2) t:T , t:T | H (1) t:T | H (2)
Approx. info. state–(Subramanian and Mahajan) 5 FORECASTING OUTPUTS IN DISTRIBUTION H (1) t ∼ H (2) t if for all future CONTROL inputs U t:T , ℙ (Y (1) t , U t:T ) = ℙ (Y (2) t , U t:T ) Now let’s consturct the state space t:T | H (1) t:T | H (2)
Approx. info. state–(Subramanian and Mahajan) t SUFFICIENT TO PREDICT OUTPUT: SUFFICIENT TO PREDICT ITSELF: of past inputs that satisfjes the following: PROPERTIES OF INFORMATION STATE , U t:T ) t 5 , U t:T ) = ℙ (Y (2) ℙ (Y (1) if for all future CONTROL inputs U t:T , t ∼ H (2) t H (1) FORECASTING OUTPUTS IN DISTRIBUTION Now let’s consturct the state space t:T | H (1) t:T | H (2) The info state Z t at time t is a “compression” ℙ (Z t+1 | H t , U t ) = ℙ (Z t+1 | Z t , U t ). ℙ (Y t | H t , U t ) = ℙ (Y t | Z t , U t ).
Approx. info. state–(Subramanian and Mahajan) 5 Step 1 for belief state formulations) case of perfect observations (which was suffjcient for forecasting outputs for the Same complexity as identifying the state SUFFICIENT TO PREDICT OUTPUT: SUFFICIENT TO PREDICT ITSELF: of past inputs that satisfjes the following: PROPERTIES OF INFORMATION STATE , U t:T ) t , U t:T ) = ℙ (Y (2) t ℙ (Y (1) if for all future CONTROL inputs U t:T , t ∼ H (2) t H (1) FORECASTING OUTPUTS IN DISTRIBUTION Now let’s consturct the state space t:T | H (1) t:T | H (2) The info state Z t at time t is a “compression” ℙ (Z t+1 | H t , U t ) = ℙ (Z t+1 | Z t , U t ). ℙ (Y t | H t , U t ) = ℙ (Y t | Z t , U t ).
Recommend
More recommend