SDRL: Symbolic Deep Reinforcement Learning SDRL: Interpretable and Data-efficient Deep Liu Reinforcement Learning Introduction Background Leveraging Symbolic Planning Method Experiment Conclusion Bo Liu and Future Work Auburn University, Auburn, AL, USA 1 / 20
Collaborators SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work Daoming Lyu Fangkai Yang Steven Gustafson Auburn University NVIDIA Corporation Maana Inc. Auburn, AL, USA Redmond, WA, USA Bellevue, WA, USA 2 / 20
Sequential Decision-Making SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion Sequential decision-making (SDM) concerns an agent and Future Work making a sequence of actions based on its behavior in the environment. Deep reinforcement learning (DRL) achieves tremendous success on sequential decision-making problems using deep neural networks (Mnih et al., 2015) . 3 / 20
Challenge: Montezuma’s Revenge SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work The avatar: climbs down the ladder, jumps over a rotating skull, picks up the key (+100) , goes back and uses the key to open the right door (+300) . Vanilla DQN achieves 0 score (Mnih et al., 2015) . 4 / 20
Challenge: Montezuma’s Revenge SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work Problem: long horizon sequential actions , sparse and delayed reward . poor data efficiency. lack of interpretability. 5 / 20
Our Solution SDRL: Symbolic Deep Solution: task decomposition Reinforcement Learning Liu Symbolic planning: subtasks scheduling (high-level plan). Introduction DRL: subtask learning (low-level control). Background Meta-learner: subtask evaluation. Method Experiment Goal Conclusion and Future Work Symbolic planning drives learning, improving task-level interpretablility . DRL learns feasible subtasks, improving data-efficiency . 6 / 20
Background: Symbolic Planning with Action Language SDRL: Symbolic Action language (Gelfond & Lifschitz, 1998) : a formal, Deep Reinforcement declarative, logic-based language that describes dynamic Learning domains. Liu Dynamic domains can be represented as a transition Introduction system. Background Method Experiment Conclusion and Future Work 7 / 20
Action Language BC Action Language BC (Lee et al., 2013) is a language that SDRL: Symbolic describes the transition system using a set of causal laws . Deep Reinforcement dynamic laws describe transition of states Learning Liu move ( x , y 1 , y 2 ) causes on ( x , y 2 ) if on ( x , y 1 ) . Introduction static laws describe value of fluents inside a state Background Method intower ( x , y 2 ) if intower ( x , y 1 ) , on ( y 1 , y 2 ) . Experiment Conclusion and Future Work 8 / 20
Background: Reinforcement Learning SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Reinforcement learning is defined on a Markov Decision Experiment Process ( S , A , P a ss ′ , r , γ ). To achieve optimal behavior, a Conclusion and Future policy π : S × A �→ [0 , 1] is learned. Work An option is defined on the tuple ( I , π , β ), which enables the decision-making to have a hierarchical structure: the initiation set I ⊆ S , policy π : S × A �→ [0 , 1], probabilistic termination condition β : S �→ [0 , 1]. 9 / 20
SDRL: Symbolic Deep Reinforcement Learning SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Symbolic Planner : orchestrates sequence of subtasks Conclusion and Future using high-level symbolic plan. Work Controller : uses DRL approaches to learn the subpolicy for each subtask with intrinsic rewards . Meta-Controller : measures learning performance of subtasks, updates intrinsic goal to enable reward-driven plan improvement. 10 / 20
Symbolic Planner SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work 11 / 20
Symbolic Planner: Planning with Intrinsic Goal SDRL: Symbolic Deep Reinforcement Intrinsic goal : a linear constraint on plan quality Learning quality ≥ quality (Π t ) where Π t is the plan at episode t . Liu Plan quality: a utility function Introduction Background � ρ g i − 1 ( s i − 1 ) quality (Π t ) = Method Experiment � s i − 1 , g i − 1 , s i �∈ Π t Conclusion where ρ g i is the gain reward for subtask g i . and Future Work Symbolic planner: generates a new plan that explores new subtasks, exploits more rewarding subtasks. 12 / 20
From Symbolic Transition to Subtask SDRL: Symbolic Deep Reinforcement Learning Assumption: given the set S of symbolic states and � S of Liu sensory input, we assumed there is an Oracle for symbol grounding: F : S × � S �→ { t , f } . Introduction Given F and a pair of symbolic states s , s ′ ∈ S : Background Method s ∈ � initiation set I = { ˜ S : F ( s , ˜ s ) = t } , Experiment π : � S �→ � A is the subpolicy for the corresponding subtask, Conclusion β is the termination condition such that and Future Work � F ( s ′ , ˜ s ′ ) = t , for ˜ s ′ ∈ � 1 S , β (˜ s ′ ) = 0 otherwise . 13 / 20
Controller SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work 14 / 20
Controllers: DRL with Intrinsic Reward SDRL: Symbolic Deep Reinforcement Learning Intrinsic reward : pseudo-reward crafted by the human. Liu Given a subtask defined on ( I , π, β ), intrinsic reward Introduction � Background β (˜ s ′ ) = 1 φ r i (˜ Method s ′ ) = r otherwise Experiment Conclusion where φ is a positive constant encouraging achieving and Future Work subtasks and r is the reward from the environment at state ˜ s ′ . 15 / 20
Meta-Controller SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work 16 / 20
Meta-Controller: Evaluation with Extrinsic Reward SDRL: Symbolic Deep Reinforcement Learning Extrinsic rewards : r e ( s , g ) = f ( ǫ ) where ǫ can measure the competence of the learned subpolicy for each subtask. Liu For example, let ǫ be the success ratio , f can be defined Introduction as Background � − ψ ǫ < threshold Method f ( ǫ ) = r ( s , g ) ǫ ≥ threshold Experiment Conclusion and Future ψ is a positive constant to punish selecting unlearnable Work subtasks, r ( s , g ) is the cumulative environmental reward by following the subtask g . 17 / 20
Experimental Results I. SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work 18 / 20
Experimental Results II. SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work Baseline: Kulkarni et. al, Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , NIPS’2016. 19 / 20
Conclusion SDRL: Symbolic Deep Reinforcement Learning We present a SDRL framework features: Liu High-level symbolic planning based on intrinsic goal Introduction Low-level policy control with DRL. Background Subtask learning evaluation by a meta-learner. Method This is the first work on integrating symbolic planning Experiment with DRL that achieves both task-level interpretability Conclusion and Future and data-efficiency for decision-making. Work Future work. 20 / 20
Recommend
More recommend