SDRL: Interpretable and Data-efficient Deep Liu Reinforcement - PowerPoint PPT Presentation

SDRL: Symbolic Deep Reinforcement Learning SDRL: Interpretable and Data-efficient Deep Liu Reinforcement Learning Introduction Background Leveraging Symbolic Planning Method Experiment Conclusion Bo Liu and Future Work Auburn University, Auburn, AL, USA 1 / 20

Collaborators SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work Daoming Lyu Fangkai Yang Steven Gustafson Auburn University NVIDIA Corporation Maana Inc. Auburn, AL, USA Redmond, WA, USA Bellevue, WA, USA 2 / 20

Sequential Decision-Making SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion Sequential decision-making (SDM) concerns an agent and Future Work making a sequence of actions based on its behavior in the environment. Deep reinforcement learning (DRL) achieves tremendous success on sequential decision-making problems using deep neural networks (Mnih et al., 2015) . 3 / 20

Challenge: Montezuma’s Revenge SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work The avatar: climbs down the ladder, jumps over a rotating skull, picks up the key (+100) , goes back and uses the key to open the right door (+300) . Vanilla DQN achieves 0 score (Mnih et al., 2015) . 4 / 20

Challenge: Montezuma’s Revenge SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work Problem: long horizon sequential actions , sparse and delayed reward . poor data efficiency. lack of interpretability. 5 / 20

Our Solution SDRL: Symbolic Deep Solution: task decomposition Reinforcement Learning Liu Symbolic planning: subtasks scheduling (high-level plan). Introduction DRL: subtask learning (low-level control). Background Meta-learner: subtask evaluation. Method Experiment Goal Conclusion and Future Work Symbolic planning drives learning, improving task-level interpretablility . DRL learns feasible subtasks, improving data-efficiency . 6 / 20

Background: Symbolic Planning with Action Language SDRL: Symbolic Action language (Gelfond & Lifschitz, 1998) : a formal, Deep Reinforcement declarative, logic-based language that describes dynamic Learning domains. Liu Dynamic domains can be represented as a transition Introduction system. Background Method Experiment Conclusion and Future Work 7 / 20

Action Language BC Action Language BC (Lee et al., 2013) is a language that SDRL: Symbolic describes the transition system using a set of causal laws . Deep Reinforcement dynamic laws describe transition of states Learning Liu move ( x , y 1 , y 2 ) causes on ( x , y 2 ) if on ( x , y 1 ) . Introduction static laws describe value of fluents inside a state Background Method intower ( x , y 2 ) if intower ( x , y 1 ) , on ( y 1 , y 2 ) . Experiment Conclusion and Future Work 8 / 20

Background: Reinforcement Learning SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Reinforcement learning is defined on a Markov Decision Experiment Process ( S , A , P a ss ′ , r , γ ). To achieve optimal behavior, a Conclusion and Future policy π : S × A �→ [0 , 1] is learned. Work An option is defined on the tuple ( I , π , β ), which enables the decision-making to have a hierarchical structure: the initiation set I ⊆ S , policy π : S × A �→ [0 , 1], probabilistic termination condition β : S �→ [0 , 1]. 9 / 20

SDRL: Symbolic Deep Reinforcement Learning SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Symbolic Planner : orchestrates sequence of subtasks Conclusion and Future using high-level symbolic plan. Work Controller : uses DRL approaches to learn the subpolicy for each subtask with intrinsic rewards . Meta-Controller : measures learning performance of subtasks, updates intrinsic goal to enable reward-driven plan improvement. 10 / 20

Symbolic Planner SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work 11 / 20

Symbolic Planner: Planning with Intrinsic Goal SDRL: Symbolic Deep Reinforcement Intrinsic goal : a linear constraint on plan quality Learning quality ≥ quality (Π t ) where Π t is the plan at episode t . Liu Plan quality: a utility function Introduction Background � ρ g i − 1 ( s i − 1 ) quality (Π t ) = Method Experiment � s i − 1 , g i − 1 , s i �∈ Π t Conclusion where ρ g i is the gain reward for subtask g i . and Future Work Symbolic planner: generates a new plan that explores new subtasks, exploits more rewarding subtasks. 12 / 20

From Symbolic Transition to Subtask SDRL: Symbolic Deep Reinforcement Learning Assumption: given the set S of symbolic states and � S of Liu sensory input, we assumed there is an Oracle for symbol grounding: F : S × � S �→ { t , f } . Introduction Given F and a pair of symbolic states s , s ′ ∈ S : Background Method s ∈ � initiation set I = { ˜ S : F ( s , ˜ s ) = t } , Experiment π : � S �→ � A is the subpolicy for the corresponding subtask, Conclusion β is the termination condition such that and Future Work � F ( s ′ , ˜ s ′ ) = t , for ˜ s ′ ∈ � 1 S , β (˜ s ′ ) = 0 otherwise . 13 / 20

Controller SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work 14 / 20

Controllers: DRL with Intrinsic Reward SDRL: Symbolic Deep Reinforcement Learning Intrinsic reward : pseudo-reward crafted by the human. Liu Given a subtask defined on ( I , π, β ), intrinsic reward Introduction � Background β (˜ s ′ ) = 1 φ r i (˜ Method s ′ ) = r otherwise Experiment Conclusion where φ is a positive constant encouraging achieving and Future Work subtasks and r is the reward from the environment at state ˜ s ′ . 15 / 20

Meta-Controller SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work 16 / 20

Meta-Controller: Evaluation with Extrinsic Reward SDRL: Symbolic Deep Reinforcement Learning Extrinsic rewards : r e ( s , g ) = f ( ǫ ) where ǫ can measure the competence of the learned subpolicy for each subtask. Liu For example, let ǫ be the success ratio , f can be defined Introduction as Background � − ψ ǫ < threshold Method f ( ǫ ) = r ( s , g ) ǫ ≥ threshold Experiment Conclusion and Future ψ is a positive constant to punish selecting unlearnable Work subtasks, r ( s , g ) is the cumulative environmental reward by following the subtask g . 17 / 20

Experimental Results I. SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work 18 / 20

Experimental Results II. SDRL: Symbolic Deep Reinforcement Learning Liu Introduction Background Method Experiment Conclusion and Future Work Baseline: Kulkarni et. al, Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , NIPS’2016. 19 / 20

Conclusion SDRL: Symbolic Deep Reinforcement Learning We present a SDRL framework features: Liu High-level symbolic planning based on intrinsic goal Introduction Low-level policy control with DRL. Background Subtask learning evaluation by a meta-learner. Method This is the first work on integrating symbolic planning Experiment with DRL that achieves both task-level interpretability Conclusion and Future and data-efficiency for decision-making. Work Future work. 20 / 20

SDRL: Interpretable and Data-efficient Deep Liu Reinforcement - PowerPoint PPT Presentation

SDRL: Symbolic Deep Reinforcement Learning SDRL: Interpretable and Data-efficient Deep Liu Reinforcement Learning Introduction Background Leveraging Symbolic Planning Method Experiment Conclusion Bo Liu and Future Work Auburn

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Towards Interpretable Deep Learning for Natural Language Processing Roy Schwartz University of

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Not Just a Black Box: Interpretable Deep Learning for Genomics Presented by: AvanA Shrikumar 1

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Interpretable Multimodal Deep Learning for Objective Diagnosis, Prognosis and Biomarker Discovery

EMBC Tutorial on Interpretable and Transparent Deep Learning Wojciech Samek Grgoire Montavon

EMBC Tutorial on Interpretable and Transparent Deep Learning Wojciech Samek Grgoire Montavon

Layer-wise Relevance Propagation in Neural Neural Networks Deep Learning Shortcomings Networks

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V.

Verification of Agents learning through Reinforcement Shashank Pathak 12 Giorgio Metta 12 Luca

Reinforcement Learning: Part 2 Chris Watkins Department of Computer Science Royal Holloway,

\ Task Scheduling in High-Performance Computing Thomas McSweeney School of Mathematics The

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1,

Deep Reinforcement Learning Prof. Kuan-Ting Lai 2020/3/5 Course Requirements Kaggle-style

Reinforcement Learning: Basic models and algorithms Optimal decisions, Part VII Christos

SDRL: Interpretable and Data-efficient Deep Liu Reinforcement - PowerPoint PPT Presentation

SDRL: Symbolic Deep Reinforcement Learning SDRL: Interpretable and Data-efficient Deep Liu Reinforcement Learning Introduction Background Leveraging Symbolic Planning Method Experiment Conclusion Bo Liu and Future Work Auburn

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan&gt; Shrikumar, Peyton

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Towards Interpretable Deep Learning for Natural Language Processing Roy Schwartz University of

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Not Just a Black Box: Interpretable Deep Learning for Genomics Presented by: AvanA Shrikumar 1

Interpretable &amp; Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Interpretable Multimodal Deep Learning for Objective Diagnosis, Prognosis and Biomarker Discovery

EMBC Tutorial on Interpretable and Transparent Deep Learning Wojciech Samek Grgoire Montavon

EMBC Tutorial on Interpretable and Transparent Deep Learning Wojciech Samek Grgoire Montavon

Layer-wise Relevance Propagation in Neural Neural Networks Deep Learning Shortcomings Networks

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V.

Verification of Agents learning through Reinforcement Shashank Pathak 12 Giorgio Metta 12 Luca

Reinforcement Learning: Part 2 Chris Watkins Department of Computer Science Royal Holloway,

\ Task Scheduling in High-Performance Computing Thomas McSweeney School of Mathematics The

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1,

Deep Reinforcement Learning Prof. Kuan-Ting Lai 2020/3/5 Course Requirements Kaggle-style

Reinforcement Learning: Basic models and algorithms Optimal decisions, Part VII Christos

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech