Learning Transferable Graph Exploration Hanjun Dai, Yujia Li, - PowerPoint PPT Presentation

Learning Transferable Graph Exploration Hanjun Dai, Yujia Li, Chenglong Wang, Rishabh Singh, Po-Sen Huang, Pushmeet Kohli 33rd Conference on Neural Information Processing Systems, Vancouver, Canada. November 15, 2019 1

State-space Coverage Problem Goal: given an environment, efficiently reach as many distinct states as possible. Examples: • model checking: design test inputs to expose as many potential errors as possible • active map building: construct a map of unknown environment efficiently • exploration in reinforcement learning in general 2

Common Approaches: Undirected Exploration High-level Idea: randomly choose states to visit / actions to take Examples: 1. Random Walk on Graph [2]: • cover time (expected number of steps to reach every node) depends on graph structure • lower-bound on cover time: O ( nlogn ); upper-bound: O ( n 3 ). 2. ǫ -greedy Exploration: • select random action with probability ǫ • prevents (to some extent) being locked onto suboptimal action 3. Learning to Prune: more on this later! 3

Common Approaches: Directed Exploration High-level Idea: optimize objective that encourages exploration / coverage (usually some kind of “quantified uncertainty”) Examples: 1. UCB for Bandit Problems: • in addition to maximizing the reward, encourage exploring � ln t unselected actions by the term N t ( a ) 2. Intrinsic Motivations in RL: • pseudo-count (similar to UCB): rewards change in state density estimates • information gain: take actions from which you learn about the environment (reduces entropy) • predictive error: encourage actions that lead to unpredictable outcome (for instance unseen states) Reference: Sergey Levine’s Deep Reinforcement Learning Course 2017, Lecture 13 4

Exploration on Graphs • goal is to efficiently reach as many vertices as possible • effectiveness of random walk greatly depends on the graph structure Motivation: given the distribution of graphs in training time, can the algorithm learn efficient covering strategy [1]? 5

Problem Setup Environment: Graph-structured state-space • at time t , the agent observes a graph G t − 1 = { E t − 1 , V t − 1 } , and a coverage mask c t − 1 : V t − 1 → { 0 , 1 } indicating the nodes explored so far • the agent takes an action a t and receives a new graph G t • number of steps / actions can be seen as budget for exploration (to be minimized) Goal of Learning : • learn exploration strategy such that given an unseen environment (from the same distribution as training environment), the agent can efficiently visit as many unique states as possible 6

Defining the Reward Maximize the number of visited nodes: c t ( v ) c t ( v ) c t − 1 ( v ) � � � max |V| ; equivalently, r t = |V| − . |V| { a 1 , a 2 ... a t } v ∈ V t v ∈ V t v ∈ V t − 1 Objective: � T � � r G � � max , { θ 1 ,θ 2 ...θ t } E G∼D E a G t ∼ π ( a | h G t t ,θ t ) t =1 • h t = { ( a i , G i , c i ) } t i =1 is the exploration history • π ( a | h t , θ t ) is an action policy at time t parameterized by θ t • D is the distribution of environments Agent trained with the advantage actor critic algorithm (A2C) [3] 7

Representing the Exploration History Representing the Graph: • use graph neural networks to learn a representation g : ( G , c ) → R d (node features are concatenated with the one-bit information c t ) • starting from node µ (0) v , update representation via message passing: µ ( l +1) = f ( µ ( l ) v , { e uv , µ ( l ) u } u ∈N ( v ) ), where N is the v neighbor nodes of v and f ( · ) is parameterized by MLP • apply attention weighted-sum to aggregate node embedding • graph representation learned via unsupervised link prediction 8

Representing the Exploration History (continued) Representing the History (graph external memory): • summarize representation up to the current step via auto-regressive aggregation parameterized as F ( h t ) = LSTM( F ( h t − 1 , g ( G t , c t ))). 9

Toy Problem: Erdos-Renyi Random Graph • blue node indicates starting point; darker colors represent more visit counts • the proposed algorithm explores the graph more efficiently 10

Toy Problem: 2D Maze • given fixed budget ( T = 36), the agent is trained to traverse the 6x6 maze as much as possible • test on held-out mazes from the same distribution 11

Program Checking • data generated by program synthesizer • learned exploration strategy is comparable or better than expert-designed heuristic algorithm 12

Limitation and Future Directions Limitation: • cannot scale to large programs • requires reasonable large amount of training data Possible Extensions: • reuse computation for efficient representation • RL-based approximation for other NP-complete problems 13

Reference H. Dai, Y. Li, C. Wang, R. Singh, P.-S. Huang, and P. Kohli. Learning transferable graph exploration. arXiv preprint arXiv:1910.12980 , 2019. L. Lov´ asz et al. Random walks on graphs: A survey. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning , pages 1928–1937, 2016. 14

Learning Transferable Graph Exploration Hanjun Dai, Yujia Li, - PowerPoint PPT Presentation

Learning Transferable Graph Exploration Hanjun Dai, Yujia Li, Chenglong Wang, Rishabh Singh, Po-Sen Huang, Pushmeet Kohli 33rd Conference on Neural Information Processing Systems, Vancouver, Canada. November 15, 2019 1 State-space Coverage

Transferable Utility Game Theory Course: Jackson, Leyton-Brown & Shoham Game Theory Course:

Identifying and Showcasing Your Transferable Skills Maggie Evans, Ph.D. July 12, 2018 Learning

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

TRANSFERABLE SKILLS A PRESENTATION TO THE NATIONAL BLACK MBA ASSOCIATION, INC. ATLANTA CHAPTER

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Anonymous Graph Exploration with Binoculars Jrmie Chalopin Emmanuel Godard Antoine Naudin

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Graph Exploration w/ Neo4j 1

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott

learning: defense, transferable and camouflaged attacks Xingjun Ma School of Computing and

Learning Transferable Architectures for Scalable Image Recognition - Barret Zoph, Vijay Vasudevan,

Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads Gina

Heat Transfer 11-1a1 Conduction R = L Fouriers Law of Conduction Thermal Resistance: kA

Digital Currency Economics and Policy Conclusion Bernard Yeung President, ABFER Dean

An Explicit Shadow Price for the Growth-Optimal Portfolio with Transaction Costs Johannes

The CS Model Becker introduced transferable utility model of marriage market 30 years ago.

Ranking-Based Voting How to Describe . . . Revisited: Maximum Utility-Based Decision . . . How

Transferable Adversarial Examples: Insights, A9acks & Defenses June 12 th 2017 Florian

CREATIVE CODING what why how how how how _will_chase _philly_dataviz_meetup 2019-10-29

Sambuz

Useful Links

Newsletter

Mail Us

Learning Transferable Graph Exploration Hanjun Dai, Yujia Li, - PowerPoint PPT Presentation

Learning Transferable Graph Exploration Hanjun Dai, Yujia Li, Chenglong Wang, Rishabh Singh, Po-Sen Huang, Pushmeet Kohli 33rd Conference on Neural Information Processing Systems, Vancouver, Canada. November 15, 2019 1 State-space Coverage

Transferable Utility Game Theory Course: Jackson, Leyton-Brown &amp; Shoham Game Theory Course:

Identifying and Showcasing Your Transferable Skills Maggie Evans, Ph.D. July 12, 2018 Learning

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

TRANSFERABLE SKILLS A PRESENTATION TO THE NATIONAL BLACK MBA ASSOCIATION, INC. ATLANTA CHAPTER

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Anonymous Graph Exploration with Binoculars Jrmie Chalopin Emmanuel Godard Antoine Naudin

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Graph Exploration w/ Neo4j 1

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott

learning: defense, transferable and camouflaged attacks Xingjun Ma School of Computing and

Learning Transferable Architectures for Scalable Image Recognition - Barret Zoph, Vijay Vasudevan,

Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads Gina

Heat Transfer 11-1a1 Conduction R = L Fouriers Law of Conduction Thermal Resistance: kA

Digital Currency Economics and Policy Conclusion Bernard Yeung President, ABFER Dean

An Explicit Shadow Price for the Growth-Optimal Portfolio with Transaction Costs Johannes

The CS Model Becker introduced transferable utility model of marriage market 30 years ago.

Ranking-Based Voting How to Describe . . . Revisited: Maximum Utility-Based Decision . . . How

Transferable Adversarial Examples: Insights, A9acks &amp; Defenses June 12 th 2017 Florian

CREATIVE CODING what why how how how how _will_chase _philly_dataviz_meetup 2019-10-29

Sambuz

Useful Links

Newsletter

Mail Us

Transferable Utility Game Theory Course: Jackson, Leyton-Brown & Shoham Game Theory Course:

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Transferable Adversarial Examples: Insights, A9acks & Defenses June 12 th 2017 Florian