Learning to Win by Reading Manuals in a Monte-Carlo Framework - PowerPoint PPT Presentation

Learning to Win by Reading Manuals in a Monte-Carlo Framework S.R.K. Branavan, David Silver, Regina Barzilay MIT

Semantic Interpretation Traditional view: Map text into an abstract representation Alternative view: Map text into a representation which helps performance in a control application 2

Semantic Interpretation for Control Applications Complex strategy game End result action 1 lost action 2 won action 3 lost Traditional approach: Learn action-selection policy from game feedback. Our contribution: Use textual advice to guide action-selection policy. 3

Leveraging Textual Advice: Challenges 1. Find sentences relevant to given game state. Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. city settler Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow … 4

Leveraging Textual Advice: Challenges 1. Find sentences relevant to given game state. Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. city settler Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your cit y. In order to survive and grow … 5

Leveraging Textual Advice: Challenges 1. Find sentences relevant to given game state. Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. city settler Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow … settler 6

Leveraging Textual Advice: Challenges 2. Label sentences with predicate stucture. ? Move the settler to a site suitable move_settl move_settlers_to ers_to () for building a city , onto grassland ? settlers_build_ci settlers_b uild_city ty () with a river if possible. Move the settler to a site suitable for building a city, onto grassland move_settl move_settlers_to ers_to () with a river if possible. Label words as action, state or background 7

Leveraging Textual Advice: Challenges 3. Guide action selection using relevant text a 1 – move_settlers_to(7,3) Build the city on plains or grassland a 2 – settlers_build_city() with a river running through it if possible. S a 3 – settlers_irrigate_land() 8

Learning from Game Feedback Goal: Learn from game feedback as only source of supervision. Key idea: Better parameter settings will lead to more victories. Game manual a 1 You start with two settler units. Although Model End result settlers are capable of performing a variety of useful tasks, your first task is won to move the settlers to a site that is a 2 params: suitable for the construction of your first city. Use settlers to build the city on θ 1 plains or grassland with a river running through it if possible. In order to survive S a 3 and grow … Game manual a 1 Model You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is a 2 params: to move the settlers to a site that is suitable for the construction of your first End result city. Use settlers to build the city on θ 2 plains or grassland with a river running through it if possible. In order to survive lost S a 3 and grow … 9

Model Overview Monte-Carlo Search Framework • Learn action selection policy from simulations • Very successful in complex games like Go and Poker. Our Algorithm • Learn text interpretation from simulation feedback • Bias action selection policy using text 10

Monte-Carlo Search Select actions via simulations, game and opponent can be stochastic Actual Game Simulation Copy game State 1 State 1 Copy Irrigate ??? Game lost 11

Monte-Carlo Search Try many candidate actions from current state & see how well they perform. Current game state State 1 Rollout depth Game scores 0.1 0.4 1.2 3.5 3.5 3.5 3.5 12

Monte-Carlo Search Try many candidate actions from current state & see how well they perform. Learn feature weights from simulation outcomes Current game state State 1 5 1 0 1 1 0 1 = 0.1 15 0 1 0 0 1 0 = 0.4 Rollout depth 37 1 0 1 0 0 0 = 1.2 . . . . . . . . . - feature function - model parameters Game scores 0.1 0.4 1.2 3.5 13

Model Overview Monte-Carlo Search Framework • Learn action selection policy from simulations Our Algorithm • Bias action selection policy using text • Learn text interpretation from simulation feedback 14

Modeling Requirements • Identify sentence relevant to game state Build cities near rivers or ocean. • Label sentence with predicate structure Build cities near rivers or ocean. Build cities near rivers or ocean. • Estimate value of candidate actions Irrigate : -10 Build cities -5 Fortify : near rivers . . . . or ocean. Build city : 25 15

1 Sentence Relevance 2 3 Identify sentence relevant to game state and action State , candidate action , document Sentence is selected as relevant - weight vector Log-linear model: - feature function 16

1 Predicate Structure 2 Select word labels based on sentence + dependency info 3 E.g., “ Build cities near rivers or ocean. ” Word index , sentence , dependency info Predicate label = { action , state , background } - weight vector Log-linear model: - feature function 17

1 Final Q function approximation 2 3 Predict expected value of candidate action State , candidate action Document , relevant sentence , predicate labeling - weight vector Linear model: - feature function 18

Model Representation Multi-layer neural network: Each layer represents a different stage of analysis Q function approximation Input: game state, Predicted action value candidate action, document text Select most relevant sentence Predict sentence predicate structure 19

Parameter Estimation Objective : Minimize mean square error between predicted utility and observed utility Game rollout 25 State Action Predicted utility: Observed utility: 20

Parameter Estimation Method : Gradient descent – i.e., Backpropagation. Parameter updates : 21

Features State features: - Amount of gold in treasury - Government type - Terrain surrounding current unit Action features: - Unit type (settler, worker, archer, etc) - Unit action type Text features: - Word - Parent word in dependency tree - Word matches text label of unit 22

Experimental Domain Game: • Complex, stochastic turn-based strategy game Civilization II. • Branching factor: 10 20 Document: • Official game manual of Civilization II Text Statistics: Sentences: 2083 Avg. sentence words: 16.7 Vocabulary: 3638 23

Experimental Setup Game opponent: • Built-in AI of Game. • Domain knowledge rich AI, built to challenge humans. Primary evaluation: • Games won within first 100 game steps. • Averaged over 200 independent experiments. • Avg. experiment runtime: 1.5 hours Secondary evaluation: • Full games won. • Averaged over 50 independent experiments. • Avg. experiment runtime: 4 hours 24

Results Built-in AI 0% Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 25

Does Text Help ? Linear Q fn. Built-in AI 0% approximation, No text Game only Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 26

Text vs. Representational Capacity Built-in AI 0% Non-Linear Q fn. Game only approximation, No text Latent variable Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 27

Linguistic Complexity vs. Performance Gain Built-in AI 0% Game only Latent variable Sentence relevance Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 28

Results: Sentence Relevance Problem: Sentence relevance depends on game state. States are game specific, and not known a priori! Solution: Add known non-relevant sentences to text. E.g., sentences from the Wall Street Journal corpus. Results: 71.8% sentence relevance accuracy… Surprisingly poor accuracy given game win rate! 29

Results: Sentence Relevance Sentence relevance accuracy Game features Text feature importance Text features 30

Results: Full Games Game only Latent variable Full model 0% 20% 40% 60% 80% 100% Percentage games won, averaged over 50 runs 31

Learning to Win by Reading Manuals in a Monte-Carlo Framework - PowerPoint PPT Presentation

Learning to Win by Reading Manuals in a Monte-Carlo Framework S.R.K. Branavan, David Silver, Regina Barzilay MIT Semantic Interpretation Traditional view: Map text into an abstract representation Alternative view: Map text into a

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Techniques in Artificial Intelligence - Part I Todd W. Neller Gettysburg College Monte Carlo

Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree

An Industrial Waste Heat Win-Win! Ray Deyoe Managing Director Integral Power, LLC An Industrial

You Can Be an Energy Solutions Partner - ESP 1 Its a Win -Win-Win or (Win 3 ) Customer - ESP -

INNOVATIVE THINKING AT WORK WIN+WIN Jorge Bugallo COACHING WIN+WIN COACHING Cell phone: +34

Win/Win Heifer Grazing Hayden Dore Veterinarian Vet South 1 Win/Win Heifer Grazing Owner

TM Announcement, The Next Generation of POWER7 Power Systems Power your planet. February 25,

Recent Advances in Finite Element Methods for Structural Acoustics Dr. Saikat Dey Code 7130,

Memory-Efficient Heterogeneous Speech Recognition Hybrid in GPU-Equipped Mobile Devices Alexei

Vessel slowdown trial: Info session breakfast Presented by: Vancouver Fraser Port Authority

Final Action Hearing RCW 8.25.290 Resolution No. 1159 Authorization to acquire property

Student Achievement Selected Principles: 8.1 and 8.2.b Significant revisions in the 2018

Tideflats Steering Committee November 7, 2019 S TEERING C OMMITTEE A GENDA Communication Items

ACTIVE ADULT CENTER AND CITYWIDE RECREATION CENTER FEASIBILITY STUDY SUMMARY TO DATE CITY

Learning to Win by Reading Manuals in a Monte-Carlo Framework - PowerPoint PPT Presentation

Learning to Win by Reading Manuals in a Monte-Carlo Framework S.R.K. Branavan, David Silver, Regina Barzilay MIT Semantic Interpretation Traditional view: Map text into an abstract representation Alternative view: Map text into a

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Techniques in Artificial Intelligence - Part I Todd W. Neller Gettysburg College Monte Carlo

Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree

An Industrial Waste Heat Win-Win! Ray Deyoe Managing Director Integral Power, LLC An Industrial

You Can Be an Energy Solutions Partner - ESP 1 Its a Win -Win-Win or (Win 3 ) Customer - ESP -

INNOVATIVE THINKING AT WORK WIN+WIN Jorge Bugallo COACHING WIN+WIN COACHING Cell phone: +34

Win/Win Heifer Grazing Hayden Dore Veterinarian Vet South 1 Win/Win Heifer Grazing Owner

TM Announcement, The Next Generation of POWER7 Power Systems Power your planet. February 25,

Recent Advances in Finite Element Methods for Structural Acoustics Dr. Saikat Dey Code 7130,

Memory-Efficient Heterogeneous Speech Recognition Hybrid in GPU-Equipped Mobile Devices Alexei

Vessel slowdown trial: Info session breakfast Presented by: Vancouver Fraser Port Authority

Final Action Hearing RCW 8.25.290 Resolution No. 1159 Authorization to acquire property

Student Achievement Selected Principles: 8.1 and 8.2.b Significant revisions in the 2018

Tideflats Steering Committee November 7, 2019 S TEERING C OMMITTEE A GENDA Communication Items

ACTIVE ADULT CENTER AND CITYWIDE RECREATION CENTER FEASIBILITY STUDY SUMMARY TO DATE CITY

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.