learning to win by reading manuals in a monte carlo
play

Learning to Win by Reading Manuals in a Monte-Carlo Framework - PowerPoint PPT Presentation

Learning to Win by Reading Manuals in a Monte-Carlo Framework S.R.K. Branavan, David Silver, Regina Barzilay MIT Semantic Interpretation Traditional view: Map text into an abstract representation Alternative view: Map text into a


  1. Learning to Win by Reading Manuals in a Monte-Carlo Framework S.R.K. Branavan, David Silver, Regina Barzilay MIT

  2. Semantic Interpretation Traditional view: Map text into an abstract representation Alternative view: Map text into a representation which helps performance in a control application 2

  3. Semantic Interpretation for Control Applications Complex strategy game End result action 1 lost action 2 won action 3 lost Traditional approach: Learn action-selection policy from game feedback. Our contribution: Use textual advice to guide action-selection policy. 3

  4. Leveraging Textual Advice: Challenges 1. Find sentences relevant to given game state. Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. city settler Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow … 4

  5. Leveraging Textual Advice: Challenges 1. Find sentences relevant to given game state. Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. city settler Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your cit y. In order to survive and grow … 5

  6. Leveraging Textual Advice: Challenges 1. Find sentences relevant to given game state. Game state Strategy document You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. city settler Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow … settler 6

  7. Leveraging Textual Advice: Challenges 2. Label sentences with predicate stucture. ? Move the settler to a site suitable move_settl move_settlers_to ers_to () for building a city , onto grassland ? settlers_build_ci settlers_b uild_city ty () with a river if possible. Move the settler to a site suitable for building a city, onto grassland move_settl move_settlers_to ers_to () with a river if possible. Label words as action, state or background 7

  8. Leveraging Textual Advice: Challenges 3. Guide action selection using relevant text a 1 – move_settlers_to(7,3) Build the city on plains or grassland a 2 – settlers_build_city() with a river running through it if possible. S a 3 – settlers_irrigate_land() 8

  9. Learning from Game Feedback Goal: Learn from game feedback as only source of supervision. Key idea: Better parameter settings will lead to more victories. Game manual a 1 You start with two settler units. Although Model End result settlers are capable of performing a variety of useful tasks, your first task is won to move the settlers to a site that is a 2 params: suitable for the construction of your first city. Use settlers to build the city on θ 1 plains or grassland with a river running through it if possible. In order to survive S a 3 and grow … Game manual a 1 Model You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is a 2 params: to move the settlers to a site that is suitable for the construction of your first End result city. Use settlers to build the city on θ 2 plains or grassland with a river running through it if possible. In order to survive lost S a 3 and grow … 9

  10. Model Overview Monte-Carlo Search Framework • Learn action selection policy from simulations • Very successful in complex games like Go and Poker. Our Algorithm • Learn text interpretation from simulation feedback • Bias action selection policy using text 10

  11. Monte-Carlo Search Select actions via simulations, game and opponent can be stochastic Actual Game Simulation Copy game State 1 State 1 Copy Irrigate ??? Game lost 11

  12. Monte-Carlo Search Try many candidate actions from current state & see how well they perform. Current game state State 1 Rollout depth Game scores 0.1 0.4 1.2 3.5 3.5 3.5 3.5 12

  13. Monte-Carlo Search Try many candidate actions from current state & see how well they perform. Learn feature weights from simulation outcomes Current game state State 1 5 1 0 1 1 0 1 = 0.1 15 0 1 0 0 1 0 = 0.4 Rollout depth 37 1 0 1 0 0 0 = 1.2 . . . . . . . . . - feature function - model parameters Game scores 0.1 0.4 1.2 3.5 13

  14. Model Overview Monte-Carlo Search Framework • Learn action selection policy from simulations Our Algorithm • Bias action selection policy using text • Learn text interpretation from simulation feedback 14

  15. Modeling Requirements • Identify sentence relevant to game state Build cities near rivers or ocean. • Label sentence with predicate structure Build cities near rivers or ocean. Build cities near rivers or ocean. • Estimate value of candidate actions Irrigate : -10 Build cities -5 Fortify : near rivers . . . . or ocean. Build city : 25 15

  16. 1 Sentence Relevance 2 3 Identify sentence relevant to game state and action State , candidate action , document Sentence is selected as relevant - weight vector Log-linear model: - feature function 16

  17. 1 Predicate Structure 2 Select word labels based on sentence + dependency info 3 E.g., “ Build cities near rivers or ocean. ” Word index , sentence , dependency info Predicate label = { action , state , background } - weight vector Log-linear model: - feature function 17

  18. 1 Final Q function approximation 2 3 Predict expected value of candidate action State , candidate action Document , relevant sentence , predicate labeling - weight vector Linear model: - feature function 18

  19. Model Representation Multi-layer neural network: Each layer represents a different stage of analysis Q function approximation Input: game state, Predicted action value candidate action, document text Select most relevant sentence Predict sentence predicate structure 19

  20. Parameter Estimation Objective : Minimize mean square error between predicted utility and observed utility Game rollout 25 State Action Predicted utility: Observed utility: 20

  21. Parameter Estimation Method : Gradient descent – i.e., Backpropagation. Parameter updates : 21

  22. Features State features: - Amount of gold in treasury - Government type - Terrain surrounding current unit Action features: - Unit type (settler, worker, archer, etc) - Unit action type Text features: - Word - Parent word in dependency tree - Word matches text label of unit 22

  23. Experimental Domain Game: • Complex, stochastic turn-based strategy game Civilization II. • Branching factor: 10 20 Document: • Official game manual of Civilization II Text Statistics: Sentences: 2083 Avg. sentence words: 16.7 Vocabulary: 3638 23

  24. Experimental Setup Game opponent: • Built-in AI of Game. • Domain knowledge rich AI, built to challenge humans. Primary evaluation: • Games won within first 100 game steps. • Averaged over 200 independent experiments. • Avg. experiment runtime: 1.5 hours Secondary evaluation: • Full games won. • Averaged over 50 independent experiments. • Avg. experiment runtime: 4 hours 24

  25. Results Built-in AI 0% Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 25

  26. Does Text Help ? Linear Q fn. Built-in AI 0% approximation, No text Game only Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 26

  27. Text vs. Representational Capacity Built-in AI 0% Non-Linear Q fn. Game only approximation, No text Latent variable Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 27

  28. Linguistic Complexity vs. Performance Gain Built-in AI 0% Game only Latent variable Sentence relevance Full model 0% 20% 40% 60% % games won in 100 turns, averaged over 200 runs. 28

  29. Results: Sentence Relevance Problem: Sentence relevance depends on game state. States are game specific, and not known a priori! Solution: Add known non-relevant sentences to text. E.g., sentences from the Wall Street Journal corpus. Results: 71.8% sentence relevance accuracy… Surprisingly poor accuracy given game win rate! 29

  30. Results: Sentence Relevance Sentence relevance accuracy Game features Text feature importance Text features 30

  31. Results: Full Games Game only Latent variable Full model 0% 20% 40% 60% 80% 100% Percentage games won, averaged over 50 runs 31

Recommend


More recommend