language understanding for text based games using deep
play

Language Understanding for Text-based Games Using Deep - PowerPoint PPT Presentation

Language Understanding for Text-based Games Using Deep Reinforcement Learning Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay MIT Text-based games (State 1: The old bridge) You are standing very close to the bridges eastern


  1. Language Understanding for Text-based Games Using Deep Reinforcement Learning Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay MIT

  2. Text-based games (State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind. >> go east (State 2: Ruined gatehouse) The old gatehouse is near collapse. Part of its northern wall has already fallen down ... East of the gatehouse leads out to a small open area surrounded by the remains of the castle. … MUDs: predecessors to modern graphical games

  3. Why are they challenging? (State 1: The old bridge) Loca%on: Bridge 1 You are standing very close to the bridge’s eastern founda<on. Wind level: 3 If you go east you will be back Time : 8pm on solid ground ... The bridge sways in the wind. Branavan et al., 2011 No symbolic representation available

  4. Can a computer understand language enough in order to play these games? Understanding Actionable intelligence ≈

  5. Can a computer understand language enough in order to play these games? Inspiration: Playing graphical games directly from raw pixels (DeepMind)

  6. Our Approach Reinforcement Learning utilizing in-game feedback to: ✦ Learn control policies for gameplay. ✦ Learn good representations for text description of game state.

  7. Traditional RL framework s 1 s 2 s 3 s t … a 1 a 2 a 3 Reward Q ( s, a ) Loca%on: Bridge 1 s = Wind level: 3 Q-value is the agent’s Time : 8pm notion of discounted future reward

  8. Text-based games s 1 s 2 s 3 s t … a 1 a 2 a 3 Reward (State 1: The old bridge) Loca%on: Bridge 1 You are standing very close to s = Wind level: 3 the bridge’s eastern founda<on. If you go east you Time : 8pm will be back on solid ground ...

  9. Text-based games: BOW representation s 1 s 2 s 3 s t … a 1 a 2 a 3 Reward   0 (State 1: The old bridge) 1     You are standing very close to 0 s =     . the bridge’s eastern .   .   founda<on. If you go east you 0 will be back on solid ground ... Bag of words?

  10.   0 1     0 Input text   Q Control policy T   . .   .   0 Bag of words Can we do better?

  11. Model Q values Input text Q for all T commands v Recurrent NN to map text to vector representation

  12. Model NN for control policy Q values Input text Q for all T commands v Recurrent NN to map text to vector representation

  13. LSTM-DQN Q(s, o) Q(s, a) Linear Linear Action-Object φ A Scorer ReLU Linear v s Mean Pooling Representation φ R LSTM LSTM LSTM LSTM Generator w 2 w 3 w 1 w n

  14. Algorithm (1) (State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If Q Q(s,a) you go east you will be back on solid ground ... The bridge sways in the wind. Obtain Q-values

  15. Algorithm (2) (State 1: The old bridge) You are standing very a* close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind. Take action using -greedy ✏

  16. Algorithm (3) (State 1: The old bridge) (State 2: Ruined gatehouse) You are standing very a* close to the bridge’s The old gatehouse is near eastern founda<on. If collapse. Part of its northern you go east you will be wall has already fallen back on solid ground ... down ... East of the The bridge sways in the gatehouse leads out … wind. + reward

  17. Algorithm (4) (State 1: The old (State 2: Ruined gatehouse) bridge) a The old gatehouse is near You are standing collapse. Part of its very close to the northern wall has already bridge’s eastern fallen down ... East of the founda<on. If you Sample transitions gatehouse leads out … go east you will be ∼ + for updates reward . . . Store transition in experience memory

  18. Parameter update (State 1: The old bridge) (State 2: Ruined gatehouse) You are standing very a* close to the bridge’s The old gatehouse is near eastern founda<on. If collapse. Part of its you go east you will northern wall has already be back on solid fallen down ... East of the ground ... The bridge gatehouse leads out … sways in the wind. + reward r θ i L i ( θ i ) = E ˆ a [2( y i � Q (ˆ s, ˆ a ; θ i )) r θ i Q (ˆ s, ˆ a ; θ i )] s, ˆ where a 0 Q ( s 0 , a 0 ; θ i � 1 ) | ˆ y i = E ˆ a [ r + γ max s, ˆ a ] s, ˆ

  19. Game Environment Evennia : a highly extensible python framework for MUD games Two worlds: ✦ small game to demonstrate task and analyze learnt representations. ✦ a pre-existing Fantasy world.

  20. Home World Number of different quests: 16 • Vocabulary: 84 words • Words per description (avg.): 10.5 • Multiple descriptions per room/object. •

  21. Home World This room has two sofas, chairs and a chandelier. You are not sleepy now but you are hungry now. > go east

  22. Home World This area has plants, grass and rabbits. You are not sleepy now but you are hungry now. > go south

  23. Home World Reward: +1 You have arrived in the kitchen. You can find food and drinks here. You are not sleepy now but you are hungry now. > eat apple

  24. Fantasy World (State 1: The old bridge) • Number of rooms: > 56 You are standing very close to • Vocabulary: 1340 words the bridge’s eastern founda<on. If you go east • Avg. no. of words/description: 65.21 you will be back on solid • Max descriptions per room: 100 ground ... The bridge sways in the wind. • Considerably more complex • Varying descriptions per state created by game developers

  25. Evaluation Two metrics: ✦ Quest completion ✦ Cumulative reward per episode • Positive rewards for quest fulfillment • Negative rewards for bad actions Epoch : Training for n episodes followed by evaluation on n episodes

  26. Baselines • Randomly select actions • Bag of words: unigrams and bigrams   0 1     0 Input text   Q Q values T   . .   .   0

  27. Agent Performance (Home) Random agent performs poorly

  28. Agent Performance (Home) LSTM-DQN has delayed performance jump

  29. Agent Performance (Fantasy) Good representation is essential for successful gameplay

  30. Visualizing Learnt Representations “Kitchen” “Bedroom” “Living room” “Garden” t-SNE visualization of vectors learnt by agent on Home world

  31. Visualizing Learnt Representations “Kitchen” “Bedroom” “Garden” “Living room” “Garden” t-SNE visualization of vectors learnt by agent on Home world

  32. Nearby states: Similar representations

  33. Transfer Learning (Home) Play on world with same vocabulary but different physical configuration

  34. Conclusions ‣ Addressed the task of end-to-end learning of control policies for textual games. ‣ Learning good representations for text is essential for gameplay. Code and game framework are available at: http://people.csail.mit.edu/karthikn/mud-play/ 34

Recommend


More recommend