Language Understanding for Text-based Games Using Deep - PowerPoint PPT Presentation

Language Understanding for Text-based Games Using Deep Reinforcement Learning Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay MIT

Text-based games (State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind. >> go east (State 2: Ruined gatehouse) The old gatehouse is near collapse. Part of its northern wall has already fallen down ... East of the gatehouse leads out to a small open area surrounded by the remains of the castle. … MUDs: predecessors to modern graphical games

Why are they challenging? (State 1: The old bridge) Loca%on: Bridge 1 You are standing very close to the bridge’s eastern founda<on. Wind level: 3 If you go east you will be back Time : 8pm on solid ground ... The bridge sways in the wind. Branavan et al., 2011 No symbolic representation available

Can a computer understand language enough in order to play these games? Understanding Actionable intelligence ≈

Can a computer understand language enough in order to play these games? Inspiration: Playing graphical games directly from raw pixels (DeepMind)

Our Approach Reinforcement Learning utilizing in-game feedback to: ✦ Learn control policies for gameplay. ✦ Learn good representations for text description of game state.

Traditional RL framework s 1 s 2 s 3 s t … a 1 a 2 a 3 Reward Q ( s, a ) Loca%on: Bridge 1 s = Wind level: 3 Q-value is the agent’s Time : 8pm notion of discounted future reward

Text-based games s 1 s 2 s 3 s t … a 1 a 2 a 3 Reward (State 1: The old bridge) Loca%on: Bridge 1 You are standing very close to s = Wind level: 3 the bridge’s eastern founda<on. If you go east you Time : 8pm will be back on solid ground ...

Text-based games: BOW representation s 1 s 2 s 3 s t … a 1 a 2 a 3 Reward   0 (State 1: The old bridge) 1     You are standing very close to 0 s =     . the bridge’s eastern .   .   founda<on. If you go east you 0 will be back on solid ground ... Bag of words?

  0 1     0 Input text   Q Control policy T   . .   .   0 Bag of words Can we do better?

Model Q values Input text Q for all T commands v Recurrent NN to map text to vector representation

Model NN for control policy Q values Input text Q for all T commands v Recurrent NN to map text to vector representation

LSTM-DQN Q(s, o) Q(s, a) Linear Linear Action-Object φ A Scorer ReLU Linear v s Mean Pooling Representation φ R LSTM LSTM LSTM LSTM Generator w 2 w 3 w 1 w n

Algorithm (1) (State 1: The old bridge) You are standing very close to the bridge’s eastern founda<on. If Q Q(s,a) you go east you will be back on solid ground ... The bridge sways in the wind. Obtain Q-values

Algorithm (2) (State 1: The old bridge) You are standing very a* close to the bridge’s eastern founda<on. If you go east you will be back on solid ground ... The bridge sways in the wind. Take action using -greedy ✏

Algorithm (3) (State 1: The old bridge) (State 2: Ruined gatehouse) You are standing very a* close to the bridge’s The old gatehouse is near eastern founda<on. If collapse. Part of its northern you go east you will be wall has already fallen back on solid ground ... down ... East of the The bridge sways in the gatehouse leads out … wind. + reward

Algorithm (4) (State 1: The old (State 2: Ruined gatehouse) bridge) a The old gatehouse is near You are standing collapse. Part of its very close to the northern wall has already bridge’s eastern fallen down ... East of the founda<on. If you Sample transitions gatehouse leads out … go east you will be ∼ + for updates reward . . . Store transition in experience memory

Parameter update (State 1: The old bridge) (State 2: Ruined gatehouse) You are standing very a* close to the bridge’s The old gatehouse is near eastern founda<on. If collapse. Part of its you go east you will northern wall has already be back on solid fallen down ... East of the ground ... The bridge gatehouse leads out … sways in the wind. + reward r θ i L i ( θ i ) = E ˆ a [2( y i � Q (ˆ s, ˆ a ; θ i )) r θ i Q (ˆ s, ˆ a ; θ i )] s, ˆ where a 0 Q ( s 0 , a 0 ; θ i � 1 ) | ˆ y i = E ˆ a [ r + γ max s, ˆ a ] s, ˆ

Game Environment Evennia : a highly extensible python framework for MUD games Two worlds: ✦ small game to demonstrate task and analyze learnt representations. ✦ a pre-existing Fantasy world.

Home World Number of different quests: 16 • Vocabulary: 84 words • Words per description (avg.): 10.5 • Multiple descriptions per room/object. •

Home World This room has two sofas, chairs and a chandelier. You are not sleepy now but you are hungry now. > go east

Home World This area has plants, grass and rabbits. You are not sleepy now but you are hungry now. > go south

Home World Reward: +1 You have arrived in the kitchen. You can find food and drinks here. You are not sleepy now but you are hungry now. > eat apple

Fantasy World (State 1: The old bridge) • Number of rooms: > 56 You are standing very close to • Vocabulary: 1340 words the bridge’s eastern founda<on. If you go east • Avg. no. of words/description: 65.21 you will be back on solid • Max descriptions per room: 100 ground ... The bridge sways in the wind. • Considerably more complex • Varying descriptions per state created by game developers

Evaluation Two metrics: ✦ Quest completion ✦ Cumulative reward per episode • Positive rewards for quest fulfillment • Negative rewards for bad actions Epoch : Training for n episodes followed by evaluation on n episodes

Baselines • Randomly select actions • Bag of words: unigrams and bigrams   0 1     0 Input text   Q Q values T   . .   .   0

Agent Performance (Home) Random agent performs poorly

Agent Performance (Home) LSTM-DQN has delayed performance jump

Agent Performance (Fantasy) Good representation is essential for successful gameplay

Visualizing Learnt Representations “Kitchen” “Bedroom” “Living room” “Garden” t-SNE visualization of vectors learnt by agent on Home world

Visualizing Learnt Representations “Kitchen” “Bedroom” “Garden” “Living room” “Garden” t-SNE visualization of vectors learnt by agent on Home world

Nearby states: Similar representations

Transfer Learning (Home) Play on world with same vocabulary but different physical configuration

Conclusions ‣ Addressed the task of end-to-end learning of control policies for textual games. ‣ Learning good representations for text is essential for gameplay. Code and game framework are available at: http://people.csail.mit.edu/karthikn/mud-play/ 34

Language Understanding for Text-based Games Using Deep - PowerPoint PPT Presentation

Language Understanding for Text-based Games Using Deep Reinforcement Learning Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay MIT Text-based games (State 1: The old bridge) You are standing very close to the bridges eastern

Deep Learning for Text analysis Jan Platos 2018-09-09 Table of Contents Natural Language

T.B.A.G. a (t)ext (b)ased (a)dventure (g)ame language Intro Optimized for text based adventure

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding ( B idirectional

Language Technologies Goal: Deep Understanding Reality: Shallow

Deep Neural Networks based Text- Dependent Speaker Verification Gautam Bhattacharya, Jahangir

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin,

Deep Learning for Language Understanding (at Google Scale) Anjuli Kannan Software Engineer,

Understanding Rotations Jim Van Verth Senior Engine Programmer, Insomniac Games

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin

Concordia Language Villages Language Class Activities and Games VOCABULARY GAMES from Utah, etc.

Convolutional Neural Networks for Language CS 6956: Deep Learning for NLP Features from text

Deep Neural Networks and Hidden Markov Models in i-vector-based Text-Dependent Speaker

Deep Bidirectional Transformers for Language Understanding Source : NAACL-HLT 2019 Speaker :

Out of Sight Navigation and Immersion of Blind Players in Text-Based Games Katharina Spiel

Bayesian Language Games Unifying and evaluating agent-based models of horizontal and vertical

What is text alignment? Text alignment is the comparison of two or more parallel texts It

Text Summarisation for Evidence Based Medicine Diego Moll a Centre for Language Technology,

Deep Learning for Natural Language Processing (in 2 hours) Eneko Agirre

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

Toward an Understanding of C++ Writing and Understanding C++ Writing programs in any language

Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016)

Understanding COVID-19 TEXT LAYOUT 2 1 Understanding COVID-19 TEXT LAYOUT 2 2 Hard Measures

Towards Interpretable Deep Learning for Natural Language Processing Roy Schwartz University of

Overview Background: Who did what to whom is a major focus in natural language understanding,