Modeling Task Effects in Human Reading with Neural Attention - PowerPoint PPT Presentation

Modeling Task Effects in Human Reading with Neural Attention Michael Hahn Frank Keller Stanford University University of Edinburgh mhahn2@stanford.edu keller@inf.ed.ac.uk CUNY Conference 2017 1 / 29

Introduction Eye Movements in Reading Computational Models The NEAT Reading Model Tradeoff Hypothesis Architecture Implementation Evaluation Task Effects in Reading Question Answering Experimental Results Task Differences in NEAT Evaluation 2 / 29

Eye Movements in Reading The two young sea-lions took not the slightest interest in our arrival. They were playing on the jetty, rolling over and tumbling into the water together, entirely ignoring the human beings edging awkwardly round adapted from the Dundee corpus [Kennedy and Pynte, 2005] 3 / 29

Eye Movements in Reading The two young sea-lions took not the slightest interest in our arrival. They were playing on the jetty, rolling over and tumbling into the water together, entirely ignoring the human beings edging awkwardly round adapted from the Dundee corpus [Kennedy and Pynte, 2005] ◮ Fixations static ◮ Saccades take 20–40 ms, no information obtained from text 3 / 29

Eye Movements in Reading The two young sea-lions took not the slightest interest in our arrival. They were playing on the jetty, rolling over and tumbling into the water together, entirely ignoring the human beings edging awkwardly round adapted from the Dundee corpus [Kennedy and Pynte, 2005] ◮ Fixations static ◮ Saccades take 20–40 ms, no information obtained from text ◮ Fixation times vary from ≈ 100 ms to ≈ 300ms 3 / 29

Eye Movements in Reading The two young sea-lions took not the slightest interest in our arrival. They were playing on the jetty, rolling over and tumbling into the water together, entirely ignoring the human beings edging awkwardly round adapted from the Dundee corpus [Kennedy and Pynte, 2005] ◮ Fixations static ◮ Saccades take 20–40 ms, no information obtained from text ◮ Fixation times vary from ≈ 100 ms to ≈ 300ms ◮ ≈ 40 % of words are skipped 3 / 29

Computational Models 1. Models of saccade generation in cognitive psychology: ◮ EZ-Reader [Reichle et al., 1998, 2003, 2009] ◮ SWIFT [Engbert et al., 2002, 2005] 2. Machine learning models trained on eye-tracking data [Nilsson and Nivre, 2009, 2010, Hara et al., 2012, Matthies and Søgaard, 2013] 4 / 29

Computational Models 1. Models of saccade generation in cognitive psychology: ◮ EZ-Reader [Reichle et al., 1998, 2003, 2009] ◮ SWIFT [Engbert et al., 2002, 2005] 2. Machine learning models trained on eye-tracking data [Nilsson and Nivre, 2009, 2010, Hara et al., 2012, Matthies and Søgaard, 2013] These models ◮ require selection of relevant eye-movement features, and ◮ estimate parameters from eye-tracking corpora 4 / 29

Computational Models 1. Models of saccade generation in cognitive psychology: ◮ EZ-Reader [Reichle et al., 1998, 2003, 2009] ◮ SWIFT [Engbert et al., 2002, 2005] 2. Machine learning models trained on eye-tracking data [Nilsson and Nivre, 2009, 2010, Hara et al., 2012, Matthies and Søgaard, 2013] These models ◮ require selection of relevant eye-movement features, and ◮ estimate parameters from eye-tracking corpora 3. Bayesian inference (Bicknell and Levy, 2010) ◮ maximize speed of reading while reliably identifying the text ◮ replicates predictability, frequency effects 4 / 29

Computational Models 1. Models of saccade generation in cognitive psychology: ◮ EZ-Reader [Reichle et al., 1998, 2003, 2009] ◮ SWIFT [Engbert et al., 2002, 2005] 2. Machine learning models trained on eye-tracking data [Nilsson and Nivre, 2009, 2010, Hara et al., 2012, Matthies and Søgaard, 2013] These models ◮ require selection of relevant eye-movement features, and ◮ estimate parameters from eye-tracking corpora 3. Bayesian inference (Bicknell and Levy, 2010) ◮ maximize speed of reading while reliably identifying the text ◮ replicates predictability, frequency effects ◮ not evaluated on wide-coverage reading data ◮ assumes fixed task of word identification 4 / 29

Computational Models: Surprisal Surprisal measures predictability of word w i in context w 1 w 2 ... w i − 1 : Surprisal ( w i ) = − log P ( w i | w 1 ... i − 1 ) (1) 5 / 29

Computational Models: Surprisal Surprisal measures predictability of word w i in context w 1 w 2 ... w i − 1 : Surprisal ( w i ) = − log P ( w i | w 1 ... i − 1 ) (1) ◮ predicts word-by-word reading times [Hale, 2001, McDonald and Shillcock, 2003a,b, Levy, 2008] 5 / 29

Computational Models: Surprisal Surprisal measures predictability of word w i in context w 1 w 2 ... w i − 1 : Surprisal ( w i ) = − log P ( w i | w 1 ... i − 1 ) (1) ◮ predicts word-by-word reading times [Hale, 2001, McDonald and Shillcock, 2003a,b, Levy, 2008] ◮ designed as a model of processing effort, hence can’t explain: ◮ regressions ◮ re-fixations ◮ spillover ◮ skipping ≈ 40 % of words are skipped 5 / 29

Tradeoff Hypothesis Goal Build unsupervised model that accounts for reading times and skipping. 6 / 29

Tradeoff Hypothesis Goal Build unsupervised model that accounts for reading times and skipping. Hypothesis Human reading optimizes a tradeoff between: ◮ Precision of language understanding: Perform a language-related task as well as possible ◮ Economy of attention: Fixate as few words as possible 6 / 29

Tradeoff Hypothesis We assume that the default task in reading is to memorize the text, i.e., to reconstruct the input as accurately as possible. Approach: NEAT (NEural Attention Tradeoff) 1. Develop generic reading architecture integrating ◮ neural language modeling ◮ attention mechanism 2. Train end-to-end to optimize tradeoff between precision and economy 3. Evaluate on human eye-tracking corpus 7 / 29

Architecture w 1 w 2 w 3 R 0 8 / 29

Architecture w 1 A w 2 w 3 P R 1 R 0 ◮ Attention module A shows word to R or skips it 8 / 29

Architecture w 1 A w 2 w 3 P R 1 w 1 R 0 ◮ Attention module A shows word to R or skips it 8 / 29

Architecture w 1 A w 2 w 3 P R 1 w 1 R 0 R 1 ◮ Attention module A shows word to R or skips it 8 / 29

Architecture w 1 A w 2 A w 3 P R 1 P R 2 w 1 R 0 R 1 ◮ Attention module A shows word to R or skips it 8 / 29

Architecture w 1 A w 2 A w 3 P R 1 P R 2 w 1 SKIP R 0 R 1 ◮ Attention module A shows word to R or skips it ◮ R receives special SKIPPED representation when skipping 8 / 29

Architecture w 1 A w 2 A w 3 P R 1 P R 2 w 1 SKIP R 0 R 1 R 2 ◮ Attention module A shows word to R or skips it ◮ R receives special SKIPPED representation when skipping 8 / 29

Architecture w 1 A w 2 A w 3 A P R 1 P R 2 P R 3 w 1 SKIP w 3 R 0 R 1 R 2 R 3 ◮ Attention module A shows word to R or skips it ◮ R receives special SKIPPED representation when skipping 8 / 29

Architecture w 1 A w 2 A w 3 A P R 1 P R 2 P R 3 w 1 SKIP w 3 R 0 R 1 R 2 R 3 Decoder w 1 w 2 w 3 ◮ Decoder tries to reconstruct full text 8 / 29

Architecture w 1 A w 2 A w 3 A P R 1 P R 2 P R 3 w 1 SKIP w 3 R 0 R 1 R 2 R 3 Decoder w 1 w 2 w 3 ◮ Decoder tries to reconstruct full text ◮ Reader, Attention, Decoder implemented as neural networks (LSTM) 8 / 29

Implementing the Tradeoff Hypothesis Training Objective Solve prediction and reconstruction with minimal attention: arg θ min { E w , ω ω [ L ( ω ω | w , θ )+ α ·� ω ω � ℓ 1 ] } ω ω ω loss for prediction + reconstruction number of fixated words 9 / 29

Implementing the Tradeoff Hypothesis Training Objective Solve prediction and reconstruction with minimal attention: arg θ min { E w , ω ω [ L ( ω ω | w , θ )+ α ·� ω ω � ℓ 1 ] } ω ω ω loss for prediction + reconstruction number of fixated words ◮ neural network components trained on newstext ( ≈ 200 million words) ◮ training is unsupervised: no lexicon, grammar, eye-tracking data, etc. required 9 / 29

Evaluation Setup ◮ English section of the Dundee corpus [Kennedy and Pynte, 2005] ◮ 20 texts from The Independent ◮ eye-movement data from ten readers ◮ 360,000 words ◮ Fixation rate: 61.3 % 10 / 29

Evaluation Setup ◮ English section of the Dundee corpus [Kennedy and Pynte, 2005] ◮ 20 texts from The Independent ◮ eye-movement data from ten readers ◮ 360,000 words ◮ Fixation rate: 61.3 % Results ◮ NEAT predicts human fixations with accuracy of 63.7 % (random baseline 52.6 %, supervised models 69.9 %) ◮ surprisal derived from NEAT predicts reading times ◮ NEAT predicts ◮ effects of frequency, length, and predictability ◮ correlations between successive fixations ◮ differential skipping rates across part-of-speech categories 10 / 29

Fixation Rates by POS Categories 80 60 40 20 ADJ ADP ADV CONJ DET NOUNNUM PRON PRT VERB X Human NEAT 11 / 29

Modeling Task Effects in Human Reading with Neural Attention - PowerPoint PPT Presentation

Modeling Task Effects in Human Reading with Neural Attention Michael Hahn Frank Keller Stanford University University of Edinburgh mhahn2@stanford.edu keller@inf.ed.ac.uk CUNY Conference 2017 1 / 29 Introduction Eye Movements in Reading

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Effects and State Liam OConnor CSE, UNSW (and Data61) Term 2 2019 1 Effects State IO

Reading Mastery - Reading Presentation Book A - Grade 5 Reading Mastery - Reading Presentation

Modeling Human Reading with Neural Attention Michael Hahn Frank Keller Stanford University

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Main Effects vs. Simple Effects Scott Fraundorf MLM Reading Group April 7th, 2011 If you want

Identifying beneficial task relations for multi-task learning in deep neural networks Author:

What is Reading? Reading is making meaning from print. PRE READING SKILLS The image

General Reading Strategies For students who love reading and students who will love reading! Our

What does research tell us about norovirus prevention? Laura G. Brown, Ph.D. Food, Water, and

useR Conference 2009 Impact Evaluation of Interventions on Child Health in Nepal Ron Bose PhD

Nudges and Norms: Peer Networks as a Platform for Encouraging Pro-Social Investment in

3/11/2017 CONGENITAL DIAPHRAGMATIC HERNIA Robyn Barst Pediatric PH Research & Mentoring Grant

Instruments: EORTC QLQ-C30 EORTC QLQ-OV28 FOSI Number of questionnaires: Sorafenib

BIOE 301/362 Lecture 2: Leading Causes of Mortality, Ages 0-4 Geoff Preidis MD/PhD candidate

Introductjon to EHR Data Quality Nicole G Weiskopf, 8/21/18 Learning Objectjves What is data

Shawna D Nesbitt MD, MS Associate Professor Cardiology Division, Hypertension Section Associate

Modeling Task Effects in Human Reading with Neural Attention - PowerPoint PPT Presentation

Modeling Task Effects in Human Reading with Neural Attention Michael Hahn Frank Keller Stanford University University of Edinburgh mhahn2@stanford.edu keller@inf.ed.ac.uk CUNY Conference 2017 1 / 29 Introduction Eye Movements in Reading

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Effects and State Liam OConnor CSE, UNSW (and Data61) Term 2 2019 1 Effects State IO

Reading Mastery - Reading Presentation Book A - Grade 5 Reading Mastery - Reading Presentation

Modeling Human Reading with Neural Attention Michael Hahn Frank Keller Stanford University

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Main Effects vs. Simple Effects Scott Fraundorf MLM Reading Group April 7th, 2011 If you want

Identifying beneficial task relations for multi-task learning in deep neural networks Author:

What is Reading? Reading is making meaning from print. PRE READING SKILLS The image

General Reading Strategies For students who love reading and students who will love reading! Our

What does research tell us about norovirus prevention? Laura G. Brown, Ph.D. Food, Water, and

useR Conference 2009 Impact Evaluation of Interventions on Child Health in Nepal Ron Bose PhD

Nudges and Norms: Peer Networks as a Platform for Encouraging Pro-Social Investment in

3/11/2017 CONGENITAL DIAPHRAGMATIC HERNIA Robyn Barst Pediatric PH Research &amp; Mentoring Grant

Instruments: EORTC QLQ-C30 EORTC QLQ-OV28 FOSI Number of questionnaires: Sorafenib

BIOE 301/362 Lecture 2: Leading Causes of Mortality, Ages 0-4 Geoff Preidis MD/PhD candidate

Introductjon to EHR Data Quality Nicole G Weiskopf, 8/21/18 Learning Objectjves What is data

Shawna D Nesbitt MD, MS Associate Professor Cardiology Division, Hypertension Section Associate

3/11/2017 CONGENITAL DIAPHRAGMATIC HERNIA Robyn Barst Pediatric PH Research & Mentoring Grant