From heuristic to optimal models in naturalistic visual search Angela Radulescu 1,2 *, Bas van Opheusden 1,2 *, Fred Callaway 2 , Thomas Griffiths 2 & James Hillis 1 1 2 Bridging AI and Cognitive Science workshop, ICLR April 24th, 2020 � 1
An everyday problem… …where are the keys? � 2
Resource allocation in visual search • Main contribution : frame visual search as a reinforcement learning problem ‣ Fixations as information-gathering actions ‣ Do people employ optimal strategies? � 3
Resource allocation in visual search • Main contribution : frame visual search as a reinforcement learning problem ‣ Fixations as information-gathering actions ‣ Do people employ optimal strategies? • Challenges: ‣ Representing the state space — world is high-dimensional; what features does visual system have access to ? ‣ Finding the optimal policy — reward function is sparse; how to balance cost of sampling and performance ? � 4
Naturalistic visual search in VR • VR + gaze tracking, fixed camera location • Cluttered room, 1 target among many distractors • “Find the target within 8 seconds” • 6 different rooms x 5 locations per room x 10 trials per location = 300 unique scenes • Some trials assisted � 5
� 6
� 7
Start End � 8
Meta-level Markov Decision Process Callaway & Gri ffi ths 2018 � 9
Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 10 �
Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 11 �
Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 12 �
Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 13 �
Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 14 �
Challenge I: representing the belief space Challenge II: finding the optimal policy � 15
Which features to include? � 16
Which features to include? Objects Target Color Shape Object attributes Treisman & Gelade, 1980 Horowitz & Wolfe, 2017 � 17
Which features to include? � 18
Which features to include? Shape 3D mesh D2 distribution � 19
Which features to include? Shape Color 3D mesh D2 distribution 2D texture CIELAB A B A B � 20
Which features to include? Shape Color 3D mesh D2 distribution 2D texture CIELAB PCA PCA A B A B � 21
Which features to include? Shape Color 3D mesh D2 distribution 2D texture CIELAB PCA PCA A B A B Full Partial (3PCs) Full Partial (3PCs) Similarity structure � 22
Shape and color predict gaze � 23
Shape and color predict gaze Gaze on objects Gaze on objects � 24
Shape and color predict gaze � 25
Challenge I: representing the belief space Challenge II: finding the optimal policy � 26
“Ideal observer” model of visual search Calculate/update posterior probabilities If maximum exceeds criterion, STOP Move eyes to object most likely to be target Sample information at fixated location • Can be expressed as a policy in the meta-MDP , but not necessarily optimal Najemnik and Geisler, 2005 Yang, Lengyel and Wolpert, 2017 � 27
Optimizing meta-level return with deep reinforcement learning • Proximal Policy Optimization π Policy (PPO, Schulman, 2017), Object locations implemented with tf-agents Object features • 10 replications, manually tuned hyper-parameters Target features • Manual tweaking of input Posterior V Value representation & initialization Input Dense Layers � 28
Optimizing meta-level return with deep reinforcement learning • Proximal Policy Optimization 0.8 (PPO, Schulman, 2017), 0.6 implemented with tf-agents 5eward 0.4 • 10 replications, manually tuned hyper-parameters 0.2 • Manual tweaking of input 0 0 0.5 1 1.5 2 representation & initialization 6imulated eSisRdes (milliRns) � 29
Optimizing meta-level return with deep reinforcement learning • Proximal Policy Optimization 0.8 (PPO, Schulman, 2017), 0.6 implemented with tf-agents 5eward 0.4 • 10 replications, manually tuned hyper-parameters 0.2 NG • Manual tweaking of input 0 0 0.5 1 1.5 2 representation & initialization 6imulated eSisRdes (milliRns) 30 �
Does optimal policy match humans? Model Human Start End � 31
Does optimal policy match humans? Model Object Human Object Start End � 32
Which features drive human search? Model Human Start End � 33
Ongoing work • Alternative schemes for extracting low-dimensional feature representations of objects ‣ Deep convolutional neural network models of human ventral visual stream (Yamins et al. 2014, Fan et al. 2019) ‣ MeshNet model of 3D shape representation (Feng et al. 2018) � 34
Ongoing work • Alternative schemes for extracting low-dimensional feature representations of objects ‣ Deep convolutional neural network models of human ventral visual stream (Yamins et al. 2014, Fan et al. 2019) ‣ MeshNet model of 3D shape representation (Feng et al. 2018) • Investigating the learned policy ‣ Is it optimal? � 35
Thank you! � 36
Recommend
More recommend