from heuristic to optimal models in naturalistic visual
play

From heuristic to optimal models in naturalistic visual search - PowerPoint PPT Presentation

From heuristic to optimal models in naturalistic visual search Angela Radulescu 1,2 *, Bas van Opheusden 1,2 *, Fred Callaway 2 , Thomas Griffiths 2 & James Hillis 1 1 2 Bridging AI and Cognitive Science workshop, ICLR April 24th, 2020


  1. From heuristic to optimal models in naturalistic visual search Angela Radulescu 1,2 *, Bas van Opheusden 1,2 *, Fred Callaway 2 , Thomas Griffiths 2 & James Hillis 1 1 2 Bridging AI and Cognitive Science workshop, ICLR April 24th, 2020 � 1

  2. An everyday problem… …where are the keys? � 2

  3. Resource allocation in visual search • Main contribution : frame visual search as a reinforcement learning problem ‣ Fixations as information-gathering actions ‣ Do people employ optimal strategies? � 3

  4. Resource allocation in visual search • Main contribution : frame visual search as a reinforcement learning problem ‣ Fixations as information-gathering actions ‣ Do people employ optimal strategies? • Challenges: ‣ Representing the state space — world is high-dimensional; what features does visual system have access to ? ‣ Finding the optimal policy — reward function is sparse; how to balance cost of sampling and performance ? � 4

  5. Naturalistic visual search in VR • VR + gaze tracking, fixed camera location • Cluttered room, 1 target among many distractors • “Find the target within 8 seconds” • 6 different rooms x 5 locations per room x 10 trials per location = 300 unique scenes • Some trials assisted � 5

  6. � 6

  7. � 7

  8. Start End � 8

  9. Meta-level Markov Decision Process Callaway & Gri ffi ths 2018 � 9

  10. Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 10 �

  11. Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 11 �

  12. Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 12 �

  13. Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 13 �

  14. Meta-level Markov Decision Process • Latent : {F true , i true } ‣ Scene features and target identity unknown to the agent • States : {F , J, f target } ‣ Mean and precision of each feature for each object • Actions : { o, ⊥ } ‣ Fixate on object o, or terminate • Transitions : measure X ~ N(F true , J meas ) ‣ J meas decreases with distance from o ‣ Integrate X into F and J with Bayesian cue combination • Rewards : if fixating o then R = -c ; if ⊥ then R = 1 if argmax(P(target | F , J)) = i true and 0 otherwise ‣ Reward agent when most probable target given state matches true target Callaway & Gri ffi ths 2018 14 �

  15. Challenge I: representing the belief space Challenge II: finding the optimal policy � 15

  16. Which features to include? � 16

  17. Which features to include? Objects Target Color Shape Object attributes Treisman & Gelade, 1980 Horowitz & Wolfe, 2017 � 17

  18. Which features to include? � 18

  19. Which features to include? Shape 3D mesh D2 distribution � 19

  20. Which features to include? Shape Color 3D mesh D2 distribution 2D texture CIELAB A B A B � 20

  21. Which features to include? Shape Color 3D mesh D2 distribution 2D texture CIELAB PCA PCA A B A B � 21

  22. Which features to include? Shape Color 3D mesh D2 distribution 2D texture CIELAB PCA PCA A B A B Full Partial (3PCs) Full Partial (3PCs) Similarity structure � 22

  23. Shape and color predict gaze � 23

  24. Shape and color predict gaze Gaze on objects Gaze on objects � 24

  25. Shape and color predict gaze � 25

  26. Challenge I: representing the belief space Challenge II: finding the optimal policy � 26

  27. “Ideal observer” model of visual search Calculate/update posterior probabilities If maximum exceeds criterion, STOP Move eyes to object most likely to be target Sample information at fixated location • Can be expressed as a policy in the meta-MDP , but not necessarily optimal Najemnik and Geisler, 2005 Yang, Lengyel and Wolpert, 2017 � 27

  28. Optimizing meta-level return with deep reinforcement learning • Proximal Policy Optimization π Policy (PPO, Schulman, 2017), Object locations implemented with tf-agents Object features • 10 replications, manually tuned hyper-parameters Target features • Manual tweaking of input Posterior V Value representation & initialization Input Dense Layers � 28

  29. Optimizing meta-level return with deep reinforcement learning • Proximal Policy Optimization 0.8 (PPO, Schulman, 2017), 0.6 implemented with tf-agents 5eward 0.4 • 10 replications, manually tuned hyper-parameters 0.2 • Manual tweaking of input 0 0 0.5 1 1.5 2 representation & initialization 6imulated eSisRdes (milliRns) � 29

  30. Optimizing meta-level return with deep reinforcement learning • Proximal Policy Optimization 0.8 (PPO, Schulman, 2017), 0.6 implemented with tf-agents 5eward 0.4 • 10 replications, manually tuned hyper-parameters 0.2 NG • Manual tweaking of input 0 0 0.5 1 1.5 2 representation & initialization 6imulated eSisRdes (milliRns) 30 �

  31. Does optimal policy match humans? Model Human Start End � 31

  32. Does optimal policy match humans? Model Object Human Object Start End � 32

  33. Which features drive human search? Model Human Start End � 33

  34. Ongoing work • Alternative schemes for extracting low-dimensional feature representations of objects ‣ Deep convolutional neural network models of human ventral visual stream (Yamins et al. 2014, Fan et al. 2019) ‣ MeshNet model of 3D shape representation (Feng et al. 2018) � 34

  35. Ongoing work • Alternative schemes for extracting low-dimensional feature representations of objects ‣ Deep convolutional neural network models of human ventral visual stream (Yamins et al. 2014, Fan et al. 2019) ‣ MeshNet model of 3D shape representation (Feng et al. 2018) • Investigating the learned policy ‣ Is it optimal? � 35

  36. Thank you! � 36

Recommend


More recommend