Action and Adaptation: Lessons from Neurobiology and Challenges for Robot Cognitive Architectures INSTITUTO DE SISTEMAS E ROBÓTICA Rodrigo Ventura Institute for Systems and Robotics Instituto Superior Técnico Lisbon, PORTUGAL yoda@isr.ist.utl.pt
Motivation • Design constraints for robot cognitive architectures - embodied agents - situated in a physical environment - receive raw sensory input INSTITUTO DE SISTEMAS E - actions constrained by their physical structure ROBÓTICA iCub humanoid robot (robotcub.org) 2
Neurobiology INSTITUTO DE SISTEMAS E ROBÓTICA CEREBELLUM 3
Neurobiology INSTITUTO DE SISTEMAS E ROBÓTICA CEREBELLUM 3
Neurobiology INSTITUTO DE SISTEMAS E ROBÓTICA CEREBELLUM 3
Neurobiology INSTITUTO DE SISTEMAS E ROBÓTICA CEREBELLUM 3
Model of cerebellum function inverse model INSTITUTO DE SISTEMAS E ROBÓTICA + + + desired motor controller trajectory command - world forward model + + sensory feedback (Kawato 1999) 4
Model of cerebellum function open loop inverse model INSTITUTO DE SISTEMAS E ROBÓTICA + + + desired motor controller trajectory command - fast feedback loop world forward model + + slow feedback loop sensory feedback (Kawato 1999) 4
Motor skill development inverse model + + + desired motor controller trajectory command - INSTITUTO DE SISTEMAS E ROBÓTICA world forward model + + sensory feedback 5
Motor skill development inverse model + + + desired motor controller trajectory command - INSTITUTO DE SISTEMAS E ROBÓTICA world forward model + + sensory feedback 1. Learn the controller — feedback, sensorimotor loop 5
Motor skill development inverse model + + + desired motor controller trajectory command - INSTITUTO DE SISTEMAS E ROBÓTICA world forward model + + sensory feedback 1. Learn the controller — feedback, sensorimotor loop 2. Learn the forward model — prediction • similar function as the Smith regulator, 1958 5
Motor skill development inverse model + + + desired motor controller trajectory command - INSTITUTO DE SISTEMAS E ROBÓTICA world forward model + + sensory feedback 1. Learn the controller — feedback, sensorimotor loop 2. Learn the forward model — prediction • similar function as the Smith regulator, 1958 3. Learn the inverse model — feedforward, open loop 5
Multiple models in the cerebel. inverse model + + + desired motor controller trajectory command - INSTITUTO DE SISTEMAS E ROBÓTICA world forward model + + sensory feedback (Wolpert et al. 1998) 6
Multiple models in the cerebel. inverse model + + + desired motor controller trajectory command - INSTITUTO DE SISTEMAS E ROBÓTICA world forward model + + sensory feedback weight Context → responsability estimation plasticity (Wolpert et al. 1998) 6
Decision making: BG-DA (Frank et al. 2006) (premotor cortex) action option considered INSTITUTO DE SISTEMAS E ROBÓTICA (BG, basal ganglia) go no-go facilitates suppresses response response 7
Decision making: BG-DA (Frank et al. 2006) (premotor cortex) action option considered INSTITUTO DE SISTEMAS E ROBÓTICA (BG, basal ganglia) go no-go dopamine facilitates suppresses response response reward expectancy match / mismatch 7
Decision making: BG-DA (Frank et al. 2006) (premotor cortex) action option considered INSTITUTO DE SISTEMAS E ROBÓTICA (BG, basal ganglia) go no-go dopamine facilitates suppresses response response reward expectancy match / mismatch • unexpected reward → dopamine release → promotes go • reward missing → dopamine drop → promotes no-go 7
Decision making: BG-DA (Frank et al. 2006) (premotor cortex) action option considered Hebbian learning INSTITUTO DE SISTEMAS E ROBÓTICA (BG, basal ganglia) go no-go dopamine facilitates suppresses response response reward expectancy match / mismatch • unexpected reward → dopamine release → promotes go • reward missing → dopamine drop → promotes no-go 7
Decision making: BG-DA • Requires several trials underlying estimation of reward probablity INSTITUTO DE SISTEMAS E ROBÓTICA • Propagation of rewards backwards in time reward expectancy gets transfered to the cause • Promotes instrumental learning BG plays no longer an active role then 8
Decision making: OFC (Frank et al. 2006) (premotor cortex) action option considered INSTITUTO DE SISTEMAS E ROBÓTICA (basal ganglia) go no-go dopamine facilitates suppresses response response reward expectancy match / mismatch 9
Decision making: OFC (Frank et al. 2006) (premotor cortex) action option OFC considered INSTITUTO DE SISTEMAS E ROBÓTICA (basal ganglia) amygdala go no-go dopamine facilitates suppresses response response reward expectancy match / mismatch 9
Decision making: OFC (Frank et al. 2006) (premotor cortex) action option OFC considered INSTITUTO DE SISTEMAS E ROBÓTICA (basal ganglia) amygdala go no-go dopamine facilitates suppresses response response reward expectancy match / mismatch • Orbitofrontal cortex (OFC): short-term memory of gain-loss information coping with non-stationary environments (e.g. reversal learning) • Amygdala: provides valuation of possible outcomes (e.g., desirable) 9
Two learning paradigms • Based on probability of future rewards: INSTITUTO DE SISTEMAS E ROBÓTICA slow adaptation performed by the BG • Based on past events: quick adaptation of OFC 10
Challenges • Binding problem INSTITUTO DE SISTEMAS E ROBÓTICA • how does the brain integrate information processed in different brain regions? — multi-modal, different time scales • Hypothesis: event files (Hommel 2004) : associate neural coding of perception (features integrated in object files ) and related actions ( action files ) 11
Challenges • Integration of continuous time motor control with discrete time events INSTITUTO DE SISTEMAS E ROBÓTICA • Hypothesis: segmentation of perception in events (Kurby et al. 2008) • local predictors of perception • prediction error triggers segmentation • predictors require internal models • models aquired by experience 12
Future directions • Perception • perception of object function is more basic than object qualities (Merleau-Ponty 1945) INSTITUTO DE SISTEMAS E • affordances (Gibson 1979) ROBÓTICA • Non-utilitarian approaches to decision • case of regret (Coricelli et al. 2007) • insensitivity to probability of negative events (Loewenstein et al . 2001) • neuroeconomics (Glimcher et al . 2004) 13
Q & A INSTITUTO DE SISTEMAS E ROBÓTICA Thank you! 14
References Coricelli, G.; Dolan, R. J.; and Sirigu, A. 2007. Brain, emotion and decision making: the paradigmatic example of regret. Trends in Cognitive Sciences 11(6):258–265. Frank, M. J., and Claus, E. D. 2006. Anatomy of a decision: Striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychological Review 113(2):300–326. Glimcher, P. W., and Rustichini, A. 2004. Neuroeconomics: The consilience of brain and decision. Science 306(5695):447–452. INSTITUTO DE SISTEMAS E Hommel, B. 2004. Event files: feature binding in and across perception and action. Trends in Cognitive Sciences ROBÓTICA 8(11):494–500. Kawato, M. 1999. Internal models for motor control and trajectory planning. Current Opinion in Neurobiology 9(6):718–727. Kurby, C. A., and Zacks, J. M. 2008. Segmentation in the perception and memory of events. Trends in Cognitive Sciences 12(2):72–79. LeDoux, J. 1996. The Emotional Brain. Simon & Schuster. Loewenstein, G. F.; Weber, E. U.; Hsee, C. K.; and Welch, N. 2001. Risk as feelings. Psychological Bulletin 127(2): 267–286. Wolpert, D. M.; Miallb, R. C.; and Kawato, M. 1998. Internal models in the cerebellum. Trends in Cognitive Sciences 2(9):338–347. 15
Recommend
More recommend