Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final Status of Work Final Status of Work Lucas Paletta & Gerald Fritz Lucas Paletta & Gerald Fritz Computational Perception Group Computational Perception Group Institute of Digital Image Processing Institute of Digital Image Processing JOANNEUM RESEARCH Forschungsgesellschaft Forschungsgesellschaft mbH mbH JOANNEUM RESEARCH Graz, Austria Graz, Austria Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Perception Module Perception Module Architecture Architecture Learning Learning of of affordance affordance cues cues Feature detectors detectors Feature Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Perception and Learning Perception and Learning Supervised Learning of Supervised Learning of Affordance Cues Affordance Cues top = T Y circular = T unknown N size > 1426 liftable non liftable Y N size < 1410 liftable Y pruning pruning liftable non liftable Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Perception and Learning Perception and Learning Affordance Hypotheses Affordance Hypotheses tb = bottom: unknown (1086.0) tb = top: | structure = circ: non liftable (552.0) P(A liftable |circ) ≈ 0.00 | structure = rect: | | size > 1426 : liftable (402.0) P(A nonliftable |circ) ≈ 1.00 | | size <= 1426 : P(A liftable |rect) ≈ 0.99 | | | size <= 1410 : liftable (72.0) P(A nonliftable |rect) ≈ 0.01 | | | size > 1410 : non liftable (6.0) Fritz et al., SAB 2006, IROS 2006 Fritz et al., SAB 2006, IROS 2006 Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Integration of Integration of Perception Module and Module and Perception Learning Module Module Learning Outcome Classifier Classifier Outcome LM LM outcome outcome ‚Interaction ‚ Interaction‘ ‘ DT Estimator DT Estimator Affordance Cue Cue Classifier Classifier Affordance EM EM BM BM Entity Entity Memory Memory PM SUPERVISED SUPERVISED PM Feature Detectors Feature Detectors (DECISION TREE ) (DECISION TREE ) LEARNING LEARNING Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Decision Processes, Decision Processes, Rewards & Affordance Cues Rewards & Affordance Cues reward perceptual perceptual state state (spatiotemp ( spatiotemp) ) trigger cue state cue actions actions affordance outcome outcome recognition Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Closed-Loop Learning Closed-Loop Learning ATTENTION ATTENTION FEATURE RECOGNITION FEATURE RECOGNITION LEARNING & CONTROL LEARNING & CONTROL Entities, State Image s t Curiosity Drive s Attributes Estimation Analysis t Final State Recognition R t R t MDP Decision Maker a t a t Paletta & Fritz, ICDL 2007 Paletta & Fritz, ICDL 2007 Paletta & Fritz, KI 2007 Paletta & Fritz, KI 2007 Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Reinforcement (Q-)Learning Reinforcement (Q-)Learning gripper gripper gripper gripper down down down down States : multi-modal & proprioceptive features States final state Actions : motor , crane, camera (a 1 ,.., a A ) Actions gripper gripper gripper gripper up up Object Object up up Object Rewards : reward function is specific to Rewards HIGH & LOW & s MID & s‘ s‘‘ affordance outcome driven reward : Gripper Gripper Gripper HIGH LOW MID R(t+1) = 1 if outcome occurs R = 0 R = +1 R(t+1) = 0 otherwise predicted reward of Predicted Reward Update Rule early trigger state � � � � � � � n � � Q ( s , a ) U ( S , a ) E R ( s , a ) ( ) � Q ( s , a ) 1 Q ( s , a ) R max Q ( s ' , a ' ) � � � = � � � � + � + � � � t n t n t n + + + � � a ' � � � � n 0 = � � � � Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Generalized Trigger Features and Generalized Trigger Features and Execution of Interaction Execution of Interaction agent agent full state description affordance classifier 1 affordance classifier 2 affordance classifier 3 affordance classifier4 classification tree b1 b2 b3 b4 (generalization on predictive perceptual state s t reward R t action a t features) environment Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Integration of Integration of Perception Module and Module and Perception Learning Module Module Learning EM EM Outcome Outcome Classifier Classifier ‚Interaction Interaction‘ ‘ ‚ Affordance Cue Cue Classifier Classifier Affordance outcome outcome (actual reward) (actual reward) reward estimate reward estimate BM BM Parameter Estimator Estimator Parameter Reward Estimator Estimator Reward parameters parameters LM LM EXPLORATORY EXPLORATORY Feature Detectors Detectors Feature (REINFORCEMENT ) (REINFORCEMENT ) LEARNING LEARNING PM PM Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Predicted Reward & Predicted Reward & Perceptual States Perceptual States est est. . future future cumulative cumulative reward reward ← ← classification classification affordance affordance cue cue ← ← Demonstration ← proprioceptive proprioceptive features features ← observation observation of of own own actions actions ← ← ← observation observation of of environment environment ← Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Perception and Learning Perception and Learning Reinforcement Learning of Reinforcement Learning of Affordance based Cues Affordance based Cues prospective states and affordance cues Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Perception and Learning Perception and Learning Learning on Real World Imagery on Real World Imagery Learning Real World Images with Image Analysis Simulation of Magnet signals and Crane´s Rope Tension Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Perception and Learning Perception and Learning Video on Real-World Data Video on Real-World Data Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Perception and Learning Perception and Learning Key Contributions Key Contributions Reinforcement Learning Methodology for Affordance Cueing ⇒ Exploratory Learning without Supervision Exploratory Learning without Supervision ⇒ Reward Signal Determines Outcome Events ⇒ Reward Signal Determines Outcome Events ⇒ Backtracking Determines Perceptual ‘ ‘Cue Cue’ ’ State State ⇒ Backtracking Determines Perceptual ⇒ ⇒ Enables Largely Autonomous Learning of Cueing Enables Largely Autonomous Learning of Cueing ⇒ Perception-Action Framework for Affordance Recognition ⇒ Implicit Learning of Affordance Relations Implicit Learning of Affordance Relations Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Perception and Learning Perception and Learning Directions of Future Work Directions of Future Work Generalisation Towards Higher Order Features Unsupervised Segmentation of Affordance Processing Stages (C, B, O) Integration into an Affordance Selection Framework Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008
Recommend
More recommend