Reinforcement Learning of Reinforcement Learning of Affordance Cues - PowerPoint PPT Presentation

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final Status of Work Final Status of Work Lucas Paletta & Gerald Fritz Lucas Paletta & Gerald Fritz Computational Perception Group Computational Perception Group Institute of Digital Image Processing Institute of Digital Image Processing JOANNEUM RESEARCH Forschungsgesellschaft Forschungsgesellschaft mbH mbH JOANNEUM RESEARCH Graz, Austria Graz, Austria Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Perception Module Perception Module Architecture Architecture Learning Learning of of affordance affordance cues cues Feature detectors detectors Feature Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Perception and Learning Perception and Learning Supervised Learning of Supervised Learning of Affordance Cues Affordance Cues top = T Y circular = T unknown N size > 1426 liftable non liftable Y N size < 1410 liftable Y pruning pruning liftable non liftable Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Perception and Learning Perception and Learning Affordance Hypotheses Affordance Hypotheses tb = bottom: unknown (1086.0) tb = top: | structure = circ: non liftable (552.0) P(A liftable |circ) ≈ 0.00 | structure = rect: | | size > 1426 : liftable (402.0) P(A nonliftable |circ) ≈ 1.00 | | size <= 1426 : P(A liftable |rect) ≈ 0.99 | | | size <= 1410 : liftable (72.0) P(A nonliftable |rect) ≈ 0.01 | | | size > 1410 : non liftable (6.0) Fritz et al., SAB 2006, IROS 2006 Fritz et al., SAB 2006, IROS 2006 Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Integration of Integration of Perception Module and Module and Perception Learning Module Module Learning Outcome Classifier Classifier Outcome LM LM outcome outcome ‚Interaction ‚ Interaction‘ ‘ DT Estimator DT Estimator Affordance Cue Cue Classifier Classifier Affordance EM EM BM BM Entity Entity Memory Memory PM SUPERVISED SUPERVISED PM Feature Detectors Feature Detectors (DECISION TREE ) (DECISION TREE ) LEARNING LEARNING Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Decision Processes, Decision Processes, Rewards & Affordance Cues Rewards & Affordance Cues reward perceptual perceptual state state (spatiotemp ( spatiotemp) ) trigger cue state cue actions actions affordance outcome outcome recognition Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Closed-Loop Learning Closed-Loop Learning ATTENTION ATTENTION FEATURE RECOGNITION FEATURE RECOGNITION LEARNING & CONTROL LEARNING & CONTROL Entities, State Image s t Curiosity Drive s Attributes Estimation Analysis t Final State Recognition R t R t MDP Decision Maker a t a t Paletta & Fritz, ICDL 2007 Paletta & Fritz, ICDL 2007 Paletta & Fritz, KI 2007 Paletta & Fritz, KI 2007 Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Reinforcement (Q-)Learning Reinforcement (Q-)Learning gripper gripper gripper gripper down down down down  States : multi-modal & proprioceptive features  States final state  Actions : motor , crane, camera (a 1 ,.., a A )  Actions gripper gripper gripper gripper up up Object Object up up Object   Rewards : reward function is specific to Rewards HIGH & LOW & s MID & s‘ s‘‘ affordance outcome driven reward : Gripper Gripper Gripper HIGH LOW MID  R(t+1) = 1 if outcome occurs R = 0 R = +1  R(t+1) = 0 otherwise predicted reward of Predicted Reward Update Rule early trigger state � � � � � � � n � � Q ( s , a ) U ( S , a ) E R ( s , a ) ( ) � Q ( s , a ) 1 Q ( s , a ) R max Q ( s ' , a ' ) � � � = � � � � + � + � � � t n t n t n + + + � � a ' � � � � n 0 = � � � � Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Generalized Trigger Features and Generalized Trigger Features and Execution of Interaction Execution of Interaction agent agent full state description affordance classifier 1 affordance classifier 2 affordance classifier 3 affordance classifier4  classification tree b1 b2 b3 b4 (generalization on predictive perceptual state s t reward R t action a t features) environment Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Integration of Integration of Perception Module and Module and Perception Learning Module Module Learning EM EM Outcome Outcome Classifier Classifier ‚Interaction Interaction‘ ‘ ‚ Affordance Cue Cue Classifier Classifier Affordance outcome outcome (actual reward) (actual reward) reward estimate reward estimate BM BM Parameter Estimator Estimator Parameter Reward Estimator Estimator Reward parameters parameters LM LM EXPLORATORY EXPLORATORY Feature Detectors Detectors Feature (REINFORCEMENT ) (REINFORCEMENT ) LEARNING LEARNING PM PM Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Predicted Reward & Predicted Reward & Perceptual States Perceptual States est est. . future future cumulative cumulative reward reward ← ← classification classification affordance affordance cue cue ← ← Demonstration ← proprioceptive proprioceptive features features ← observation observation of of own own actions actions ← ← ← observation observation of of environment environment ← Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Perception and Learning Perception and Learning Reinforcement Learning of Reinforcement Learning of Affordance based Cues Affordance based Cues prospective states and affordance cues Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Perception and Learning Perception and Learning Learning on Real World Imagery on Real World Imagery Learning  Real World Images with Image Analysis  Simulation of Magnet signals and Crane´s Rope Tension Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Perception and Learning Perception and Learning Video on Real-World Data Video on Real-World Data Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Perception and Learning Perception and Learning Key Contributions Key Contributions  Reinforcement Learning Methodology for Affordance Cueing ⇒ Exploratory Learning without Supervision Exploratory Learning without Supervision ⇒ Reward Signal Determines Outcome Events ⇒ Reward Signal Determines Outcome Events ⇒ Backtracking Determines Perceptual ‘ ‘Cue Cue’ ’ State State ⇒ Backtracking Determines Perceptual ⇒ ⇒ Enables Largely Autonomous Learning of Cueing Enables Largely Autonomous Learning of Cueing ⇒  Perception-Action Framework for Affordance Recognition ⇒ Implicit Learning of Affordance Relations Implicit Learning of Affordance Relations Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Perception and Learning Perception and Learning Directions of Future Work Directions of Future Work  Generalisation Towards Higher Order Features  Unsupervised Segmentation of Affordance Processing Stages (C, B, O)  Integration into an Affordance Selection Framework Lucas PALETTA - Computational Perception Group (CAPE) MACS YEAR 3 REVIEW MEETING, Sankt Augustin, February 15, 2008

Reinforcement Learning of Reinforcement Learning of Affordance Cues - PowerPoint PPT Presentation

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final Status of Work Final Status of Work Lucas Paletta & Gerald Fritz Lucas Paletta & Gerald Fritz Computational Perception Group Computational

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Screening: Survey of Well-being of Young Children (SWYC) Rhonda Burk - Crete Public

Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama

Utilization of ASQ in Web Design Course Brankica Brati c, Vladimir Kurbalija, Vasileios

Manuela Veloso Manuela Veloso

10/20/2009 Announcements Introduction to Artificial Intelligence Assignment 2 due next Monday

Maestro Workflow Conductor: A vision for the future of HPC Workflow Computing Expo Francesco Di

CONSTRAINTS A DEVELOPER'S SECRET WEAPON PG Day Paris 2018-03-15 WILL LEINWEBER @LEINWEBER

The Hypergraph Assignment Problem Olga Heismann joint work with: Ralf Borndrfer, Achim