Robot coaching of manipulation tasks using haptics and vision PhD thesis proposal Presented by: Leonel D. Rozo C. Advisors: Carme Torras Pablo Jiménez Barcelona. Spain September 29 th , 2008
Outline Objectives 1. State of the art 2. Expected contributions 3. Work planning 4. Resources 5. Conclusions 6.
Objectives Main objective To provide robots with manipulation skills acquired from demonstrated examples given by a human who acts as a coach .
Objectives Specific objectives To analyze (and adapt) different learning algorithms based on robot learning by demonstration, with the aim of finding those that best suit the manipulation task features. Incremental learning Fast learning Robust learning To identify the relevant features in the manipulation tasks from sensorial information with the aim of including them as input in the learning stage. What to imitate ?
Objectives To develop a set-up where robot learning of manipulation tasks by demonstration will take place. It will be composed of a robot (the learner) teleoperated through a haptic device driven by a human user (the coach). To fuse haptic and visual information for improving and speeding up the learning stage.
State of the art Introduction Introduction Why should robots learn ? Two main approaches exist for endowing robots with learning capabilities: Self-learning Learning from examples
State of the art LbD – History and concepts Learning by demonstration Symbolic approaches Exact reproduction of the demonstrated task ( playback ) (A. Billard et al. 2008) State-action-state representation Unsuitable approach when uncertainty appears If-then rules
State of the art LbD – History and concepts Machine learning inclusion in programming by demonstration Supervised methods A training dataset composed by labelled input and desired outputs is given. Goal: Given a new input, to predict its corresponding output Some methods are: Artificial neural networks Decision trees Bayesian statistics Gaussian process regression Nearest neighbour Support vector machines Unsupervised methods A input dataset is presented but no feedback about it is given Goal: finding a representation of particular input patterns in a way that reflects the statistical structure of the overall collection of input patterns
State of the art LbD – History and concepts Imitation learning What is imitation ? Biological inspiration From an act witnessed learn to do an act (Thorndike). Robotics Imitation takes place when an agent learns a behaviour from observing the execution of that behaviour by a teacher (Bakker and Kuniyoshi, 1996). Current challenges (P. Bakker & Y . Kuniyoshi, 1996)
State of the art LbD – History and concepts Movement primitives (MP) Inductive approach MP are sequences of actions that accomplish a complete goal-directed behaviour and allow to have a compact state-action representation (Schaal, 1999). (S. Schaal, 1999 )
State of the art LbD – History and concepts Movement primitives (MP) Biological inspiration A behaviour-based control approach (Mataric) How to interpret and How to integrate the understand observed perception and motion control behaviors ? system to reconstruct what was observed ? (Computational Neuroscience and Humanoid Robotics Department, ATR laboratories) To use a control system that is based on a set of behaviours (MP), which are real- time processes that take inputs from sensors or other behaviours and send output commands to effectors or other system behaviour.
State of the art LbD – History and concepts Control policies The motor control problem which can be conceived as finding a task-specific control policy Motor commands Algorithm parameters Policy States Imitation learning can be defined as the problem of how control policies can be learned by observing a demonstration: Imitation by direct policy learning Imitation by learning policies from demonstrated trajectories Imitation by model-based policy learning
State of the art LbD – History and concepts What to imitate ? – Learning invariances over demonstrations Finding those features of the task that are relevant to the reproduction Those that appear most repeatedly in different demonstrations of the task i.e., the invariants in time (Billard et al., 2004) Observation process Imitation task Execution process Categorization of the human actions (Dillman,2004): Performative Commenting Commanding (Dillman, 2004)
State of the art LbD – History and concepts Improving imitation learning A task learned from imitation can be improved, corrected or refined in two ways: By using reinforcement learning The given demonstrations enclose the search in the state-action space to a more reduced subspace, which means RL is focused on those areas where demonstration data yield This approach is based on a self-improvement process, where the robot improves the learned skill by interacting with its environment (A. Billard et al. 2008)
State of the art LbD – History and concepts By using active teaching The learned action from imitation is corrected or refined through teacher’s support The information goes from The information flow is bi- teacher to the robot directional due to a social activity is being carried out S. Calinon and A. Billard. What is the teacher's role in robot programming by demonstration? toward benchmarks for improved learning. Interaction Studies, 8(3):441-464, 2007.
State of the art LbD – History and concepts Incremental learning Whenever new data are generated, these should be included in the learning framework New demonstrations Corrections Refinements It is necessary to work with learning algorithms that accomplish at least the following requirements: Online learning Inexpensive computations Robustness in front of the interference problem Fast learning in highly dimensional state-action spaces
State of the art LbD – History and concepts Locally weighted learning LWL methods approximate nonlinear functions by means of piecewise linear models Memory-based Locally weighted regression – LWR Locally weighted partial least squares - LWPLS
State of the art LbD – History and concepts Non-memory-based Receptive field weighted regression – RFWR Locally weighted projection regression – LWPR (S. Vijayakumar & S. Schaal, 2000) (S. Schaal & C. Atkeson, 1998) LWPR is an incremental learning algorithm, which is able to deal with high dimensional data streams. In addition is computationally cheap and numerically robust. SHORTCOMING !!! Too many open parameters to be manually tuned
State of the art LbD – History and concepts LWL-based Bayesian learning These methods deal with the problem of manually tuning of the open parameters in LWL algorithms Bayesian locally weighted regression – BLWR It treats all open parameters probabilistically and learns the appropriate local regime for each linearization problem based on the LWR algorithm approach. It is Bayesian formulation of spatially local adaptive kernels for LWR Randomly varying coefficient – RVC Probabilistic method based on the paradigm of Bayesian probabilistic online learning It treats each open parameter in LWPR as a probability distribution Gaussian processes Incremental GMM Direct update method It is based on the temporal coherence properties of data streams It is assumed that were varying smoothly in time to adjust the GMM parameters when new data were observed Reformulating the problem for a generic observation of multiple datapoints Generative method It uses Expectation-Maximization performed on data generated by GMR Sparse online Gaussian processes - SOGP
State of the art LbD – History and concepts Coaching It can be divided into two process Imitation learning Observation Execution Active teaching Observation and evaluation Corrections and refinements (A. Billard et al. 2008) It allows ... to acquire new knowledge to focus attention on relevant task features to give a strategy for correction to help to iteratively define the characteristics of a successful outcome
State of the art LbD – Entire systems Systems based on vision Manipulation Playing air Gestures Human motion tasks hockey Optimization Bayesian Gaussian HMM PCA criteria methods processes
State of the art LbD – Entire systems Learning basketball official’s signals Motion sensors Preprocessing stage by using PCA Actions are encoded in a probabilistic way by using GMM GMR is applied for reconstructing a general form for the signals S. Calinon and A. Billard. Incremental learning of gestures by imitation in a humanoid robot. 2007
State of the art LbD – Entire systems Systems based on haptics Virtual Assembly tasks environments Neural Optimization Fuzzy HMM LWR networks criteria logic
Recommend
More recommend