Synthesis of skilled robotic behaviour through human sensorimotor adaptation Jan Babič Jožef Stefan Institute Slovenia
Well studied arm-reaching Force field OFF Force field ON Force field Before training After training perturbation Data from Shadmehr and Wise (2005) Catch trials: suddenly turn off the force field to see the effect of training Results: Central Nervous System forms an internal model to nullify the effect of the force field Illustration adapted from Milner and Franklin (2005)
� � Computational theories • kinematic minimal jerk model (Flash & Hogan 1985) • minimal torque change model (Uno et al. 1989) • minimal motor command change model (Nakano et al. 1999) • combination of control minimization (∫ 𝑣 $ ) and best performance with signal dependent noise (Harris & Wolpert 1998) • stohastic optimal control theory (Todorov & Jordan 2002)
What is missing? • arm-reaching paradigm is too constrained • any optimality principle for a functional modality of the brain should be a suboptimal goal to increase fitness (ecological viewpoint) • sensorimotor adaptation from a wider scope of reinforcement learning with subgoals • fitness maximization, injury avoidance, neural energy, memory dependence, cheap and approximate sub-goal solution, …
Our approach • to expose sensorimotor control mechanisms and the adaptations to danger of falling and injury • unconstrained whole body motion – squat to standing movements • non-trivial perturbations • whole body equivalent to well-studied arm-reaching motion • same level of complexity • but, it inherently involves the danger of falling and injury Babic, J., Oztop, E., Kawato, M. Human motor adaptation in whole body motion . Nature PG: Scientific Reports 6, 32868, (2016). Rueckert, E., Čamernik, J., Peters, J., & Babič, J. Probabilistic Movement Models Show that Postural Control Precedes and Predicts Volitional Motor Control . Nature PG: Scientific Reports , 6 , 28455 (2016).
Experimental setup Display Visual Feedback Display Motion Capture Target COM position COM 6DOF Parallel Platform Base COM position Perturbation generation Marker velocity Platform position
Experiment
Adaptation to perturbations Trajectory Area Failed Trials A B 0.03 4 * r = .738 Average number of failed trials 0.02 3 Trajectory area / m² 0.01 2 0 1 -0.01 0 1 2 3 4 5 8 2 3 4 5 Block number Block number Very fast adaptation to perturbations • Perturbed trajectories remained different to • unperturbed trajectories Failures correspond to adaptation •
Adaptation to perturbations 1 subject 1 subject 2 subject 3 subject 4 0.8 0.6 0.4 Normalized vertical displacement / m 0.2 0 subject 5 subject 6 subject 7 subject 8 1 0.8 0.6 0.4 0.2 0 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 Horizontal displacement / m Inter-subject consistency in re-optimized trajectories •
Adaptation mechanism 1 subject 1 subject 2 subject 3 subject 4 0.8 0.6 0.4 Normalized vertical displacement / m 0.2 0 subject 5 subject 6 subject 7 subject 8 1 0.8 0.6 0.4 0.2 0 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 Horizontal displacement / m Catch trials after adaptation stabilized • Inter-subject variability in aftereffects • Active compensation of perturbations •
̇ Predictive Component Measure Focus on feed-forward mechanisms governing the motion • How to quantify motion of COM before the feedback mechanisms could • alter the motion? 𝑄𝐷 = 𝑌 +,- (𝑢) Introduction of predictive-response measure , 𝑢 = 20 𝑛𝑡 • 𝐺(𝑢) Trajectory Area Measure Measures: Predictive component Measure Feedback mechanisms Motion Control Processes: Feedforward mechanisms Time: Start of motion 20 ms Feedback starts acting End of motion
Inter-subject variability 1 0.05 5 6 Trajectory area / m² 0.04 3 0.03 8 2 4 0.02 7 0.01 -0.002 -0.001 0 0.001 0.002 0.003 0.004 Predictive component / m ⋅ s ⁻ ¹ ⋅ N ⁻ ¹ • predictive-response measure is strong predictor of afftereffects subjects used little exploration during adaptation process •
Catch-trial simulations Adapted COM Motion Feedforward Controller Dynamic Model of Feedback Movement System Controller Joint Angles Gain Parameters Perturbation Switch subject 1 subject 2 subject 3 subject 4 1 subject 1 subject 2 subject 3 subject 4 1 0.8 0.8 0.6 0.6 0.4 0.4 Normalized vertical displacement / m Normalized vertical displacement / m 0.2 0.2 0 0 subject 5 subject 6 subject 7 subject 8 1 1 subject 5 subject 6 subject 7 subject 8 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 -0.05 0 0.05 Horizontal displacement / m Horizontal displacement / m
Summary Very fast adaptation to perturbations • Perturbed trajectories remained different to • unperturbed trajectories Inter-subject variability in aftereffects • predictive-response measure is strong predictor • of afftereffects subjects used little exploration during adaptation • process Combining Sensorimotor Adaptation and • Reinforcement Learning
Skill synthesis for autonomy For autonomous operation, the key issue is transferring the control policy • learnt by human to the robot Human Robot Learning: ~Adaptive Controller Learn π: s → u Feedback to human sensory system (f) Robot state (s) Feedback Interface Human Motion (m) Motor command (u) Feedforward Interface
Robot skill synthesis machine learning techniques are more efficient for supervised than unsupervised learning and optimal control problems illustration adapted from Milner and Franklin (2005) human brain + supervised learning >> robot skill generation
Body schema is flexible Body schema is flexible 円 Figure from (Maravita & Iriki 2004) • representation for body schema: VIP neurons integrate somatosensory and visual information with visual receptive fields anchored to the hand/arm of the monkey • Tool use modifies the body schema (Iriki et al. 1996)
Shared control for human-robot interacting tasks ROBOT FEEDBACK CONTROL POLICY HUMAN MACHINE MOTOR MOTOR COMMANDS HUMAN MOTOR COMMANDS COMMANDS SHARED HUMAN-ROBOT HUMAN ROBOT CONTROL INTERFACE ACTUAL SYSTEM COMMANDS FEEDBACK FEEDBACK INTERFACE The method is based on Locally Weighted Regression (LWR) • Shared control algorithm delegates the control responsibility between • human demonstrator and current robotic skill (control policy)
Force Interaction Task Peternel, L., Oztop, E., & Babic, J. A Shared Control Method for Online Human-in-the-Loop Robot Learning Based on Locally Weighted Regression. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Daejeon, Korea, 2016.
Evolution of robot adaptation
Force Interaction Task Peternel, L., Petric, T., & Babic, J. Human-in-the-loop approach for teaching robot assembly tasks using impedance control interface. IEEE International Conference on Robotics and Automation (ICRA) , Seattle, USA, 2015. p. 1497–1502.
Reactive postural control Peternel, L., Babic, J. Humanoid robot posture-control learning in real-time based on human sensorimotor learning ability. IEEE International Conference on Robotics and Automation (ICRA) , Karlsruhe, Germany, 2013. p. 5309-5314.
Responsibility transfer The influence weighting algorithm calculates the mean square error (MSE) • between the human reaction and predicted reaction over a period T during the demonstration. The maximum MSE is set as a reference for the weighting criterion: • MSE = C total MSE max The criterion is used to weight the human influence and the influence of the • autonomous controller. The output that is controlling the robot is calculated by: • = + - y Cy (1 C y ) human predicted when MSE does not improve over N periods the algorithm disconnects the human • from the control loop. At that point the robot is considered trained. • 23
Responsibility transfer 24
Human – Robot Physical Collaboration Peternel, L., Petric, T., Oztop, E., Babic, J. Teaching robots to cooperate with humans in dynamic manipulation tasks based on multi-modal human-in-the-loop approach. Autonomous Robots , 2014, vol. 36, p. 123-136.
Autonomy • Two layered imitation system – First layer extracts the frequency – Canonic dynamic system – Second layer learns the waveform – Output dynamic system • The waveform is learned in real-time • Adaptations: – Frequency – Phase – Amplitude
Co-adaptive control of exoskeletons Tight interconnection between human and exoskeleton • Human adapts muscular activation to the exoskeleton assistance • Exoskeleton adapts to human motion • Peternel, L., Tomoyuki, N., Petric, T., Ude, A., Morimoto, J., Babic, J. Adaptive control of exoskeleton robots for periodic assistive behaviours based on EMG feedback minimisation. PloS one , 2016, vol. 11, no. 2.
Evolution of trajectories
In collaboration with: Funding: Erhan Oztop , OZU, Turkey FP7 CoDyCo Mitsuo Kawato & Jun Morimoto , ATR, Japan Horizon 2020 SPEXOR Luka Peternel , IIT, Italy Tadej Petric , JSI, Slovenia
Recommend
More recommend