Contextual Awareness for Robot Autonomy ( FA2386-10-1-4138 ) PI: Reid Simmons (Carnegie Mellon University) AFOSR Joint Program Review: Cognition and Decision Program Human-System Interaction and Robust Decision Making Program Robust Computational Intelligence Program (Jan 28-Feb 1, 2013, Washington DC)
Contextual Awareness (Simmons) Technical Approach: Objective: Anticipate possible failures and respond Develop approaches that provide appropriately by planning and reasoning robots with contextual awareness: about uncertainty explicitly; Awareness of surroundings, Current work switches between policies at capabilities, and intent run time to maximize probability of exceeding reward threshold and finds anomalies in execution data using deviation from nominal model Budget: DoD Benefit: FY11 FY12 FY13 Robots that are more robust to Actual/ uncertainty in environment; 179,479 / 155,232 / 74,670 / Planned $K 166,995 173,353 186,394 Robots that are capable of as of 12/31/12 understanding their limitations and Annual Progress Report Submitted? Y Y NA responding intelligently Project End Date: 8/23/2013
List of Project Goals 1. Develop algorithms to reliably detect, diagnose and recover from exceptional (and uncertain) situations Develop approaches to determine robot’s own 2. limitations and ask for assistance 3. Develop algorithms to explain actions to people 4. Develop approaches to learn from people
Progress Towards Goals 1. Develop algorithms to reliably detect, diagnose and recover from exceptional (and uncertain) situations ( ongoing work by Breelyn Kane, Robotics PhD ) Develop approaches to determine robot’s own 2. limitations and ask for assistance ( ongoing work by Juan Mendoza, Robotics PhD ) 3. Develop algorithms to explain actions to people ( postponed ) 4. Develop approaches to learn from people ( PhD thesis “Graph -based Trajectory Planning through Programming by Demonstration,” Nik Melchior, defended December 2010, thesis completed August 2012 )
Policy Switching to Exceed Reward Thresholds (Breelyn Kane, PhD student, Robotics) “Risk - Variant Policy Switching to Exceed Reward Thresholds,” B. Kane and R. Simmons, In Proceedings International Conference on Automated Planning and Scheduling , Sao Paulo, Brazil, June 2012.
Acting to Exceed Reward Thresholds • In Competitive Domains, Second is no Better than Last – “The person that said winning isn’t everything, never won anything” (Mia Hamm) – “If you’re not first, you’re last!” (Ricky Bobby, Talladega Nights) – Arcade games: Not just beating a level, going for top score
Straightforward Approach • Add time and cumulative reward to state space • Generate optimal plan and execute • Significantly increases state space – Planning probably infeasible for real-world domains
Our Approach Offline • Generate different policies : policies of varying risk attitude • Estimate the reward distributions Online • Switch between policies : Calculate the maximum probability of being over a threshold at each time step based on the current cumulative (discounted) reward
Distribution of Rewards • Our work reasons about the complete , non-parametric reward distribution, including the distribution tails – Estimate reward distribution by running policy and gathering statistics P V ( s ) x
Switching Decision Criterion maximize P R s thresh 0 t 1 thresh R s 0 minarg P V s t t
Pizza Delivery Domain “30 Minutes or It’s Free” Risk-Neutral Risk = 1.2
Pizza Domain Results • Execute 10,000 runs in original MDP • Same start state every time Risk-neutral vs switching (with risky policy d =1.2) • Threshold = -100; Threshold = -70 Fails to Exceed the Threshold Fails to Exceed the Threshold Risk Neutral Fails: 3120 Risk Neutral Fails: 8026 Switching Fails: 2166 Switching Fails: 5790 Fails 9.5% less Fails 22.4% less using switching strategy; using switching strategy; Reduces losses by 30.6% Reduces losses by 27.9%
Augmented State Approach • Augment state space with cumulative reward – Integer-valued, no discounting – Reward capped [0, -150] – Action rewards based on location and current cumulative reward – State space increases by two orders of magnitude
Comparison of Approaches • Execute 10,000 runs in original MDP • Same start state every time • No discounting Augmented space vs switching (with risky policy d =1.2) • Fails to Exceed the Threshold (-70) Risk Neutral Fails: 7946 Switching Fails: 6945 Augmented State Fails: 6256 Augmented state fails 16.9% less Augmented state fails 6.8% less than risk-neutral, original-space; than switching strategy; Reduces losses by 21.2% Reduces losses by 9.9%
Comparison of Approaches Augmented State Risk-Variant Switching Planning (Offline) Time Solve policy: 18 hours Solve policy: < 1min Generate reward distr: 5-10 min Construct CDF: 1 min Total: 18 hours Total: 12 min * 2 policies = 24 min Execution (Online) Time 0.015s Eval Switch : 0.02s + Augmented state approach performs close to optimal - Very large planning time - Must re-generate policy when threshold changes - State space is enormous if discounting is needed
Robust Execution Monitoring (Juan Pablo Mendoza, PhD student, Robotics) “Motion Interference Detection in Mobile Robots,” J.P. Mendoza, M. Veloso and R. Simmons, In Proceedings of International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, October 2012 “Mobile Robot Fault Detection based on Redundant Information Statistics,” J.P. Mendoza, M. Veloso and R. Simmons, In IROS Workshop on Safety in Human-Robot Coexistence and Interaction , Vilamoura, Portugal, October 2012
Learning to Detect Motion Interference • Learn HMM from Robot Data – Includes nominal and Motion Interference ( MI ) states – Hand-labeled training data – Learn transition probabilities to nominal states – Learn observation probabilities of all states – Transition probability to MI state is tunable parameter Accel Stop Constant MI Decel
Learning Behavior Model • Observations – Commanded Velocity – Velocity Difference : difference between commanded and perceived velocity, from encoders – Acceleration : Linear regression of last N measures of velocity – Jerk : Linear regression of last M measures of acceleration
Example Runs
Overall Performance • Precision/recall as MI transition probability varies • With precision = 1 and recall = 0.93, median detection time was 0.36s (mean = 0.647)
Detecting Unexpected Anomalies • Basic Idea – Model nominal behavior – Detect significant deviation from nominal – Determine extent of anomaly • Make execution monitoring efficient , effective , and informative
Modeling Nominal Behavior • Define a residual function that is (close to) zero during nominal behavior – For instance, velocity difference or difference between estimates of redundant sensors – Future work: Make residual function dependent on current state (e.g., using HMM)
Detecting Deviation from Nominal • Compute sample mean of residual function from observed data 2 F r ~ 0 , N • Compute probability that sample mean is not within epsilon of zero ' a ( D ) 2 z 1 f r F r ' z f r N
Estimate Extent of Anomaly • Define region around current state – Currently: grid state space – Future: maintain continuous state space • Extend region in direction that increase a(d(R)) the most – Currently: extend to form axis-aligned hyper-rectangles – Future: general convex shaped regions • Stop when found locally maximal anomaly region – Current: Continue while a(d(R)) is non-decreasing – Future: Skip over “gaps”
Example Runs Pull from Stop Collision Push from Stop
Discovering a Global Anomaly • Region grows and becomes more certain as robot travels Ideally, keep upper and lower bounds on anomaly region
Interaction with Other Groups and Organizations • Received Infinite Mario software and support from John Laird ’s group at University of Michigan; interaction with student Shiwali Mohan • Interacted with Sven Koenig (USC/NSF) and former student Yaxin Liu regarding generation of risk-sensitive policies • Interaction with Manuela Veloso (CMU CSD/RI) – co- advising Juan Pablo Mendoza
List of Publications Attributed to the Grant • “Risk - Variant Policy Switching to Exceed Reward Thresholds,” B. Kane and R. Simmons, ICAPS, Sao Paulo Brazil, June 2012 • “Graph -based Trajectory Planning through Programming by Demonstration,” Nik Melchior, PhD Thesis, CMU RI -TR-11-40, August 2012 • “Motion Interference Detection in Mobile Robots,” J.P. Mendoza, M. Veloso and R. Simmons, In Proceedings of International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, October 2012 • “Mobile Robot Fault Detection based on Redundant Information Statistics,” J.P. Mendoza, M. Veloso and R. Simmons, In IROS Workshop on Safety in Human-Robot Coexistence and Interaction , Vilamoura, Portugal, October 2012
Recommend
More recommend