contextual awareness for robot autonomy
play

Contextual Awareness for Robot Autonomy ( FA2386-10-1-4138 ) PI: - PowerPoint PPT Presentation

Contextual Awareness for Robot Autonomy ( FA2386-10-1-4138 ) PI: Reid Simmons (Carnegie Mellon University) AFOSR Joint Program Review: Cognition and Decision Program Human-System Interaction and Robust Decision Making Program Robust


  1. Contextual Awareness for Robot Autonomy ( FA2386-10-1-4138 ) PI: Reid Simmons (Carnegie Mellon University) AFOSR Joint Program Review: Cognition and Decision Program Human-System Interaction and Robust Decision Making Program Robust Computational Intelligence Program (Jan 28-Feb 1, 2013, Washington DC)

  2. Contextual Awareness (Simmons) Technical Approach: Objective: Anticipate possible failures and respond Develop approaches that provide appropriately by planning and reasoning robots with contextual awareness: about uncertainty explicitly; Awareness of surroundings, Current work switches between policies at capabilities, and intent run time to maximize probability of exceeding reward threshold and finds anomalies in execution data using deviation from nominal model Budget: DoD Benefit: FY11 FY12 FY13 Robots that are more robust to Actual/ uncertainty in environment; 179,479 / 155,232 / 74,670 / Planned $K 166,995 173,353 186,394 Robots that are capable of as of 12/31/12 understanding their limitations and Annual Progress Report Submitted? Y Y NA responding intelligently Project End Date: 8/23/2013

  3. List of Project Goals 1. Develop algorithms to reliably detect, diagnose and recover from exceptional (and uncertain) situations Develop approaches to determine robot’s own 2. limitations and ask for assistance 3. Develop algorithms to explain actions to people 4. Develop approaches to learn from people

  4. Progress Towards Goals 1. Develop algorithms to reliably detect, diagnose and recover from exceptional (and uncertain) situations ( ongoing work by Breelyn Kane, Robotics PhD ) Develop approaches to determine robot’s own 2. limitations and ask for assistance ( ongoing work by Juan Mendoza, Robotics PhD ) 3. Develop algorithms to explain actions to people ( postponed ) 4. Develop approaches to learn from people ( PhD thesis “Graph -based Trajectory Planning through Programming by Demonstration,” Nik Melchior, defended December 2010, thesis completed August 2012 )

  5. Policy Switching to Exceed Reward Thresholds (Breelyn Kane, PhD student, Robotics) “Risk - Variant Policy Switching to Exceed Reward Thresholds,” B. Kane and R. Simmons, In Proceedings International Conference on Automated Planning and Scheduling , Sao Paulo, Brazil, June 2012.

  6. Acting to Exceed Reward Thresholds • In Competitive Domains, Second is no Better than Last – “The person that said winning isn’t everything, never won anything” (Mia Hamm) – “If you’re not first, you’re last!” (Ricky Bobby, Talladega Nights) – Arcade games: Not just beating a level, going for top score

  7. Straightforward Approach • Add time and cumulative reward to state space • Generate optimal plan and execute • Significantly increases state space – Planning probably infeasible for real-world domains

  8. Our Approach Offline • Generate different policies : policies of varying risk attitude • Estimate the reward distributions Online • Switch between policies : Calculate the maximum probability of being over a threshold at each time step based on the current cumulative (discounted) reward

  9. Distribution of Rewards • Our work reasons about the complete , non-parametric reward distribution, including the distribution tails – Estimate reward distribution by running policy and gathering statistics    P V ( s ) x

  10. Switching Decision Criterion      maximize P R s thresh 0       t 1   thresh R s    0 minarg P V s     t t   

  11. Pizza Delivery Domain “30 Minutes or It’s Free” Risk-Neutral Risk = 1.2

  12. Pizza Domain Results • Execute 10,000 runs in original MDP • Same start state every time Risk-neutral vs switching (with risky policy d =1.2) • Threshold = -100; Threshold = -70 Fails to Exceed the Threshold Fails to Exceed the Threshold Risk Neutral Fails: 3120 Risk Neutral Fails: 8026 Switching Fails: 2166 Switching Fails: 5790 Fails 9.5% less Fails 22.4% less using switching strategy; using switching strategy; Reduces losses by 30.6% Reduces losses by 27.9%

  13. Augmented State Approach • Augment state space with cumulative reward – Integer-valued, no discounting – Reward capped [0, -150] – Action rewards based on location and current cumulative reward – State space increases by two orders of magnitude

  14. Comparison of Approaches • Execute 10,000 runs in original MDP • Same start state every time • No discounting Augmented space vs switching (with risky policy d =1.2) • Fails to Exceed the Threshold (-70) Risk Neutral Fails: 7946 Switching Fails: 6945 Augmented State Fails: 6256 Augmented state fails 16.9% less Augmented state fails 6.8% less than risk-neutral, original-space; than switching strategy; Reduces losses by 21.2% Reduces losses by 9.9%

  15. Comparison of Approaches Augmented State Risk-Variant Switching Planning (Offline) Time Solve policy: 18 hours Solve policy: < 1min Generate reward distr: 5-10 min Construct CDF: 1 min Total: 18 hours Total: 12 min * 2 policies = 24 min Execution (Online) Time 0.015s Eval Switch : 0.02s + Augmented state approach performs close to optimal - Very large planning time - Must re-generate policy when threshold changes - State space is enormous if discounting is needed

  16. Robust Execution Monitoring (Juan Pablo Mendoza, PhD student, Robotics) “Motion Interference Detection in Mobile Robots,” J.P. Mendoza, M. Veloso and R. Simmons, In Proceedings of International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, October 2012 “Mobile Robot Fault Detection based on Redundant Information Statistics,” J.P. Mendoza, M. Veloso and R. Simmons, In IROS Workshop on Safety in Human-Robot Coexistence and Interaction , Vilamoura, Portugal, October 2012

  17. Learning to Detect Motion Interference • Learn HMM from Robot Data – Includes nominal and Motion Interference ( MI ) states – Hand-labeled training data – Learn transition probabilities to nominal states – Learn observation probabilities of all states – Transition probability to MI state is tunable parameter Accel Stop Constant MI Decel

  18. Learning Behavior Model • Observations – Commanded Velocity – Velocity Difference : difference between commanded and perceived velocity, from encoders – Acceleration : Linear regression of last N measures of velocity – Jerk : Linear regression of last M measures of acceleration

  19. Example Runs

  20. Overall Performance • Precision/recall as MI transition probability varies • With precision = 1 and recall = 0.93, median detection time was 0.36s (mean = 0.647)

  21. Detecting Unexpected Anomalies • Basic Idea – Model nominal behavior – Detect significant deviation from nominal – Determine extent of anomaly • Make execution monitoring efficient , effective , and informative

  22. Modeling Nominal Behavior • Define a residual function that is (close to) zero during nominal behavior – For instance, velocity difference or difference between estimates of redundant sensors – Future work: Make residual function dependent on current state (e.g., using HMM)

  23. Detecting Deviation from Nominal • Compute sample mean of residual function from observed data    2    F r ~ 0 ,     N • Compute probability that sample mean is not   within epsilon of zero    ' a ( D ) 2 z 1 f r   F  r ' z  f r N

  24. Estimate Extent of Anomaly • Define region around current state – Currently: grid state space – Future: maintain continuous state space • Extend region in direction that increase a(d(R)) the most – Currently: extend to form axis-aligned hyper-rectangles – Future: general convex shaped regions • Stop when found locally maximal anomaly region – Current: Continue while a(d(R)) is non-decreasing – Future: Skip over “gaps”

  25. Example Runs Pull from Stop Collision Push from Stop

  26. Discovering a Global Anomaly • Region grows and becomes more certain as robot travels Ideally, keep upper and lower bounds on anomaly region

  27. Interaction with Other Groups and Organizations • Received Infinite Mario software and support from John Laird ’s group at University of Michigan; interaction with student Shiwali Mohan • Interacted with Sven Koenig (USC/NSF) and former student Yaxin Liu regarding generation of risk-sensitive policies • Interaction with Manuela Veloso (CMU CSD/RI) – co- advising Juan Pablo Mendoza

  28. List of Publications Attributed to the Grant • “Risk - Variant Policy Switching to Exceed Reward Thresholds,” B. Kane and R. Simmons, ICAPS, Sao Paulo Brazil, June 2012 • “Graph -based Trajectory Planning through Programming by Demonstration,” Nik Melchior, PhD Thesis, CMU RI -TR-11-40, August 2012 • “Motion Interference Detection in Mobile Robots,” J.P. Mendoza, M. Veloso and R. Simmons, In Proceedings of International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, October 2012 • “Mobile Robot Fault Detection based on Redundant Information Statistics,” J.P. Mendoza, M. Veloso and R. Simmons, In IROS Workshop on Safety in Human-Robot Coexistence and Interaction , Vilamoura, Portugal, October 2012

Recommend


More recommend