evolving societies of learning autonomous systems eslas
play

Evolving Societies of Learning Autonomous Systems (ESLAS) Franz - PowerPoint PPT Presentation

Organic Computing Status Colloquium / Sept 2010 Evolving Societies of Learning Autonomous Systems (ESLAS) Franz Rammig, Bernd Kleinjohann, Alexander Jungmann University of Paderborn The ESLAS project phase III Goal: Organic coordination


  1. Organic Computing Status Colloquium / Sept 2010 Evolving Societies of Learning Autonomous Systems (ESLAS) Franz Rammig, Bernd Kleinjohann, Alexander Jungmann University of Paderborn

  2. The ESLAS project phase III • Goal: Organic coordination and cooperation controller observer • controller observer How to coordinate multiple (contradicting?) goals of one robot? LTM BC long term behavior memory construction EM → dynamic goal priorization based on Multi-SMDPs episode memory dependent on motivation system EXPL exploration • How to enable non-obtrusive cooperation? DEC EV decision evaluation → use behavior -recognition (imitation) to model teammates output input ACT → use those models to “meta - learn” team strategies action capabilities (cartesian product of the state spaces) dependent on vicinity controller observer controller observer LTM BC long term behavior memory construction EM controller observer controller observer episode memory LTM BC EXPL exploration long term behavior memory construction EM episode memory controller observer DEC EV controller observer decision evaluation EXPL controller observer controller observer exploration LTM BC long term behavior LTM BC memory construction EM DEC EV episode long term behavior input output memory memory construction EM ACT decision evaluation action capabilities episode EXPL memory exploration EXPL exploration input output ACT DEC EV action capabilities decision evaluation EV DEC decision evaluation output input ACT action capabilities output input ACT action capabilities 2 September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

  3. Part I: Organic goal coordination • Up to now: controller observer LTM BC long term behavior memory construction EM episode memory EXPL exploration DEC EV decision evaluation output input ACT action capabilities 3 September 2010 DFG 1183 ORGANIC COMPUTING

  4. Organic goal coordination • Recap: – Intrinsic high-level state of the robot • Robot‟s goals defined by means of drives • Dependent on perception and time-dependent functions • Threshold defines state of “well - being” or satisfaction – Drive examples • Battery level • Collect items • Transport items to base – Generation of motivation • Vector to “well - being region” • Dynamic drive state → dynamic motivation • ESLAS Phase II: Pursue the goal most in need – i.e. greedy goal selection – Problem: Does not pay attention to the dynamics of the drive state 4 September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

  5. Organic goal coordination • ESLAS Phase III – While pursuing one goal, try to fulfill other goals as well that are „on your way“ – Ex: If battery level is low, but the robot can transport an object to its base while driving to the battery fuel station „with minor detour“, then it shall do it • Goal coordination (COORD) controller observer – Keeps track of all the state spaces for the different drives„ strategies COORD BC goal behavior coordination construction EM episode – Maintains a priorization of those memory SMDP SMDP SMDP dependent on their motivation LTM long term memory – Switches between the different strategies at runtime EV DEC EXPL decision exploration evaluation output input ACT action capabilities 5 September 2010 DFG 1183 ORGANIC COMPUTING

  6. Challenge 1: When is a detour worthwhile? • Heuristic: A detour is the more worthwhile – the shorter → by means of state values – the more beneficial → by means of RL reward it is. • BUT: state value as a means of „detour length“ is problematic –ESLAS splits and merges „raw states“ to abstract states in order to keep RL tractable – Different strategies will have different state space abstractions, which leads us to Challenge 2 6 September 2010 DFG 1183 ORGANIC COMPUTING

  7. Challenge 2: Different states in different strategies • How to intelligently handle state changes in different strategies? Drive a m a SMDP a SMDP b m b Drive b m c SMDP c Drive c A 78 65 78 65 65 67 81 67 65 81 67 B 81 81 7 September 2010 DFG 1183 ORGANIC COMPUTING

  8. Solving the challenges by three approaches • Mv: Priority weighted state value • EG: Expected Gain • EG-PS: Expected Gain – Primary Scondary 8 September 2010 DFG 1183 ORGANIC COMPUTING

  9. Mv: Priority weighted state value M V (i) = m i v i • chooses drives that are urgent and easy to satisfy Refined state value Priority (higher= lower timely effort) Drive a 100 • Example: 90 81 53 59 66 A m a = 3,4 90 81 73 59 66 73 81 73 66 66 73 81 Drive b 73 66 59 73 81 90 100 m b = 2,6 66 59 53 81 90 p 1 p 1 B m c = 2,6 SMDP a SMDP b Drive c = m a v a = 3,4 • 59 ≈ 200 M V ( a ) < = 2,6 • 90 ≈ 234 M V ( b ) = m b v b 9 September 2010 DFG 1183 ORGANIC COMPUTING

  10. EG: Expected Gain 10 10 September 2010 DFG 1183 ORGANIC COMPUTING

  11. EG-PS: Expected Gain – Primary Scondary • Similar to EG, but considers only sequences of two drives: primary and secondary • State assignment by EG • Considers mainly the primary drive (similar to greedy), but also allows for a detour of another one (secondary) 11 11 September 2010 DFG 1183 ORGANIC COMPUTING

  12. Evaluation example scenario 1 Ø Σ Degree of „ unsatisfaction “ 200 G REEDY 39% 39% 39% 27% 150 M V E G E G - PS F IXED 100 trial . 1 2 3 4 5 6 7 8 9 10 M V E G E G - PS F IXED 12 12 September 2010 DFG 1183 ORGANIC COMPUTING

  13. Evaluation example scenario 2 Ø Σ Degree of „ unsatisfaction “ 14 G REEDY 13 F IXED 12 22% 22% 22% M V 11 E G 11% E G - PS 10 trial . 1 2 3 4 5 6 7 8 9 10 M V E G E G - PS F IXED 13 13 September 2010 DFG 1183 ORGANIC COMPUTING

  14. Organic coordination: Results • Organic coordination solves the problem of efficiently selecting a robot‟s actions based on SMDP in the presence of dynamically prioritized goals. • Mv weights the current priority (drive) of a goal with the value of the active state in the corresponding SMDP – Solely doing so is suboptima: does not handle consequences that emerge when reaching a goal state of a selected • Refined “EG” considers all possible sequences of the goals. – higher precision, but slower runtime • Approximation EG-PS achieves EG accuracy but with acceptable runtime 14 14 September 2010 DFG 1183 ORGANIC COMPUTING

  15. Part II: Organic cooperation (ongoing work) • Cooperation methods – Communication • Robots exchange action plans prior to execution • Locally choose joint action with highest expected utility – Social laws MRS Taxonomy (Farinelli et al, 2004) • Conventions specified at design time • Restrict robots‟ decisions in coordinated actions (stigmergy) – Learning methods • Deriving knowledge from the Here, we can experience of repeated interactions do better! • Typically need to “look into the teammates‟ brains” 15 15 September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

  16. Organic cooperation • What we already can do: „Understanding“ of other robot„s behavior in terms of our own capabilities • What we need: 1. Applying this continuously to all robots all the time to build models 2. Using these models to combine them on-demand with our own strategy 16 16 September 2010 DFG 1183 ORGANIC COMPUTING

  17. Organic cooperation • How to model the behavior of teammates? –Assumption “my teammate is similar to me”  Model deviations of expected behavior –Assumption “my teammate is different to me (knows nothing)”  Model the complete behavior –Important, not because of model size, but of “approximation speed” • Mixing individual and teammate strategy • If a teammate is exploring, it should not be modeled – Determine whether teammate is in exploration/exploitation mode dependent on its “strategy variance” 17 17 September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

  18. Organic cooperation – the algorithm 1/2 18 18 September 2010 DFG 1183 ORGANIC COMPUTING

  19. Organic cooperation – the algorithm 2/2 19 19 September 2010 DFG 1183 ORGANIC COMPUTING

  20. Organic cooperation • Non-obtrusive coordination • Benefits – Robustness No predefined communication needed – Adaptability Change of goal/behavior will be reflected in a change of the teammate model • Approach – Use behavior recognition from imitation to model teammates (possible with minor algorithm changes) – Same assumption: shared goals 20 20 September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

  21. Real world evaluation Live video stream stitched from 8 GigE video cameras Robot„s local video (live) Drag to move robot 21 21 September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

  22. Conclusion • ESLAS Phase III moves on to support robustness in addition to the learning speed of robot groups – Organic coordination handles intelligently multiple (possibly contradicting) goals – Organic cooperation enables robots to cooperate with each other (even if the teammate was not designed to do so) 22 22 September 2010 DFG 1183 ORGANIC COMPUTING

Recommend


More recommend