e xtending s emantic and e pisodic m emory to s upport r
play

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D - PowerPoint PPT Presentation

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING (FA2386-10-1-4127) PI: John E. Laird (University of Michigan) Graduate Students: Nate Derbinsky, Mitchell Bloch, Mazin Assanie AFOSR Program Review: Mathematical


  1. E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING (FA2386-10-1-4127) PI: John E. Laird (University of Michigan) Graduate Students: Nate Derbinsky, Mitchell Bloch, Mazin Assanie AFOSR Program Review: Mathematical and Computational Cognition Program Computational and Machine Intelligence Program Robust Decision Making in Human-System Interface Program (Jan 28 – Feb 1, 2013, Washington, DC)

  2. E XTENDING S EMANTIC AND E PISODIC M EMORY (J OHN L AIRD ) Technical Approach: Objective: Develop algorithms that support effective, 1. Analyze multiple tasks and domains to general, and scalable long-term memory: determine exploitable regularities. 1. Effective: retrieves useful knowledge 2. Develop algorithms that exploit those 2. General: effective across a variety of tasks regularities. 3. Scalable: supports large amounts of 3. Embed within a general cognitive knowledge and long agent lifetimes: architecture. manageable growth in memory and 4. Perform formal analyses and empirical computational requirements evaluations across multiple domains. Budget: DoD Benefit: FY11 FY12 FY13 $99 $195 $205 Develop science and technology to support: Actual/ • Intelligent knowledge-rich autonomous Planned $K $158 $165 $176 systems that have long-term existence, Annual Progress such as autonomous vehicles (ONR, Y Y N Report Submitted? DARPA: ACTUV). • Large-scale, long-term cognitive models Project End Date: June 29, 2013 (AFRL) 2

  3. L IST OF P ROJECT G OALS 1. Episodic memory (experiential & contextualized) – Expand functionality – Improve efficiency of storage (memory) and retrieval (time) 2. Semantic memory (context independent) – Enhance retrieval – Automatic generalization 3. Cognitive capabilities that leverage episodic and semantic memory functionality – Reusing prior experience, noticing familiar situations, … 4. Evaluate on real world domains 5. Extended Goal – Competence-preserving selective retention across multiple memories 3

  4. P ROGRESS T OWARDS G OALS 1. Episodic memory – Expand functionality (recognition) • [AAAI 2012b] – Improve efficiency of storage (memory) and retrieval (time) • Exploits temporal contiguity, structural regularity, high cue structural selectivity, high temporal selectivity, low cue feature co-occurrence • For many different cues and many different tasks, no significant slowdown with experience: runs for days of real time (tens of millions of episodes), faster than real time. • [ICCBR 2009; BRIMS 2011; AAMAS 2012] 2. Semantic memory – Enhance retrieval • Evaluated multiple bias functions: conclude base-level (exponential) activation works best • Developed efficient approximate algorithm that maintains high (>90%) validity – 30-100x fast as prior retrieval algorithms (non base-level activation) (for 3x larger data set) – sub linear slowdown as memory size increases • Exploits small node outdegree, high selectivity, not low co-occurrence of cue features. • [ICCM 2010; AISB 2011; AAAI 2011] • Current research: how to use context – collaboration with Braden Phillips, University of Adelaide on special purpose hardware to support spreading activation in semantic memory – Automatic generalization • Current research: Leverage data maintained for episodic memory 4

  5. P ROGRESS T OWARDS G OALS 3. Cognitive capabilities that leverage episodic and semantic memory functionality – Episodic memory • Seven distinct capabilities: recognition, prospective memory, virtual sensing, action modeling, … • [BRIMS 2011; ACS 2011b; AAAI 2012a] – Semantic memory • Support reconstruction of forgotten working memory • [ACS 2011; ICCM 2012a] 4. Evaluate on real world domains – Episodic memory • Multiple domains including mobile robotics, games, planning problems, linguistics • [BRIMS 2011; AAAI 2012a] – Semantic memory • Word sense disambiguation, mobile robotics • [ICCM 2010; BRIMS 2011; AAAI 2011] 5. Competence preserving retention/forgetting – Working memory • Automatic management of working memory to improve the scalability of episodic memory, utilizing semantic memory • [ACS 2011; ICCM 2012b; Cog Sys 2013]] – Procedural memory • Automatic management or procedural memory using same algorithms as in working-memory management • [ICCM 2012b; Cog Sys 2013] 5

  6. N EW G OALS • Dynamic determination of value-functions for reinforcement learning to support robust decision making. – [ACS 2012; AAAI submitted] 6

  7. O VERVIEW • Goal: – Online learning and decision making in novel domains with very large state spaces. – No a priori knowledge of which features are most important • Approach: – Reinforcement learning with adaptive value function determination using hierarchical tile coding – Only online, incremental methods need apply! • Hypothesis: – Will lead to more robust decision making and learning over small changes to environment and task 7

  8. R EINFORCEMENT L EARNING FOR A CTION S ELECTION • Choose action based on the expected (Q) value stored in a value function – Value function maps from situation-action to expected value. • Value function updated based on reward received and expected future reward (Q Learning: off policy) Value Function (s i , a j ) → q ij Reward a 1 a 3 State: S 1 (s 2 , a 4 ) a 2 a 4 Perception & State: S 2 Internal Structures a 3 a 5 Page 8

  9. V ALUE - FUNCTION FOR L ARGE S TATE S PACES • (s i , a j ) → q ij • s i = (f 1 , f 2 , f 3 , f 4 , f 5 , f 6 , … f n ) • Usually only a subset of features are relevant • If include irrelevant features, slow learning • If don’t include relevant features, suboptimal asymptotic performance • How get the best of both? • First step: hierarchical tile coding (Sutton & Barto, 1998) • Initial results for propositional representations in Puddle World and Mountain Car 9

  10. P UDDLE W ORLD 10

  11. P UDDLE W ORLD 2x2 4x4 8x8 Q-value for (s i , a j ) = ∑ (s it , a j ) [as opposed average] More abstract tilings (2x2) gets more updates, which form the baseline for subtilings Update is distributed across all tiles that contribute to Q-value • Explored variety of distributions: 1/sqrt (updates), even, 1/updates, … 11

  12. Puddle World: Single Level Tilings 0 -10000 -20000 4x4 Cumulative Reward/Episodes -30000 8x8 16x16 -40000 32x32 -50000 64x64 -60000 -70000 -80000 -90000 -100000 0 50 100 150 200 Actions (thousands) 12

  13. Puddle World: Single Level Tilings Expanded 0 -1000 -2000 Cumulative Reward/Episodes 4x4 8x8 -3000 16x16 -4000 -5000 -6000 -7000 0 10 20 30 40 50 Actions (thousands) 13

  14. Puddle World: Includes Static Hierarchical Tiling 1-64 0 -1000 -2000 Cumulative Reward/Episodes 4x4 8x8 -3000 16x16 1-64 static -4000 -5000 -6000 -7000 0 10 20 30 40 50 Actins (thousands) 14

  15. M OUNTAIN C AR 15

  16. Mountain Car: Static Tilings 0 -1000 -2000 Cumulative Reward/Episodes 16x16 -3000 32x32 64x64 -4000 128x128 256x256 -5000 -6000 -7000 0 100 200 300 400 500 600 700 800 900 1000 Actions (thousands) 16

  17. Mountain Car: Static Tilings Expanded 0 -1000 -2000 Cumulative Reward/Episodes 16x16 -3000 32x32 64x64 -4000 128x128 256x256 -5000 -6000 -7000 0 10 20 30 40 50 60 70 80 90 100 Actions (thousands) 17

  18. Mountain Car: Includes Static Hierarchical Tiling 0 -1000 -2000 Cumulative Reward/Episodes 16x16 -3000 32x32 64x64 -4000 128x128 256x256 -5000 1-256 static -6000 -7000 0 10 20 30 40 50 60 70 80 90 100 Actions (thousands) 18

  19. W HY DOES HIERARCHICAL TILING WORK ? • Abstract Q values serve as starting point for learning more specific Q values so they require less learning • Exploits a locality assumption – – There is continuity in the mapping from feature space to Q values at multiple levels of refinement 19

  20. F OR LARGE STATE SPACES , HOW AVOID HUGE MEMORY COSTS ? • Hypothesis: non uniform tiling is sufficient • How do this incrementally and online? • Split a tile if mean Cumulative Absolute Bellman Error (CABE) is half a standard deviation above the mean – CABE is stored proportionally to the credit assignment and the learning rate. – The mean and standard deviations for CABE are tracked 100% incrementally at low computational cost • Incremental and online algorithm 20

  21. 1x1 P UDDLE W ORLD 2x2 4x4 8x8 2x2 4x4 8x8 21

  22. A NALYSIS AND E XPECTED R ESULTS • Might lose performance because takes time to “grow” the tiling. • Might gain performance because not wasting updates on useless details. • Expect many fewer “active” Q values 22

  23. Puddle World: Static Hierarchical Tiling Reward and Memory Usage 0 60000 Reward: 1-64 static -200 Memory: 1-64 static 50000 -400 Cumulative Reward/Episodes -600 40000 -800 Q values -1000 30000 -1200 20000 -1400 -1600 10000 -1800 -2000 0 0 2 4 6 8 10 12 14 16 18 20 Actions (thousands) 23

  24. Puddle World: Static and Dynamic Hierarchical Tiling Reward and Memory Usage 0 60000 Reward 1-64 static -200 Reward 1-64 dynamic 50000 -400 Memory: 1-64 static Memory: 1-64 dynamic Cumulative Reward/Episodes -600 40000 -800 Q values -1000 30000 -1200 20000 -1400 -1600 10000 -1800 -2000 0 0 2 4 6 8 10 12 14 16 18 20 Actions (thousands) 24

Recommend


More recommend