curiosity bottleneck exploration by distilling task
play

Curiosity-Bottleneck: Exploration by Distilling Task-Specific - PowerPoint PPT Presentation

Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty Youngjin Kim 1 4 , Wontae Nam 3 , Hyunwoo Kim 1 Jihoon Kim 2 and Gunhee Kim 1 2 1 3 4 Code available at: http://vision.snu.ac.kr/projects/cb Motivation: Exploration under


  1. Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty Youngjin Kim 1 4 , Wontae Nam 3 , Hyunwoo Kim 1 Jihoon Kim 2 and Gunhee Kim 1 2 1 3 4 Code available at: http://vision.snu.ac.kr/projects/cb

  2. Motivation: Exploration under Distraction (a) Known Place (b) Known Place and Strangers Navigating City 1. Distractive Environments are Widespread Real-world observations often contain § novel but task-irrelevant information.

  3. Motivation: Exploration under Distraction Not Novel Novel (a) Known Place (b) Known Place and Strangers Navigating City 2. Degeneration of Prior Novelty-Based Exploration Strategies Due to task-agnostic intrinsic reward § Need mechanisms to prioritize task-related novelty §

  4. Approach: Curiosity-Bottleneck % $ Intrinsic Reward " Compressor Value Predictor & # $ ! " ' " External Reward E E " Environment Policy Environment Quantify the ‘Degree of Compression’ using a compressive value network

  5. Approach: Curiosity-Bottleneck % $ Intrinsic Reward " Value Predictor Compressor & # $ ! " ' " External Reward E E " Environment Policy Environment Compressor Encode rare ! to a lengthy code and common ! to a shorter code § Discard information about ! during compression §

  6. Approach: Curiosity-Bottleneck % $ Intrinsic Reward " Value Predictor Compressor & # $ ! " ' " External Reward E E " Environment Policy Environment Value Predictor Prevent Compressor from discarding task-related information §

  7. Approach: Curiosity-Bottleneck 1. Objective Function Minimize average code-length of representation ! § Discard information about observation " § #+, -(!) − - ! " Preserve information related to value estimate ) § #$% &(!; )) / = −& !; ) + 2& "; ! 2. Intrinsic Reward: Per-instance Mutual Information 7 8 % log 7 %, 8 3 4 (%) = 5 7 % 7(8) =8 6

  8. Approach: Curiosity-Bottleneck : ! ",$ 9 = + −log. $ (/ = |0 = ) 3![4 " (5|6 = )||.(5)] Value Predictor Compressor 6 = 0 = ∼ 4 " (5|6 = ) ? $ , @ $ ? " , @ " 3. Approximation Variational Information Bottleneck with Gaussian assumptions ! ",$ = & ',( [− log . $ / 0 + 23![4 " 5 6 | . 5 ] 9 : (6) = 3![4 " 5 6 ||. 5 ]

  9. Experiments: Static Environment Detects novelty ! " ( ) while being robust to distraction ! # ( ) Random Box 0.1 ! " 0.9 0.1 Object ! " 0.9 0.1 Pixel Noise ! " 0.9 ! # ! # ! # ! # ! # 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 (a) Input (b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash

  10. Experiments: Static Environment Detects novelty ! " ( ) while being robust to distraction ! # ( ) Random Box 0.1 ! " 0.9 0.1 Object ! " 0.9 0.1 Pixel Noise ! " 0.9 ! # ! # ! # ! # ! # 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 (a) Input (b) Ideal (c) CB (d) CB-noKL (e) RND (f) SimHash

  11. Experiments: Treasure-Hunt Grad-Cam Visualization The adaptive exploration strategy (a) Input (c) CB (e) RND (f) Dynamics (g) SimHash (b) CB-Early (d) CB-noKL !"[$ % & ' ||) & ] Compression loss term induces task-agnostic exploration in early stages

  12. Experiments: Treasure-Hunt Grad-Cam Visualization The adaptive exploration strategy (a) Input (c) CB (e) RND (f) Dynamics (g) SimHash (b) CB-Early (d) CB-noKL − "#$ % & ' ( Value prediction loss term induces task-specific exploration after collecting external rewards

  13. Experiments: Treasure-Hunt Consistently outperform baselines on different distraction settings SimHash Dynamics CB-noKL RND CB Mean Episodic Reward 1e6 1e6 (a) Movement Condition (b) Location Condition

  14. Experiments: Atari Hard-Exploration Games SimHash Dynamics CB-noKL RND CB With Distraction W.o. Distraction Gravitar Montezuma Solaris

  15. Curiosity-Bottleneck : Exploration by Distilling Task-Specific Novelty Thank You! Poster @ Pacific Ballroom #48 Code Available at http://vision.snu.ac.kr/projects/cb

Recommend


More recommend