sample mple opt optimal imal pa para rametric metric q le
play

Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le - PowerPoint PPT Presentation

Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le Learning arning Usi Using ng Li Line nearly arly Ad Additive ditive Fea eatur tures es Lin in F. Yan ang, , Meng ngdi di Wan ang A Basic RL Model: Markov Decision


  1. Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le Learning arning Usi Using ng Li Line nearly arly Ad Additive ditive Fea eatur tures es Lin in F. Yan ang, , Meng ngdi di Wan ang

  2. A Basic RL Model: Markov Decision Process • States: ; Actions: • Reward: • State transition: • Policy: random Effective Horizon: • Optimal policy & value: • -optimal policy :

  3. Curse of Dimensionality • Optimal sample complexity: |S| = 3 361 |S| ≥ 256 256×240 Too many states for How to optimally reduce dimensions? most cases … Exploiting structures!

  4. Parametric Q-Learning On Feature-Based MDP • Transition is decomposable 𝑄 ∈ ℝ 𝑇×𝐵 ×𝑇 Φ Ψ Known Unknown

  5. Parametric Q-Learning On Feature-Based MDP • Transition is decomposable

  6. Parametric Q-Learning On Feature-Based MDP 0.2 0.11 0.3 0.5 0.01

  7. A Simple Regression Based Algorithm • Generative Model: we are able to samples from any ( s,a ) Represent Q-function with parameter 𝑥 ∈ ℝ 𝐿 : 𝑅 𝑥 ≔ 𝑠 𝑡, 𝑏 + 𝛿𝜚 𝑡, 𝑏 ⊤ 𝑥 𝑊 𝑥 𝑡 ≔ max 𝑏∈𝐵 𝑅 𝑥 (𝑡, 𝑏) 𝜌 𝑥 𝑡 ≔ argmax 𝑏∈𝐵 𝑅 𝑥 (𝑡, 𝑏) • Learn 𝑥 with modified Q-learning Sample complexity ( 𝐿 : feature dimension): 𝐿 ෨ 𝑃 𝜗 2 1 − 𝛿 7

  8. Sample Optimality? 𝑄 ⋅ |𝑡 1 , 𝑏 1 • Anchor condition: 𝑄 ⋅ |𝑡 2 , 𝑏 2 𝑄 ⋅ |𝑡, 𝑏 𝑄 ⋅ |𝑡 6 , 𝑏 6 𝑄 ⋅ |𝑡 3 , 𝑏 3 Sample complexity: 𝑄 ⋅ |𝑡 4 , 𝑏 4 𝑄 ⋅ |𝑡 5 , 𝑏 5 𝐿 ෩ Θ 𝜗 2 1 − 𝛿 3 ArXiv: 1902.04779. Poster: 117

Recommend


More recommend