Experiments in Value Function Approximation with Sparse Support Vector Regression Tobias Jung and Thomas Uthmann { tjung,uthmann } @informatik.uni-mainz.de Fachbereich Mathematik & Informatik Johannes Gutenberg-Universit¨ at Mainz, Germany Value Function Approximation with Sparse SVR – ECML 2004 – p. 1/17
✓ ✻ ✟ ✆ ✟ ✁ ✓ ☎ ✶ � ✶ ✹✺ ☞ ✰ ✭ ✓ ✲ ✖ ✪ ✚ ✜ ✕ ✕ ✠ ✣ ✓ ✫ ✣ ✢ ✧ ★ ✗ ✑ ✕ ✫ ✏ ☞ ✔ ✚ ✏ ✎ ☞ ✡ ✍ ✆ ✍ ✓ ✪ ✓ ✖ ✔ ✓ ✜ ✩ ✔ ✚ ✢ ✕ ✗ ★ ✚ ✗ ✒ � ✔ ✓ ✭ ✒ ✓ ✕ ✏ ✣ ✓ ✢ ✖ ❉ ✢ ✏ ✔ ✗ ✕ ✖ ✔ ★ ✫ ✗ ✏ ✚ ★ ✚ ✕ ✪ ✓ ✒ ✗ ★ ✣ ✓ ✕ ✓ ✒ ✒ ✑ ✏ ✓ ✜ ✕ ✚ ✖ ✕ ✓ ★ ✫ ✣ ✗ ✖ ✩ ✓ ✔ ✔ ✕ ✪ ✔ ✲ ✤ ✛ ✢ ✪ ✚ ✣ ✓ ✏ ✓ ✒ ✑ ❈ ✓ ✖ ✭ ✔ ✢ ✔ ✢ ✗ ✪ ✗ ✓ ✑ ❁ ☞ ✗ ✿ ✾ ✽ ✼ ✖ ✓ ✖ ✟ ✖ ✢ ✔ ✚ ✢ ✕ ✗ ✕ ✔ ☞ ❂ ✚ ✔ ✕ ✖ ✪ ✓ ✓ ✔ ✹ ✻ ✓ ✢ ✪ ★ ✘ ✔ ✚ ❆ ✡ ❆❇ ☞ ✡ ❅ ✕ ✥ ✫ ✘ ★ ✥ ✚ ✒ ✫ ★✪ ✒ ✚ ✩ ★ ✣ ✗ ✓ ✒ ✧ ✦ ✒ ✗ ✣ ✜ ✓ ✖ ✔ ✥ ✔ ✓ ✜ ✩ ✣ ✢ ✪ ✬ ✤ ✪ ★ ✓ ✔ ✓ ✪ ✒ ✑ ✥ ✖ ✖ ✓ ✏ ✓ ✚ ✌ ✑ ✏ ✎ ☞ ✡ ✍ ✆ ✍ ✌ ✝ ✒ ☞ ✔ ✟ ✆ ✟ ✁ ✓ ☎ ✖ ✒ ✓ ✥ ✭ ✤ ✔ ✗ ✣ ✔ ✢ ✕ ✒ ✗ ✜ ✔ ✕ ✘ ✛ ✚ ✚ ✕ ✗ ✕ ✖ ✕ ✏ ✣ ✶ ✕ ✳✴ ✚ ✔ ✲ ✖ ✦ ✒ ✚ ✩ ✓ ✚ ✱ ★ ✗ ✒ ✑ ✓ ✱ ✚ ✕ ✛ ✒ ✓ ✚ ✗ ✣ ✢ ✔ ✢ ✣ ★ ✗ ✏ ★ ✭ ✚ ✔ ✮ ✳✵ ✭ ✔ ✢ ✕ ✕ ✓ ✪ ✒ ✫ ✪ ✢ ✪ ✚ ✏ ✓ ★ ✢ ✕ ✢ ✢ ✭ ✒ ✔ ✕ ✢ ✩ ✪ ✓ ✒ ✗ ✔ ✮ ✗ ★✪ ✫ ✣ ✚ ✏ ✔ ✓ ✜ ✩ ✤ ✓ ✓ ✢ ✩ ✔ ✑ ✖ ✖ ✓ ★ ✕ ✕ ✑ Value Function Approximation with Sparse SVR – ECML 2004 – p. 2/17 ✓✯✰ ✏✯✰ ✥✞✗ ✓✙✘ ✁❄❃ ✖✯✮ ✜✄✭ ✆✸❀ ✓✙✘ ✢✸✷ ✓✙✘ ✠☛✡ ✠☛✡ Why SVR? ✆✞✝ ✆✞✝ ✁✄✂ ✁✄✂
✒ ✏ ✛ ✚ ✔ ✚ ✢ ✕ ✓ ✑ ★ ✓ ✑ ✓ ✔ ✢ ★ ✑ ✥ ✔ ✔ ✔ ✮ ★ ✓ ✭ ✔ ✓ ✚ ✖ ✪ ✓ ✖ ❇ ✲ ✕ ✓ ✘ ✒ ✔ ☞ ✣ ✢ ❉ ✚ ✒ ✫ ✫ ✓ ✕ ✖ � ✗ ✫ ✑ ✮ ✏ ✗ ✢ ✮ ✪ ✣ ✓ ★ ✥ ✚ ✒ ✏ ✓ ✚ ✏ ✑ ✪ ✓ ✻ ✮ ✔ ✗ ✔ ✓ ✙ ☞ ☎ ☎ ✁ ✽ ✰ ✒ ✂ ✗ ✘ ✔ ✢ ✗ ✕ ✔ ❆ ☞ ✚ ✍ ☞ ✠ ✡ ✁ ✛ ☞❀ ✡ ✚ ❇ ✡ ❆ ✁ ✟ ✁ ✄ ✍ ✑ ✔ ✚ ✲ ✕ ✶ ✶ ✍ ✌ ✌ ✍ ✒ ✖ ✢ ✓ ✔ ✪ ✔ ✗ ✒ ✰ ✞ ✮ ✗ ★✪ ✒ ✚ ✩ ✪ ✢ ✒ ❀ ✆ ✟ ❇ ✡ ☎ ✆ ❆ ✡ ✎ ✕ ❅ ✔ ✣ ✓ ✏ ✒ ✚ ✛ ✢ ✔ ✓ ✻ ❇ ✌ ✆ ✟ ☞ ✓ ✕ ✆ ✟ ★ ✗ ✒ ✚ ✫ ✣ ✓ ✮ ✹ ✭ ✔ ✢ ✔ ✒ ✗ ✓ ☎ ✞ ✠ ❀ ✁ ✌ ✠ ☞ ❅ � ☞ ✟ ✆ ✎ ✁ ✟ ❀ ✆ ✟ ☞ ✁ ✟ ✂ ✌ ❂ ❆ ✆ ✜ ❇ ✌ ✆ ✟ ❇ ✰ ✁ ✄ ✡ ✁ ❅ ☞ ✾ ✘ ✢ ✗ ✆ ✡ ✾ ✟ ❆ ✌ ✆ ✁ ✟ ✽ ✰ ✍ ✶ ✌ ✲ ✠ ❂ ✌ ✪ ✌ ★ ✑ ✣ ✒ ✚ ☛ ❇ ✆ ❆ ❀ ❀ ✡ ❆ ✝ ✡ ✿ ✟ ✔ ✡ ✗ ☛ ✮ ✭ ✔ ✢ ✔ ✒ ✓ ✔ ✹ ✓ ✏ ✔ ✓ ✒ ✓ ✑ ✏ ✗ ❉ ✔ ✚ ✢ ✕ ✗ ✣ ✢ ✚ ✕ ✒ ✫ ✫ ☞ ✔ ✚ ✢ ✆ Value Function Approximation with Sparse SVR – ECML 2004 – p. 3/17 ✥✞✗ Sparse regressor SVR A very small training set Reduce (states, values) A very big list of update ☎✝✆ add RL Contents ✆✞✝
✟ ✔ ✣ ✠ ✏ ☞ ✍ ✟ ✰ ✑ ✻ ✌ ✹ ★ ✓ ✗ ✒ ✔ ✓ ✒ ✠ ✓ ✚ ☎ ✒ ✓ ✕ ✭ ✔ ✚ ✝ ✆ ❂ ★ ✁ ✜ ✞ ✖ ✑ ✗ ★ ★ ✤ ✪ ✓ ✒ ✓ ✁ ✡ ✜ ✏ ✜ ✚ ✚ ✖ ✗ ✟ ✏ ✕ ✢ ✚ ✔ ✖ ✕ ✚ ✆ ❂ ✖ ✜ ✔ ✚ ✕ ✦ ✔ ✚ ✩ ✕ ✓ ✡ ✒ ✓ ✢ ☛ ❉ ✰ ✔ ✠ ✕ ✣ ✼ ✗ ✗ ❀ ✪ ✟ ❀ ✌ ☎ ✑ ✕ ✕ ✌ ✓ ✖ ✒ ☞ ✏ ✕ ✢ ✚ ❇ ❂ ✖ ❂ � ☞ ❆ � ✌ ✁ ✂ ✡ ✰ ❀ ✆ ✌ ❇ ✄ ❆ ✌ ❂ ✡ ❀ ✔ ✗ ✶ ✕ ✫ ✒ ✚ ✓ ✥ ✢ ★ ✢ ✢ ✚ ✓ ✖ ✲ ✔ ✗ ✒ ✦ ✚ ✆ ✔ ✢ ✻ ✪ ✓ ✩ ✗ ✒ ✪ ✖ ✣ ✚ ✓ ✕ ★ ✼ ✩ ✟ ✒ ✗ ✔ ✖ ✢ ✣ Value Function Approximation with Sparse SVR – ECML 2004 – p. 4/17 P a ( s, s ′ ) , R a ( s, s ′ ) s t +1 s t Environment t = 0 , 1 , 2 , . . . Agent r t ✟✡✠ a t P a ( s, s ′ ) Reinforcement Learning I ✢✸✷ ✆✸❀ A = { a 1 , . . . , a M } R a ( s, s ′ ) S = { s 1 , . . . , s N } ✟✎✍ ✥✞✗ ☛✡☞ ✟✡✠ ✆✸❀
✌ ✔ ✓ ✔ ✚ ✢ ✕ ✏ ✑ ✕ ✛ ✓ ✑ ★ ✗ ✆ ✮ � ✗ ✌ ✆ ✡ ✁ ❅ ✌ ❀ ✟ ☞ ❀ ✂ ☞ ✂ ✂ ❇ ★ ✣ ✜ ✢ ❅ ☞ � ✆ ✗ ✕ ✚ ✚ ✔ ✲ ✭ ✰ ✎ ✼ ✜ ✫ ✢ ✏ ✕ ✫ ✚ ✮ ✒ ✤ ✢ ✕ ★ ✚ ✫ ★ ✗ ✣ ✢ ✟ ✔ ✤ ✏ ✒ ✘ ✘ ✒ ✚ ✕ ☞ ✕ ✮ ✑ ✏ ✏ ✢ ✕ ✢ ✢ ✢ ✑ ✄ ✔ ✚ ✪ ✓ ✖ ✖ ✏ ✪ ✚ ✜ ✕ ✓ ✔ ✶ ✖ ✣ ✓ ✓ ★ ✚ ✏ ✔ ✚ ✪ ✖ ✏ ✑ ✖ ✪ ✚ ✜ ✕ ✢ ✤ ✢ ✲ ✕ ✫ ✒ ✰ ✭ ✓ ✔ ✑ ✚ ✢ ✕ ✗ ✒ ✓ ✕ ✖ ✗ ★ ✪ ✪ ✒ ✗ ✩ ✓ ✒ ✓ ✌ ✕ ✔ ✑ ✚ ✏ ✖ ✝ ✂ ✪ ✁ ✚ ✏ ✎ ✡ ❆ ✡ ✟ ✍ ✟ ✡ ✝ ✡ ✂ ✌ ✢ ★ ✢ ❇ ✢ ✔ ✁ ✔ ✢ ✜ ✌ ✶ ✆ ❆ ✡ ✟ ✆ ❆ ✕ ✜ ✗ ✏ ✕ ✚ ✕ ✪ ✓ ✕ ✓ ✚ ✫ ❉ ✓ ✔ ✚ ✭ ✒ ★ ✏ ✖ ✪ ✔ ✑ ✚ ✏ ✖ ✢ ✢ ✒ ✲ ✼ ✔ ✚ ✢ ✕ ✕ ✗ ✔ ☎ ✔ ✗ ✣ ★ ★ ✓ ✘ ✕ ★ ✓ ✗ ✒ ✔ ✶ ✓ ✏ ✑ ✤ ✒ ✕ ✖ ✢ ✔ ✢ ✣ ✓ ✏ ✕ ✓ ✪ ✲ ✔ ✼ ✢ ✮ ✛ ✤ ✓ ✑ ★ ✗ ✄ ✶ ✒ ✖ ✗ ✔ ✚ ✢ ✕ ✗ ✕ ✗ Value Function Approximation with Sparse SVR – ECML 2004 – p. 5/17 ∀ s V ∗ ( s ) = max π V π ( s ) ∀ s , � R π ( s ) ( s, s ′ ) + γV π ( s ′ ) ∀ s γ k r k | s t = s, π } , � k =0 � P π ( s ) ( s, s ′ ) ∞ π ∗ = argmax π V π V π ( s ) = E π { Reinforcement Learning II ✓✯✰ ✓✯✰ � s ′ V π ( s ) = π : S → A γ ✢✸✷ ✓✙✘ ✥✞✗ ✥✞✗
❃ ✦ ✢ ✒ ✚ ✕ ✖ ✤ ✥ ✖ ✒ ✭ ✚ ✩ ✮ ✤ ★ ★ ✗ ✏ ✔ ☎ ✖ ✠ ✢ ★ ✗ ✔ ✢ ✟ ✁ ✆ ✞ ✁ ✍ ☞ ✆ ✝ ✠ ✆ ☞ ✆ ✢ ✗ ✕ ☎ ✍ ✡ ☞❀ ✠ � ✂ ❆ ✌ ✡ ✁ � ✡ ❆ ✂ ✜ ☎ ✑ ✢ ☎ ❇ ☎ ✞ ✜ ❇ ✌ ✆ ✟ ☞ ☎ ✆ ✌ ❂ ❆ ✆ ✆ ☞ ❇ ✌ ✆ ✟ ✖ ✼ ✲ ✖ ✁ ☛ ✡ ✓ ✖ ★ ✓ ✕ ✓ ✠ ✒ ✣ ✚ ✒ ✛ ✍ ☞ ✠ ✆ ✕ ✢ ☞ ✕ ✗ ✕ ✖ ✆ ✁ ✠ ✍ ✠ ✗ ☞ ✒ ✚ ✛ ✕ ✓ ✭ ✒ ✖ ✓ ✛ ☞ ✖ ✔ ✢ ✩ ✓ ✔ ✟ ✟ ✼ ✗ ✡ ✟ ☞ ✍ ✆ ✁ ❃ ✂ ✕ ✔ ✕ ✑ ✗ ✕ ✖ ✕ ✔ ✓ ✒ ✒ ✏ ✏ ✒ ✓ ✆ ✓ ✔ ✓ ✜ ✩ ✓ ★✪ ✼ ✌ ✢ ✲ ✓ ★ ✤ ✕ ✖ ✭ ✔ ✣ ✚ ✣ ✗ ✒ ✭ ✚ ✒ ✏ ✏ ✣ ✪ ✣ ✓ ✢ ★ ✚ ✫ ✪ ✓ ❉ ✁ ✖ ✓ ✑ ✮ ✪ ✓ ✖ ✔ ✘ ★ ✢ ✗ ✤ ✟ ✌ ☎ ❆ ✡ ✆ ❀ ☎ ✁ ✆ ☎ ❆ ✌ ✝ ❅ ☞ ✂ ❇ ☞ ❆ ✆ ✔ ✁ ✤ ✠ ✜ ❇ ✌ ✆ ✟ ☞ ❅ ✌ ☞ ✁ ✡ ❃ ✂ ❂ ✆ ❅ ✏ ✚ ✶ ✠ ✗ ✩ ✓ ✒ ✟ ✠ ✄ ✍ ✗ ✪ ✓ ✓ ✖ ✑ ✮ ✓ ✓ ✒ ✒ ✏ ✘ ✕ ★ ✭ ✔ ✢ ✖ ✑ ✚ ✓ ✗ ✗ ✕ ✖ ✕ ❉ ✓ ✔ ✪ ✔ ✛ ★ ✶ ✫ ✫ ✣ ✓ ✟ ★ ✶ ✗ ✫ ✒ ✒ ✚ ❉ ✢ ✣ ✗ ✕ ✢ ✼ ✚ ✗ ✓ ✖ ✪ ✚ ✣ ✲ ✓ ★ ✤ ✕ ✓ ★ ✏ ✔ ✓ ✒ ✓ ✡ ✢ ✠ ✘ ✓ Value Function Approximation with Sparse SVR – ECML 2004 – p. 6/17 π � − V t ( s ) s ′ � � � R π ( s ) ( s, s ′ ) + γV t ( s ′ ) − V t ( s ) r t π target (unbiased estimate) r t + γV t ( s ′ ) � �� target �� � P π ( s ) ( s, s ′ ) � �✂✁ � V t +1 ( s ) = V t ( s ) + α Reinforcement Learning III ✥✞✗ �� s ′ � V t +1 ( s ) = V t ( s ) + ✟✡☞
Recommend
More recommend