High-Dimensional Function Approximation for Knowledge-Free - PowerPoint PPT Presentation

High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris Wojciech Jaśkowski Marcin Szubert Pawel Liskowski Krysztof Krawiec Institute of Computing Science July 14, 2015

Introduction RL Perspective 1 direct policy search (e.g. EAs), good for Tetris, Othello 2 value function-based methods (e.g. TD), good for Backgammon Comparison: Many factors involved: randomness, environment observability, problem structure, etc. High-Dimensional Function Approximation in RL: SZ-Tetris 2 / 17 Jasśkowski et al.

Introduction RL Perspective 1 direct policy search (e.g. EAs), good for Tetris, Othello 2 value function-based methods (e.g. TD), good for Backgammon Comparison: Many factors involved: randomness, environment observability, problem structure, etc. Here: Policy Representation For High dimensions only value function-based methods? Modern EAs capable of searching high-dimensional spaces, e.g., VD-CMA-ES, R1-NES. High-Dimensional Function Approximation in RL: SZ-Tetris 2 / 17 Jasśkowski et al.

Introduction RL Perspective 1 direct policy search (e.g. EAs), good for Tetris, Othello 2 value function-based methods (e.g. TD), good for Backgammon Comparison: Many factors involved: randomness, environment observability, problem structure, etc. Here: Policy Representation For High dimensions only value function-based methods? Modern EAs capable of searching high-dimensional spaces, e.g., VD-CMA-ES, R1-NES. Research Question How these modern EAs compare to value function-based methods for high-dimensional policy representations? High-Dimensional Function Approximation in RL: SZ-Tetris 2 / 17 Jasśkowski et al.

SZ-Tetris Domain SZ-Tetris a single-player stochastic game, a constrained variant of Tetris, a popular yardstick in RL devised to studying ‘key problems of reinforcement learning’ 10 × 20 board 17 actions: position + rotation 1 point for clearing a line High-Dimensional Function Approximation in RL: SZ-Tetris 3 / 17 Jasśkowski et al.

SZ-Tetris Motivation Hard for value function-based methods There are many RL algorithms for approximating the value functions. None of them really work on (SZ-)Tetris , they do not even come close to the performance of the evolutionary approaches. 1 1 I. Szita and C. Szepesv´ ari. SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In Proceedings of the ICML 2010 Workshop on Machine High-Dimensional Function Approximation in RL: SZ-Tetris 4 / 17 Jasśkowski et al. Learning and Games., 2010.

SZ-Tetris Motivation Hard for value function-based methods There are many RL algorithms for approximating the value functions. None of them really work on (SZ-)Tetris , they do not even come close to the performance of the evolutionary approaches. 1 Not easy for direct search methods Cross Entrophy Method (ca. 117) < hand-coded policy: (ca. 183 . 6) 1 I. Szita and C. Szepesv´ ari. SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In Proceedings of the ICML 2010 Workshop on Machine High-Dimensional Function Approximation in RL: SZ-Tetris 4 / 17 Jasśkowski et al. Learning and Games., 2010.

SZ-Tetris Motivation Hard for value function-based methods There are many RL algorithms for approximating the value functions. None of them really work on (SZ-)Tetris , they do not even come close to the performance of the evolutionary approaches. 1 Not easy for direct search methods Cross Entrophy Method (ca. 117) < hand-coded policy: (ca. 183 . 6) Need for better function approximator Challenge #1: Find a sufficiently good feature set (...). A feature set is sufficiently good if CEM (or CMA-ES, or genetic algorithms, etc.) is able to learn a weight vector such that the resulting preference function reaches at least as good results as the hand-coded solution. 1 1 I. Szita and C. Szepesv´ ari. SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In Proceedings of the ICML 2010 Workshop on Machine High-Dimensional Function Approximation in RL: SZ-Tetris 4 / 17 Jasśkowski et al. Learning and Games., 2010.

Preliminary State-Evaluation Function and Action Selection Known model → we use state-evaluation function V : S → R Greedy policy w.r.t V : π ( s ) = argmax a ∈ A V ( T ( s , a )) , where T is a transition model. Evaluation functions: 1 state-value function (estimates the expected future scores from a given state), 2 state-preference function (no interpretation, larger is better) High-Dimensional Function Approximation in RL: SZ-Tetris 5 / 17 Jasśkowski et al.

Function Approximation 2 20 × 10 ≈ 10 60 states (upper bound) → we need a function approximator: V θ : S → R Task: learn the best set of parameters θ . High-Dimensional Function Approximation in RL: SZ-Tetris 6 / 17 Jasśkowski et al.

Weighted Sum of Hand-Designed Features φ Bertsekas & Ioffe (B&I) 1 Height h k of the k th column of the board, k = 1 , . . . 10. 2 Absolute difference between the heights of the consecutive columns. 3 Maximum column height max h . 4 Number of ‘holes‘ on the board. Linear evaluation function of features: 21 � V θ ( s ) = θ i φ i ( s ) , i = 1 High-Dimensional Function Approximation in RL: SZ-Tetris 7 / 17 Jasśkowski et al.

Systematic n -Tuple Network Successful for LUT 1 Othello [Lucas, 2007, Jaśkowski 2014], League 0123 value 0000 3 . 04 2 Connect-4 [Thill, 2012], 0001 − 3 . 90 0010 − 2 . 14 3 2048 [Szubert, 2015] . . . . . . 1 2 1100 − 2 . 01 . . Linear weighted function of . . . . 1110 6 . 12 (a large number of) binary 0 3 1111 3 . 21 features Computationally efficient m m � � LUT i � � �� V i ( s ) = V θ ( s ) = index s loc i 1 , . . . , s loc ini i = 1 i = 1 High-Dimensional Function Approximation in RL: SZ-Tetris 8 / 17 Jasśkowski et al.

Systematic n -tuple Network LUT Systematically cover the board 0123 value with: 0000 3 . 04 0001 − 3 . 90 1 3 × 3-tuples (size = 9), 0010 − 2 . 14 . . | θ | = 72 × 2 9 = 36 864 . . . . 1 2 1100 − 2 . 01 . . 2 4 × 4-tuples (size = 16), . . . . | θ | = 68 × 2 16 = 4 456 448 1110 6 . 12 0 3 1111 3 . 21 High-Dimensional Function Approximation in RL: SZ-Tetris 9 / 17 Jasśkowski et al.

Direct search methods ESs maintaining a multi-variate Gaussian probability distribution: N ( µ, Σ ) : 1 Cross-Entrophy Method [ CEM , Rubinstein, 2004]: 2 Covariance Matrix Adaptation Evolution Strategy [ CMA-ES , Hansen 2001] full matrix Σ , smart self-adaptation ( O ( n 2 ) ) 3 CMA-ES for high dimensions [ VD-CMA-ES , Akimoto, 2014] Σ = D ( I + vv T ) D , where D – diagonal matrix, v ∈ R n ( O ( n ) ) High-Dimensional Function Approximation in RL: SZ-Tetris 10 / 17 Jasśkowski et al.

Value Function-Based Methods (TD) Learning of V After a move the agents gets a new experience � s , a , r , s ′ � Modify V in response to the experience by Sutton’s TD(0) update rule: V ( s ) ← V ( s ) + α ( r + V ( s ′ ) − V ( s )) α — learning rate General Idea Reconcile values of neighboring states V ( s ) and V ( s ′ ) , to make in the long run Bellman equation hold: � � � P ( s , a , s ′ ) V ( s ′ ) V ( s ) = max R ( s , a ) + a ∈ A ( s ) s ′ ∈ S High-Dimensional Function Approximation in RL: SZ-Tetris 11 / 17 Jasśkowski et al.

Results for evolutionary methods B&I Features 3x3 Tuple Network 300 250 average score (cleared lines) 200 150 100 CEM 50 CMAES CMAES−VD 0 0 50 100 150 200 0 200 400 600 800 1000 generation 117 . 0 ± 6 . 3 CEM 124 . 8 ± 13 . CMA-ES 219 . 7 ± 2 . 8 VD-CMA-ES for 3 × 3 High-Dimensional Function Approximation in RL: SZ-Tetris 12 / 17 Jasśkowski et al.

Results for TD(0) 3x3 Tuple Network 4x4 Tuple Network 300 250 average score (cleared lines) 200 150 100 50 0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 training games (x1000) 183 . 3 ± 4 . 3 TD(0) for 3 × 3 218 . 0 ± 5 . 2 TD(0) for 4 × 4 219 . 7 ± 2 . 8 VD-CMA-ES for 3 × 3 High-Dimensional Function Approximation in RL: SZ-Tetris 13 / 17 Jasśkowski et al.

Results Summary dence interval delta. Algorithm Function Features # Games Result Hand-coded - - - 183 . 6 ± 1 . 4 CEM B&I 21 20 mln 117 . 0 ± 6 . 3 CMA-ES B&I 21 20 mln 124 . 8 ± 13 . 1 VD-CMA-ES 3 × 3-tuple network 36 864 100 mln 219 . 7 ± 2 . 8 TD(0) 3 × 3-tuple network 36 864 4 mln 183 . 3 ± 4 . 3 TD(0) 4 × 4-tuple network 4 456 448 4 mln 218 . 0 ± 5 . 2 Larger variance with TD(0) 4 × 4 → best strategy (nearly 300 points on average). High-Dimensional Function Approximation in RL: SZ-Tetris 14 / 17 Jasśkowski et al.

Best agent play High-Dimensional Function Approximation in RL: SZ-Tetris 15 / 17 Jasśkowski et al.

4x4 TDL agent play High-Dimensional Function Approximation in RL: SZ-Tetris 16 / 17 Jasśkowski et al.

Summary RL Perspective 1 High-dimensional representation (systematic n -tuple network) to: Make TD work at all on this problem 2 VD-CMA-ES vs. TD: VD-CMA-ES can work with tens of tousands parameters ( needs large populations ) CEM < TD < VD-CMA-ES (on 3 x 3) TD vs. VD-CMA-ES → memory vs. time trade-off 1 Source code: http://github.com/wjaskowski/gecco-2015-sztetris High-Dimensional Function Approximation in RL: SZ-Tetris 17 / 17 Jasśkowski et al.

High-Dimensional Function Approximation for Knowledge-Free - PowerPoint PPT Presentation

High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris Wojciech Jakowski Marcin Szubert Pawel Liskowski Krysztof Krawiec Institute of Computing Science July 14, 2015 Introduction RL

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

6. Approximation and fitting norm approximation least-norm problems regularized

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Adaptive Near-Minimal Rank Approximation for High Dimensional Operator Equations Wolfgang Dahmen,

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Function Representation & Spherical Harmonics Function approximation G (x) ... function

Moderately exponential approximation Bridging the gap between exact computation and polynomial

6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao

Deep Approximation via Deep Learning Zuowei Shen Department of Mathematics National University

ECS 231 Lecture on Approximation and Error Analysis 1 / 9 Approximation and error analysis 1.

Lecture 18: PCP Theorem and Hardness of Approximation I Arijit Bishnu 26.04.2010 Introduction

Advanced Algorithms COMS31900 Approximation algorithms part three (Fully) Polynomial Time

High-dimensional and infinite-dimensional hyperbolic crosses and their applications in

Rational function approximation Rational function of degree N = n + m is written as q ( x ) = p 0 +

Plan for today Knowledge-based systems 1 Explicit knowledge Knowledge Representation Inferred

Your Home Fan on with call for heating or cooling Heating on with call from space

M Institut de Recherche en G nie C ivil et M canique , N. Chevaugeon ILS for contact

Standard Motor Products, Inc. Annual Meeting of Shareholders May 17 th , 2018 1 2018 Annual

This is an English convenience translation of the original Hebrew version. In case of any

Barriers and opportunities: Approaches to sensitive LCS sectors ~ An automotive industrys view~

MANUFACTURING SERVICES IN SPECIALTY PROCESES IN MORE THAN MOORE SENSORS, PIEZO-MEMS,

Virtually Enhanced Languages Scott Grant NTNU 2017 http://www.virtuallyenhancedlanguages.com

High-sensitivity and Low-power Flexible Schottky Hydrogen Sensor based on Silicon Nanomembrane

High-Dimensional Function Approximation for Knowledge-Free - PowerPoint PPT Presentation

High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris Wojciech Jakowski Marcin Szubert Pawel Liskowski Krysztof Krawiec Institute of Computing Science July 14, 2015 Introduction RL

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

6. Approximation and fitting norm approximation least-norm problems regularized

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Adaptive Near-Minimal Rank Approximation for High Dimensional Operator Equations Wolfgang Dahmen,

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Function Representation &amp; Spherical Harmonics Function approximation G (x) ... function

Moderately exponential approximation Bridging the gap between exact computation and polynomial

6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao

Deep Approximation via Deep Learning Zuowei Shen Department of Mathematics National University

ECS 231 Lecture on Approximation and Error Analysis 1 / 9 Approximation and error analysis 1.

Lecture 18: PCP Theorem and Hardness of Approximation I Arijit Bishnu 26.04.2010 Introduction

Advanced Algorithms COMS31900 Approximation algorithms part three (Fully) Polynomial Time

High-dimensional and infinite-dimensional hyperbolic crosses and their applications in

Rational function approximation Rational function of degree N = n + m is written as q ( x ) = p 0 +

Plan for today Knowledge-based systems 1 Explicit knowledge Knowledge Representation Inferred

Your Home Fan on with call for heating or cooling Heating on with call from space

M Institut de Recherche en G nie C ivil et M canique , N. Chevaugeon ILS for contact

Standard Motor Products, Inc. Annual Meeting of Shareholders May 17 th , 2018 1 2018 Annual

This is an English convenience translation of the original Hebrew version. In case of any

Barriers and opportunities: Approaches to sensitive LCS sectors ~ An automotive industrys view~

MANUFACTURING SERVICES IN SPECIALTY PROCESES IN MORE THAN MOORE SENSORS, PIEZO-MEMS,

Virtually Enhanced Languages Scott Grant NTNU 2017 http://www.virtuallyenhancedlanguages.com

High-sensitivity and Low-power Flexible Schottky Hydrogen Sensor based on Silicon Nanomembrane

Function Representation & Spherical Harmonics Function approximation G (x) ... function