See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/271642128 Presentation Data · February 2015 CITATIONS READS 0 62 2 authors: Mohamed K. Gunady Walid Gomaa University of Maryland, College Park Egypt-Japan University of Science and Technology 7 PUBLICATIONS 14 CITATIONS 121 PUBLICATIONS 549 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Vehicle Detection and Tracking in Videos for very crowded scenes employing Quadrotors View project Blink! A Smartphone based Automatic Accident Detection and Notification System View project All content following this page was uploaded by Walid Gomaa on 01 February 2015. The user has requested enhancement of the downloaded file.
Reinforcement Learning Generalization Using State Aggregation with a Maze-Solving Problem Mohamed K. Gunady Walid Gomaa Department of Computer Science Engineering Egypt-Japan University of Science and Technology (E-JUST) Alexandria, Egypt Presented by: Mohamed K. Geunady
Overview • Introduction. • Q-Learning. • State Aggregation. • SAQL Algorithm. • Experimental Results. • Future Work. • Conclusion. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 2 Alexandria, Egypt
Introduction • Reinforcement Learning. – Learn by Trial & Error. – Interaction with the environment. – Best strategy to maximize the total rewards. – Based on MDP model. • current state s , action a , next state s' , reward R(s, a) – Many algorithms: • Q-learning, TD( λ ), Sarsa-Learning.. JEC-ECC March 2012 SAQL - M. Gunady - EJUST 3 Alexandria, Egypt
Q-Learning • Observing <s, a, r, s'> • Learn an optimal policy π * : S → A. • Max the cumulative discounted Reward: V π (st) = rt + γ rt+1 + γ 2 rt+2 +… (1) • Let V* ( s ) = maxa Q(s,a) then Q(s,a) = r(s,a) + γ maxa' Q(s',a') (2) JEC-ECC March 2012 SAQL - M. Gunady - EJUST 4 Alexandria, Egypt
Q-Learning (Contd.) • Guaranteed to converge. However, slow convergence rate. • Lookup table for Q(s,a) , high computation, space complexity. • curse of dimensionality in state-action space. • Thus, more compact representations. – hence, Generalization Techniques. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 5 Alexandria, Egypt
State Aggregation • Generalization Techniques: – Function Approximators: e.g. NN. – Hierarchical Learning. – State Aggregation. • Reduce time and storage requirements. • Mostly, smooth state space, i.e. the values of the adjacent states are nearly equal. – Combine similar states. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 6 Alexandria, Egypt
State Aggregation (Contd.) • Then, Two questions: – How to determine the similarity between states? – How to learn over the new state-action space? • Terminologies: – Ground space, the actual MDP. – Abstract space, the hypothetical reduced MDP. • Denote X for the abstract state space. • Ax for the abstract action space. • Rx for the new reward function. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 7 Alexandria, Egypt
SAQL Algorithm • Main Steps: – Discover similar states. – Group them into one abstract state. – Learn over this single abstract state instead of the many similar ground states. • how to decide a group of states are similar, i.e. consistent , enough to be grouped? – if some neighbouring states have consistent reward payoffs. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 8 Alexandria, Egypt
SAQL Algorithm Consistency Test • Let abstract state x , • s, s’ ground states to be grouped into x . • Consistency Rule for x : IF (3) THEN group x is consistent, construct abstract state x . JEC-ECC March 2012, SAQL - M. Gunady - EJUST 9 Alexandria, Egypt
SAQL Algorithm Abstract-Ground Mapping • Abstract action space Ax • ax abstract action, from abstract state x to x' , one of the neighboring abstract states to x. • Map ax to the equivalent ground actions within x , i.e. Internal Actions Ain • Internal actions have to be planned algorithmically not by learning. – Use simple group topology, e.g. square-shaped JEC-ECC March 2012, SAQL - M. Gunady - EJUST 10 Alexandria, Egypt
SAQL Algorithm System Architecture • Check the paper for the full algorithm. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 11 Alexandria, Egypt
Experimental Results • Maze-solving problem as a test bed. • Problem Settings: – * mark starting state, location (row, col). – X mark absorbing state (goal). – {N, S, E, W} actions. – Reward function: • +10 for goal state. • - 10 for obstacle state. • - 0.02 otherwise. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 12 Alexandria, Egypt
Experimental Results Convergence Rate • Learning Episodes. – Each contains many iterations. • 60x60 maze. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 13 Alexandria, Egypt
Experimental Results State Space Size • Different maze sizes. • 100, 900, 3600 states. • QL suffers high increase in number of iterations by increasing state space size. • SAQL suffers much less. • Speedup 3.9x, for state space of size 3600, within the first 100 episodes. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 14 Alexandria, Egypt
Experimental Results State Space Size JEC-ECC March 2012, SAQL - M. Gunady - EJUST 15 Alexandria, Egypt
Experimental Results Maze Complexity • Aggregation depends on the reward function w.r.t. the neighboring states. • The more consistent regions, the more aggregation efficiency => more reduction. • i.e. number & distribution of obstacles JEC-ECC March 2012, SAQL - M. Gunady - EJUST 16 Alexandria, Egypt
Experimental Results Maze Complexity • 30x30 maze, different obstacles JEC-ECC March 2012, SAQL - M. Gunady - EJUST 17 Alexandria, Egypt
Experimental Results Iteration Complexity • Tradeoffs: the learning iteration became more complex. • Extra computational work: – Consistency test. – States merging. – Mapping between ground/abstract states and actions. • But usually, the number of learning actions is highly costly compared to computations. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 18 Alexandria, Egypt
Future Work • Simple grouping technique and shapes. • Arbitrary shapes rather than squared ones. – How to plan internal actions, and quickly. • More complex groping technique – E.g. allow group breaking and regrouping. • Extend to probabilistic and dynamic environments. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 19 Alexandria, Egypt
Conclusion • RL Generalization with state aggregation is promising. • Modified QL, SAQL, algorism and system architecture is introduced. • Speedup of 4x. • 60% state space reduction. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 20 Alexandria, Egypt
ًاركش Thank You ����� View publication stats View publication stats
Recommend
More recommend