Presentation Data February 2015 CITATIONS READS 0 62 2 authors: - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/271642128 Presentation Data · February 2015 CITATIONS READS 0 62 2 authors: Mohamed K. Gunady Walid Gomaa University of Maryland, College Park Egypt-Japan University of Science and Technology 7 PUBLICATIONS 14 CITATIONS 121 PUBLICATIONS 549 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Vehicle Detection and Tracking in Videos for very crowded scenes employing Quadrotors View project Blink! A Smartphone based Automatic Accident Detection and Notification System View project All content following this page was uploaded by Walid Gomaa on 01 February 2015. The user has requested enhancement of the downloaded file.

Reinforcement Learning Generalization Using State Aggregation with a Maze-Solving Problem Mohamed K. Gunady Walid Gomaa Department of Computer Science Engineering Egypt-Japan University of Science and Technology (E-JUST) Alexandria, Egypt Presented by: Mohamed K. Geunady

Overview • Introduction. • Q-Learning. • State Aggregation. • SAQL Algorithm. • Experimental Results. • Future Work. • Conclusion. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 2 Alexandria, Egypt

Introduction • Reinforcement Learning. – Learn by Trial & Error. – Interaction with the environment. – Best strategy to maximize the total rewards. – Based on MDP model. • current state s , action a , next state s' , reward R(s, a) – Many algorithms: • Q-learning, TD( λ ), Sarsa-Learning.. JEC-ECC March 2012 SAQL - M. Gunady - EJUST 3 Alexandria, Egypt

Q-Learning • Observing <s, a, r, s'> • Learn an optimal policy π * : S → A. • Max the cumulative discounted Reward: V π (st) = rt + γ rt+1 + γ 2 rt+2 +… (1) • Let V* ( s ) = maxa Q(s,a) then Q(s,a) = r(s,a) + γ maxa' Q(s',a') (2) JEC-ECC March 2012 SAQL - M. Gunady - EJUST 4 Alexandria, Egypt

Q-Learning (Contd.) • Guaranteed to converge. However, slow convergence rate. • Lookup table for Q(s,a) , high computation, space complexity. • curse of dimensionality in state-action space. • Thus, more compact representations. – hence, Generalization Techniques. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 5 Alexandria, Egypt

State Aggregation • Generalization Techniques: – Function Approximators: e.g. NN. – Hierarchical Learning. – State Aggregation. • Reduce time and storage requirements. • Mostly, smooth state space, i.e. the values of the adjacent states are nearly equal. – Combine similar states. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 6 Alexandria, Egypt

State Aggregation (Contd.) • Then, Two questions: – How to determine the similarity between states? – How to learn over the new state-action space? • Terminologies: – Ground space, the actual MDP. – Abstract space, the hypothetical reduced MDP. • Denote X for the abstract state space. • Ax for the abstract action space. • Rx for the new reward function. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 7 Alexandria, Egypt

SAQL Algorithm • Main Steps: – Discover similar states. – Group them into one abstract state. – Learn over this single abstract state instead of the many similar ground states. • how to decide a group of states are similar, i.e. consistent , enough to be grouped? – if some neighbouring states have consistent reward payoffs. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 8 Alexandria, Egypt

SAQL Algorithm Consistency Test • Let abstract state x , • s, s’ ground states to be grouped into x . • Consistency Rule for x : IF (3) THEN group x is consistent, construct abstract state x . JEC-ECC March 2012, SAQL - M. Gunady - EJUST 9 Alexandria, Egypt

SAQL Algorithm Abstract-Ground Mapping • Abstract action space Ax • ax abstract action, from abstract state x to x' , one of the neighboring abstract states to x. • Map ax to the equivalent ground actions within x , i.e. Internal Actions Ain • Internal actions have to be planned algorithmically not by learning. – Use simple group topology, e.g. square-shaped JEC-ECC March 2012, SAQL - M. Gunady - EJUST 10 Alexandria, Egypt

SAQL Algorithm System Architecture • Check the paper for the full algorithm. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 11 Alexandria, Egypt

Experimental Results • Maze-solving problem as a test bed. • Problem Settings: – * mark starting state, location (row, col). – X mark absorbing state (goal). – {N, S, E, W} actions. – Reward function: • +10 for goal state. • - 10 for obstacle state. • - 0.02 otherwise. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 12 Alexandria, Egypt

Experimental Results Convergence Rate • Learning Episodes. – Each contains many iterations. • 60x60 maze. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 13 Alexandria, Egypt

Experimental Results State Space Size • Different maze sizes. • 100, 900, 3600 states. • QL suffers high increase in number of iterations by increasing state space size. • SAQL suffers much less. • Speedup 3.9x, for state space of size 3600, within the first 100 episodes. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 14 Alexandria, Egypt

Experimental Results State Space Size JEC-ECC March 2012, SAQL - M. Gunady - EJUST 15 Alexandria, Egypt

Experimental Results Maze Complexity • Aggregation depends on the reward function w.r.t. the neighboring states. • The more consistent regions, the more aggregation efficiency => more reduction. • i.e. number & distribution of obstacles JEC-ECC March 2012, SAQL - M. Gunady - EJUST 16 Alexandria, Egypt

Experimental Results Maze Complexity • 30x30 maze, different obstacles JEC-ECC March 2012, SAQL - M. Gunady - EJUST 17 Alexandria, Egypt

Experimental Results Iteration Complexity • Tradeoffs: the learning iteration became more complex. • Extra computational work: – Consistency test. – States merging. – Mapping between ground/abstract states and actions. • But usually, the number of learning actions is highly costly compared to computations. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 18 Alexandria, Egypt

Future Work • Simple grouping technique and shapes. • Arbitrary shapes rather than squared ones. – How to plan internal actions, and quickly. • More complex groping technique – E.g. allow group breaking and regrouping. • Extend to probabilistic and dynamic environments. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 19 Alexandria, Egypt

Conclusion • RL Generalization with state aggregation is promising. • Modified QL, SAQL, algorism and system architecture is introduced. • Speedup of 4x. • 60% state space reduction. JEC-ECC March 2012, SAQL - M. Gunady - EJUST 20 Alexandria, Egypt

ًاركش Thank You �� View publication stats View publication stats

Presentation Data February 2015 CITATIONS READS 0 62 2 authors: - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/271642128 Presentation Data February 2015 CITATIONS READS 0 62 2 authors: Mohamed K. Gunady Walid Gomaa University of Maryland,

Investor Presentation Investor Presentation Investor Presentation Investor Presentation

INVESTOR PRESENTATION | 3 INVESTOR PRESENTATION | 4 INVESTOR PRESENTATION | 5 INVESTOR

INVESTOR PRESENTATION | 2 INVESTOR PRESENTATION | 3 INVESTOR PRESENTATION | 4 INVESTOR

Presentation Skills -Week 10- Presentation Skills Structure of presentation Preparing a

Presentation Presentation Presentation Presentation Presentation Abstract No Topic Title

John A. Deithloff W5FFS Amateur Radio License since 1954 HF Presentation HF Presentation HF

Presentation Presentation Presentation Presentation Presentation Abstract No Topic Title

Presentation Now: Prepare a Perfect Presentation in Presentation Now: Prepare a Perfect

Corporate Presentation Corporate Presentation Corporate Presentation Corporate Presentation

RESULTS RESULTS RESULTS RESULTS PRESENTATION PRESENTATION PRESENTATION PRESENTATION 17 17

INVESTOR PRESENTATION INVESTOR PRESENTATION INVESTOR PRESENTATION INVESTOR PRESENTATION June ,

Investor Presentation Investor Presentation Investor Presentation Investor Presentation

Plan4 Media Presentation For Paul Kyle Consultants Presentation outline Web presentation

Investor Presentation Investor Presentation Investor Presentation Investor Presentation June

Presentation Now: Prepare a Perfect Presentation in Presentation Now: Prepare a Perfect

Presentation Review Presentation Review Bioengineering 6061: Presentations Focus on Presentation

Colorados Health Insurance Affordability Programs : Goals to Prioritize and Options to Consider

DPT Class of 2018 Case Presentation Schedule Rm 2117 MERF Wednesday, December 12 th 1:00 5:00

Role of European and National Regulatory Authorities: who advises on what? CAT-ESGCT Workshop

Navigating the Maze. Individual Education Plans & Identification, Placement and Review

Navigating the Maze of Energy Storage Costs 9 th German PV & Energy Storage Market Briefing

LANDMARKS PRESERVATION COMMISSION TRIBECA DISTRICTS 1-3 WORTH STREET LPC PRESENTATION WINDOW

VISION The vision for Downtown Winnipeg is clear a place where people are living, working,

UEA Outreach Primary School Project DARREN MCMORRAN OUTREACH OFFICER Primary School Project

Presentation Data February 2015 CITATIONS READS 0 62 2 authors: - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/271642128 Presentation Data February 2015 CITATIONS READS 0 62 2 authors: Mohamed K. Gunady Walid Gomaa University of Maryland,

Investor Presentation Investor Presentation Investor Presentation Investor Presentation

INVESTOR PRESENTATION | 3 INVESTOR PRESENTATION | 4 INVESTOR PRESENTATION | 5 INVESTOR

INVESTOR PRESENTATION | 2 INVESTOR PRESENTATION | 3 INVESTOR PRESENTATION | 4 INVESTOR

Presentation Skills -Week 10- Presentation Skills Structure of presentation Preparing a

Presentation Presentation Presentation Presentation Presentation Abstract No Topic Title

John A. Deithloff W5FFS Amateur Radio License since 1954 HF Presentation HF Presentation HF

Presentation Presentation Presentation Presentation Presentation Abstract No Topic Title

Presentation Now: Prepare a Perfect Presentation in Presentation Now: Prepare a Perfect

Corporate Presentation Corporate Presentation Corporate Presentation Corporate Presentation

RESULTS RESULTS RESULTS RESULTS PRESENTATION PRESENTATION PRESENTATION PRESENTATION 17 17

INVESTOR PRESENTATION INVESTOR PRESENTATION INVESTOR PRESENTATION INVESTOR PRESENTATION June ,

Investor Presentation Investor Presentation Investor Presentation Investor Presentation

Plan4 Media Presentation For Paul Kyle Consultants Presentation outline Web presentation

Investor Presentation Investor Presentation Investor Presentation Investor Presentation June

Presentation Now: Prepare a Perfect Presentation in Presentation Now: Prepare a Perfect

Presentation Review Presentation Review Bioengineering 6061: Presentations Focus on Presentation

Colorados Health Insurance Affordability Programs : Goals to Prioritize and Options to Consider

DPT Class of 2018 Case Presentation Schedule Rm 2117 MERF Wednesday, December 12 th 1:00 5:00

Role of European and National Regulatory Authorities: who advises on what? CAT-ESGCT Workshop

Navigating the Maze. Individual Education Plans &amp; Identification, Placement and Review

Navigating the Maze of Energy Storage Costs 9 th German PV &amp; Energy Storage Market Briefing

LANDMARKS PRESERVATION COMMISSION TRIBECA DISTRICTS 1-3 WORTH STREET LPC PRESENTATION WINDOW

VISION The vision for Downtown Winnipeg is clear a place where people are living, working,

UEA Outreach Primary School Project DARREN MCMORRAN OUTREACH OFFICER Primary School Project

Navigating the Maze. Individual Education Plans & Identification, Placement and Review

Navigating the Maze of Energy Storage Costs 9 th German PV & Energy Storage Market Briefing