Q-Learning An agent tries an action at a particular state, and - PowerPoint PPT Presentation

Mar 10, 2024 •225 likes •290 views

Q-Learning An agent tries an action at a particular state, and evaluates its consequences in terms of the immediate reward or penalty it receives and its estimate of the value of the state to which it is taken. The paper shows that

Q-Learning • An agent tries an action at a particular state, and evaluates its consequences in terms of the immediate reward or penalty it receives and its estimate of the value of the state to which it is taken. • The paper shows that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely.
π(x) is the next state recommended by policy π Prob that y= π(x) The the value of state x is Instant reward of Value of new state y going to the state which policy π recommends Learning rate
The goal of Q-learning is to find π*, such that for any given state x , π* recommends the action a that will maximize the value of current state.
To get this optimal policy π*, we build the matrix Q in incremental way, like in Dynamic Programming: Now, since we want π(x) to be optimal, it will recommend max(V π (y)) with probability 1. So, equation becomes: Q(x, a) = R x (a) + γ * max a’ (Q(x’, a’))
Data Structures Matrix “R” is the reward matrix. R[x][a] denotes instant reward of • performing action a at state x . Only the actions leading to goal state have positive reward. Matrix “Q” is the brain matrix. It represents the memory of what our • agent has learned through experience. Q[x][a] denotes learned reward of performing action a at state x. Q can be initially zero. However, size of these matrices depends on the size of action and • state space, which could be exponential. So, we generally use look-up tables instead.
References • [1992] "Q-Learning". Christopher Watkins, Peter Dayan. Nature Publishing Group.

Recommend

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on Learning Differences, Learning Challenges, and Learning Strengths Experience-based Learning: Experience based Learning: Language Build on the

370 views • 14 slides

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning 11.1 Learning agents 11.2 Inductive learning 11.3 Deep learning 11.4 Statistical learning 11.5 Reinforcement learning 11.6 Transfer learning

1.7k views • 159 slides

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning Outcomes Define mobile learning. Describe policies for mobile learning. Identify mobile learning technologies. Definition of Mobile Learning

524 views • 16 slides

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening 2016 Wellbeing Resilience Miss Pounnas Year 7 Learning Evening 2017 Home Learning and the Virtual Learning Environment Mr Rutherford Year 7

1.02k views • 75 slides

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

NeurIPS 2018 Tutorial on Automatic Machine Learning Learning to Learn Learning Learning Learning Learning Learning automl.org/events -> AutoML Tutorial -> Slides Frank Hutter Joaquin Vanschoren Eindhoven University of Technology

1.36k views • 40 slides

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

MACHINE LEARNING TOOLBOX Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret R package Automates supervised learning (a.k.a. predictive modeling ) Target variable Machine Learning Toolbox

634 views • 16 slides

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

9/23/2020 Outline of Machine Learning Lectures Introduction to machine learning (two lectures) A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very brief) First Lecture Reinforcement learning

358 views • 13 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1. What is Learning? 2. Can We do it?

613 views • 25 slides

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning Challenges, and Learning Strengths Strengths As a companion to Dr. Greenspans The Learning Tree : g Overcoming Learning Disabilities From the

367 views • 16 slides

Impasse, Conflict Impasse, Conflict and Learning of CS Notions and Learning of CS Notions David

Impasse, Conflict Impasse, Conflict and Learning of CS Notions and Learning of CS Notions David Ginat Ginat David Tel- - Aviv University Aviv University Tel 32 slides 32 slides Learning Learning Rote Learning Rote Learning Learning

842 views • 33 slides

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Contents Statistical learning Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and Why Learning Works W olfram Burgard, Bernhard Nebel, and Andreas Karw ath 10/ 1 10/ 2 Statistical Learning

305 views • 5 slides

Why e Learning can actually be effective for learning an understanding from psycho

Why e Learning can actually be effective for learning an understanding from psycho cognitive science. Why e Learning can actually be Why e Learning can actually be effective for learning an effective for learning an

360 views • 8 slides

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning: Software can make decision free of human biases Fairness in Supervised Learning Make decisions by machine learning: Software is

501 views • 46 slides

Objectives Objectives Objectives Objectives Learning Learning Learning Learning

11/14/2016 Is Starting a Nonprofit Right for You? Objectives Objectives Objectives Objectives Learning Learning Learning Learning Understand what a Nonprofit Organization is and the Objectives Objectives Objectives Objectives

384 views • 11 slides

Learning Sciences: Impact on Learning Technologies & Learning Activities Phillip D. Long,

Learning Sciences: Impact on Learning Technologies & Learning Activities Phillip D. Long, Ph.D. 12-May, 2015 Narrative Arc ( 5 2) Conclusion: What do we want? Learning learning |lrniNG | noun the acquisition of knowledge or

653 views • 36 slides

Towards a Computer Algebra System with Automatic Differentiation for use with object-oriented

3rd International Workshop on Equation-Based Object-Oriented Modeling Languages and Tools Oslo, 3 October 2010 Towards a Computer Algebra System with Automatic Differentiation for use with object-oriented modelling languages Joel Andersson

583 views • 44 slides

Pr t sss

Pr t sss ts r r rtstt ts

370 views • 14 slides

- Given a set of k colors, color each node (randomly). - With high probability, there is a

Given a graph G , seek a solution that is a subgraph on k nodes. - Given a set of k colors, color each node (randomly). - With high probability, there is a solution where each node has a different color. - Seek such a solution easy!

546 views • 25 slides

Smart Recursion aka Dynamic Programming Suresh Velagapudi 31 Jan 2015 licensed under a Creative

Smart Recursion aka Dynamic Programming Suresh Velagapudi 31 Jan 2015 licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. We aim to . . . Write human comprehensible code Write machine

645 views • 26 slides

Window Uniqueness Constraint Digital Human Research Center, AIST Shuntaro Yamazaki and Masaaki

Optimal Decoding of Stripe Patterns with Window Uniqueness Constraint Digital Human Research Center, AIST Shuntaro Yamazaki and Masaaki Mochimaru Digital Human Research Center National Institute of Advanced Industrial Science and Technology,

307 views • 20 slides

Integrating decision-theoretic planning and programming for robot control in highly dynamic

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains Christian Fritz Thesis, Final Presentation Integrating decision-theoretic planning and programming for robot control in highly dynamic domains

752 views • 31 slides

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition 4 colors Problem Definition 4 colors 3 colors Problem Definition Definition: k-coloring problem: a graph coloring using

754 views • 48 slides

TITLEPAGE Bc., Ing., Ph.D. October 22, 2018 Die Die Hardcode all six possibilities

TITLEPAGE Bc., Ing., Ph.D. October 22, 2018 Die Die Hardcode all six possibilities Rotate . . . Profit $ $ $ Horsemeat - alternative solution (matrix multiplication) Horsemeat - alternative solution (matrix multiplication)

710 views • 56 slides

Q-Learning An agent tries an action at a particular state, and - PowerPoint PPT Presentation

Q-Learning An agent tries an action at a particular state, and evaluates its consequences in terms of the immediate reward or penalty it receives and its estimate of the value of the state to which it is taken. The paper shows that

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning

Impasse, Conflict Impasse, Conflict and Learning of CS Notions and Learning of CS Notions David

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Why e Learning can actually be effective for learning an understanding from psycho

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

Objectives Objectives Objectives Objectives Learning Learning Learning Learning

Learning Sciences: Impact on Learning Technologies & Learning Activities Phillip D. Long,

Towards a Computer Algebra System with Automatic Differentiation for use with object-oriented

Pr t sss

- Given a set of k colors, color each node (randomly). - With high probability, there is a

Smart Recursion aka Dynamic Programming Suresh Velagapudi 31 Jan 2015 licensed under a Creative

Window Uniqueness Constraint Digital Human Research Center, AIST Shuntaro Yamazaki and Masaaki

Integrating decision-theoretic planning and programming for robot control in highly dynamic

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

TITLEPAGE Bc., Ing., Ph.D. October 22, 2018 Die Die Hardcode all six possibilities

Sambuz

Useful Links

Newsletter

Mail Us

Q-Learning An agent tries an action at a particular state, and - PowerPoint PPT Presentation

Q-Learning An agent tries an action at a particular state, and evaluates its consequences in terms of the immediate reward or penalty it receives and its estimate of the value of the state to which it is taken. The paper shows that

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning

Impasse, Conflict Impasse, Conflict and Learning of CS Notions and Learning of CS Notions David

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Why e Learning can actually be effective for learning an understanding from psycho

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

Objectives Objectives Objectives Objectives Learning Learning Learning Learning

Learning Sciences: Impact on Learning Technologies &amp; Learning Activities Phillip D. Long,

Towards a Computer Algebra System with Automatic Differentiation for use with object-oriented

Pr t sss

- Given a set of k colors, color each node (randomly). - With high probability, there is a

Smart Recursion aka Dynamic Programming Suresh Velagapudi 31 Jan 2015 licensed under a Creative

Window Uniqueness Constraint Digital Human Research Center, AIST Shuntaro Yamazaki and Masaaki

Integrating decision-theoretic planning and programming for robot control in highly dynamic

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

TITLEPAGE Bc., Ing., Ph.D. October 22, 2018 Die Die Hardcode all six possibilities

Sambuz

Useful Links

Newsletter

Mail Us

Learning Sciences: Impact on Learning Technologies & Learning Activities Phillip D. Long,