Recap: Q-Learning with state abstraction Using a feature - PowerPoint PPT Presentation

Recap: Q-Learning with state abstraction • Using a feature representation, we can write a Q function (or value function) for any state using a few weights: ( ) = w 1 f 1 s ( ) + w 2 f 2 s ( ) +  + w n f n s ( ) V s ( ) = w 1 f 1 s , a ( ) + w 2 f 2 s , a ( ) +  + w n f n s , a ( ) Q s , a • Advantage: our experience is summed up in a few powerful numbers • Disadvantage: states may share features but actually be very different in value! 1

Function Approximation ( ) = w 1 f 1 s , a ( ) + w 2 f 2 s , a ( ) +  + w n f n s , a ( ) Q s , a • Q-learning with linear Q-functions: transition = (s,a,r,s’) ⎡ ⎤ difference r max Q s a ', ' Q s a ( , ) ( ) = + γ − ⎣ ⎦ a ' Exact Q’s " $ Q ( s , a ) ← Q ( s , a ) + α difference # % Approximate Q’s " $ ( ) w i ← w i + α difference % f i s , a # • Intuitive interpretation: – Adjust weights of active features – E.g. if something unexpectedly bad happens, disprefer all states with that state’s features • Formal justification: online least squares 2

Example: Q-Pacman s Q ( s,a ) = 4.0 f DOT ( s,a ) - 1.0 f GST ( s,a ) f DOT ( s, NORTH)=0.5 f GST ( s, NORTH)=1.0 Q ( s,a )=+1 α= North R ( s,a,s’ )=-500 r = -500 s’ difference =-501 w DOT ← 4.0+ α [-501]0.5 w GST ← -1.0+ α [-501]1.0 Q ( s,a ) = 3.0 f DOT ( s,a ) - 3.0 f GST ( s,a ) 3

Today: Reasoning over Time • Often, we want to reason about a sequence of observations – Speech recognition – Robot localization – User attention – Medical monitoring • Need to introduce time into our models • Basic approach: hidden Markov models (HMMs) • More general: dynamic Bayes’ nets 4

Markov Models • A Markov model is a chain-structured BN – Conditional probabilities are the same (stationarity) – Value of X at a given time is called the state – As a BN: p( X 1 ) p( X|X -1 ) – Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probabilities) 5

Example: Markov Chain • Weather: – States: X = {rain, sun} – Transitions: – Initial distribution: 1.0 sun – What’s the probability distribution after one step? p ( X 2 =sun)= p ( X 2 =sun |X 1 =sun) p ( X 1 =sun) + p ( X 2 =sun |X 1 =rain) p ( X 1 =rain) =0.9*1.0+1.0*0.0 =0.9 6

Forward Algorithm • Question: What’s p(X) on some day t? p x p x | x p x ( ) ( ) ( ) ∑ = t t t 1 t 1 − − x t 1 − p x known ( ) = 1 Forward simulation 7

Example • From initial observation of sun p( X 1 ) p( X 2 ) p( X 3 ) p( X ∞ ) • From initial observation of rain p( X 1 ) p( X 2 ) p( X 3 ) p( X ∞ ) 8

Stationary Distributions • If we simulate the chain long enough: – What happens? – Uncertainty accumulates – Eventually, we have no idea what the state is! • Stationary distributions: – For most chains, the distribution we end up in is independent of the initial distribution – Called the stationary distribution of the chain – Usually, can only predict a short time out 9

Computing the stationary distribution • p ( X =sun)= p ( X =sun |X -1 =sun) p ( X =sun) + p ( X =sun |X -1 =rain) p ( X =rain) • p ( X =rain)= p ( X =rain |X -1 =sun) p ( X =sun) + p ( X =rain |X -1 =rain) p ( X =rain) 10

Web Link Analysis • PageRank over a web graph – Each web page is a state – Initial distribution: uniform over pages – Transitions: • With prob. c, uniform jump to a random page (dotted lines, not all shown) • With prob. 1-c, follow a random outlink (solid lines) • Stationary distribution – Will spend more time on highly reachable pages – Somewhat robust to link spam 11

Restrictiveness of Markov models • Are past and future really independent given current state? • E.g., suppose that when it rains, it rains for at most 2 days X 1 X 2 X 3 X 4 … • Second-order Markov process • Workaround: change meaning of “state” to events of last 2 days … X 3 , X 4 X 4 , X 5 X 2 , X 3 X 1 , X 2 • Another approach: add more information to the state • E.g., the full state of the world would include whether the sky is full of water – Additional information may not be observable – Blowup of number of states…

Hidden Markov Models • Markov chains not so useful for most agents – Eventually you don’t know anything anymore – Need observations to update your beliefs • Hidden Markov models (HMMs) – Underlying Markov chain over state X – You observe outputs (effects) at each time step – As a Bayes’ net: 13

Example R t-1 p(R t ) t 0.7 f 0.3 R t p(U t ) t 0.9 f 0.2 • An HMM is defined by: – Initial distribution: p(X 1 ) – Transitions: p(X|X -1 ) – Emissions: p(E|X) 14

Conditional Independence • HMMs have two important independence properties: – Markov hidden process, future depends on past via the present – Current observation independent of all else given current state • Quiz: does this mean that observations are independent? – [No, correlated by the hidden state] 15

Real HMM Examples • Speech recognition HMMs: – Observations are acoustic signals (continuous values) – States are specific positions in specific words (so, tens of thousands) • Robot tracking: – Observations are range readings (continuous) – States are positions on a map (continuous) 16

Filtering / Monitoring • Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time • We start with B(X) in an initial setting, usually uniform • As time passes, or we get observations, we update B(X) • The Kalman filter was invented in the 60’s and first implemented as a method of trajectory estimation for the Apollo program 17

Example: Robot Localization May not execute action with small prob. 18

Example: Robot Localization 19

Another weather example • X t is one of {s, c, r} (sun, cloudy, rain) • Transition probabilities: .6 s .1 .3 .4 not a Bayes net! .3 .2 c r .3 .3 .5 • Throughout, assume uniform distribution over X 1

Weather example extended to HMM • Transition probabilities: .6 s .1 .3 .4 .3 .2 c r .3 .3 .5 • Observation: roommate wet or dry • p(w|s) = .1, p(w|c) = .3, p(w|r) = .8

HMM weather example: a question .6 p(w|s) = .1 s p(w|c) = .1 .3 .4 .3 p(w|r) .3 .2 c = .8 r .3 .3 .5 • You have been stuck in the dorm for three days (!) • On those days, your roommate was dry, wet, wet, respectively • What is the probability that it is now raining outside? • p(X 3 = r | E 1 = d, E 2 = w, E 3 = w) • By Bayes’ rule, really want to know p(X 3 , E 1 = d, E 2 = w, E 3 = w)

Solving the question .6 p(w|s) = .1 s p(w|c) = .1 .3 .4 .3 p(w|r) .3 .2 c = .8 r .3 .3 .5 • Computationally efficient approach: first compute p(X 1 = i, E 1 = d) for all states i • General case: solve for p(X t , E 1 = e 1 , …, E t = e t ) for t=1, then t=2, … This is called monitoring • p(X t , E 1 = e 1 , …,E t = e t ) = Σ X t-1 p(X t-1 = x t-1 , E 1 = e 1 , …, E t-1 = e t-1 ) P(X t | X t-1 = x t-1 ) P(E t = e t | X t )

Predicting further out .6 p(w|s) = .1 s p(w|c) = .1 .3 .4 .3 p(w|r) .3 .2 c = .8 r .3 .3 .5 • You have been stuck in the dorm for three days • On those days, your roommate was dry, wet, wet, respectively • What is the probability that two days from now it will be raining outside? • p(X 5 = r | E 1 = d, E 2 = w, E 3 = w)

Predicting further out, continued… .6 p(w|s) = .1 s p(w|c) = .1 .3 .4 .3 p(w|r) .3 .2 c = .8 r .3 .3 .5 • Want to know: p(X 5 = r | E 1 = d, E 2 = w, E 3 = w) • Already know how to get: p(X 3 | E 1 = d, E 2 = w, E 3 = w) • p(X 4 = r | E 1 = d, E 2 = w, E 3 = w) = Σ X 3 P(X 4 = r, X 3 = x 3 | E 1 = d, E 2 = w, E 3 = w) =Σ X 3 P(X 4 = r | X 3 = x 3 )P(X 3 = x 3 | E 1 = d, E 2 = w, E 3 = w) • Etc. for X 5 • So: monitoring first, then straightforward Markov process updates

Integrating newer information .6 p(w|s) = .1 s p(w|c) = .1 .3 .4 .3 p(w|r) .3 .2 c = .8 r .3 .3 .5 • You have been stuck in the dorm for four days (!) • On those days, your roommate was dry, wet, wet, dry respectively • What is the probability that two days ago it was raining outside? p(X 2 = r | E 1 = d, E 2 = w, E 3 = w, E 4 = d) – Smoothing or hindsight problem

Recap: Q-Learning with state abstraction Using a feature - PowerPoint PPT Presentation

Recap: Q-Learning with state abstraction Using a feature representation, we can write a Q function (or value function) for any state using a few weights: ( ) = w 1 f 1 s ( ) + w 2 f 2 s ( ) + + w n f n s ( ) V s ( ) = w 1 f 1 s , a (

Data Abstraction Announcements Data Abstraction Data Abstraction 4 Data Abstraction

Predicate Abstraction with SATABS Existential Abstraction Predicate Abstraction for Software

Data Abstraction Announcements Data Abstraction Data Abstraction Programmers Compound

Abstraction and OOP Tiziana Ligorio 1 Todays Plan Announcements Recap Abstraction OOP

Point, Line, & Plane 1 Abstraction Abstraction is the act of considering something as a

Chapter 3: Data Abstraction Modularity and Abstraction Abstraction, modularity, information

File System Thierry Sans (recap) File System Abstraction File system specifics of which disk

Managing Water Abstraction Reforming Abstraction and Modernising Regulation Richard Austen Water

Predicate Abstraction with SATABS Version 1.0, 2010 Outline Introduction Existential

61A Lecture 18 Announcements Sequences The Sequence Abstraction 4 The Sequence Abstraction

CS 1331 Introduction to Object Oriented Programming Data Abstraction Christopher Simpkins

On Visual Abstraction Ivan Viola Visual Abstraction Fundamental concept in visualization and

Lecture 11: Abstraction I ntro. to Programming, lecture 11: Abstraction 2 Topics for today

CS 61A Lecture 9 Friday, September 14 The Sequence Abstraction 2 The Sequence Abstraction red,

The Power of Abstraction The Power of Abstraction Barbara Liskov October 2009 October 2009

Abstraction in Cryptography Ueli Maurer ETH Zurich CRYPTO 2009, August 19, 2009 Abstraction in

Concave Programming Upper Bounds on the Capacity of 2-D Constraints Ido Tal Ron M. Roth Work

The Symmetric Two-State Chain Different Initial Distributions? Let ( 0 ) = [ p ( 1 p )] be

CSE 473: Artificial Intelligence Markov Models Steve Tanimoto --- University of Washington [Most

Regular Markov Chains MATH 107: Finite Mathematics University of Louisville April 9, 2014

Stochastic chains with memory of variable length Antonio Galves Universidade de So Paulo AofA

A Scan Markov Chain for Sampling Colourings Kasper Pedersen kasper@dcs.warwick.ac.uk Department

Generative Adversarial Networks (part 2) Benjamin Striner 1 1 Carnegie Mellon University April 10,

MATH 612 Computational methods for equation solving and function minimization Week # 11 F

Recap: Q-Learning with state abstraction Using a feature - PowerPoint PPT Presentation

Recap: Q-Learning with state abstraction Using a feature representation, we can write a Q function (or value function) for any state using a few weights: ( ) = w 1 f 1 s ( ) + w 2 f 2 s ( ) + + w n f n s ( ) V s ( ) = w 1 f 1 s , a (

Data Abstraction Announcements Data Abstraction Data Abstraction 4 Data Abstraction

Predicate Abstraction with SATABS Existential Abstraction Predicate Abstraction for Software

Data Abstraction Announcements Data Abstraction Data Abstraction Programmers Compound

Abstraction and OOP Tiziana Ligorio 1 Todays Plan Announcements Recap Abstraction OOP

Point, Line, &amp; Plane 1 Abstraction Abstraction is the act of considering something as a

Chapter 3: Data Abstraction Modularity and Abstraction Abstraction, modularity, information

File System Thierry Sans (recap) File System Abstraction File system specifics of which disk

Managing Water Abstraction Reforming Abstraction and Modernising Regulation Richard Austen Water

Predicate Abstraction with SATABS Version 1.0, 2010 Outline Introduction Existential

61A Lecture 18 Announcements Sequences The Sequence Abstraction 4 The Sequence Abstraction

CS 1331 Introduction to Object Oriented Programming Data Abstraction Christopher Simpkins

On Visual Abstraction Ivan Viola Visual Abstraction Fundamental concept in visualization and

Lecture 11: Abstraction I ntro. to Programming, lecture 11: Abstraction 2 Topics for today

CS 61A Lecture 9 Friday, September 14 The Sequence Abstraction 2 The Sequence Abstraction red,

The Power of Abstraction The Power of Abstraction Barbara Liskov October 2009 October 2009

Abstraction in Cryptography Ueli Maurer ETH Zurich CRYPTO 2009, August 19, 2009 Abstraction in

Concave Programming Upper Bounds on the Capacity of 2-D Constraints Ido Tal Ron M. Roth Work

The Symmetric Two-State Chain Different Initial Distributions? Let ( 0 ) = [ p ( 1 p )] be

CSE 473: Artificial Intelligence Markov Models Steve Tanimoto --- University of Washington [Most

Regular Markov Chains MATH 107: Finite Mathematics University of Louisville April 9, 2014

Stochastic chains with memory of variable length Antonio Galves Universidade de So Paulo AofA

A Scan Markov Chain for Sampling Colourings Kasper Pedersen kasper@dcs.warwick.ac.uk Department

Generative Adversarial Networks (part 2) Benjamin Striner 1 1 Carnegie Mellon University April 10,

MATH 612 Computational methods for equation solving and function minimization Week # 11 F

Point, Line, & Plane 1 Abstraction Abstraction is the act of considering something as a