Learning Predictive State Representations Using Non-Blind Policies - PowerPoint PPT Presentation

Learning Predictive State Representations Using Non-Blind Policies Michael Bowling Peter McCracken Michael James James Neufeld Dana Wilkinson University of Alberta Toyota Technical Center University of Waterloo ICML 2006 Bowling et al. PSRs and Non-Blind Policies ICML 2006 1 / 18

Outline Very Brief What is a PSR? 1 Tutorial Extracting PSRs from Data. 2 Short Prediction Estimators: 3 Punchline Problem and Solution Non-Blind Exploration 4 Bonus Bowling et al. PSRs and Non-Blind Policies ICML 2006 2 / 18

Decision Process Action Observation a 1 , o 1 , a 2 , o 2 , . . . , a n , o n General Form Pr( o n +1 | a 1 , o 1 , . . . , a n , o n , a n +1 ) Bowling et al. PSRs and Non-Blind Policies ICML 2006 3 / 18

Decision Process Action Observation a 1 , o 1 , a 2 , o 2 , . . . , a n , o n Markov Decision Process Pr( o n +1 | a 1 , o 1 , . . . , a n , o n , a n +1 ) = Pr( o n +1 | o n , a n +1 ) Bowling et al. PSRs and Non-Blind Policies ICML 2006 3 / 18

Decision Process Action Observation a 1 , o 1 , a 2 , o 2 , . . . , a n , o n General Form Pr( o n +1 | a 1 , o 1 , . . . , a n , o n , a n +1 ) Bowling et al. PSRs and Non-Blind Policies ICML 2006 3 / 18

Histories, Tests, and Predictions Notation History( h ) a 1 , o 1 , a 2 , o 2 , . . . , a n , o n Test( t ) (but in the future) a 1 , o 1 , a 2 , o 2 , . . . , a n , o n Prediction p ( t | h ) n � p ( a 1 , o 1 , . . . , a n , o n | h ) ≡ Pr( o i | ha 1 , o 1 , . . . , a i ) i =1 n � π ( a 1 , o 1 , . . . , a n , o n | h ) ≡ Pr( a i | ha 1 , o 1 , . . . , a i − 1 , o i − 1 ) i =1 Pr( t | h ) = p ( t | h ) π ( t | h ) Bowling et al. PSRs and Non-Blind Policies ICML 2006 4 / 18

System Dynamics Matrix Countable number of Tests tests and histories. t Infinite matrix of all predictions. Histories p ( t | h ) h Bowling et al. PSRs and Non-Blind Policies ICML 2006 5 / 18

POMDPs Underlying states. Tests Histories Bowling et al. PSRs and Non-Blind Policies ICML 2006 6 / 18

POMDPs Underlying states. Tests States s 1 s 2 s 3 Tests s 4 Histories Bowling et al. PSRs and Non-Blind Policies ICML 2006 6 / 18

POMDPs Underlying states. Tests Histories correspond to States s 1 belief states. s 2 s 3 Tests s 4 Histories Bowling et al. PSRs and Non-Blind Policies ICML 2006 6 / 18

POMDPs Underlying states. Tests Histories correspond to  b 1 States s 1   belief states. b 2  s 2 b 3 s 3 Tests  History row is a linear  b 4  s 4 combination of state rows. Histories Bowling et al. PSRs and Non-Blind Policies ICML 2006 6 / 18

POMDPs Underlying states. Tests Histories correspond to  b 1 States s 1   belief states. b 2  s 2 b 3 s 3 Tests  History row is a linear  b 4  s 4 combination of state rows. Histories ∴ rank(SDM) ≤ |S| Bowling et al. PSRs and Non-Blind Policies ICML 2006 6 / 18

Predictive State Representations Find linearly independent Tests tests. Histories Bowling et al. PSRs and Non-Blind Policies ICML 2006 7 / 18

Predictive State Representations Find linearly independent Tests tests. q 1 q 2 q 3 “Core Tests” Histories Q Bowling et al. PSRs and Non-Blind Policies ICML 2006 7 / 18

Predictive State Representations Find linearly independent Tests tests. t q 1 q 2 q 3 “Core Tests” Histories Q Any test is a linear combination of core tests. p ( t | h ) = p ( Q | h ) m t                  m t Bowling et al. PSRs and Non-Blind Policies ICML 2006 7 / 18

Predictive State Representations Find linearly independent Tests tests. t q 1 q 2 q 3 “Core Tests” Histories Q Update predictions: p ( aoQ | h ) p ( Q | hao ) = p ( ao | h ) p ( Q | h ) M aoQ =                  p ( Q | h ) m ao m t Bowling et al. PSRs and Non-Blind Policies ICML 2006 7 / 18

Extracting PSRs from Data Bowling et al. PSRs and Non-Blind Policies ICML 2006 8 / 18

What Data? a 1 , o 1 , a 2 , o 2 , . . . , a n , o n Bowling et al. PSRs and Non-Blind Policies ICML 2006 9 / 18

What Data? a 1 , o 1 , a 2 , o 2 , . . . , a n , o n How are actions chosen? Unknown policy. Known policy. Controlled policy. Bowling et al. PSRs and Non-Blind Policies ICML 2006 9 / 18

What Data? a 1 , o 1 , a 2 , o 2 , . . . , a n , o n How are actions chosen? Unknown policy. Known policy. Controlled policy. Note Existing algorithms require a particular control policy. Either: Exhaustively trying history-test pairs, or Random actions. Bowling et al. PSRs and Non-Blind Policies ICML 2006 9 / 18

Extracting PSRs from Data (James & Singh, 2004) (Rosencrantz et al., 2004) (Wolfe et al., 2005) (Wiewiora, 2005) (McCracken & Bowling, 2006) The common formula: Tests Find core tests. Find update parameters. Histories Bowling et al. PSRs and Non-Blind Policies ICML 2006 10 / 18

Extracting PSRs from Data (James & Singh, 2004) (Rosencrantz et al., 2004) (Wolfe et al., 2005) (Wiewiora, 2005) (McCracken & Bowling, 2006) The common formula: Tests Find core tests. Find update parameters. Histories Estimate part of the system dynamics matrix. Bowling et al. PSRs and Non-Blind Policies ICML 2006 10 / 18

Extracting PSRs from Data (James & Singh, 2004) (Rosencrantz et al., 2004) (Wolfe et al., 2005) (Wiewiora, 2005) (McCracken & Bowling, 2006) The common formula: Tests t Find core tests. Find update parameters. Histories Estimate part of the system dynamics matrix. p ( t | h ) ˆ h Estimate a subset of predictions. Bowling et al. PSRs and Non-Blind Policies ICML 2006 10 / 18

Extracting PSRs from Data (James & Singh, 2004) (Rosencrantz et al., 2004) (Wolfe et al., 2005) (Wiewiora, 2005) (McCracken & Bowling, 2006) The common formula: Tests t Find core tests. Find update parameters. Histories Estimate part of the system dynamics matrix. p ( t | h ) ˆ h Estimate a subset of predictions. p • ( t | h ) = # ha 1 o 1 . . . a n o n ˆ # ha 1 . . . a n Bowling et al. PSRs and Non-Blind Policies ICML 2006 10 / 18

Problem � n i =1 Pr( a i | ha 1 o 1 . . . a i − 1 o i − 1 ) E [ˆ p • ( t | h )] = p ( t | h ) � n i =1 Pr( a i | ha 1 . . . a i − 1 ) Definition A policy is blind if actions are selected independent of preceeding observations. I.e., Pr( a n | a 1 , o 1 . . . a n − 1 , o n − 1 ) = Pr( a n | a 1 , . . . , a n ) Observation p • ( t | h ) is only an unbiased estimator of p ( t | h ) if π is blind. ˆ Bowling et al. PSRs and Non-Blind Policies ICML 2006 11 / 18

What Data? a 1 , o 1 , a 2 , o 2 , . . . , a n , o n How are actions chosen? Unknown policy. Known policy. Controlled policy. Bowling et al. PSRs and Non-Blind Policies ICML 2006 12 / 18

Prediction Estimators Policy is Known Policy is Not Known n p π ( t | h ) = # ht 1 # ha 1 o 1 . . . a i o i ˆ � p π ˆ × ( t | h ) = # h π ( t | h ) # ha 1 o 1 . . . a i i =1 Theorem p π ( t | h ) and ˆ ˆ × ( t | h ) are unbiased estimators of p ( t | h ) . p π Bowling et al. PSRs and Non-Blind Policies ICML 2006 13 / 18

Exploration Goal Choose actions to reduce error in the estimated system dynamics matrix. Approach Add intelligent exploration to James & Singh’s “reset” algorithm. Since ˆ p π ( t | h ) is an unbiased estimator, we want to take actions to reduce the variance. Solve as an optimization problem. Bowling et al. PSRs and Non-Blind Policies ICML 2006 14 / 18

Exploration Intuition Find the policy that maximizes the worst-case (over all predictions) bound on the root expected inverse variance. Optimization Problem �� v i − 1 ( h, t ) − 1 + 2 � Maximize: min h,t k i p ( h ) π ( ht ) Subject to: Sequence form constraints on π ( ht ) : π ( φ ) = 1 , 1 π ( h ) = � ∀ h, o ∈ O a π ( hao ) , and 2 ∀ h, a ∈ A , { o, o ′ } ⊆ O π ( hao ) = π ( hao ′ ) . 3 Bowling et al. PSRs and Non-Blind Policies ICML 2006 16 / 18

Results Tiger Paint 0.1 0.1 Non−blind Non−blind Testing Error Testing Error Random Random 0.01 0.01 0.001 0.001 1e−04 1e−04 60000 140000 220000 100000 300000 500000 Sample Size Sample Size Float−reset 0.1 Non−blind Testing Error Random 0.01 0.001 1e−04 0 100000 200000 Sample Size Bowling et al. PSRs and Non-Blind Policies ICML 2006 17 / 18

Summary Contributions Unbiased prediction estimators for non-blind policies. Variance analysis in the case of a known policy. Estimators used in“intelligent” exploration, which was shown can speed learning. Future Work Better objective functions for exploration. Investigate when non-blind exploration proves helpful. Questions? Bowling et al. PSRs and Non-Blind Policies ICML 2006 18 / 18

Learning Predictive State Representations Using Non-Blind Policies - PowerPoint PPT Presentation

Learning Predictive State Representations Using Non-Blind Policies Michael Bowling Peter McCracken Michael James James Neufeld Dana Wilkinson University of Alberta Toyota Technical Center University of Waterloo ICML 2006 Bowling et al.

Nanocrafter Xylem vs. r pl r lv r pl r lv r pl r lv + d - d - d + d BLIND BLIND RATINGS

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

N. Healing two blind men and a deaf mute Matthew 9:27 34 1. Matthew 9:27 These blind

Blind/Visually Impaired Silvia Ludena Veronica Sarabia Katie Stoddard Huong Vo Blind/Visually

The V The V The V The V- - - -30 Drilling Solution 30 Drilling Solution 30 Drilling

Visual disability Low vision 2015 Estimated blind people 2020 Visually impaired 285 M Blind

CANADIANS WHO ARE BLIND, DEAF-BLIND, AND PARTIALLY- SIGHTED Keith D Gordon Ph.D. Senior

Blind Signatures in Scriptless Scripts Jonas Nick jonasd.nick@gmail.com @n1ckler September 4,

Blind Signatures in Scriptless Scripts Jonas Nick jonasd.nick@gmail.com @n1ckler February 17,

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations Fei

61A Lecture 16 Announcements String Representations String Representations 4 String

Blind and Partially Sighted Blind and Partially Sighted People People Lifelong Learning

11/24/2009 Privacy in Location Based Services Where's the Blind Evaluation of Nearest Blind

Blind Beamforming using Randomly Distributed Sensors Kung Yao UCLA DARPA CSP Workshop, Jan. 15,

Predictive Analytics for Capacity Planning HIC 2015 Andrae Gaeth What is predictive

https://xkcd.com/1323/ Cryptocurrencies & Security on the Blockchain Digicash, Part 1:

Deep Integration of Human and Machine Intelligence for Accessibility Jeffrey P. Bigham Carnegie

r trs tt

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

The University of Lisbon at GeoCLEF 2007 Nuno Cardoso, David Cruz, Marcirio Chaves and Mrio J.

Set 2: State-spaces and Uninformed Search ICS 271 Fall 2014 Kalev Kask 271-fall 2014

How does software accessibility work? How to make applications accessible ? (or rather, how to

Blind Image Deconvolution Need for Theoretical . . . Based on Sparsity: Need for Improvement

Learning Predictive State Representations Using Non-Blind Policies - PowerPoint PPT Presentation

Learning Predictive State Representations Using Non-Blind Policies Michael Bowling Peter McCracken Michael James James Neufeld Dana Wilkinson University of Alberta Toyota Technical Center University of Waterloo ICML 2006 Bowling et al.

Nanocrafter Xylem vs. r pl r lv r pl r lv r pl r lv + d - d - d + d BLIND BLIND RATINGS

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

N. Healing two blind men and a deaf mute Matthew 9:27 34 1. Matthew 9:27 These blind

Blind/Visually Impaired Silvia Ludena Veronica Sarabia Katie Stoddard Huong Vo Blind/Visually

The V The V The V The V- - - -30 Drilling Solution 30 Drilling Solution 30 Drilling

Visual disability Low vision 2015 Estimated blind people 2020 Visually impaired 285 M Blind

CANADIANS WHO ARE BLIND, DEAF-BLIND, AND PARTIALLY- SIGHTED Keith D Gordon Ph.D. Senior

Blind Signatures in Scriptless Scripts Jonas Nick jonasd.nick@gmail.com @n1ckler September 4,

Blind Signatures in Scriptless Scripts Jonas Nick jonasd.nick@gmail.com @n1ckler February 17,

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations Fei

61A Lecture 16 Announcements String Representations String Representations 4 String

Blind and Partially Sighted Blind and Partially Sighted People People Lifelong Learning

11/24/2009 Privacy in Location Based Services Where's the Blind Evaluation of Nearest Blind

Blind Beamforming using Randomly Distributed Sensors Kung Yao UCLA DARPA CSP Workshop, Jan. 15,

Predictive Analytics for Capacity Planning HIC 2015 Andrae Gaeth What is predictive

https://xkcd.com/1323/ Cryptocurrencies &amp; Security on the Blockchain Digicash, Part 1:

Deep Integration of Human and Machine Intelligence for Accessibility Jeffrey P. Bigham Carnegie

r trs tt

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

The University of Lisbon at GeoCLEF 2007 Nuno Cardoso, David Cruz, Marcirio Chaves and Mrio J.

Set 2: State-spaces and Uninformed Search ICS 271 Fall 2014 Kalev Kask 271-fall 2014

How does software accessibility work? How to make applications accessible ? (or rather, how to

Blind Image Deconvolution Need for Theoretical . . . Based on Sparsity: Need for Improvement

https://xkcd.com/1323/ Cryptocurrencies & Security on the Blockchain Digicash, Part 1: