CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov - PowerPoint PPT Presentation

Nov 20, 2022 •120 likes •201 views

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put] Sec. 11.1-11.3 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Hierarchical RL Hierarchy of goals Reach and actions in Destination

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put] Sec. 11.1-11.3 University of Waterloo CS885 Spring 2018 Pascal Poupart 1
Hierarchical RL • Hierarchy of goals Reach and actions in Destination autonomous driving Reach A Reach B Reach C Turn Overtake Stop Park Break Gas Steering • Theory: Semi-Markov Decision Processes University of Waterloo CS885 Spring 2018 Pascal Poupart 2
Semi-Markov Process • Definition – Set of States: ! – Transition dynamics: Pr $ % , ' $ = Pr $ % $ Pr ' $ where ' indicates the time to transition • Semi-Markovian: – Next state depends only on current state – Time spent in each state varies '′′ ' '′ $ $′ $′′ $′′′ University of Waterloo CS885 Spring 2018 Pascal Poupart 3
Semi-Markov Decision Process • Definition – Set of states: ! – Set of actions: " – Transition model: Pr(& ' ,)|&,+) – Reward model: - &,+ = /[1|&,+] – Discount factor: 0 ≤ 5 ≤ 1 • discounted: 5 < 1 undiscounted: 5 = 1 – Horizon (i.e., # of time steps): ℎ • Finite horizon: ℎ ∈ ℕ infinite horizon: ℎ = ∞ • Goal: find optimal policy University of Waterloo CS885 Spring 2018 Pascal Poupart 4
Example from Queuing Theory • Consider a retail store with two queues: – Customer service queue – Cashier queue • Semi-Markov decision process – State: ! = ($ % , $ ' ) where $ ) = # of customers in queue + – Action: , ∈ {1,2} (i.e., serve customer in queue 1 or 2) – Transition model: distribution over arrival and service times for customers in each queue. – Reward model: expected revenue of each serviced customer – expected cost associated with waiting times – Discount factor: 0 ≤ 4 < 1 – Horizon (i.e., # of time steps): ℎ = ∞ University of Waterloo CS885 Spring 2018 Pascal Poupart 5
Value Function and Policy • Objective: ! " # = ∑ & ' ( ) * + # ( ) ,-(# ( ) ) – Where 0 & = 1 2 + 1 4 + ⋯+ 1 & – Optimal policy: - ∗ such that ! " ∗ # ≥ ! " # ∀#,- • Bellman’s equation: ! ∗ # = max Pr # J ,1 #,C ' G ! ∗ (# J ) B + #,C + D E F ,G • Q-learning update: K #,C ← K #,C + M N + ' G max B F K # J ,C J − K(#,C) University of Waterloo CS885 Spring 2018 Pascal Poupart 6
Option Framework • Semi-Markov decision process where actions are options (temporally extended sub-policies) • Let ! be an option with sub-policy " and terminal states # $%& ∀( )*+ ∈ # $%& : Pr ( )*+ , 0 ( ) , ! = +CB Pr ( )*@ ( )*@CB , " ( )*@CB ∑ 3 456:45896 ∉ ; <=> ∏ @AB + F ∑ 3 456 Pr ( )*B ( ) , " ( ) D ( ) , !, ( )*+ , 0 = D ( ) , " ( ) +⋯F∑ 3 458 Pr ( )*+ ( )*+CB ,"(( )*+CB ) D ( )*B ," ( )*B D ( )*+ ," ( )*+ … University of Waterloo CS885 Spring 2018 Pascal Poupart 7
Option Framework • Bellman’s equation: ! ∗ # = max Pr # 0 ,1 #,2 [4 #,2,# 0 ,1 + 6 - ! ∗ (# 0 )] ( ) * + ,- • Q-learning update: : #,2 ← : #,2 + < = - + 6 - max ( + : # 0 ,2 0 − :(#,2) - 6 @ C where = - = ∑ @AB @ University of Waterloo CS885 Spring 2018 Pascal Poupart 8

Recommend

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1,

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1, [Sze] Chapter 1 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Introduction to Reinforcement Learning Course website and

424 views • 14 slides

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning Haarnoja, Tang et al. (2017) Reinforcement Learning with Deep Energy Based Policies, ICML . Haarnoja, Zhou et al. (2018) Soft Actor-Critic: Off-Policy

684 views • 24 slides

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Recurrent neural networks Long short term memory (LSTM) networks Deep

119 views • 11 slides

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Minimax search Evaluation functions Alpha-beta pruning University

662 views • 26 slides

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and Wright, Chapter 4] University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Op#miza#on in ML It is common to formulate ML problems as optimization

210 views • 7 slides

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7,

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7, [Sze] Sec. 4.2.1-4.2.2 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Exploration/exploitation tradeoff Regret

228 views • 20 slides

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar] Sec. 2.9 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Bayesian bandits Thompson sampling Contextual bandits

464 views • 22 slides

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Quick recap Markov Decision Processes: value iteration ( " + * ,- Pr "

624 views • 18 slides

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar]

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar] Chap. 3, [Sze] Chap. 2, [RusNor] Sec. 17.1-17.2, 17.4, [Put] Chap. 2, 4, 5 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Markov

509 views • 17 slides

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Environment dynamics Stochastic processes Markovian assumption

156 views • 14 slides

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7, [Sze] Sec. 4.3.2 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Value Function Approximation Linear approximation

178 views • 16 slides

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper by Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016) Presented by Yan Shi Outline 1. Introduction 2. Background 3. Algorithms

966 views • 27 slides

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

587 views • 27 slides

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885 Reinforcement Learning Paper by: P. Zhang, L. Xiong, Z. Yu, P. Fang, S. Yan, J. Yao, and Y. Zhou (Sensors 2019) Presented by: Neel Bhatt Context and

286 views • 14 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.42k views • 88 slides

Fitting theory into reality in the ALTQ case Kenjiro Cho kjc@csl.sony.co.jp Sony Computer

Fitting theory into reality in the ALTQ case Kenjiro Cho kjc@csl.sony.co.jp Sony Computer Science Labs, Inc. WIDE Project Japan Advanced Institute of Science and Technology is QoS (Quality of Service) still needed? some people claim -

520 views • 31 slides

Networked Servers Subject to MMPP Arrival Process Bruno Ciciani, Andrea Santoro, Paolo Romano

Approximate Analytical Models for Networked Servers Subject to MMPP Arrival Process Bruno Ciciani, Andrea Santoro, Paolo Romano Computer Engineering Department, University of Rome La Sapienza 1 Main research project: SLA and penalty

406 views • 37 slides

CS 3640: Introduction to Networks and Their Applications Fall 2018, Lecture 4: Packet switching

CS 3640: Introduction to Networks and Their Applications Fall 2018, Lecture 4: Packet switching performance metrics Instructor: Rishab Nithyanand Teaching Assistant: Md. Kowsar Hossain 1 You should Be checking Piazza regularly for

559 views • 32 slides

Algorithms Theory 07 Binomial Queues Prof. Dr. S. Albers Priority queues: operations

Algorithms Theory 07 Binomial Queues Prof. Dr. S. Albers Priority queues: operations (Priority) queue Q Data structure for maintaining a set of elements, each having an associated priority from a totally ordered universe. The following

457 views • 23 slides

Delay Analysis of Multihop Cognitive Radio Networks Using Network of Virtual Priority Queues

Delay Analysis of Multihop Cognitive Radio Networks Using Network of Virtual Priority Queues Dibakar Das Alhussein A. Abouzeid Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute Troy, NY 12180 Emails: dasd2@rpi.edu,

945 views • 47 slides

Why is word2vec so fast? Efficiency tricks for neural nets Taylor Berg-Kirkpatrick Site

CS11-747 Neural Networks for NLP Why is word2vec so fast? Efficiency tricks for neural nets Taylor Berg-Kirkpatrick Site https://phontron.com/class/nn4nlp2017/ Glamorous Life of an AI Scientist Perception Reality Waiting. Photo Credit:

641 views • 34 slides

1 4 Quick review of IPH & guns Women are ~2.5 times as likely to be shot and killed by a

1 Nonfatal intimate partner violence: The special role of firearms Susan B. Sorenson, Ph.D ., & Emily Rothman, Sc.D. March 21, 2019 Battered Womens Justice Project This project was supported by Grant No. 2016-TA-AX-K047 awarded by the

490 views • 10 slides

Cross application asset Cross application asset creation for Lair: From creation for Lair: From

Cross application asset Cross application asset creation for Lair: From creation for Lair: From characters to Clouds characters to Clouds Mark Teare Mark Teare Session Overview Session Overview The transition to next gen brought with it

302 views • 16 slides