Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le - PowerPoint PPT Presentation

Nov 18, 2023 •284 likes •380 views

Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le Learning arning Usi Using ng Li Line nearly arly Ad Additive ditive Fea eatur tures es Lin in F. Yan ang, , Meng ngdi di Wan ang A Basic RL Model: Markov Decision

Sample mple-Opt Optimal imal Pa Para rametric metric Q-Le Learning arning Usi Using ng Li Line nearly arly Ad Additive ditive Fea eatur tures es Lin in F. Yan ang, , Meng ngdi di Wan ang
A Basic RL Model: Markov Decision Process • States: ; Actions: • Reward: • State transition: • Policy: random Effective Horizon: • Optimal policy & value: • -optimal policy :
Curse of Dimensionality • Optimal sample complexity: |S| = 3 361 |S| ≥ 256 256×240 Too many states for How to optimally reduce dimensions? most cases … Exploiting structures!
Parametric Q-Learning On Feature-Based MDP • Transition is decomposable 𝑄 ∈ ℝ 𝑇×𝐵 ×𝑇 Φ Ψ Known Unknown
Parametric Q-Learning On Feature-Based MDP • Transition is decomposable
Parametric Q-Learning On Feature-Based MDP 0.2 0.11 0.3 0.5 0.01
A Simple Regression Based Algorithm • Generative Model: we are able to samples from any ( s,a ) Represent Q-function with parameter 𝑥 ∈ ℝ 𝐿 : 𝑅 𝑥 ≔ 𝑠 𝑡, 𝑏 + 𝛿𝜚 𝑡, 𝑏 ⊤ 𝑥 𝑊 𝑥 𝑡 ≔ max 𝑏∈𝐵 𝑅 𝑥 (𝑡, 𝑏) 𝜌 𝑥 𝑡 ≔ argmax 𝑏∈𝐵 𝑅 𝑥 (𝑡, 𝑏) • Learn 𝑥 with modified Q-learning Sample complexity ( 𝐿 : feature dimension): 𝐿 ෨ 𝑃 𝜗 2 1 − 𝛿 7
Sample Optimality? 𝑄 ⋅ |𝑡 1 , 𝑏 1 • Anchor condition: 𝑄 ⋅ |𝑡 2 , 𝑏 2 𝑄 ⋅ |𝑡, 𝑏 𝑄 ⋅ |𝑡 6 , 𝑏 6 𝑄 ⋅ |𝑡 3 , 𝑏 3 Sample complexity: 𝑄 ⋅ |𝑡 4 , 𝑏 4 𝑄 ⋅ |𝑡 5 , 𝑏 5 𝐿 ෩ Θ 𝜗 2 1 − 𝛿 3 ArXiv: 1902.04779. Poster: 117

Recommend

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample

207 views • 10 slides

OPT WORKSHOP F-1 International Students Global Engagement Office Brooks 221 OPTIONAL PRACTICAL

OPT WORKSHOP F-1 International Students Global Engagement Office Brooks 221 OPTIONAL PRACTICAL TRAINING (OPT) What is OPT? OPT Facts Types of OPT Pre-Completion Post-Completion After Graduation STEM Extension OPT

571 views • 30 slides

Optional Practical Training (OPT) International Students Optional Practical Training (OPT) Today

Optional Practical Training (OPT) International Students Optional Practical Training (OPT) Today we will discuss: What is OPT When to Apply Items Needed for OPT Application How to Apply Important Considerations Social

289 views • 9 slides

(OPT) Office of International Student & Scholar Services OVERVIEW Northwestern University

OPTIONAL PRACTICAL TRAINING (OPT) Office of International Student & Scholar Services OVERVIEW Northwestern University OUTLINE What is OPT? When to apply for OPT How to apply for OPT How to maintain F-1 status during OPT

644 views • 42 slides

TAILORING YOUR DOUBLE OPT-IN EMAILS 14 February 2018 AGENDA DOUBLE OPT-IN MECHANISM 1. 2.

TAILORING YOUR DOUBLE OPT-IN EMAILS 14 February 2018 AGENDA DOUBLE OPT-IN MECHANISM 1. 2. DOUBLE OPT-IN EMAIL BUILDER 3. GUESTFOLIO SIGNUP FORM 4. CLIENT SHOWCASE 5. QUESTIONS GUESTFOLIO DOUBLE OPT-IN Express consent through Guestfolio

352 views • 19 slides

Enrollm llment t Update Opt Out Channel CSR 41% IVR 28% Web 31% Eligible Opt-Out % Opt

Enrollm llment t Update Opt Out Channel CSR 41% IVR 28% Web 31% Eligible Opt-Out % Opt Out Residential 56,500 1,221 2.2% Non-Residential 8,500 479 5.6% Total 65,000 1,700 2.6% Status Date: 6/4/18 1 Valley Clean Energy

263 views • 25 slides

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric space X , d ( i , j ) where d ( i , j ) d ( i , k )+ d ( k , j ) , X is nodes of tree with edge weights d ( i , j ) = d ( j , i ) , and d ( i , j )

376 views • 4 slides

Clique para editar o ttulo Business and Management Plan 2015-2019 mestre __ Clique para

Clique para editar o ttulo Business and Management Plan 2015-2019 mestre __ Clique para editar o texto mestre Clique para editar o texto mestre Press Conference June 29 th , 2015 DISCLAIMER Clique para editar o ttulo FORWARD-LOOKING

456 views • 20 slides

(LPAC, por sus siglas en ingls) Bienvenida Actividad para romper el hielo: Elija un mueco

Capacitacin para padres para la membresa del Capacitacin para padres para la membresa comit de evaluacin del dominio del idioma 2018 2019 2019-2020 (LPAC, por sus siglas en ingls) Bienvenida Actividad para romper el hielo:

324 views • 30 slides

How Unf air is Opt imal Rout ing? Tim Roughgarden Cornell Universit y 1 Traf f ic in Congest

How Unf air is Opt imal Rout ing? Tim Roughgarden Cornell Universit y 1 Traf f ic in Congest ed Net works The Model: A dir ect ed gr aph G = (V,E) A source s and a sink t A rat e r of t raf f ic f rom s t o t For each

297 views • 15 slides

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is the topology generated by the basis = { B d ( X , ) : x X and > 0 } . The set X endowed with this topology is called the metric space (

52 views • 4 slides

OPT F-1 STUDENTS Office of International Programs Justin R. Ball Ricardo Williams Director

OPT F-1 STUDENTS Office of International Programs Justin R. Ball Ricardo Williams Director International Student Advisor jball@shsu.edu rwl062@shsu.edu 936-294-4607 936-294-3892 Optional Practical Training (OPT) What is OPT? OPT

413 views • 24 slides

Tribes to Opt In to USEITI State and Tribal Opt-In Subcommittee March 2016 The Subcommittee used

Methodology for Selecting Tribes to Opt In to USEITI State and Tribal Opt-In Subcommittee March 2016 The Subcommittee used four criteria to identify tribes for potential opt-in. Note that the Subcommittee is in the initial phases of designing

392 views • 6 slides

OPT Basics How to Apply When to Apply What to Expect 1 What Is OPT? O ptional P ractical T

OPT Basics How to Apply When to Apply What to Expect 1 What Is OPT? O ptional P ractical T raining Experiential learning benefit granted to F-1 students to work in their major field of study You can apply for twelve months of OPT for

1.09k views • 65 slides

24-month Extension of OPT for UMN STEM Degrees 24-Month STEM OPT Workshop In this workshop

10/10/2018 24-month Extension of OPT for UMN STEM Degrees 24-Month STEM OPT Workshop In this workshop you will: Learn the rules and requirements of requesting 24-Month STEM OPT authorization documentation Learn the process for

845 views • 45 slides

Understanding the F-1 STEM OPT Extension September 25, 2020 Agenda What is the STEM OPT

Understanding the F-1 STEM OPT Extension September 25, 2020 Agenda What is the STEM OPT Extension? Key points: Eligibility Considerations Form I-983 Training Plan Reporting During STEM OPT Transitions During & After

475 views • 23 slides

An Improved Approximation Algorithm For MMS Allocation Input Output Agents: = {1,2, ,

An Improved Approximation Algorithm For MMS Allocation Input Output Agents: = {1,2, , } A 3/4-Maximin Share (MMS) allocation % , & , , ' where Indivisible items: = {1, 2, , } ( )

264 views • 4 slides

Decision Making Under Decision Making . . . General Set Uncertainty: Proof of This Result

Need for Decision . . . From Interval to Set . . . Additivity: the Main . . . Decision Making . . . Decision Making Under Decision Making . . . General Set Uncertainty: Proof of This Result Remaining Problem Additivity Approach Main Result

519 views • 22 slides

THE PERPETUAL INCOME Live Trading PORTFOLIO October 23, 2017 1 IF YOU LOOK ON YOUR OWN How do

THE PERPETUAL INCOME Live Trading PORTFOLIO October 23, 2017 1 IF YOU LOOK ON YOUR OWN How do you pick PIPC stocks? 2 THE ANSWER IS (STILL) CASH A trade does not a position make 3 THE ANSWER IS CASH Cash a position makes 4 THE

390 views • 13 slides

1Q 2008 FINANCIAL RESULTS 1Q 2008 FINANCIAL RESULTS 21 April 2008 Contents Financial

1Q 2008 FINANCIAL RESULTS 1Q 2008 FINANCIAL RESULTS 21 April 2008 Contents Financial Results Operations Review & Portfolio Performance Capital Management Market Outlook & Review 2 1 Financial Results

353 views • 12 slides

Fully Conglomerable Coherent Upper Conditional Prevision Defined by the Choquet Integral with

ISIPTA 15 9 TH I NTERNATIONAL S YMPOSIUM ON I MPRECISE P ROBABILITY : T HEORIES AND A PPLICATIONS Fully Conglomerable Coherent Upper Conditional Prevision Defined by the Choquet Integral with respect to its Associated Hausdorff Outer Measure

543 views • 30 slides

Sublinear Algorithms Lectures 1 and 2 Sofya Raskhodnikova Penn State University 1 Tentative

Sublinear Algorithms Lectures 1 and 2 Sofya Raskhodnikova Penn State University 1 Tentative Topics Introduction, examples and general techniques. Sublinear-time algorithms for graphs strings basic properties of functions

1.04k views • 64 slides

Numerical semigroups in Sage Christopher ONeill University of California Davis

Numerical semigroups in Sage Christopher ONeill University of California Davis coneill@math.ucdavis.edu Oct 26, 2016 Christopher ONeill (UC Davis) Numerical semigroups Oct 26, 2016 1 / 9 Numerical monoids Definition A numerical

575 views • 42 slides

Non-additive measures and integrals Vicen c Torra March, 2014 IIIA-CSIC (joint work with

LiU 2014 Non-additive measures and integrals Vicen c Torra March, 2014 IIIA-CSIC (joint work with Yasuo Narukawa and Michio Sugeno) Motivation Outline A short motivation Topic. Non-additive measures A generalization of additive

1.49k views • 111 slides