Breaking the Curse of Horizon: Infinite-Horizon Off-Policy - PowerPoint PPT Presentation

Mar 18, 2023 •246 likes •323 views

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation Qiang Liu Lihong Li Ziyang Tang Dengyong Zhou Department of Computer Science, The University of Texas at Austin Google Brain (KIR) Liu et al. Breaking

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation Qiang Liu † Lihong Li ‡ Ziyang Tang † Dengyong Zhou ‡ † Department of Computer Science, The University of Texas at Austin ‡ Google Brain (KIR) Liu et al. Breaking the Curse of Horizon 1 / 7
Off-Policy Reinforcement Learning Off-Policy Evaluation : Evaluate a new policy π by only using data from old policy π 0 . Widely useful when running new RL policies is costly or impossible, due to high cost, risk, or ethics, legal concerns: Healthcare Robotic & Control Advertisement, Recommendation Liu et al. Breaking the Curse of Horizon 2 / 7
“Curse of Horizon” Importance Sampling (IS) : Given trajectory τ = { s t , a t } T t =1 ∼ π 0 , T π ( a t | s t ) � R π = E τ ∼ π 0 [ w ( τ ) R ( τ )] , where w ( τ ) = π 0 ( a t | s t ) t =0 The Curse of Horizon: The IS weights w ( τ ) are product of T terms ; T is horizon length. Variance can grow exponentially with T . Problematic for infinite horizon problems ( T = ∞ ). Liu et al. Breaking the Curse of Horizon 3 / 7
Breaking the Curse Key: Apply IS on ( s , a ) pairs, not the whole trajectory τ : w ( s , a ) = d π ( s , a ) R π = E ( s , a ) ∼ d π 0 [ w ( s , a ) r ( s , a )] , where d π 0 ( s , a ) , where d π ( s , a ) is the stationary / average visitation distribution of ( s , a ) under policy π . Stationary density ratio w ( s , a ) : is NOT product of T terms. can be small even for infinite horizon ( T = ∞ ) . But is more difficult to estimate . Liu et al. Breaking the Curse of Horizon 4 / 7
Main Algorithm 1 1.Estimate density ratio by a new minimax objective : ˆ w = min ˆ w ∈W max L ( w , f , D π 0 ) f ∈F 2 2. Value estimation by IS: R π = ˆ ˆ E ( s , a ) ∼ d π 0 [ ˆ w ( s , a ) r ( s , a )] Theoretical guarantees developed for the new minimax objective. Can be kernelized : Inner max has closed form if F is an RKHS. Liu et al. Breaking the Curse of Horizon 5 / 7
Empirical Results 0 0 -2 Log MSE -2 -4 -4 -6 -6 -8 -8 30 50 100 200 1 2 3 4 5 (a) # of Trajectories ( n ) (b) Different Behavior Policies 0 Traffic control -2 Naive Average Log MSE On Policy (oracle) -4 WIS Trajectory-wise (using SUMO simulator [5] ) -6 WIS Step-wise Our Method -8 200 400 600 800 1000 (c) Truncated Length T Liu et al. Breaking the Curse of Horizon 6 / 7
Thank You! Location: Room 210 & 230 AB; Poster #121 Time: Wed Dec 5th 05:00 – 07:00 PM References & Acknowledgment [1] [HLR’16] K. Hofmann, L. Li, and F. Radlinski. Online evaluation for information retrieval. [2] [JL16] N. Jiang and L. Li. Doubly robust off-policy value evaluation for reinforcement learning. [3] [LMS’15] L. Li, R. Munos, and Cs. Szepesvari. Toward minimax off-policy value estimation. [4] [TB’16] P.S. Thomas and E. Brunskill. Data-efficient off-Policy policy evaluation for reinforcement learning. [5] [KEBB’12] D. Krajzewicz, J.Erdmann, M.Behrisch and L.Bieker. Recent development and applications of SUMO-Simulation of Urban MObility. Work supported in part by NSF CRII 1830161 and Google Cloud. Liu et al. Breaking the Curse of Horizon 7 / 7

Recommend

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

Henryk Wo zniakowski Curse of Dimensionality How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and Columbia University 30 Years of IMSM, 1 Henryk Wo zniakowski Curse of Dimensionality Curse of

286 views • 15 slides

$TITLE: M10-2.GMS: Infinite horizon dynamic model, MPS/GE formulation $ONTEXT Converts infinite

C:\jim\COURSES\8858\code-bk 2012\M10-2.gms Tuesday, January 10, 2012 6:00:46 AM Page 1 $TITLE: M10-2.GMS: Infinite horizon dynamic model, MPS/GE formulation $ONTEXT Converts infinite horizon model in to a fixed time horizon MCP Trick is that

751 views • 7 slides

Infinite graphs P eter Komj ath LC12 P eter Komj ath Infinite graphs Infinite

Infinite graphs Infinite graphs P eter Komj ath LC12 P eter Komj ath Infinite graphs Infinite graphs Introduction Graph : ( V , X ), where X [ V ] 2 , V : vertices, X : edges ( W , Y ) is a subgraph of ( V , X ) if W V

403 views • 26 slides

Can Tim or Leste Avoid Can Tim or Leste Avoid the Resource Curse? the Resource Curse? By

Can Tim or Leste Avoid Can Tim or Leste Avoid the Resource Curse? the Resource Curse? By Charles Scheiner, La o Hamutuk o Hamutuk By Charles Scheiner, La TAG Workshop, Dili TAG Workshop, Dili 25 March 2004 25 March 2004 Main points

413 views • 29 slides

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

The curse of dimensionality . . . 1 / 5 The curse of dimensionality . many applications require high dimensional data . . 1 / 5 The curse of dimensionality . many applications require high dimensional data many algorithms become

447 views • 12 slides

Breaking out of the box Understanding rela5onships between learning and assessment Breaking

Dr Nick Saville Breaking out of the box Understanding rela5onships between learning and assessment Breaking out of the box - a metaphor for thinking about rela5onships and interpreta5ons breaking out breaking out of the box of the

895 views • 68 slides

CSS creative by @aganaplocha breaking the norm with CSS creative by @aganaplocha breaking

breaking the norm with CSS creative by @aganaplocha breaking the norm with CSS creative by @aganaplocha breaking the norm with CSS creative by @aganaplocha breaking the norm with CSS creative by @aganaplocha create a web s ite by

1.47k views • 102 slides

Lecture 2: Infinite Horizon and Indefinite Horizon MDPs B9140 Dynamic Programming &

Lecture 2: Infinite Horizon and Indefinite Horizon MDPs B9140 Dynamic Programming & Rienforcement Learning. Prof. Daniel Russo Last time: RL overview and motivation Finite Horizon MDPs: formulation and the DP algorithm Today:

515 views • 35 slides

Breaking the resource curse NICK HOLLAND CEO Gold Fields The African Mining Network

d all e picture and send to back Breaking the resource curse NICK HOLLAND CEO Gold Fields The African Mining Network Johannesburg October 2018 Forward looking statement Certain statements in this document constitute forward

462 views • 33 slides

Concepts for Breaking the Curse of Dimensionality for the Optimal Control HJB Equation Karl

Concepts for Breaking the Curse of Dimensionality for the Optimal Control HJB Equation Karl Kunisch and Daniel Walter University of Graz, and RICAM Linz, Austria RICAM, October 2019 Closed loop optimal control 2 | u ( t ) |

596 views • 26 slides

Breaking the Curse of Cardinality on Bitmap Indexes K. John Wu Kurt Stockinger Arie Shoshani

Breaking the Curse of Cardinality on Bitmap Indexes K. John Wu Kurt Stockinger Arie Shoshani Lawrence Berkeley National Lab University of California http://sdm.lbl.gov/fastbit/ U.S. Department of Energy Contract No. DE-AC02-05CH11231

472 views • 15 slides

Nonlinear infinite-horizon control using generalized Lyapunov equations Tobias Breiten Karl

INSTITUTE OF MATHEMATICS Nonlinear infinite-horizon control using generalized Lyapunov equations AND SCIENTIFIC COMPUTING Nonlinear infinite-horizon control using generalized Lyapunov equations Tobias Breiten Karl Kunisch (KFU, Graz), Laurent

942 views • 56 slides

Infinite Campus Parent Portal Scan and Go https://goo.gl/kNtHrw Infinite Campus Parent Portal

Infinite Campus Parent Portal Scan and Go https://goo.gl/kNtHrw Infinite Campus Parent Portal Tutorial Video Scan the code. Tap on video to play. https://goo.gl/swTsnV Infinite Campus Live Grading Pilot Parent Access Parents will be able

281 views • 16 slides

Happy 103rd birthday, Richard Guy Karl Dilcher Infinite products Infinite products involving

Happy 103rd birthday, Richard Guy Karl Dilcher Infinite products Infinite products involving Dirichlet characters and cyclotomic polynomials Karl Dilcher Dalhousie Number Theory Seminar, Sept. 30, 2019 Karl Dilcher Infinite products Joint

965 views • 69 slides

Infinite dimensional sub-Riemannian geometry Sylvain Arguill` ere (CIS, Johns Hopkins

Finite dimensions An infinite dimensional example Infinite dimensions Infinite dimensional sub-Riemannian geometry Sylvain Arguill` ere (CIS, Johns Hopkins University) Workshop on infinite dimensional Riemannian geometry, University of

525 views • 24 slides

NUMBER OF BIDDERS AND THE NUMBER OF BIDDERS AND THE WINNER S S CURSE: AN EMPIRICAL

NUMBER OF BIDDERS AND THE NUMBER OF BIDDERS AND THE WINNER S S CURSE: AN EMPIRICAL CURSE: AN EMPIRICAL WINNER ANALYSIS ANALYSIS 5th conference on Applied Infrastructure Research, 7 October 2006, Berlin ATHIAS Laure , ATOM, University

390 views • 18 slides

Holographic Lattices, Metals and Insulators Jerome Gauntlett 1311.3292, 1401.5077, 1406.4742,

Holographic Lattices, Metals and Insulators Jerome Gauntlett 1311.3292, 1401.5077, 1406.4742, 1407.xxxx Aristomenis Donos Electrically charged AdS-RN black hole (brane) Describes holographic matter at finite charge density that is

548 views • 23 slides

A curvature bound from gravitational catalysis. Riccardo Martini Friedrich-Schiller-Universit

A curvature bound from gravitational catalysis. Riccardo Martini Friedrich-Schiller-Universit at Jena Based on a joint work with H. Gies: [arXiv:1802.02865] May 15th, 2018 Riccardo Martini (FSU) Curvature Bound May 15th, 2018 1 / 22

499 views • 29 slides

Learn earn CN CNNs Ns fr from om Lar arge ge-scale cale We Web b Im Images ages wi

Learn earn CN CNNs Ns fr from om Lar arge ge-scale cale We Web b Im Images ages wi without hout Hu Human an An Anno notations tations Weilin Huang Malong Technologies Ho How t w to Trai ain a H a High-Perf erfor orma

548 views • 31 slides

Energy for south-east Australia 2018 Annual General Meeting Address by the Chairman Mr John

Energy for south-east Australia 2018 Annual General Meeting Address by the Chairman Mr John Conde AO Address by the Managing Director Mr David Maxwell Important Notice Disclaimer This presentation (Presentation) is issued by Cooper

458 views • 20 slides

Lecture Outline Regeltechniek Previous lecture: Stability and transient response. Lecture 4

Lecture Outline Regeltechniek Previous lecture: Stability and transient response. Lecture 4 Basics of Feedback Control Robert Babu ska Today: Steady-state response. Delft Center for Systems and Control Faculty of Mechanical

388 views • 7 slides

Holographic Q-Lattices and Metal-Insulator Transitions Jerome Gauntlett Aristomenis Donos

Holographic Q-Lattices and Metal-Insulator Transitions Jerome Gauntlett Aristomenis Donos Holographic tools provide a powerful framework for investigating strongly coupled systems using weakly coupled theories of gravity Make contact with real

545 views • 33 slides

Anti-patterns What is an Anti-Pattern? A pattern is a named, proven approach to solving a

Anti-patterns What is an Anti-Pattern? A pattern is a named, proven approach to solving a technical problem in a context with generally positive consequences. . An anti-pattern is a named, common approach to solving a technical problem with

131 views • 11 slides

Robust and structural ergodicity of stochastic reaction networks Corentin Briat and Mustafa

Robust and structural ergodicity of stochastic reaction networks Corentin Briat and Mustafa Khammash - D-BSSE - ETH-Z urich 2017 IFAC World Congress, Toulouse, France Contents 1 Introduction 2 Robust ergodicity of SRNs 3 Structural ergodicity

606 views • 39 slides