Universal Value Function Approximators Tom Schaul, Dan Horgan, - PowerPoint PPT Presentation

Universal Value Function Approximators Tom Schaul, Dan Horgan, Karol Gregor, Dave Silver

Motivation Forecasts about the environment • = temporally abstract predictions (questions) • not necessarily related to reward (unsupervised) • conditioned on a behavior • (aka GVFs, nexting) • many of them Why? • better, richer representations (features) • decomposition, modularity • temporally abstract planning, long horizons

Example forecasts • Hitting the wall • if the agent aims for the nearest wall • if the agent goes for the door • Remaining time on battery • if the agent stands still • if the agent keeps moving • Luminosity increase • if the agent presses the light switch • if the agent waits for sunrise

Concretely, for this work: Subgoal forecasts • Reaching any of a set of states, then • the episode terminates ( γ = 0) • and a pseudo-reward of 1 is given • Various time-horizons induced by γ • Q-values are for the optimal policy that tries to reach the subgoal (alignment) Neural networks as function approximators

Combinatorial numbers of subgoals Why? • because the environment admits tons of predictions • any of them could be useful for the task How? • efficiency • sub-linear cost in the number of subgoals • exploit shared structure in value space • generalize to similar subgoals

Outline • Motivation • learn values for forecasts • efficiently for many subgoals • Approach • new architecture • one neat trick • Results

Universal Value Function Approximator • a single neural network producing Q(s, a; g) • for many subgoals g • generalize between subgoals • compact • UVFA (“you-fah”)

UVFA architectures • Vanilla (monolithic) • Two-stream • separate embeddings φ and ψ for states and subgoals • Q-values = dot-product of embeddings • (works better)

UVFA learning • Method 1: bootstrapping • some stability issues • Method 2: • built training set of subgoal values • train with supervised objective • like neuro-fitted Q-learning • (works better)

Outline • Motivation • learn values for forecasts • efficiently for many subgoals • Approach • new architecture: UVFA • one neat trick • Results

Trick for supervised UVFA learning: FLE Stage 1: F actorize Stage 2: L earn E mbeddings +

Stage 1: Factorize (low-rank) ● target embeddings for states and goals ~ x =

Stage 2: Learn Embeddings ● regression from state/ subgoal features to target embeddings (optional Stage 3): end-to-end fine-tuning s,a

FLE vs end-to-end regression ● between 10x and 100x faster

Outline • Motivation • learn values for forecasts • efficiently for many subgoals • Approach • new architecture: UVFA • one neat trick: FLE • Results

Results: Low-rank is enough

Results: Low-rank embeddings

Results: Generalizing to new subgoals

Results: Extrapolation even to subgoals in unseen fourth room: truth UVFA

Results: Transfer to new subgoals Refining UVFA is much faster than learning from scratch

Results: Pacman pellet subgoals training set test set

Results: pellet subgoal values (test set) “truth” UVFA generalization

Summary • UVFA • compactly represent values for many subgoals • generalization, even extrapolation • transfer learning • FLE • a trick for efficiently training UVFAs • side-effect: interesting embedding spaces • scales to complex domains (Pacman from raw vision) Details: see our paper at ICML 2015

Universal Value Function Approximators Tom Schaul, Dan Horgan, - PowerPoint PPT Presentation

Universal Value Function Approximators Tom Schaul, Dan Horgan, Karol Gregor, Dave Silver Motivation Forecasts about the environment = temporally abstract predictions (questions) not necessarily related to reward (unsupervised)

Fuzzy Systems Are Universal . . . Universal Approximators Often, We Can Only . . . Main Idea:

floating-point function approximators in FPGAs David B. Thomas Imperial College London 1 David

Adding Aerosol Cans to the Universal Waste Regulations Where does Universal Waste fit? HAZARDOUS

UNIVERSAL ROBOTS RUC 2018 Universal Robots - Evolving the future UNIVERSAL ROBOTS SET THE

Tech Day: Universal Acceptance Mark van rek Universal Acceptance Todays Objectives

Universal Credit Universal Credit Universal Credit is for working-age people aged over 18 and

Universal Acceptance Quick Guide What Does Universal Acceptance Mean? ACCEPT Universal

North West Landlords Forum Universal Credit June 2014 Universal Credit Current position

V-PLC9000 Product Series Veesta Universal PLC & Veesta Universal PLC & Universal PLC

A Universal Language A Universal Language Scheme. It contains terms and rules describing

A Universal Language A Universal Language Scheme. It contains terms and rules describing

without Universal Functions of cardinality 1 Juris Stepr ans Fields Retrospective

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Function Calls Function Calls Python supports expressions with math-like functions A

Evaluation function Cost function g g Evaluation function Cost function expand vertex

Universal Access to Universal Access to Antiretroviral Therapy: Therapy: Antiretroviral The

Determinants Artem Los (arteml@kth.se) February 6th, 2017 Artem Los (arteml@kth.se)

Enabling holographic media for future applications: Missing pieces and limitations in networks

TAPS-related topics from the NEAT project Naeem Khademi (NEAT project) TAPS WG - IETF 97 Seoul-

Some lessons Matthew taught me Luca Aceto ICE-TCS, School of Computer Science, Reykjavik

Black holes in the 1/D expansion Roberto Emparan ICREA & UBarcelona w/ Tetsuya Shiromizu,

Whole of Hospital Program and NEAT Prepared by Luke Worth February 2013 Could no longer

Transport Evolution on top of the BSD's [tj] tj@enoti.me NEAT is funded by the European

Mitigating the Compiler Optimization Phase-Ordering Problem using Machine Learning Sameer

Universal Value Function Approximators Tom Schaul, Dan Horgan, - PowerPoint PPT Presentation

Universal Value Function Approximators Tom Schaul, Dan Horgan, Karol Gregor, Dave Silver Motivation Forecasts about the environment = temporally abstract predictions (questions) not necessarily related to reward (unsupervised)

Fuzzy Systems Are Universal . . . Universal Approximators Often, We Can Only . . . Main Idea:

floating-point function approximators in FPGAs David B. Thomas Imperial College London 1 David

Adding Aerosol Cans to the Universal Waste Regulations Where does Universal Waste fit? HAZARDOUS

UNIVERSAL ROBOTS RUC 2018 Universal Robots - Evolving the future UNIVERSAL ROBOTS SET THE

Tech Day: Universal Acceptance Mark van rek Universal Acceptance Todays Objectives

Universal Credit Universal Credit Universal Credit is for working-age people aged over 18 and

Universal Acceptance Quick Guide What Does Universal Acceptance Mean? ACCEPT Universal

North West Landlords Forum Universal Credit June 2014 Universal Credit Current position

V-PLC9000 Product Series Veesta Universal PLC &amp; Veesta Universal PLC &amp; Universal PLC

A Universal Language A Universal Language Scheme. It contains terms and rules describing

A Universal Language A Universal Language Scheme. It contains terms and rules describing

without Universal Functions of cardinality 1 Juris Stepr ans Fields Retrospective

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Function Calls Function Calls Python supports expressions with math-like functions A

Evaluation function Cost function g g Evaluation function Cost function expand vertex

Universal Access to Universal Access to Antiretroviral Therapy: Therapy: Antiretroviral The

Determinants Artem Los (arteml@kth.se) February 6th, 2017 Artem Los (arteml@kth.se)

Enabling holographic media for future applications: Missing pieces and limitations in networks

TAPS-related topics from the NEAT project Naeem Khademi (NEAT project) TAPS WG - IETF 97 Seoul-

Some lessons Matthew taught me Luca Aceto ICE-TCS, School of Computer Science, Reykjavik

Black holes in the 1/D expansion Roberto Emparan ICREA &amp; UBarcelona w/ Tetsuya Shiromizu,

Whole of Hospital Program and NEAT Prepared by Luke Worth February 2013 Could no longer

Transport Evolution on top of the BSD's [tj] tj@enoti.me NEAT is funded by the European

Mitigating the Compiler Optimization Phase-Ordering Problem using Machine Learning Sameer

V-PLC9000 Product Series Veesta Universal PLC & Veesta Universal PLC & Universal PLC

Black holes in the 1/D expansion Roberto Emparan ICREA & UBarcelona w/ Tetsuya Shiromizu,