Motivation Value Function Approximation Related Work Summary Model-Selection for Non-Parametric Function Approximation: A Case Study in a Smart Energy System Daniel Urieli Peter Stone Department of Computer Science The University of Texas at Austin {urieli,pstone}@cs.utexas.edu ECML 2013 Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Motivation A smart energy problem: Controlling a thermostat for reducing energy consumption in an HVAC a system while maintaining comfort requirements a Heating, Ventilation and Air-Conditioning General Motivation Applying value-function based reinforcement learning (RL) to discrete-time, continuous-control problems Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Discrete-Time, Continuous Control Problems System’s state-space is continuous Control actions are taken at discrete times Further assuming that action-set is small and discrete Examples: Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Value-Function based RL In theory, value-function based RL can solve such problems optimally In practice, it is often unclear how to approximate the value function well enough Indeed, recent successes used direct policy search Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Value-Function based RL In theory, value-function based RL can solve such problems optimally In practice, it is often unclear how to approximate the value function well enough Indeed, recent successes used direct policy search Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Value-Function based RL In theory, value-function based RL can solve such problems optimally In practice, it is often unclear how to approximate the value function well enough Indeed, recent successes used direct policy search Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Value-Function based RL Still, value-function based RL has desirable advantages: Aiming for global optimum Bootstrapping = ⇒ less interactions with the real-world Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Case Study: Smart Thermostat Control Minimize energy consumption while satisfying this comfort specification Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Case Study: Smart Thermostat Control Straightforward turn-off strategy fails to satisfy both requirements Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Value Function Approximation VF for Discrete-Time, Continuous Control Problems Related Work Case Study: Smart Energy System (Problem setup definition) Summary Smart Thermostat Control as an MDP We model the problem as an MDP: S : {� T in , T out , Time �} A : { COOL , OFF , HEAT , AUX } P : computed by the simulator, initially unknown R : − energyConsumedByLastAction − C 6 pm T : { s ∈ S | s . time == 23:59pm } Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Plan For the value-function (VF) approximation part, we need to: Choose a function approximator 1 Choose an algorithm to compute the approximate VF 2 Tune the function approximator’s parameters through 3 model-selection Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results The Challenge of Value-Function Approximation 6.5 6 5.5 5 value 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 state Must differentiate optimal from suboptimal action Non-trivial with “small” action effects + smooth value function = ⇒ losses accumulate over time Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results The Challenge of Value-Function Approximation 6.5 6 5.5 5 value 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 state Must differentiate optimal from suboptimal action Non-trivial with “small” action effects + smooth value function = ⇒ losses accumulate over time Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results The Challenge of Value-Function Approximation 6.5 6 5.5 5 value 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 state Must differentiate optimal from suboptimal action Non-trivial with “small” action effects + smooth value function = ⇒ losses accumulate over time Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Function Approximation Methods 6.5 6 5.5 5 value 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 state Discretization Suffers from the curse of dimensionality at the required resolution levels Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Function Approximation Methods 6.5 6 5.5 5 value 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 state Linear Function Approximation Depends on choosing good features Frequently not clear how to do that Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Function Approximation Methods 6.5 6 5.5 5 4.5 4 3.5 3 2.5 2 −6 −4 −2 0 2 4 6 8 10 12 Non-Parametric: can represent any function Using lots of data... Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Non-Parametric Value Function Approximation 6.5 6.5 6.5 6 6 6 5.5 5.5 5.5 5 5 5 value value 4.5 4.5 4.5 4 4 4 3.5 3.5 3.5 3 3 3 2.5 2.5 2.5 2 2 2 −6 −4 −2 0 2 4 6 8 10 12 −6 −4 −2 0 2 4 6 8 10 12 −6 −4 −2 0 2 4 6 8 10 12 state state To minimize the assumptions about the VF representation we use a smooth, non-parametric function approximator: Locally Weighted Linear Regression (LWR) Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Compute an Approximate VF Using FVI To compute the approximate VF, we use Fitted Value Iteration (FVI): RepeatUntilConvergence { ∀ i ∈ 1 , . . . , m y ( i ) := max a “ ” R ( s ( i ) , a ) + γ E [ s ′ | s ( i ) a ] [ˆ V π ∗ ( s ′ )] “ ” V π ∗ ( s ) := LWR ˆ {� s ( i ) , y ( i ) �| i ∈ 1 , . . . , m } } S FVI := { s ( 1 ) , s ( 2 ) , . . . , s ( m ) } Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Motivation Function Approximation Methods Value Function Approximation FVI Related Work Model-selection Summary Main Results Model-Selection for LWR LWR needs tuning, for instance the kernel bandwidth in 1-d: 6.5 4 6 3.5 5.5 3 5 4.5 2.5 value 4 2 3.5 1.5 3 1 2.5 0.5 2 1.5 0 −6 −4 −2 0 2 4 6 8 10 12 −3 −2 −1 0 1 2 3 state Daniel Urieli, Peter Stone Model-Selection for Non-Parametric Function Approximation
Recommend
More recommend