Markov Decision Processes Model description Extensions Critical Level Policies in Lost Sales Inventory Systems with Different Demand Classes Aleksander Wieczorek 1 , 4 c 1 Emmanuel Hyon 2 , 3 Ana Buˇ si´ 1 INRIA/ENS, Paris, France 2 Universit´ e Paris Ouest Nanterre, Nanterre, France 3 LIP6, UPMC, Paris, France 4 Institute of Computing Science, Poznan University of Technology, Poznan, Poland EPEW, Borrowdale, UK, October 13, 2011 A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Model description Extensions Table of Contents Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon
N phases λ J classes of customers (increasing costs) ... S Stock μ1 μ2 ... μN Markov Decision Processes Model description Extensions Model presentation p1 (cost c1) p2 (cost c2) pJ (cost cJ) A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Definition Model description Optimal control Extensions Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Definition Model description Optimal control Extensions Markov Decision Process Formalism and notation [3] A collection of objects ( X , A , p ( y | x , a ) , c ( x , a )) where: X — state space, X = { 1 , . . . , S } × { 1 , . . . , N } ∪ { ( 0 , 1 ) } , ∀ ( x , k ) ∈ X x — replenishment, k — phase , A — set of actions, A = { 0 , 1 } , 1 — acceptance, 0 — rejection , p ( y | x , a ) — probability of moving to state y from state x when action a is triggered, c ( x , a ) — instantaneous cost in state x when action a is triggered. A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Definition Model description Optimal control Extensions Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Definition Model description Optimal control Extensions Optimal control problem Policy A policy π is a sequence of decision rules that maps the information history (past states and actions) to the action set A . Markov deterministic policy A Markov deterministic policy is of the form ( a ( · ) , a ( · ) , . . . ) where a ( · ) is a single deterministic decision rule that maps the current state to a decision (hence, in our case a ( · ) is a function from X to A ). A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Definition Model description Optimal control Extensions Optimal control problem — optimality criteria Minimal long-run average cost � n − 1 � 1 v ∗ = min � n E π ¯ lim C ( y ℓ , a ℓ ) y π n →∞ ℓ = 0 Policies π ∗ optimising some optimality criteria are called optimal policies (with respect to a given criterion). Goal: characterise optimal policy π ∗ that reaches ¯ v ∗ . A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Definition Model description Optimal control Extensions Optimal control problem — optimality criteria Minimal (expected) n -stage total cost � n − 1 � π ( n ) E π ( n ) � V n ( y ) = min C ( y ℓ , a ℓ ) , y ∈ X , y 0 = y y ℓ = 0 Convergence results [2], [3, Chapter 8] The minimal n -stage total cost value function V n does not converge when n tends to infinity. The difference V n + 1 ( y ) − V n ( y ) converges to the minimal long-run v ∗ ). average cost ( ¯ Relation between different optimality criteria [2], [3, Chapter 8] The optimal n -stage policy (minimizing V n ) tends to the optimal average policy π ∗ (minimizing ¯ v ∗ ) when n tends to infinity. A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Definition Model description Optimal control Extensions Cost value function Bellman equation V n + 1 = TV n where T is the dynamic programming operator: a (ˆ � P ( y ′ | ( y , a )) f ( y ′ ) , ( Tf )( y ) = min Tf )( y , a ) = min C ( y , a ) + a y ′ ∈X Decomposition of T The dynamic programming equation is: � J � � V n ( x , k ) = T unif p i T CA ( i ) ( V n − 1 ) , T D ( V n − 1 ) , (1) i = 1 where V 0 ( x , k ) ≡ 0 and T unif , T CA ( i ) and T D are the different event operators. A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Admission control Model description Policies Extensions Results Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon
J classes of customers (increasing costs) μ1 λ N phases μN ... μ2 Stock ... S Markov Decision Processes Admission control Model description Policies Extensions Results Description of operators p1 (cost c1) p2 (cost c2) pJ (cost cJ) Controlled arrival operator of a customer of class i , T CA ( i ) � min { f ( x + 1 , k ) , f ( x , k ) + c i } if x < S , T CA ( i ) f ( x , k ) = f ( x , k ) + c i if x = S . A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Admission control Model description Policies Extensions Results Description of operators Let µ ′ k = µ k /α . Departure operator, T D � f ( x , k + 1 ) if ( k < N ) and ( x > 0 ) , T D f ( x , k ) = µ ′ k f (( x − 1 ) + , 1 ) if ( k = N ) or ( x = 0 and k = 0 ) + ( 1 − µ ′ k ) f ( x , k ) . Uniformization operator, T unif λ α T unif ( f ( x , k ) , g ( x , k )) = λ + α f ( x , k ) + λ + α g ( x , k ) . A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Admission control Model description Policies Extensions Results Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Admission control Model description Policies Extensions Results Critical level policies Definition (Critical level policy) A policy is called a critical level policy if for any fixed k and any customer class j it exists a level t k , j in x , depending on phase k and customer class j , such that in state ( x , k ) : - for all 0 ≤ x < t k , j it is optimal to accept any customer of class j , - for all x ≥ t k , j it is optimal to reject any customer of class j . A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Admission control Model description Policies Extensions Results Structural properties of policies Assume a critical level policy and consider a decision for a fixed customer class j . Definition (Switching curve) For every k , we define a level t ( k ) = t k , j such that when we are in state ( x , k ) decision 1 is taken if and only if x < t ( k ) and 0 otherwise. The mapping k �→ t ( k ) is called a switching curve . Definition (Monotone switching curve) We say that a decision rule is of the monotone switching curve type if the mapping k �→ t ( k ) is monotone. A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Admission control Model description Policies Extensions Results Example — critical levels, switching curve 10 x − no. of customers in queue 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 k − phase Figure: Acceptance points for different customer classes. Blue circle — all classes are accepted, green triangle — classes 2 and 3 are accepted, pink square — only class 3 is accepted, red asterisk — rejection of any class. A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Admission control Model description Policies Extensions Results Properties of value functions Definition (Convexity) f is convex in x (denoted by Convex ( x ) ) if for all y = ( x , k ) : 2 f ( x + 1 , k ) ≤ f ( x , k ) + f ( x + 2 , k ) . Definition (Submodularity) f is submodular in x and k (denoted by Sub( x , k )) if for all y = ( x , k ) : f ( x + 1 , k + 1 ) + f ( x , k ) ≤ f ( x + 1 , k ) + f ( x , k + 1 ) . Theorem (Th 8.1 [2]) Let a ( y ) be the optimal decision rule: i) If f ∈ Convex ( x ) , then a ( y ) is decreasing in x. ii) If f ∈ Sub ( x , k ) , then a ( y ) is increasing in k. A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Admission control Model description Policies Extensions Results Properties of value functions Definition (Convexity) f is convex in x (denoted by Convex ( x ) ) if for all y = ( x , k ) : 2 f ( x + 1 , k ) ≤ f ( x , k ) + f ( x + 2 , k ) . Definition (Submodularity) f is submodular in x and k (denoted by Sub( x , k )) if for all y = ( x , k ) : f ( x + 1 , k + 1 ) + f ( x , k ) ≤ f ( x + 1 , k ) + f ( x , k + 1 ) . Theorem (Th 8.1 [2]) Let a ( y ) be the optimal decision rule: i) If f ∈ Convex ( x ) , then a ( y ) is decreasing in x. ii) If f ∈ Sub ( x , k ) , then a ( y ) is increasing in k. A. Wieczorek, A. Buˇ si´ c, E. Hyon
Markov Decision Processes Admission control Model description Policies Extensions Results Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon
Recommend
More recommend