Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 2 0 An action is an improving switch w.r.t. π if it improves the values. -4 0 -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 9
Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 2 0 An action is an improving switch w.r.t. π if it improves the values. It suffices to check whether an action -4 0 is improving for one step w.r.t. the current values. -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 9
Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 6 6 An action is an improving switch w.r.t. π if it improves the values. It suffices to check whether an action -4 0 is improving for one step w.r.t. the current values. -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 9
Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 6 6 An action is an improving switch w.r.t. π if it improves the values. It suffices to check whether an action -4 2 is improving for one step w.r.t. the current values. 1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 9
Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 6 6 An action is an improving switch w.r.t. π if it improves the values. It suffices to check whether an action -4 2 is improving for one step w.r.t. the current values. A policy π ∗ is optimal iff there are no 2 improving switches. Optimal policies -1 simultaneously maximize the values of all states. Oliver Friedmann (LMU) Zadeh Lower Bound 9
Games and Policy Iteration MDPs and linear programming No improving switches for optimal policy π ∗ : � ∀ i ∈ S : val π ∗ ( i ) = max a ∈ A i r a + p a,j val π ∗ ( j ) j ∈ S where A i is the set of actions from state i , r a is the expected reward of using action a , and p a,j is the probability of moving to state j when using action a . Oliver Friedmann (LMU) Zadeh Lower Bound 10
Games and Policy Iteration MDPs and linear programming No improving switches for optimal policy π ∗ : � ∀ i ∈ S : val π ∗ ( i ) = max a ∈ A i r a + p a,j val π ∗ ( j ) j ∈ S where A i is the set of actions from state i , r a is the expected reward of using action a , and p a,j is the probability of moving to state j when using action a . This can be used to formulate an LP for solving the MDP: � minimize v i i ∈ S � s.t. ∀ i ∈ S ∀ a ∈ A i : v i ≥ r a + p a,j v j j ∈ S Oliver Friedmann (LMU) Zadeh Lower Bound 10
Games and Policy Iteration Primal and dual LPs for MDPs � minimize v i i ∈ S � s.t. ∀ i ∈ S ∀ a ∈ A i : v i ≥ r a + p a,j v j j ∈ S � � maximize r a x a i ∈ S a ∈ A i � � � s.t. ∀ i ∈ S : x a = 1 + p a,i x a a ∈ A i j ∈ S a ∈ A j Oliver Friedmann (LMU) Zadeh Lower Bound 11
Games and Policy Iteration Primal and dual LPs for MDPs Flow conservation: � minimize v i i ∈ S x 1 = 1 x 2 = 6 � s.t. ∀ i ∈ S ∀ a ∈ A i : v i ≥ r a + p a,j v j j ∈ S � � maximize r a x a x 3 = 4 x 4 = 2 i ∈ S a ∈ A i � � � s.t. ∀ i ∈ S : x a = 1 + p a,i x a x 1 + x 2 = 1 + x 3 + x 4 a ∈ A i j ∈ S a ∈ A j Oliver Friedmann (LMU) Zadeh Lower Bound 11
Games and Policy Iteration Primal and dual LPs for MDPs Flow conservation: � minimize v i i ∈ S x 1 = 7 x 2 = 0 � s.t. ∀ i ∈ S ∀ a ∈ A i : v i ≥ r a + p a,j v j j ∈ S � � maximize r a x a x 3 = 4 x 4 = 2 i ∈ S a ∈ A i � � � s.t. ∀ i ∈ S : x a = 1 + p a,i x a x 1 + x 2 = 1 + x 3 + x 4 a ∈ A i j ∈ S a ∈ A j Every basic feasible solution corresponds to a policy π . Oliver Friedmann (LMU) Zadeh Lower Bound 11
Games and Policy Iteration Variables of the primal LP t 6 x a is the expected number of x 2 = 1 1 times action a is used, summed 3 2 3 over all starting states. x 4 = 2 2 0 x 1 = 0 x 3 = 0 -4 0 x 5 = 0 x 6 = 1 -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 12
Games and Policy Iteration Variables of the primal LP t 6 x a is the expected number of x 2 = 1 1 times action a is used, summed 3 2 3 over all starting states. x 4 = 2 2 0 We have: x 1 = 0 � � r a x π x 3 = 0 val π ( i ) = a -4 0 a ∈ π i ∈ S x 5 = 0 x 6 = 1 -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 12
Games and Policy Iteration From MDP to LP max − 1 + 2 x 1 − 2 x 3 − x 5 x 2 = 1 − 1 3 x 1 + 2 3 x 3 + 2 s.t. 3 x 5 x 4 = 2 − x 3 − x 5 6 t x 6 = 1 − x 5 x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ≥ 0 x 2 = 1 1 3 2 3 2 0 x 4 = 2 x 1 = 0 x 3 = 0 0 -4 x 5 = 0 x 5 x 1 x 6 = 1 -1 -1 { x 2 , x 4 , x 6 } x 3 Oliver Friedmann (LMU) Zadeh Lower Bound 13
Games and Policy Iteration From MDP to LP max 5 − 6 x 2 + 2 x 3 + 3 x 5 s.t. x 1 = 3 − 3 x 2 + 2 x 3 + 2 x 5 x 4 = 2 − x 3 − x 5 6 t x 6 = 1 − x 5 x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ≥ 0 x 2 = 0 1 3 2 3 6 6 x 4 = 2 x 1 = 3 x 5 x 3 = 0 0 -4 x 5 = 0 { x 1 , x 4 , x 6 } x 3 x 2 x 6 = 1 -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 13
Games and Policy Iteration From MDP to LP max 9 − 6 x 2 − 2 x 4 + x 5 s.t. x 1 = 7 − 3 x 2 − 2 x 4 x 3 = 2 − x 4 − x 5 6 t x 6 = 1 − x 5 x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ≥ 0 x 2 = 0 1 3 2 3 6 6 x 4 = 0 x 1 = 7 x 5 x 3 = 2 2 -4 { x 1 , x 3 , x 6 } x 5 = 0 x 4 x 6 = 1 1 -1 x 2 Oliver Friedmann (LMU) Zadeh Lower Bound 13
Games and Policy Iteration From MDP to LP max 10 − 6 x 2 − 2 x 4 − x 6 s.t. x 1 = 7 − 3 x 2 − 2 x 4 x 3 = 1 − x 4 + x 6 6 t x 5 = 1 − x 6 x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ≥ 0 x 2 = 0 1 3 2 3 6 6 x 4 = 0 { x 1 , x 3 , x 5 } x 4 x 1 = 7 x 2 x 6 x 3 = 1 2 -4 x 5 = 1 x 6 = 0 2 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 13
Games and Policy Iteration Diameter Question: theoretically possible to have polynomially many iterations? Let G be a Markov decision process and n be the number of nodes. Definition: the diameter of G is the least number of iterations required to solve G Small Diameter Theorem The diameter of G is less or equal to n . Oliver Friedmann (LMU) Zadeh Lower Bound 14
Lower Bound for Zadeh’s Rule Lower Bound for Zadeh’s Rule Oliver Friedmann (LMU) Zadeh Lower Bound 15
Lower Bound for Zadeh’s Rule Lower bound construction We define a family of lower bound MDPs G n such that the Least-Entered pivoting rule will simulate an n -bit binary counter. Oliver Friedmann (LMU) Zadeh Lower Bound 16
Lower Bound for Zadeh’s Rule Lower bound construction We define a family of lower bound MDPs G n such that the Least-Entered pivoting rule will simulate an n -bit binary counter. We make use of exponentially growing rewards (and penalties): To get a higher reward the MDP is willing to sacrifice everything that has been built up so far. Oliver Friedmann (LMU) Zadeh Lower Bound 16
Lower Bound for Zadeh’s Rule Lower bound construction We define a family of lower bound MDPs G n such that the Least-Entered pivoting rule will simulate an n -bit binary counter. We make use of exponentially growing rewards (and penalties): To get a higher reward the MDP is willing to sacrifice everything that has been built up so far. Notation: Integer priority p corresponds to reward ( − N ) p , where N = 7 n + 1. . . . < 5 < 3 < 1 < 2 < 4 < 6 < . . . ( − N ) 5 5 for Oliver Friedmann (LMU) Zadeh Lower Bound 16
Lower Bound for Zadeh’s Rule Background The use of priorities is inspired by parity games . Oliver Friedmann (LMU) Zadeh Lower Bound 17
Lower Bound for Zadeh’s Rule Background The use of priorities is inspired by parity games . Friedmann (2009): The strategy iteration algorithm may require exponentially many iterations to solve parity games. Fearnley (2010): The strategy iteration algorithm may require exponentially many iterations to solve MDPs. Oliver Friedmann (LMU) Zadeh Lower Bound 17
Lower Bound for Zadeh’s Rule Background The use of priorities is inspired by parity games . Friedmann (2009): The strategy iteration algorithm may require exponentially many iterations to solve parity games. Fearnley (2010): The strategy iteration algorithm may require exponentially many iterations to solve MDPs. We also first proved a lower bound for parity games and then transferred the result to MDPs and linear programs. Oliver Friedmann (LMU) Zadeh Lower Bound 17
Lower Bound for Zadeh’s Rule Related game-theoretic settings LP-type problems Abstract Concrete Turn-based stochastic games Linear programming 2 1 / 2 players Mean payoff games Markov decision problems 2 players 1 1 / 2 players Parity games Deterministic MDPs 2 players 1 player Oliver Friedmann (LMU) Zadeh Lower Bound 18
Lower Bound for Zadeh’s Rule Related game-theoretic settings LP-type problems Abstract Concrete Turn-based stochastic games Linear programming 2 1 / 2 players Mean payoff games Markov decision problems 2 players 1 1 / 2 players ∈ NP ∩ co NP ∈ P Parity games Deterministic MDPs 2 players 1 player Oliver Friedmann (LMU) Zadeh Lower Bound 18
Lower Bound for Zadeh’s Rule Zadeh’s pivoting rule Zadeh’s Least-Entered rule Perform single switch that has been applied least often. (taken from David Avis’ paper) Oliver Friedmann (LMU) Zadeh Lower Bound 19
Lower Bound for Zadeh’s Rule Tie-Breaking Rule Tie-Breaking Rule = method of selecting a switch in case of a tie (w.r.t. the occurrence record) Proof of Small Diameter Theorem implies: Corollary There is a tie-breaking rule s.t. Zadeh’s rule requires linearly many iterations in the worst-case. Consequence: lower bound construction is equipped with particular tie-breaking rule Oliver Friedmann (LMU) Zadeh Lower Bound 20
Lower Bound for Zadeh’s Rule Binary Counting 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TPrinciple: If a bit can be set, then all bits can be set. y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 T Tie-Breaking: We decide to set the first bit. y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 0 1 R 0 0 1 1 0 0 1 1 0 0 1 1 T Set the second bit and reset the first bit. y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 T Set the first bit again. y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 0 1 1 0 0 1 1 0 0 1 2 0 0 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 1 R R 0 1 1 1 0 1 1 2 0 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 1 0 0 0 1 0 0 0 1 1 2 0 1 0 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 1 0 1 0 1 0 1 0 1 1 3 0 1 0 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 1 1 R 0 1 1 1 0 1 2 3 0 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 1 1 0 0 1 1 0 0 1 2 3 0 1 1 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 0 1 1 1 0 1 1 1 0 1 2 4 0 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 R R R 1 1 1 1 1 1 2 4 1 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 0 0 0 1 0 0 0 1 1 2 4 1 0 0 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 0 0 1 1 0 0 1 1 1 2 5 1 0 0 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 0 1 R 1 0 1 1 1 1 3 5 1 0 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 0 1 0 1 0 1 0 1 1 3 5 1 0 1 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 0 1 1 1 0 1 1 1 1 3 6 1 0 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 1 R R 1 1 1 1 1 2 3 6 1 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 1 0 0 1 1 0 0 1 2 3 6 1 1 0 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 1 0 1 1 1 0 1 1 2 3 7 1 1 0 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 1 1 R 1 1 1 1 1 2 4 7 1 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 1 1 0 1 1 1 0 1 2 4 7 1 1 1 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting 1 1 1 1 1 1 1 1 1 2 4 8 1 1 1 1 T Problem: Occurrence record unbalanced! y Oliver Friedmann (LMU) Zadeh Lower Bound 21
Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TLet’s do it again - watch the occurrence record this time! y Oliver Friedmann (LMU) Zadeh Lower Bound 22
Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T Everything okay so far... y Oliver Friedmann (LMU) Zadeh Lower Bound 22
Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 T Everything okay so far... y Oliver Friedmann (LMU) Zadeh Lower Bound 22
Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 1 R 0 0 1 1 0 0 1 1 0 0 1 1 T Everything okay so far... y Oliver Friedmann (LMU) Zadeh Lower Bound 22
Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 T Problem: We have to set one of the higher bits now! y Oliver Friedmann (LMU) Zadeh Lower Bound 22
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TReplace gadget by two-bit, conjunctive structure. y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T Gadget is set iff both edges are going in. y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 T Set one improving edge of every gadget. y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 T Set other improving edge of first gadget. y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 T Other gadgets have updated to their old setting. y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 T Set one improving edge of every gadget again. y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 R 0 0 1 1 1 1 2 1 0 0 1 1 1 1 1 1 1 1 1 1 T Set other improving edge of second gadget. y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 0 0 0 1 0 1 1 2 1 0 0 1 0 1 1 1 1 0 0 1 0 T Reset all other gadgets. y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 0 1 1 1 1 2 2 2 2 0 0 1 0 1 1 1 1 0 0 1 0 T Everything okay so far, continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 1 1 1 1 1 2 2 2 2 0 0 1 1 1 1 1 2 0 0 1 1 T Everything okay so far, continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 1 0 0 1 1 2 2 2 2 0 0 1 1 1 1 1 2 0 0 1 1 T Everything okay so far, continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 23
Recommend
More recommend