a subexponential lower bound for zadeh s pivoting rule
play

A subexponential lower bound for Zadehs pivoting rule for solving - PowerPoint PPT Presentation

A subexponential lower bound for Zadehs pivoting rule for solving linear programs and games Oliver Friedmann Department of Computer Science, Ludwig-Maximilians-Universit at Munich, Germany. Oliver Friedmann (LMU) Zadeh Lower Bound 1


  1. Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 2 0 An action is an improving switch w.r.t. π if it improves the values. -4 0 -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 9

  2. Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 2 0 An action is an improving switch w.r.t. π if it improves the values. It suffices to check whether an action -4 0 is improving for one step w.r.t. the current values. -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 9

  3. Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 6 6 An action is an improving switch w.r.t. π if it improves the values. It suffices to check whether an action -4 0 is improving for one step w.r.t. the current values. -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 9

  4. Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 6 6 An action is an improving switch w.r.t. π if it improves the values. It suffices to check whether an action -4 2 is improving for one step w.r.t. the current values. 1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 9

  5. Games and Policy Iteration Policies and corresponding values A policy π is a choice of an action from each state. The value val π ( i ) of a state i ∈ S for 6 t a policy π , is the expected sum of rewards obtained when moving 1 according to π , starting from i . 3 2 3 6 6 An action is an improving switch w.r.t. π if it improves the values. It suffices to check whether an action -4 2 is improving for one step w.r.t. the current values. A policy π ∗ is optimal iff there are no 2 improving switches. Optimal policies -1 simultaneously maximize the values of all states. Oliver Friedmann (LMU) Zadeh Lower Bound 9

  6. Games and Policy Iteration MDPs and linear programming No improving switches for optimal policy π ∗ : � ∀ i ∈ S : val π ∗ ( i ) = max a ∈ A i r a + p a,j val π ∗ ( j ) j ∈ S where A i is the set of actions from state i , r a is the expected reward of using action a , and p a,j is the probability of moving to state j when using action a . Oliver Friedmann (LMU) Zadeh Lower Bound 10

  7. Games and Policy Iteration MDPs and linear programming No improving switches for optimal policy π ∗ : � ∀ i ∈ S : val π ∗ ( i ) = max a ∈ A i r a + p a,j val π ∗ ( j ) j ∈ S where A i is the set of actions from state i , r a is the expected reward of using action a , and p a,j is the probability of moving to state j when using action a . This can be used to formulate an LP for solving the MDP: � minimize v i i ∈ S � s.t. ∀ i ∈ S ∀ a ∈ A i : v i ≥ r a + p a,j v j j ∈ S Oliver Friedmann (LMU) Zadeh Lower Bound 10

  8. Games and Policy Iteration Primal and dual LPs for MDPs � minimize v i i ∈ S � s.t. ∀ i ∈ S ∀ a ∈ A i : v i ≥ r a + p a,j v j j ∈ S � � maximize r a x a i ∈ S a ∈ A i � � � s.t. ∀ i ∈ S : x a = 1 + p a,i x a a ∈ A i j ∈ S a ∈ A j Oliver Friedmann (LMU) Zadeh Lower Bound 11

  9. Games and Policy Iteration Primal and dual LPs for MDPs Flow conservation: � minimize v i i ∈ S x 1 = 1 x 2 = 6 � s.t. ∀ i ∈ S ∀ a ∈ A i : v i ≥ r a + p a,j v j j ∈ S � � maximize r a x a x 3 = 4 x 4 = 2 i ∈ S a ∈ A i � � � s.t. ∀ i ∈ S : x a = 1 + p a,i x a x 1 + x 2 = 1 + x 3 + x 4 a ∈ A i j ∈ S a ∈ A j Oliver Friedmann (LMU) Zadeh Lower Bound 11

  10. Games and Policy Iteration Primal and dual LPs for MDPs Flow conservation: � minimize v i i ∈ S x 1 = 7 x 2 = 0 � s.t. ∀ i ∈ S ∀ a ∈ A i : v i ≥ r a + p a,j v j j ∈ S � � maximize r a x a x 3 = 4 x 4 = 2 i ∈ S a ∈ A i � � � s.t. ∀ i ∈ S : x a = 1 + p a,i x a x 1 + x 2 = 1 + x 3 + x 4 a ∈ A i j ∈ S a ∈ A j Every basic feasible solution corresponds to a policy π . Oliver Friedmann (LMU) Zadeh Lower Bound 11

  11. Games and Policy Iteration Variables of the primal LP t 6 x a is the expected number of x 2 = 1 1 times action a is used, summed 3 2 3 over all starting states. x 4 = 2 2 0 x 1 = 0 x 3 = 0 -4 0 x 5 = 0 x 6 = 1 -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 12

  12. Games and Policy Iteration Variables of the primal LP t 6 x a is the expected number of x 2 = 1 1 times action a is used, summed 3 2 3 over all starting states. x 4 = 2 2 0 We have: x 1 = 0 � � r a x π x 3 = 0 val π ( i ) = a -4 0 a ∈ π i ∈ S x 5 = 0 x 6 = 1 -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 12

  13. Games and Policy Iteration From MDP to LP max − 1 + 2 x 1 − 2 x 3 − x 5 x 2 = 1 − 1 3 x 1 + 2 3 x 3 + 2 s.t. 3 x 5 x 4 = 2 − x 3 − x 5 6 t x 6 = 1 − x 5 x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ≥ 0 x 2 = 1 1 3 2 3 2 0 x 4 = 2 x 1 = 0 x 3 = 0 0 -4 x 5 = 0 x 5 x 1 x 6 = 1 -1 -1 { x 2 , x 4 , x 6 } x 3 Oliver Friedmann (LMU) Zadeh Lower Bound 13

  14. Games and Policy Iteration From MDP to LP max 5 − 6 x 2 + 2 x 3 + 3 x 5 s.t. x 1 = 3 − 3 x 2 + 2 x 3 + 2 x 5 x 4 = 2 − x 3 − x 5 6 t x 6 = 1 − x 5 x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ≥ 0 x 2 = 0 1 3 2 3 6 6 x 4 = 2 x 1 = 3 x 5 x 3 = 0 0 -4 x 5 = 0 { x 1 , x 4 , x 6 } x 3 x 2 x 6 = 1 -1 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 13

  15. Games and Policy Iteration From MDP to LP max 9 − 6 x 2 − 2 x 4 + x 5 s.t. x 1 = 7 − 3 x 2 − 2 x 4 x 3 = 2 − x 4 − x 5 6 t x 6 = 1 − x 5 x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ≥ 0 x 2 = 0 1 3 2 3 6 6 x 4 = 0 x 1 = 7 x 5 x 3 = 2 2 -4 { x 1 , x 3 , x 6 } x 5 = 0 x 4 x 6 = 1 1 -1 x 2 Oliver Friedmann (LMU) Zadeh Lower Bound 13

  16. Games and Policy Iteration From MDP to LP max 10 − 6 x 2 − 2 x 4 − x 6 s.t. x 1 = 7 − 3 x 2 − 2 x 4 x 3 = 1 − x 4 + x 6 6 t x 5 = 1 − x 6 x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ≥ 0 x 2 = 0 1 3 2 3 6 6 x 4 = 0 { x 1 , x 3 , x 5 } x 4 x 1 = 7 x 2 x 6 x 3 = 1 2 -4 x 5 = 1 x 6 = 0 2 -1 Oliver Friedmann (LMU) Zadeh Lower Bound 13

  17. Games and Policy Iteration Diameter Question: theoretically possible to have polynomially many iterations? Let G be a Markov decision process and n be the number of nodes. Definition: the diameter of G is the least number of iterations required to solve G Small Diameter Theorem The diameter of G is less or equal to n . Oliver Friedmann (LMU) Zadeh Lower Bound 14

  18. Lower Bound for Zadeh’s Rule Lower Bound for Zadeh’s Rule Oliver Friedmann (LMU) Zadeh Lower Bound 15

  19. Lower Bound for Zadeh’s Rule Lower bound construction We define a family of lower bound MDPs G n such that the Least-Entered pivoting rule will simulate an n -bit binary counter. Oliver Friedmann (LMU) Zadeh Lower Bound 16

  20. Lower Bound for Zadeh’s Rule Lower bound construction We define a family of lower bound MDPs G n such that the Least-Entered pivoting rule will simulate an n -bit binary counter. We make use of exponentially growing rewards (and penalties): To get a higher reward the MDP is willing to sacrifice everything that has been built up so far. Oliver Friedmann (LMU) Zadeh Lower Bound 16

  21. Lower Bound for Zadeh’s Rule Lower bound construction We define a family of lower bound MDPs G n such that the Least-Entered pivoting rule will simulate an n -bit binary counter. We make use of exponentially growing rewards (and penalties): To get a higher reward the MDP is willing to sacrifice everything that has been built up so far. Notation: Integer priority p corresponds to reward ( − N ) p , where N = 7 n + 1. . . . < 5 < 3 < 1 < 2 < 4 < 6 < . . . ( − N ) 5 5 for Oliver Friedmann (LMU) Zadeh Lower Bound 16

  22. Lower Bound for Zadeh’s Rule Background The use of priorities is inspired by parity games . Oliver Friedmann (LMU) Zadeh Lower Bound 17

  23. Lower Bound for Zadeh’s Rule Background The use of priorities is inspired by parity games . Friedmann (2009): The strategy iteration algorithm may require exponentially many iterations to solve parity games. Fearnley (2010): The strategy iteration algorithm may require exponentially many iterations to solve MDPs. Oliver Friedmann (LMU) Zadeh Lower Bound 17

  24. Lower Bound for Zadeh’s Rule Background The use of priorities is inspired by parity games . Friedmann (2009): The strategy iteration algorithm may require exponentially many iterations to solve parity games. Fearnley (2010): The strategy iteration algorithm may require exponentially many iterations to solve MDPs. We also first proved a lower bound for parity games and then transferred the result to MDPs and linear programs. Oliver Friedmann (LMU) Zadeh Lower Bound 17

  25. Lower Bound for Zadeh’s Rule Related game-theoretic settings LP-type problems Abstract Concrete Turn-based stochastic games Linear programming 2 1 / 2 players Mean payoff games Markov decision problems 2 players 1 1 / 2 players Parity games Deterministic MDPs 2 players 1 player Oliver Friedmann (LMU) Zadeh Lower Bound 18

  26. Lower Bound for Zadeh’s Rule Related game-theoretic settings LP-type problems Abstract Concrete Turn-based stochastic games Linear programming 2 1 / 2 players Mean payoff games Markov decision problems 2 players 1 1 / 2 players ∈ NP ∩ co NP ∈ P Parity games Deterministic MDPs 2 players 1 player Oliver Friedmann (LMU) Zadeh Lower Bound 18

  27. Lower Bound for Zadeh’s Rule Zadeh’s pivoting rule Zadeh’s Least-Entered rule Perform single switch that has been applied least often. (taken from David Avis’ paper) Oliver Friedmann (LMU) Zadeh Lower Bound 19

  28. Lower Bound for Zadeh’s Rule Tie-Breaking Rule Tie-Breaking Rule = method of selecting a switch in case of a tie (w.r.t. the occurrence record) Proof of Small Diameter Theorem implies: Corollary There is a tie-breaking rule s.t. Zadeh’s rule requires linearly many iterations in the worst-case. Consequence: lower bound construction is equipped with particular tie-breaking rule Oliver Friedmann (LMU) Zadeh Lower Bound 20

  29. Lower Bound for Zadeh’s Rule Binary Counting 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  30. Lower Bound for Zadeh’s Rule Binary Counting 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TPrinciple: If a bit can be set, then all bits can be set. y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  31. Lower Bound for Zadeh’s Rule Binary Counting 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 T Tie-Breaking: We decide to set the first bit. y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  32. Lower Bound for Zadeh’s Rule Binary Counting 0 0 1 R 0 0 1 1 0 0 1 1 0 0 1 1 T Set the second bit and reset the first bit. y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  33. Lower Bound for Zadeh’s Rule Binary Counting 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 T Set the first bit again. y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  34. Lower Bound for Zadeh’s Rule Binary Counting 0 0 1 1 0 0 1 1 0 0 1 2 0 0 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  35. Lower Bound for Zadeh’s Rule Binary Counting 0 1 R R 0 1 1 1 0 1 1 2 0 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  36. Lower Bound for Zadeh’s Rule Binary Counting 0 1 0 0 0 1 0 0 0 1 1 2 0 1 0 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  37. Lower Bound for Zadeh’s Rule Binary Counting 0 1 0 1 0 1 0 1 0 1 1 3 0 1 0 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  38. Lower Bound for Zadeh’s Rule Binary Counting 0 1 1 R 0 1 1 1 0 1 2 3 0 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  39. Lower Bound for Zadeh’s Rule Binary Counting 0 1 1 0 0 1 1 0 0 1 2 3 0 1 1 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  40. Lower Bound for Zadeh’s Rule Binary Counting 0 1 1 1 0 1 1 1 0 1 2 4 0 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  41. Lower Bound for Zadeh’s Rule Binary Counting 1 R R R 1 1 1 1 1 1 2 4 1 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  42. Lower Bound for Zadeh’s Rule Binary Counting 1 0 0 0 1 0 0 0 1 1 2 4 1 0 0 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  43. Lower Bound for Zadeh’s Rule Binary Counting 1 0 0 1 1 0 0 1 1 1 2 5 1 0 0 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  44. Lower Bound for Zadeh’s Rule Binary Counting 1 0 1 R 1 0 1 1 1 1 3 5 1 0 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  45. Lower Bound for Zadeh’s Rule Binary Counting 1 0 1 0 1 0 1 0 1 1 3 5 1 0 1 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  46. Lower Bound for Zadeh’s Rule Binary Counting 1 0 1 1 1 0 1 1 1 1 3 6 1 0 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  47. Lower Bound for Zadeh’s Rule Binary Counting 1 1 R R 1 1 1 1 1 2 3 6 1 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  48. Lower Bound for Zadeh’s Rule Binary Counting 1 1 0 0 1 1 0 0 1 2 3 6 1 1 0 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  49. Lower Bound for Zadeh’s Rule Binary Counting 1 1 0 1 1 1 0 1 1 2 3 7 1 1 0 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  50. Lower Bound for Zadeh’s Rule Binary Counting 1 1 1 R 1 1 1 1 1 2 4 7 1 1 1 1 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  51. Lower Bound for Zadeh’s Rule Binary Counting 1 1 1 0 1 1 1 0 1 2 4 7 1 1 1 0 T Continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  52. Lower Bound for Zadeh’s Rule Binary Counting 1 1 1 1 1 1 1 1 1 2 4 8 1 1 1 1 T Problem: Occurrence record unbalanced! y Oliver Friedmann (LMU) Zadeh Lower Bound 21

  53. Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TLet’s do it again - watch the occurrence record this time! y Oliver Friedmann (LMU) Zadeh Lower Bound 22

  54. Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T Everything okay so far... y Oliver Friedmann (LMU) Zadeh Lower Bound 22

  55. Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 T Everything okay so far... y Oliver Friedmann (LMU) Zadeh Lower Bound 22

  56. Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 1 R 0 0 1 1 0 0 1 1 0 0 1 1 T Everything okay so far... y Oliver Friedmann (LMU) Zadeh Lower Bound 22

  57. Lower Bound for Zadeh’s Rule Binary Counting (... again!) 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 T Problem: We have to set one of the higher bits now! y Oliver Friedmann (LMU) Zadeh Lower Bound 22

  58. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TReplace gadget by two-bit, conjunctive structure. y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  59. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T Gadget is set iff both edges are going in. y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  60. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 T Set one improving edge of every gadget. y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  61. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 T Set other improving edge of first gadget. y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  62. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 1 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 T Other gadgets have updated to their old setting. y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  63. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 0 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 T Set one improving edge of every gadget again. y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  64. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 R 0 0 1 1 1 1 2 1 0 0 1 1 1 1 1 1 1 1 1 1 T Set other improving edge of second gadget. y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  65. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 0 0 0 1 0 1 1 2 1 0 0 1 0 1 1 1 1 0 0 1 0 T Reset all other gadgets. y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  66. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 0 1 1 1 1 2 2 2 2 0 0 1 0 1 1 1 1 0 0 1 0 T Everything okay so far, continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  67. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 1 1 1 1 1 2 2 2 2 0 0 1 1 1 1 1 2 0 0 1 1 T Everything okay so far, continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 23

  68. Lower Bound for Zadeh’s Rule Binary Counting with conjunctive bits 0 0 1 1 0 0 1 1 2 2 2 2 0 0 1 1 1 1 1 2 0 0 1 1 T Everything okay so far, continue... y Oliver Friedmann (LMU) Zadeh Lower Bound 23

Recommend


More recommend