csc304 lecture 6 game theory
play

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning - PowerPoint PPT Presentation

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1 2-Player Zero-Sum Games Reward of P2 = - Reward of P1 Matrix s.t. , is reward to P1 when P1 chooses her action and


  1. CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1

  2. 2-Player Zero-Sum Games β€’ Reward of P2 = - Reward of P1 ➒ Matrix 𝐡 s.t. 𝐡 𝑗,π‘˜ is reward to P1 when P1 chooses her 𝑗 π‘’β„Ž action and P2 chooses her π‘˜ π‘’β„Ž action π‘ˆ 𝐡 𝑦 2 ➒ Mixed strategy profile (𝑦 1 , 𝑦 2 ) β†’ reward to P1 is 𝑦 1 β€’ Minimax Theorem: For all 𝐡 , π‘ˆ 𝐡 𝑦 2 = min π‘ˆ 𝐡 𝑦 2 max min 𝑦 1 max 𝑦 1 𝑦 1 𝑦 2 𝑦 2 𝑦 1 ➒ Proof through online expert learning! CSC304 - Nisarg Shah 2

  3. Online Expert Learning β€’ Setup: ➒ On each day, we want to predict if a stock price will go up or down ➒ π‘œ experts provide their predictions every day o Each expert says either up or down ➒ Based on their advice, we make a final prediction ➒ At the end of the day, we learn if our prediction was correct (reward = 1) or wrong (reward = 0) β€’ Goal: ➒ Do almost as good as the best expert in hindsight! CSC2420 – Allan Borodin & Nisarg Shah 3

  4. Online Expert Learning β€’ Notation ➒ π‘œ = #experts ➒ Predictions and ground truth: 1 or 0 (π‘ˆ) = #mistakes of expert 𝑗 in first π‘ˆ steps ➒ 𝑛 𝑗 ➒ 𝑁 (π‘ˆ) = #mistakes of the algorithm in first π‘ˆ steps β€’ Simplest idea: ➒ Keep a weight for each expert ➒ Use weighted majority of experts to make prediction ➒ Decrease the weight of an expert whenever the expert makes a mistake CSC2420 – Allan Borodin & Nisarg Shah 4

  5. Online Expert Learning β€’ Weighted Majority: ➒ Fix πœƒ ≀ 1/2 . (1) = 1 . ➒ Start with π‘₯ 𝑗 ➒ In time step 𝑒 , predict 1 if the total weight of experts predicting 1 is larger than the total weight of experts predicting 0 , and vice-versa. (𝑒+1) ← π‘₯ 𝑗 (𝑒) β‹… (1 βˆ’ πœƒ) for ➒ At the end of time step 𝑒 , set π‘₯ 𝑗 every expert that made a mistake. CSC2420 – Allan Borodin & Nisarg Shah 5

  6. Online Expert Learning β€’ Theorem: For every 𝑗 and π‘ˆ , (π‘ˆ) + 2 ln π‘œ 𝑁 (π‘ˆ) ≀ 2 1 + πœƒ 𝑛 𝑗 πœƒ β€’ Proof: ➒ Consider a β€œpotential function” Ξ¦ (𝑒) = Οƒ 𝑗 π‘₯ 𝑗 (𝑒) . ➒ If the algorithm makes a mistake in round 𝑒 , at least half of the weight decreases by a factor of 1 βˆ’ πœƒ : Ξ¦ (𝑒+1) ≀ Ξ¦ (𝑒) 1 2 + 1 = Ξ¦ (𝑒) 1 βˆ’ πœƒ 2 1 βˆ’ πœƒ 2 CSC2420 – Allan Borodin & Nisarg Shah 6

  7. Online Expert Learning β€’ Theorem: For every 𝑗 and π‘ˆ , (π‘ˆ) + 2 ln π‘œ 𝑁 (π‘ˆ) ≀ 2 1 + πœƒ 𝑛 𝑗 πœƒ β€’ Proof: ➒ Ξ¦ (1) = π‘œ 𝑁 (π‘ˆ) ➒ Thus: Ξ¦ (π‘ˆ+1) ≀ π‘œ 1 βˆ’ πœƒ . 2 (π‘ˆ+1) = 1 βˆ’ πœƒ 𝑛 𝑗 (π‘ˆ) ➒ Weight of expert 𝑗 : π‘₯ 𝑗 π‘ˆ+1 and βˆ’ ln 1 βˆ’ πœƒ ≀ πœƒ + πœƒ 2 ➒ Use Ξ¦ (π‘ˆ+1) β‰₯ π‘₯ 𝑗 (as πœƒ ≀ 1/2 ). CSC2420 – Allan Borodin & Nisarg Shah 7

  8. Online Expert Learning β€’ Beautiful! ➒ Comparison to the best expert in hindsight . ➒ At most (roughly) twice as many mistakes + small additive term ➒ In the worst case over how experts make mistakes o No statistical assumptions. ➒ Simple policy to implement. β€’ It can be shown that this bound is tight for any deterministic algorithm. CSC2420 – Allan Borodin & Nisarg Shah 8

  9. Randomized Weighted Majority β€’ Randomization β‡’ beat the factor of 2 β€’ Simple Change: ➒ At the beginning of round 𝑒 , let (𝑒) = total weight of experts predicting 1 o Ξ¦ 1 𝑒 = total weight of experts predicting 0 o Ξ¦ 0 (𝑒) > Ξ¦ 0 (𝑒) , 0 otherwise. ➒ Deterministic: predict 1 if Ξ¦ 1 𝑒 Ξ¦ 1 ➒ Randomized: predict 1 with probability (𝑒) , 0 with (𝑒) +Ξ¦ 0 Ξ¦ 1 the remaining probability. CSC2420 – Allan Borodin & Nisarg Shah 9

  10. Randomized Weighted Majority β€’ Equivalently: ➒ β€œPick an expert with probability proportional to weight, and go with their prediction” 𝑒 𝑒 = π‘₯ 𝑗 ➒ Pr[ picking expert 𝑗 in step 𝑒] = π‘ž 𝑗 Ξ¦ 𝑒 𝑒 = 1 if expert 𝑗 makes a mistake in step 𝑒 , 0 otherwise. β€’ Let 𝑐 𝑗 β€’ Algorithm makes a mistake in round 𝑒 with probability 𝑒 𝑐 𝑗 𝑒 = 𝒒 𝑒 β‹… 𝒄 𝑒 ෍ π‘ž 𝑗 𝑗 𝒒 𝑒 β‹… 𝒄 𝑒 π‘ˆ β€’ 𝐹[ #mistakes after π‘ˆ rounds ] = Οƒ 𝑒=1 CSC2420 – Allan Borodin & Nisarg Shah 10

  11. Randomized Weighted Majority 𝑒+1 = Οƒ 𝑗 π‘₯ 𝑗 𝑒 β‹… 1 βˆ’ πœƒπ‘ 𝑗 Ξ¦ 𝑒+1 = Οƒ 𝑗 π‘₯ 𝑗 𝑒 𝑒 β‹… 𝑐 𝑗 = Ξ¦ 𝑒 βˆ’ πœƒ Ξ¦ 𝑒 Οƒ 𝑗 π‘ž 𝑗 𝑒 1 βˆ’ πœƒ 𝒒 𝑒 β‹… 𝒄 𝑒 = Ξ¦ 𝑒 ≀ Ξ¦ 𝑒 exp βˆ’πœƒ 𝒒 𝑒 β‹… 𝒄 𝑒 β€’ Applying iteratively: Ξ¦ π‘ˆ+1 ≀ π‘œ β‹… exp βˆ’πœƒ β‹… 𝐹 #mistakes π‘ˆ+1 β‰₯ 1 βˆ’ πœƒ 𝑛 𝑗 π‘ˆ β€’ But Ξ¦ π‘ˆ+1 β‰₯ π‘₯ 𝑗 β€’ QED! CSC2420 – Allan Borodin & Nisarg Shah 11

  12. Randomized Weighted Majority β€’ Theorem: For every 𝑗 and π‘ˆ , the expected number of mistakes of randomized weighted majority in the first π‘ˆ rounds is π‘ˆ + 2 ln π‘œ 𝑁 π‘ˆ ≀ 1 + πœƒ 𝑛 𝑗 πœƒ π‘ˆ + 𝑃 ln π‘œ : 𝑁 π‘ˆ ≀ 𝑛 𝑗 β€’ Setting πœƒ = π‘ˆ β‹… ln π‘œ π‘ˆ β€’ We say that the algorithm has 𝑃 π‘ˆ β‹… ln π‘œ regret β€’ Sublinear regret in π‘ˆ β€’ Regret per round β†’ 0 as π‘ˆ β†’ ∞ CSC2420 – Allan Borodin & Nisarg Shah 12

  13. How is this related to the minimax theorem?!! CSC304 - Nisarg Shah 13

  14. Minimax via Regret Learning β€’ Recall: π‘ˆ 𝐡 𝑦 2 π‘Š 𝑆 = max 𝑦 1 min 𝑦 2 𝑦 1 π‘ˆ 𝐡 𝑦 2 π‘Š 𝐷 = min 𝑦 2 max 𝑦 1 𝑦 1 β€’ Row player’s guarantee: my reward β‰₯ π‘Š 𝑆 β€’ Column player’s guarantee: row player’s reward ≀ π‘Š 𝐷 β€’ Hence, π‘Š 𝑆 ≀ π‘Š 𝐷 (trivial direction) β€’ To prove: π‘Š 𝑆 = π‘Š 𝐷 CSC2420 - Allan Borodin & Nisarg Shah 14

  15. Minimax via Regret Learning β€’ Scale values in 𝐡 to be in [0,1] . ➒ Without loss of generality. β€’ Suppose for contradiction that π‘Š 𝑆 = π‘Š 𝐷 βˆ’ πœ€ , πœ€ > 0 . β€’ Suppose row player 𝑆 uses randomized weighted majority (experts = row player’s actions) ➒ In each round, column player 𝐷 responds by choosing her action that minimizes the row player’s expected reward. CSC2420 - Allan Borodin & Nisarg Shah 15

  16. Minimax via Regret Learning β€’ After π‘ˆ iterations, row player’s reward is: ➒ π‘Š ≀ π‘ˆ β‹… π‘Š 𝑆 ➒ π‘Š β‰₯ β€œreward of best action in hindsight” βˆ’ 𝑃 π‘ˆ β‹… ln π‘œ o Reward of best action in hindsight β‰₯ π‘ˆ β‹… π‘Š 𝐷 . o Why? o Suppose column player plays action π‘˜ 𝑒 in round 𝑒 o Equivalent to playing mixed strategy 𝑑 in each round β€’ 𝑑 picks 𝑒 ∈ {1, … , π‘ˆ} at random and plays π‘˜ 𝑒 o By definition of π‘Š 𝐷 , 𝑑 cannot ensure that row player’s reward is less than π‘Š 𝐷 β€’ Then, there is an action of row player with E[reward] at least π‘Š 𝐷 against 𝑑 CSC2420 - Allan Borodin & Nisarg Shah 16

  17. Minimax via Regret Learning β€’ After π‘ˆ iterations, row player’s reward is: ➒ π‘Š ≀ π‘ˆ β‹… π‘Š 𝑆 ➒ π‘Š β‰₯ π‘ˆ β‹… π‘Š 𝐷 βˆ’ 𝑃 π‘ˆ β‹… ln π‘œ ➒ π‘ˆ β‹… π‘Š 𝑆 = π‘ˆ β‹… (π‘Š 𝐷 βˆ’ πœ€) β‰₯ π‘ˆ β‹… π‘Š 𝐷 βˆ’ 𝑃 π‘ˆ β‹… ln π‘œ ➒ πœ€ π‘ˆ ≀ 𝑃 π‘ˆ β‹… ln π‘œ ➒ Contradiction for sufficiently large π‘ˆ . β€’ QED! CSC2420 - Allan Borodin & Nisarg Shah 17

  18. Yao’s Minimax Principle β€’ Goal: ➒ Provide a lower bound on the expected running time that any randomized algorithm for a problem can achieve in the worst case over problem instances β€’ Note: ➒ Expectation (in running time) is over randomization of the algorithm ➒ The problem instance (worst case) is chosen to maximize this expected running time CSC304 - Nisarg Shah 18

  19. Yao’s Minimax Principle β€’ Notation ➒ Capital letters for β€œrandomized”, small for deterministic ➒ 𝑒 : a deterministic algorithm ➒ 𝑆 : a randomized algorithm ➒ π‘ž : a problem instance ➒ 𝑄 : a distribution over problem instances ➒ π‘ˆ : running time β€’ We are interested in min max π‘ˆ(𝑆, π‘ž) 𝑆 π‘ž CSC304 - Nisarg Shah 19

  20. Yao’s Minimax Principle Det. Algorithms Running Problem Instances times CSC304 - Nisarg Shah 20

  21. Yao’s Minimax Principle β€’ Minimax Theorem: min max π‘ˆ(𝑆, π‘ž) = max min π‘ˆ(𝑒, 𝑄) 𝑆 π‘ž 𝑄 𝑒 β€’ So: ➒ To lower bound the E[running time] of any randomized algorithm 𝑆 on its worst-case instance π‘ž by a quantity 𝑅 … ➒ Choose a distribution 𝑄 over problem instances, and show that every det. algorithm 𝑒 has expected running time at least 𝑅 on problems drawn from 𝑄 CSC304 - Nisarg Shah 21

Recommend


More recommend