CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning - PowerPoint PPT Presentation

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1

2-Player Zero-Sum Games • Reward of P2 = - Reward of P1 ➢ Matrix 𝐵 s.t. 𝐵 𝑗,𝑘 is reward to P1 when P1 chooses her 𝑗 𝑢ℎ action and P2 chooses her 𝑘 𝑢ℎ action 𝑈 𝐵 𝑦 2 ➢ Mixed strategy profile (𝑦 1 , 𝑦 2 ) → reward to P1 is 𝑦 1 • Minimax Theorem: For all 𝐵 , 𝑈 𝐵 𝑦 2 = min 𝑈 𝐵 𝑦 2 max min 𝑦 1 max 𝑦 1 𝑦 1 𝑦 2 𝑦 2 𝑦 1 ➢ Proof through online expert learning! CSC304 - Nisarg Shah 2

Online Expert Learning • Setup: ➢ On each day, we want to predict if a stock price will go up or down ➢ 𝑜 experts provide their predictions every day o Each expert says either up or down ➢ Based on their advice, we make a final prediction ➢ At the end of the day, we learn if our prediction was correct (reward = 1) or wrong (reward = 0) • Goal: ➢ Do almost as good as the best expert in hindsight! CSC2420 – Allan Borodin & Nisarg Shah 3

Online Expert Learning • Notation ➢ 𝑜 = #experts ➢ Predictions and ground truth: 1 or 0 (𝑈) = #mistakes of expert 𝑗 in first 𝑈 steps ➢ 𝑛 𝑗 ➢ 𝑁 (𝑈) = #mistakes of the algorithm in first 𝑈 steps • Simplest idea: ➢ Keep a weight for each expert ➢ Use weighted majority of experts to make prediction ➢ Decrease the weight of an expert whenever the expert makes a mistake CSC2420 – Allan Borodin & Nisarg Shah 4

Online Expert Learning • Weighted Majority: ➢ Fix 𝜃 ≤ 1/2 . (1) = 1 . ➢ Start with 𝑥 𝑗 ➢ In time step 𝑢 , predict 1 if the total weight of experts predicting 1 is larger than the total weight of experts predicting 0 , and vice-versa. (𝑢+1) ← 𝑥 𝑗 (𝑢) ⋅ (1 − 𝜃) for ➢ At the end of time step 𝑢 , set 𝑥 𝑗 every expert that made a mistake. CSC2420 – Allan Borodin & Nisarg Shah 5

Online Expert Learning • Theorem: For every 𝑗 and 𝑈 , (𝑈) + 2 ln 𝑜 𝑁 (𝑈) ≤ 2 1 + 𝜃 𝑛 𝑗 𝜃 • Proof: ➢ Consider a “potential function” Φ (𝑢) = σ 𝑗 𝑥 𝑗 (𝑢) . ➢ If the algorithm makes a mistake in round 𝑢 , at least half of the weight decreases by a factor of 1 − 𝜃 : Φ (𝑢+1) ≤ Φ (𝑢) 1 2 + 1 = Φ (𝑢) 1 − 𝜃 2 1 − 𝜃 2 CSC2420 – Allan Borodin & Nisarg Shah 6

Online Expert Learning • Theorem: For every 𝑗 and 𝑈 , (𝑈) + 2 ln 𝑜 𝑁 (𝑈) ≤ 2 1 + 𝜃 𝑛 𝑗 𝜃 • Proof: ➢ Φ (1) = 𝑜 𝑁 (𝑈) ➢ Thus: Φ (𝑈+1) ≤ 𝑜 1 − 𝜃 . 2 (𝑈+1) = 1 − 𝜃 𝑛 𝑗 (𝑈) ➢ Weight of expert 𝑗 : 𝑥 𝑗 𝑈+1 and − ln 1 − 𝜃 ≤ 𝜃 + 𝜃 2 ➢ Use Φ (𝑈+1) ≥ 𝑥 𝑗 (as 𝜃 ≤ 1/2 ). CSC2420 – Allan Borodin & Nisarg Shah 7

Online Expert Learning • Beautiful! ➢ Comparison to the best expert in hindsight . ➢ At most (roughly) twice as many mistakes + small additive term ➢ In the worst case over how experts make mistakes o No statistical assumptions. ➢ Simple policy to implement. • It can be shown that this bound is tight for any deterministic algorithm. CSC2420 – Allan Borodin & Nisarg Shah 8

Randomized Weighted Majority • Randomization ⇒ beat the factor of 2 • Simple Change: ➢ At the beginning of round 𝑢 , let (𝑢) = total weight of experts predicting 1 o Φ 1 𝑢 = total weight of experts predicting 0 o Φ 0 (𝑢) > Φ 0 (𝑢) , 0 otherwise. ➢ Deterministic: predict 1 if Φ 1 𝑢 Φ 1 ➢ Randomized: predict 1 with probability (𝑢) , 0 with (𝑢) +Φ 0 Φ 1 the remaining probability. CSC2420 – Allan Borodin & Nisarg Shah 9

Randomized Weighted Majority • Equivalently: ➢ “Pick an expert with probability proportional to weight, and go with their prediction” 𝑢 𝑢 = 𝑥 𝑗 ➢ Pr[ picking expert 𝑗 in step 𝑢] = 𝑞 𝑗 Φ 𝑢 𝑢 = 1 if expert 𝑗 makes a mistake in step 𝑢 , 0 otherwise. • Let 𝑐 𝑗 • Algorithm makes a mistake in round 𝑢 with probability 𝑢 𝑐 𝑗 𝑢 = 𝒒 𝑢 ⋅ 𝒄 𝑢 ෍ 𝑞 𝑗 𝑗 𝒒 𝑢 ⋅ 𝒄 𝑢 𝑈 • 𝐹[ #mistakes after 𝑈 rounds ] = σ 𝑢=1 CSC2420 – Allan Borodin & Nisarg Shah 10

Randomized Weighted Majority 𝑢+1 = σ 𝑗 𝑥 𝑗 𝑢 ⋅ 1 − 𝜃𝑐 𝑗 Φ 𝑢+1 = σ 𝑗 𝑥 𝑗 𝑢 𝑢 ⋅ 𝑐 𝑗 = Φ 𝑢 − 𝜃 Φ 𝑢 σ 𝑗 𝑞 𝑗 𝑢 1 − 𝜃 𝒒 𝑢 ⋅ 𝒄 𝑢 = Φ 𝑢 ≤ Φ 𝑢 exp −𝜃 𝒒 𝑢 ⋅ 𝒄 𝑢 • Applying iteratively: Φ 𝑈+1 ≤ 𝑜 ⋅ exp −𝜃 ⋅ 𝐹 #mistakes 𝑈+1 ≥ 1 − 𝜃 𝑛 𝑗 𝑈 • But Φ 𝑈+1 ≥ 𝑥 𝑗 • QED! CSC2420 – Allan Borodin & Nisarg Shah 11

Randomized Weighted Majority • Theorem: For every 𝑗 and 𝑈 , the expected number of mistakes of randomized weighted majority in the first 𝑈 rounds is 𝑈 + 2 ln 𝑜 𝑁 𝑈 ≤ 1 + 𝜃 𝑛 𝑗 𝜃 𝑈 + 𝑃 ln 𝑜 : 𝑁 𝑈 ≤ 𝑛 𝑗 • Setting 𝜃 = 𝑈 ⋅ ln 𝑜 𝑈 • We say that the algorithm has 𝑃 𝑈 ⋅ ln 𝑜 regret • Sublinear regret in 𝑈 • Regret per round → 0 as 𝑈 → ∞ CSC2420 – Allan Borodin & Nisarg Shah 12

How is this related to the minimax theorem?!! CSC304 - Nisarg Shah 13

Minimax via Regret Learning • Recall: 𝑈 𝐵 𝑦 2 𝑊 𝑆 = max 𝑦 1 min 𝑦 2 𝑦 1 𝑈 𝐵 𝑦 2 𝑊 𝐷 = min 𝑦 2 max 𝑦 1 𝑦 1 • Row player’s guarantee: my reward ≥ 𝑊 𝑆 • Column player’s guarantee: row player’s reward ≤ 𝑊 𝐷 • Hence, 𝑊 𝑆 ≤ 𝑊 𝐷 (trivial direction) • To prove: 𝑊 𝑆 = 𝑊 𝐷 CSC2420 - Allan Borodin & Nisarg Shah 14

Minimax via Regret Learning • Scale values in 𝐵 to be in [0,1] . ➢ Without loss of generality. • Suppose for contradiction that 𝑊 𝑆 = 𝑊 𝐷 − 𝜀 , 𝜀 > 0 . • Suppose row player 𝑆 uses randomized weighted majority (experts = row player’s actions) ➢ In each round, column player 𝐷 responds by choosing her action that minimizes the row player’s expected reward. CSC2420 - Allan Borodin & Nisarg Shah 15

Minimax via Regret Learning • After 𝑈 iterations, row player’s reward is: ➢ 𝑊 ≤ 𝑈 ⋅ 𝑊 𝑆 ➢ 𝑊 ≥ “reward of best action in hindsight” − 𝑃 𝑈 ⋅ ln 𝑜 o Reward of best action in hindsight ≥ 𝑈 ⋅ 𝑊 𝐷 . o Why? o Suppose column player plays action 𝑘 𝑢 in round 𝑢 o Equivalent to playing mixed strategy 𝑡 in each round • 𝑡 picks 𝑢 ∈ {1, … , 𝑈} at random and plays 𝑘 𝑢 o By definition of 𝑊 𝐷 , 𝑡 cannot ensure that row player’s reward is less than 𝑊 𝐷 • Then, there is an action of row player with E[reward] at least 𝑊 𝐷 against 𝑡 CSC2420 - Allan Borodin & Nisarg Shah 16

Minimax via Regret Learning • After 𝑈 iterations, row player’s reward is: ➢ 𝑊 ≤ 𝑈 ⋅ 𝑊 𝑆 ➢ 𝑊 ≥ 𝑈 ⋅ 𝑊 𝐷 − 𝑃 𝑈 ⋅ ln 𝑜 ➢ 𝑈 ⋅ 𝑊 𝑆 = 𝑈 ⋅ (𝑊 𝐷 − 𝜀) ≥ 𝑈 ⋅ 𝑊 𝐷 − 𝑃 𝑈 ⋅ ln 𝑜 ➢ 𝜀 𝑈 ≤ 𝑃 𝑈 ⋅ ln 𝑜 ➢ Contradiction for sufficiently large 𝑈 . • QED! CSC2420 - Allan Borodin & Nisarg Shah 17

Yao’s Minimax Principle • Goal: ➢ Provide a lower bound on the expected running time that any randomized algorithm for a problem can achieve in the worst case over problem instances • Note: ➢ Expectation (in running time) is over randomization of the algorithm ➢ The problem instance (worst case) is chosen to maximize this expected running time CSC304 - Nisarg Shah 18

Yao’s Minimax Principle • Notation ➢ Capital letters for “randomized”, small for deterministic ➢ 𝑒 : a deterministic algorithm ➢ 𝑆 : a randomized algorithm ➢ 𝑞 : a problem instance ➢ 𝑄 : a distribution over problem instances ➢ 𝑈 : running time • We are interested in min max 𝑈(𝑆, 𝑞) 𝑆 𝑞 CSC304 - Nisarg Shah 19

Yao’s Minimax Principle Det. Algorithms Running Problem Instances times CSC304 - Nisarg Shah 20

Yao’s Minimax Principle • Minimax Theorem: min max 𝑈(𝑆, 𝑞) = max min 𝑈(𝑒, 𝑄) 𝑆 𝑞 𝑄 𝑒 • So: ➢ To lower bound the E[running time] of any randomized algorithm 𝑆 on its worst-case instance 𝑞 by a quantity 𝑅 … ➢ Choose a distribution 𝑄 over problem instances, and show that every det. algorithm 𝑒 has expected running time at least 𝑅 on problems drawn from 𝑄 CSC304 - Nisarg Shah 21

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning - PowerPoint PPT Presentation

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1 2-Player Zero-Sum Games Reward of P2 = - Reward of P1 Matrix s.t. , is reward to P1 when P1 chooses her action and

CSC304 Lecture 15 Computational Social Choice: Voting 1: Introduction, Axioms, Rules CSC304 -

CSC304 Lecture 5 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1

CSC304 Lecture 6 Game Theory : Security games, Applications to security CSC304 - Nisarg Shah 1

CSC304 Lecture 7 Game Theory : Security games, Applications to security CSC304 - Nisarg Shah 1

CSC304 Lecture 21 CSC304 - Nisarg Shah 1 Complete your course evaluations Check your e-mail

CSC304 Lecture 12 Mechanism Design w/ Money: Revenue maximization Myersons Auction CSC304 -

CSC304 Lecture 22 CSC304 - Nisarg Shah 1 BUT FIRST Course Evaluation Low response rate

CSC304 Algorithmic Game Theory & Mechanism Design Nisarg Shah CSC304 - Nisarg Shah 1

(Basic Concepts) CSC304 - Nisarg Shah 1 Game Theory How do rational, self-interested agents

w/ Money: Intro, Basic Framework CSC304 - Nisarg Shah 1 Game Theory Recap Normal form games

CSC304 Lecture 14 Begin Computational Social Choice: Voting 1: Introduction, Axioms, Rules

CSC304 Lecture 14 Mechanism Design w/o Money 2: Stable Matching Gale-Shapley Algorithm CSC304 -

CSC304 Lecture 13 Mechanism Design w/o Money 2: Stable Matching Gale-Shapley Algorithm CSC304 -

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

Guest Lecture: Prof. Allan Borodin Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 -

Game Theory (More examples, PoA, PoS) CSC304 - Nisarg Shah 1 Recap Normal form games

Homework 7.1 C D Here is the payoff matrix for the most commonly used version of the

Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory Studied by mathematicians,

CS540 Midterm Review Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University

Robust Digital Filters Part 1: Minimax FIR Filters Wu-Sheng Lu Takao Hinamoto University of

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: Ariel Procaccia Reminder: The

Imprecision in learning: introduction Sebastien Destercke Universit de Technologie de

Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work

Decision Problems Decision Making under Uncertainty, Part III Christos Dimitrakakis Chalmers

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning - PowerPoint PPT Presentation

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1 2-Player Zero-Sum Games Reward of P2 = - Reward of P1 Matrix s.t. , is reward to P1 when P1 chooses her action and

CSC304 Lecture 15 Computational Social Choice: Voting 1: Introduction, Axioms, Rules CSC304 -

CSC304 Lecture 5 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1

CSC304 Lecture 6 Game Theory : Security games, Applications to security CSC304 - Nisarg Shah 1

CSC304 Lecture 7 Game Theory : Security games, Applications to security CSC304 - Nisarg Shah 1

CSC304 Lecture 21 CSC304 - Nisarg Shah 1 Complete your course evaluations Check your e-mail

CSC304 Lecture 12 Mechanism Design w/ Money: Revenue maximization Myersons Auction CSC304 -

CSC304 Lecture 22 CSC304 - Nisarg Shah 1 BUT FIRST Course Evaluation Low response rate

CSC304 Algorithmic Game Theory &amp; Mechanism Design Nisarg Shah CSC304 - Nisarg Shah 1

(Basic Concepts) CSC304 - Nisarg Shah 1 Game Theory How do rational, self-interested agents

w/ Money: Intro, Basic Framework CSC304 - Nisarg Shah 1 Game Theory Recap Normal form games

CSC304 Lecture 14 Begin Computational Social Choice: Voting 1: Introduction, Axioms, Rules

CSC304 Lecture 14 Mechanism Design w/o Money 2: Stable Matching Gale-Shapley Algorithm CSC304 -

CSC304 Lecture 13 Mechanism Design w/o Money 2: Stable Matching Gale-Shapley Algorithm CSC304 -

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

Guest Lecture: Prof. Allan Borodin Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 -

Game Theory (More examples, PoA, PoS) CSC304 - Nisarg Shah 1 Recap Normal form games

Homework 7.1 C D Here is the payoff matrix for the most commonly used version of the

Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory Studied by mathematicians,

CS540 Midterm Review Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University

Robust Digital Filters Part 1: Minimax FIR Filters Wu-Sheng Lu Takao Hinamoto University of

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: Ariel Procaccia Reminder: The

Imprecision in learning: introduction Sebastien Destercke Universit de Technologie de

Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work

Decision Problems Decision Making under Uncertainty, Part III Christos Dimitrakakis Chalmers

CSC304 Algorithmic Game Theory & Mechanism Design Nisarg Shah CSC304 - Nisarg Shah 1