No-Regret Learning CMPUT 654: Modelling Human Strategic Behaviour - PowerPoint PPT Presentation

  No-Regret Learning CMPUT 654: Modelling Human Strategic Behaviour   Hart & Mas-Colell (2000)   Nekipolov, Syrgkanis, and Tardos (2015)

Lecture Outline 1. Recap 2. Hart & Mas-Colell (2000) 3. Coarse Correlated Equilibrium 4. Nekipolov, Syrgkanis, and Tardos (2015)

Hart & Mas-Colell (2000) Why: • A no-regret algorithm ( regret matching ) that converges to correlated equilibrium • Influential: This paper is always cited in this area 1. Defines regret matching algorithm and argues for its plausibility 2. Proves that it converges to correlated equilibrium

Correlated Equilibrium Definition:   Given an n -agent game G=(N,A,u), a correlated equilibrium is a tuple ( v , π , σ ), where v = ( v 1 , …, v n ) is a tuple of random variables with domains ( D 1 , …, D n ), • π is a joint distribution over v , • σ = ( σ 1 , …, σ n ) is a vector of mappings σ i : D i → A i , and • • for every agent i and mapping σ ′ � i : D i → A i , ∑ ∑ π ( d ) u i ( σ 1 ( d 1 ), …, σ n ( d n )) ≥ π ( d ) u i ( σ 1 ( d 1 ), …, σ ′ � i ( d i ), …, σ n ( d n )) d ∈ D 1 ×⋯× D n d ∈ D 1 ×⋯× D n

Correlated Equilibrium (simplified) Definition:   Given an n -agent game G =( N,A,u ), a correlated equilibrium is a distribution 𝜏 ∈ 𝛦 (A) such that for every i ∈ N and actions a ʹ i , a ʹʹ i ∈ A i , ∑ σ ( a )[ u i ( a ′ � ′ � i , a − i ) − u i ( a )] ≤ 0 a ∈ A : a i = a ′ � i

Repeated Setting • A game G =( N,A,u ) is played repeatedly over t =1,2,... • At time t , agent i selects action a it • Each agent i receives utility u i ( a t )

Regret Matching • For every pair of strategies j,k , let W i,t (j,k) be the utility that i would have received at time t by playing k instead of j • Unchanged from u i ( a t ) if i didn't play j • D i,t ( j,k ) is the average of W i,t ( j,k ) - u i ( a t ) up until time t • At each time step, each agent chooses between actions with positive D(j,k), where j is the most-recent action , and the most-recent action j

Convergence of   Regret Matching Theorem:   If all players play according to regret matching, then the empirical distributions of play converge to the set of correlated equilibria.

Coarse   Correlated Equilibrium • Instead of getting to replace each action with an arbitrary action, compare to the case where we play a single action: Definition:   Given an n -agent game G=(N,A,u), a coarse correlated equilibrium is a distribution 𝜏 ∈ 𝛦 (A) such that for every i ∈ N and action a ʹ i ∈ A i , i , a − i ) − ∑ ∑ σ ( a ) u i ( a ′ � σ ( a ) u i ( a ) ≤ 0 a − i ∈ A − i a ∈ A

Convergence of Multiagent No-Regret Learning Proposition:   If every agent plays a no-regret learning algorithm, then the empirical distribution of play will converge to a coarse correlated equilibrium.

Nekipolov, Syrgkanis, and Tardos (2015) Why:   Application of a non-equilibrium behavioural rule to econometrics 1. Define rationalizable set NR 2. Prove properties of NR for sponsored search auctions 3. Apply to value estimation

Setting: Sponsored Search • There are k slots • Each agent submits a bid b i • Highest bid gets first slot, etc. • Each agent pays bid of next-highest slot • Payments are per-click rather than per-impression

Problem: Estimating Types • Each agent has a value v i for a click • We want to estimate what those values are, based on bids • Previously: Assume equilibrium • Now: Assume no-regret learning

Rationalizable Set Definition:   The rationalizable set NR is the set of pairs ( v i , 𝜁 i ) such that i 's sequence of bids has regret less than 𝜁 i if i 's value is v i .

Data Analysis Claims: 1. Bids are highly shaded (only 60% of value) 2. Almost all accounts have a few keywords with very small error, and others with large error

Epilogue Some questions: 1. Regret matching includes a notion of inertia . How closely related to I-SAW is it? 2. Why do we think that the smallest rationalizable error is the one to use for point estimates?

No-Regret Learning CMPUT 654: Modelling Human Strategic Behaviour - PowerPoint PPT Presentation

No-Regret Learning CMPUT 654: Modelling Human Strategic Behaviour Hart & Mas-Colell (2000) Nekipolov, Syrgkanis, and Tardos (2015) Lecture Outline 1. Recap 2. Hart & Mas-Colell (2000) 3. Coarse Correlated Equilibrium 4.

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Composability of Regret Minimizers Gabriele Farina 1 Christian Kroer 2 Tuomas Sandholm 1,3,4,5 1

A Closer Look at Adaptive Regret Dmitry Adamskiy Joint work with Wouter Koolen, Volodya Vovk and

Royal Economic Society The history of Regret Theory Robert Sugden Contribution to Economic

An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting Cem

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell & Geoff Gordon

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently Asaf Cassel Joint work

Learning Linear Quadratic Regulators Efficiently with Only Regret T Alon Cohen Joint

On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1

Online Algorithms: Learning & Optimization with No Regret. CS/CNS/EE 253 Daniel Golovin 1

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games Zhongxiang

PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of

BYO D Dakota Meadows Middle School 1 12/7/12 Foundation of the project Meeting the

Q2 2015 Earnings Review and Update August 7, 2015 1 Forward looking statements This

Licensure and Preparation Subcommittee STATE BO ARD O F EDUC ATIO N DEC . 17, 2019 Agenda

1 2 3 4 5 6 7 8 9 10 11 12 13 Question from the Audience: Q: I'm curious about your

What I Wish Id Known When I Started Erick Hitter @ethitter https://ethitter.com/ FIRST

Cli lient nt-side side attac tacks s con onti tinued ued 1 Last week: security provided

TDD ist fr Trumer, nicht fr Praktiker, oder? Sven Amann academicscode.com

No-Regret Learning CMPUT 654: Modelling Human Strategic Behaviour - PowerPoint PPT Presentation

No-Regret Learning CMPUT 654: Modelling Human Strategic Behaviour Hart & Mas-Colell (2000) Nekipolov, Syrgkanis, and Tardos (2015) Lecture Outline 1. Recap 2. Hart & Mas-Colell (2000) 3. Coarse Correlated Equilibrium 4.

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Composability of Regret Minimizers Gabriele Farina 1 Christian Kroer 2 Tuomas Sandholm 1,3,4,5 1

A Closer Look at Adaptive Regret Dmitry Adamskiy Joint work with Wouter Koolen, Volodya Vovk and

Royal Economic Society The history of Regret Theory Robert Sugden Contribution to Economic

An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting Cem

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell &amp; Geoff Gordon

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently Asaf Cassel Joint work

Learning Linear Quadratic Regulators Efficiently with Only Regret T Alon Cohen Joint

On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1

Online Algorithms: Learning &amp; Optimization with No Regret. CS/CNS/EE 253 Daniel Golovin 1

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games Zhongxiang

PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of

BYO D Dakota Meadows Middle School 1 12/7/12 Foundation of the project Meeting the

Q2 2015 Earnings Review and Update August 7, 2015 1 Forward looking statements This

Licensure and Preparation Subcommittee STATE BO ARD O F EDUC ATIO N DEC . 17, 2019 Agenda

1 2 3 4 5 6 7 8 9 10 11 12 13 Question from the Audience: Q: I'm curious about your

What I Wish Id Known When I Started Erick Hitter @ethitter https://ethitter.com/ FIRST

Cli lient nt-side side attac tacks s con onti tinued ued 1 Last week: security provided

TDD ist fr Trumer, nicht fr Praktiker, oder? Sven Amann academicscode.com

to No-Regret Online Learning Stephane Ross Joint work with Drew Bagnell & Geoff Gordon

Online Algorithms: Learning & Optimization with No Regret. CS/CNS/EE 253 Daniel Golovin 1