Bandits in Auctions (& more) Vianney Perchet joint work with P. - PowerPoint PPT Presentation

Bandits in Auctions (& more) Vianney Perchet joint work with P. Rigollet (MIT) and J. Weed (MIT) CEMRACS 2017 July 20 2017 CMLA, ENS Paris-Saclay & Criteo Research

Motivations & Objectives

Classical Examples of Bandits Problems – Size of data: n patients with some proba of getting cured or – Patients cured or dead 1) Inference: Find the best treatment between the red and blue 3 – Choose one of two treatments to prescribe 2) Cumul: Save as many patients as possible

Classical Examples of Bandits Problems – Size of data: n banners with some proba of click or – Banner clicked or ignored 1) Inference: Find the best ad between the red and blue 2) Cumul: Get as many clicks as possible 3 – Choose one of two ads to display

• criteo chooses ad of a client, Microsoft or Cdiscount or Booking • criteo gets paid by the client if the user clicks on the ad Example of Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...) Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 4

Example of Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • criteo chooses ad of a client, Microsoft or Cdiscount or Booking • criteo gets paid by the client if the user clicks on the ad Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 4 • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...)

Example of Repeated Auctions Some companies whose cookies can be controlled 4

Back to Classical Examples of Bandits Problems – Size of data: n mails with some proba of spam or – Mail correctly or incorrectly classified 1) Inference: Find the best between the red and blue 2) Cumul: Minimize number of errors as possible 5 – Choose one of two actions: spam or ham

Back to Classical Examples of Bandits Problems 5

Back to Classical Examples of Bandits Problems – Size of data: n patients with some proba of getting cured or – Patients cured or dead 1) Inference: Find the best treatment between the red and blue 2) Cumul: Save as many patients as possible 5 – Choose one of two

– Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

Two-Armed Bandit – Patients arrive and are treated sequentially. – Save as many as possible. 6

A bit of theory 7

Stochastic Multi-Armed Bandit

K -Armed Stochastic Bandit Problems i.i.d. T T T max – Goal: Maximize expected reward 9 bounded X i – K actions i ∈ { 1 , . . . , K } , outcome X i t ∈ R (sub-)Gaussian, ( ) 1 , X i 2 , . . . , ∼ N µ i , 1 ( ) 2 , . . . , X π t − 1 X π 1 1 , X π 2 ∈ { 1 , . . . , K } – Non-Anticipative Policy: π t t − 1 ∑ T t = ∑ T t = 1 E X π t t = 1 µ π t – Performance: Cumulative Regret ∑ ∑ ∑ { } µ i − µ π t = ∆ i R T = π t = i ̸ = ⋆ 1 i ∈{ 1 ,..., K } t = 1 t = 1 t = 1 with ∆ i = µ ⋆ − µ i , the “gap” or cost of error i .

Most Famous algorithm [Auer, Cesa-Bianchi, Fisher, ’02] i Worst-Case: k Regret: s . • UCB - “Upper Confidence Bound” T i t 10 i i X √ { } 2 log ( t ) π t + 1 = arg max t + , T i ( t ) where T i ( t ) = ∑ t ∑ t = 1 1 { π t = i } and X t = 1 s : i s = i X i E R T ≲ ∑ K log ( T ) log ( T ) ∧ T ∆ E R T ≲ sup ∆ k ∆ ∆ √ KT log ( T ) ≂

11 i X i i i • 2-lines proof: i i i { √ } 2 log ( t ) Ideas of proof π t + 1 = arg max i t + T i ( t ) √ √ 2 log ( t ) 2 log ( t ) ⋆ π t + 1 = i ̸ = ⋆ ⇐ ⇒ X t + ≤ X t + T ⋆ ( t ) T i ( t ) √ 2 log ( t ) ⇒ T i ( t ) ≲ log ( t ) ⇒ ”∆ i ≤ “ = = T i ( t ) ∆ 2 • Number of mistakes grows as log ( t ) i ; each mistake costs ∆ i . ∆ 2 Regret at stage T ≲ ∑ × ∆ i ≂ ∑ log ( T ) log ( T ) ∆ 2 ∆ i • “ = ⇒ ” actually happens with overwhelming proba • “optimal”: no algo always has a regret smaller than ∑ log ( T ) ∆ i

Other Algos • Other algo, MOSS [Audibert, Bubeck], variants of UCB T Discretize + UCB gives TK • ETC [Perchet,Rigollet]. pull in round-robin then eliminate 12 k √ R T ≲ ∑ log ( T ∆ k ) , worst case R T ≤ T log ( K ) K ∆ k √ R T ≲ K log ( T ∆ min / K ) , worst case R T ≤ ∆ min • Infinite number of actions x ∈ [ 0 , 1 ] d with ∆( x ) 1 Lipschitz. √ ε ≤ T 2 / 3 R T ≲ T ε +

Adversarial Multi-Armed Bandit

K -Armed Adversarial Bandit Problems T t T X i 14 No assumption on X i • K actions i ∈ [ K ] = { 1 , . . . , K } , outcome X i t ∈ R bounded in [ 0 , 1 ] 1 , X i 2 , . . . ( ) 2 , . . . , X π t − 1 X π 1 1 , X π 2 • Non-Anticipative Policy: π t ∈ [ K ] t − 1 • Performance: Cumulative Regret ∑ ∑ X π t R T = max t − i ∈ [ K ] t = 1 t = 1 ∑ T • Convex optimization of p �→ E p t , from ∆([ K ]) to [ 0 , 1 ] t = 1 X i

EXP-algo t • Using this estimate we obtain that p i t p i t t X i t p i t X i X t t p i t 15 p i X i s X t • Main insight: π t ∼ p t ∈ ∆([ K ]) , more weights on best actions e η ∑ t − 1 s = 1 X i t = η is a parameter ∑ j ∈ [ K ] e η ∑ t − 1 s , s = 1 X j t is observed, not X t . Estimate X t by � • Only X π t ( 1 − X i ) � 1 { π t = i } and run EXP on � t = 1 − • E � 1 − X i t = 1 − ( 1 − p i t ) . 0 + p i = X i t , unbiased estimator ( ) 2 • E ∑ t ) 2 ≤ 1 + ∑ t ( � 1 − X i t ≤ K + 1 bounded variance i ∈ [ K ] p i i ∈ K p i √ E R T ≤ log ( K ) + η ( K + 1 ) T ≤ 3 log ( K ) KT η

Bandits & Repeated Auctions

Back to Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • criteo gets paid by the client if the user clicks on the ad Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 17 • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...) • criteo chooses ad of a client, Microsoft or Cdiscount or Booking

2nd price Auctions • A good is sold on second price auctions auction. • The highest bidder wins and pays second highest bid Truthful auctions • Utility of bidder : 18 • Each buyer, with valuation v ( i ) , puts a bet b ( i ) b ♯ = max i ̸ = argmax b ( i ) (ties broken arbitrarily) optimal strategy bid its own valuation b ( i ) = v ( i ) ( v ( i ) − b ♯ ) 1 { b ( i ) ≥ b ♯ } • if b ( i ) > v ( i ) might only pay too much • if b ( i ) > v ( i ) might loose the auction

Reserve price Reserve price • Still truthful: c is a bid 19 • Utility of highest value: v ⋆ − b ♯ • Utility of seller (value v 0 ): b ♯ − v 0 , can be negative ! A threshold c : if b ∗ ≥ c ; price max { b ♯ , c } otherwise not sold • Optimal reserve price c ∗ max. E ( max { v ♯ , c } − v 0 ) 1 { v ∗ ≥ c } • Depends on the (actually unknown) distributions of value.

Main model • Total regret : T T • Learning optimal reserve price [Cesa-Bianchi, Gentile, Mansour] max 20 From the point of view of a bidder ? • At round t = 1 , . . . , T : bidder bids b t ∈ [ 0 , 1 ] if b t > m t (maximum other bids & reserve price) win good, observe value v t ∈ [ 0 , 1 ] • Total utility: ∑ T t = 1 ( v t − m t ) 1 { b t > m t } ∑ ∑ ( v t − m t ) 1 { b > m t } − ( v t − m t ) 1 { b t > m t } b ∈ [ 0 , 1 ] t = 1 t = 1

Bandits in Auctions (& more) Vianney Perchet joint work with P. - PowerPoint PPT Presentation

Bandits in Auctions (& more) Vianney Perchet joint work with P. Rigollet (MIT) and J. Weed (MIT) CEMRACS 2017 July 20 2017 CMLA, ENS Paris-Saclay & Criteo Research Motivations & Objectives Classical Examples of Bandits Problems

An introduction to Auctions: Single Item Auctions (2) Maria Serna Fall 2016 AGT-MIRI Single

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

An introduction to Auctions: Single Item Auctions Maria Serna Fall 2016 AGT-MIRI Single item

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Auctions, Auction Theory, and Hard Computational Problems in Auctions Kevin Leyton-Brown This

Game Theory Auctions Levent Ko ckesen Ko c University Levent Ko ckesen (Ko c

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Optimal Auctions Game Theory Course: Jackson, Leyton-Brown & Shoham Game Theory Course:

Combinatorial Auctions This document contains notes from the combinatorial auctions lecture for

A glimpse to sponsored search auctions Maria Serna Fall 2016 AGT-MIRI Sponsored search Keyword

Auctions Jos e M Vidal Department of Computer Science and Engineering University of South

Regional I ntraday I m plicit Auctions for Electricity W orkshop on I ntra Day Auctions,

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

THE BEGINNERS GUIDE TO INBOUND MARKETING By Syed Irfan Ajmal SERIAL ENTREPRENEUR

E-Commerce: Digital Markets, Digital Goods E-commerce and the Internet E-Commerce Today

Segmentation Strategies That Boost Open and Clickthrough Rates Joy Cropper Director of Internet

Animated Captchas And Games For Advertising Suhas Aggarwal IIT Guwahati May 2013 1 Content

Reaching Agreement: Auctions Contents Introductjon Auctjon Parameters English, Dutch,

February 27, 2018 Forward looking statements and non-GAAP measures Caution Regarding

Google Ads Search & Display Certification Course 2 Outline Modules we will be covering

Background of Project Background of Project The Webs Missing Links: The Webs Missing

Bandits in Auctions (& more) Vianney Perchet joint work with P. - PowerPoint PPT Presentation

Bandits in Auctions (& more) Vianney Perchet joint work with P. Rigollet (MIT) and J. Weed (MIT) CEMRACS 2017 July 20 2017 CMLA, ENS Paris-Saclay & Criteo Research Motivations & Objectives Classical Examples of Bandits Problems

An introduction to Auctions: Single Item Auctions (2) Maria Serna Fall 2016 AGT-MIRI Single

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

An introduction to Auctions: Single Item Auctions Maria Serna Fall 2016 AGT-MIRI Single item

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Auctions, Auction Theory, and Hard Computational Problems in Auctions Kevin Leyton-Brown This

Game Theory Auctions Levent Ko ckesen Ko c University Levent Ko ckesen (Ko c

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Optimal Auctions Game Theory Course: Jackson, Leyton-Brown &amp; Shoham Game Theory Course:

Combinatorial Auctions This document contains notes from the combinatorial auctions lecture for

A glimpse to sponsored search auctions Maria Serna Fall 2016 AGT-MIRI Sponsored search Keyword

Auctions Jos e M Vidal Department of Computer Science and Engineering University of South

Regional I ntraday I m plicit Auctions for Electricity W orkshop on I ntra Day Auctions,

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

THE BEGINNERS GUIDE TO INBOUND MARKETING By Syed Irfan Ajmal SERIAL ENTREPRENEUR

E-Commerce: Digital Markets, Digital Goods E-commerce and the Internet E-Commerce Today

Segmentation Strategies That Boost Open and Clickthrough Rates Joy Cropper Director of Internet

Animated Captchas And Games For Advertising Suhas Aggarwal IIT Guwahati May 2013 1 Content

Reaching Agreement: Auctions Contents Introductjon Auctjon Parameters English, Dutch,

February 27, 2018 Forward looking statements and non-GAAP measures Caution Regarding

Google Ads Search &amp; Display Certification Course 2 Outline Modules we will be covering

Background of Project Background of Project The Webs Missing Links: The Webs Missing

Optimal Auctions Game Theory Course: Jackson, Leyton-Brown & Shoham Game Theory Course:

Google Ads Search & Display Certification Course 2 Outline Modules we will be covering