bandits in auctions more
play

Bandits in Auctions (& more) Vianney Perchet joint work with P. - PowerPoint PPT Presentation

Bandits in Auctions (& more) Vianney Perchet joint work with P. Rigollet (MIT) and J. Weed (MIT) CEMRACS 2017 July 20 2017 CMLA, ENS Paris-Saclay & Criteo Research Motivations & Objectives Classical Examples of Bandits Problems


  1. Bandits in Auctions (& more) Vianney Perchet joint work with P. Rigollet (MIT) and J. Weed (MIT) CEMRACS 2017 July 20 2017 CMLA, ENS Paris-Saclay & Criteo Research

  2. Motivations & Objectives

  3. Classical Examples of Bandits Problems – Size of data: n patients with some proba of getting cured or – Patients cured or dead 1) Inference: Find the best treatment between the red and blue 3 – Choose one of two treatments to prescribe 2) Cumul: Save as many patients as possible

  4. Classical Examples of Bandits Problems – Size of data: n banners with some proba of click or – Banner clicked or ignored 1) Inference: Find the best ad between the red and blue 2) Cumul: Get as many clicks as possible 3 – Choose one of two ads to display

  5. • criteo chooses ad of a client, Microsoft or Cdiscount or Booking • criteo gets paid by the client if the user clicks on the ad Example of Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...) Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 4

  6. • criteo chooses ad of a client, Microsoft or Cdiscount or Booking • criteo gets paid by the client if the user clicks on the ad Example of Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...) Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 4

  7. Example of Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • criteo chooses ad of a client, Microsoft or Cdiscount or Booking • criteo gets paid by the client if the user clicks on the ad Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 4 • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...)

  8. Example of Repeated Auctions Some companies whose cookies can be controlled 4

  9. Back to Classical Examples of Bandits Problems – Size of data: n mails with some proba of spam or – Mail correctly or incorrectly classified 1) Inference: Find the best between the red and blue 2) Cumul: Minimize number of errors as possible 5 – Choose one of two actions: spam or ham

  10. Back to Classical Examples of Bandits Problems 5

  11. Back to Classical Examples of Bandits Problems – Size of data: n patients with some proba of getting cured or – Patients cured or dead 1) Inference: Find the best treatment between the red and blue 2) Cumul: Save as many patients as possible 5 – Choose one of two

  12. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  13. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  14. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  15. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  16. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  17. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  18. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  19. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  20. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  21. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  22. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  23. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  24. Two-Armed Bandit – Patients arrive and are treated sequentially. – Save as many as possible. 6

  25. A bit of theory 7

  26. Stochastic Multi-Armed Bandit

  27. K -Armed Stochastic Bandit Problems i.i.d. T T T max – Goal: Maximize expected reward 9 bounded X i – K actions i ∈ { 1 , . . . , K } , outcome X i t ∈ R (sub-)Gaussian, ( ) 1 , X i 2 , . . . , ∼ N µ i , 1 ( ) 2 , . . . , X π t − 1 X π 1 1 , X π 2 ∈ { 1 , . . . , K } – Non-Anticipative Policy: π t t − 1 ∑ T t = ∑ T t = 1 E X π t t = 1 µ π t – Performance: Cumulative Regret ∑ ∑ ∑ { } µ i − µ π t = ∆ i R T = π t = i ̸ = ⋆ 1 i ∈{ 1 ,..., K } t = 1 t = 1 t = 1 with ∆ i = µ ⋆ − µ i , the “gap” or cost of error i .

  28. Most Famous algorithm [Auer, Cesa-Bianchi, Fisher, ’02] i Worst-Case: k Regret: s . • UCB - “Upper Confidence Bound” T i t 10 i i X √ { } 2 log ( t ) π t + 1 = arg max t + , T i ( t ) where T i ( t ) = ∑ t ∑ t = 1 1 { π t = i } and X t = 1 s : i s = i X i E R T ≲ ∑ K log ( T ) log ( T ) ∧ T ∆ E R T ≲ sup ∆ k ∆ ∆ √ KT log ( T ) ≂

  29. 11 i X i i i • 2-lines proof: i i i { √ } 2 log ( t ) Ideas of proof π t + 1 = arg max i t + T i ( t ) √ √ 2 log ( t ) 2 log ( t ) ⋆ π t + 1 = i ̸ = ⋆ ⇐ ⇒ X t + ≤ X t + T ⋆ ( t ) T i ( t ) √ 2 log ( t ) ⇒ T i ( t ) ≲ log ( t ) ⇒ ”∆ i ≤ “ = = T i ( t ) ∆ 2 • Number of mistakes grows as log ( t ) i ; each mistake costs ∆ i . ∆ 2 Regret at stage T ≲ ∑ × ∆ i ≂ ∑ log ( T ) log ( T ) ∆ 2 ∆ i • “ = ⇒ ” actually happens with overwhelming proba • “optimal”: no algo always has a regret smaller than ∑ log ( T ) ∆ i

  30. Other Algos • Other algo, MOSS [Audibert, Bubeck], variants of UCB T Discretize + UCB gives TK • ETC [Perchet,Rigollet]. pull in round-robin then eliminate 12 k √ R T ≲ ∑ log ( T ∆ k ) , worst case R T ≤ T log ( K ) K ∆ k √ R T ≲ K log ( T ∆ min / K ) , worst case R T ≤ ∆ min • Infinite number of actions x ∈ [ 0 , 1 ] d with ∆( x ) 1 Lipschitz. √ ε ≤ T 2 / 3 R T ≲ T ε +

  31. Adversarial Multi-Armed Bandit

  32. K -Armed Adversarial Bandit Problems T t T X i 14 No assumption on X i • K actions i ∈ [ K ] = { 1 , . . . , K } , outcome X i t ∈ R bounded in [ 0 , 1 ] 1 , X i 2 , . . . ( ) 2 , . . . , X π t − 1 X π 1 1 , X π 2 • Non-Anticipative Policy: π t ∈ [ K ] t − 1 • Performance: Cumulative Regret ∑ ∑ X π t R T = max t − i ∈ [ K ] t = 1 t = 1 ∑ T • Convex optimization of p �→ E p t , from ∆([ K ]) to [ 0 , 1 ] t = 1 X i

  33. EXP-algo t • Using this estimate we obtain that p i t p i t t X i t p i t X i X t t p i t 15 p i X i s X t • Main insight: π t ∼ p t ∈ ∆([ K ]) , more weights on best actions e η ∑ t − 1 s = 1 X i t = η is a parameter ∑ j ∈ [ K ] e η ∑ t − 1 s , s = 1 X j t is observed, not X t . Estimate X t by � • Only X π t ( 1 − X i ) � 1 { π t = i } and run EXP on � t = 1 − • E � 1 − X i t = 1 − ( 1 − p i t ) . 0 + p i = X i t , unbiased estimator ( ) 2 • E ∑ t ) 2 ≤ 1 + ∑ t ( � 1 − X i t ≤ K + 1 bounded variance i ∈ [ K ] p i i ∈ K p i √ E R T ≤ log ( K ) + η ( K + 1 ) T ≤ 3 log ( K ) KT η

  34. Bandits & Repeated Auctions

  35. Back to Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • criteo gets paid by the client if the user clicks on the ad Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 17 • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...) • criteo chooses ad of a client, Microsoft or Cdiscount or Booking

  36. 2nd price Auctions • A good is sold on second price auctions auction. • The highest bidder wins and pays second highest bid Truthful auctions • Utility of bidder : 18 • Each buyer, with valuation v ( i ) , puts a bet b ( i ) b ♯ = max i ̸ = argmax b ( i ) (ties broken arbitrarily) optimal strategy bid its own valuation b ( i ) = v ( i ) ( v ( i ) − b ♯ ) 1 { b ( i ) ≥ b ♯ } • if b ( i ) > v ( i ) might only pay too much • if b ( i ) > v ( i ) might loose the auction

  37. Reserve price Reserve price • Still truthful: c is a bid 19 • Utility of highest value: v ⋆ − b ♯ • Utility of seller (value v 0 ): b ♯ − v 0 , can be negative ! A threshold c : if b ∗ ≥ c ; price max { b ♯ , c } otherwise not sold • Optimal reserve price c ∗ max. E ( max { v ♯ , c } − v 0 ) 1 { v ∗ ≥ c } • Depends on the (actually unknown) distributions of value.

  38. Main model • Total regret : T T • Learning optimal reserve price [Cesa-Bianchi, Gentile, Mansour] max 20 From the point of view of a bidder ? • At round t = 1 , . . . , T : bidder bids b t ∈ [ 0 , 1 ] if b t > m t (maximum other bids & reserve price) win good, observe value v t ∈ [ 0 , 1 ] • Total utility: ∑ T t = 1 ( v t − m t ) 1 { b t > m t } ∑ ∑ ( v t − m t ) 1 { b > m t } − ( v t − m t ) 1 { b t > m t } b ∈ [ 0 , 1 ] t = 1 t = 1

Recommend


More recommend