lea learn rning ing to to bi bid d wi with thout out
play

Lea Learn rning ing to to Bi Bid d Wi With thout out Kn - PowerPoint PPT Presentation

Lea Learn rning ing to to Bi Bid d Wi With thout out Kn Knowin wing g yo your ur Va Valu lue Zhe Feng, Harvard Joint work with Chara Podimata (Harvard) and Vasilis Syrgkanis (MSR) 19th ACM Conference on Economics and Computation,


  1. Lea Learn rning ing to to Bi Bid d Wi With thout out Kn Knowin wing g yo your ur Va Valu lue Zhe Feng, Harvard Joint work with Chara Podimata (Harvard) and Vasilis Syrgkanis (MSR) 19th ACM Conference on Economics and Computation, 6/21/2018 1 EC’18

  2. Wa Warm rm-up up Auction theory & Mechanism Design v i b i ( a i , p i ) Auction Utility to buyer i : u i = a i v i − p i 19th ACM Conference on Economics and Computation, 6/21/2018 2 EC’18

  3. Motiva tivation tion Key assumption in Auction Theory & Mechanism Design Private valuation but known to the bidder himself/herself 19th ACM Conference on Economics and Computation, 6/21/2018 3 EC’18

  4. Motiva tivation tion Key assumption in Auction Theory & Mechanism Design Private valuation but known to the bidder himself/herself 19th ACM Conference on Economics and Computation, 6/21/2018 4 EC’18

  5. Motiva tivation tion Key assumption in Auction Theory & Mechanism Design Small markets; Digital economy: online Bidders have time to advertisement auctions; prepare to bid No time to prepare to bid (market research) (market research) 19th ACM Conference on Economics and Computation, 6/21/2018 5 EC’18

  6. Main ain que uest stion ion How to design a bidding strategy for the learner in online advertisement auctions when he/she doesn’t know the value before submitting the bid . 19th ACM Conference on Economics and Computation, 6/21/2018 6 EC’18

  7. Sp Sponsored nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) 19th ACM Conference on Economics and Computation, 6/21/2018 7 EC’18

  8. Sponsored Sp nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) 19th ACM Conference on Economics and Computation, 6/21/2018 8 EC’18

  9. Sp Sponsored nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Clicked by users Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates value 𝑤 𝑢 19th ACM Conference on Economics and Computation, 6/21/2018 9 EC’18

  10. Sp Sponsored nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Observes Clicked by (estimated) users 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates value 𝑤 𝑢 19th ACM Conference on Economics and Computation, 6/21/2018 10 EC’18

  11. Sp Sponsored nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Observes (estimated) 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) 19th ACM Conference on Economics and Computation, 6/21/2018 11 EC’18

  12. Sponsored Sp nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Observes Clicked by (estimated) users 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates value 𝑤 𝑢 Reward 𝑤 𝑢 − 𝑞 𝑢 (⋅) Expected utility 𝑣 𝑢 (𝑐) = (𝑤 𝑢 −𝑞 𝑢 𝑐 ) ⋅ 𝑦 𝑢 (𝑐) 19th ACM Conference on Economics and Computation, 6/21/2018 12 EC’18

  13. Si Simp mple le Model: del: Si Sing ngle le-item item Auc uctio tions ns • At each day 𝒖 : • Designer and competitors choose allocation rule, 𝒚 𝒖 (⋅) ; payment rule, 𝒒 𝒖 (⋅) • Learner submits 𝒄 𝒖 ∈ 𝑪 (finite set) • The learner wins item with probability 𝒚 𝒖 (𝐜 𝐮 ) • At the end, observes 𝒚 𝒖 (⋅), 𝒒 𝒖 (⋅) • If the learner wins, observes 𝒘 𝒖 19th ACM Conference on Economics and Computation, 6/21/2018 13 EC’18

  14. Si Simp mple le Model: del: Si Sing ngle le-item item Auc uctio tions ns • At each day 𝒖 : • Designer and competitors choose allocation rule, 𝒚 𝒖 (⋅) ; payment rule, 𝒒 𝒖 (⋅) • Learner submits 𝒄 𝒖 ∈ 𝑪 (finite set) • The learner wins item with probability 𝒚 𝒖 (𝐜 𝐮 ) • At the end, observes 𝒚 𝒖 (⋅), 𝒒 𝒖 (⋅) • If the learner wins, observes 𝒘 𝒖 • Expected utility function: 𝒗 𝒖 𝒄 = 𝒘 𝒖 − 𝒒 𝒖 𝒄 ⋅ 𝒚 𝒖 (𝒄) 19th ACM Conference on Economics and Computation, 6/21/2018 14 EC’18

  15. Si Simp mple le Model: del: Si Sing ngle le-item item Auc uctio tions ns • At each day 𝒖 : • Designer and competitors choose allocation rule, 𝒚 𝒖 (⋅) ; payment rule, 𝒒 𝒖 (⋅) • Learner submits 𝒄 𝒖 ∈ 𝑪 (finite set) • The learner wins item with probability 𝒚 𝒖 (𝐜 𝐮 ) • At the end, observes 𝒚 𝒖 (⋅), 𝒒 𝒖 (⋅) • If the learner wins, observes 𝒘 𝒖 • Expected utility function: 𝒗 𝒖 𝒄 = 𝒘 𝒖 − 𝒒 𝒖 𝒄 ⋅ 𝒚 𝒖 (𝒄) • Goal: minimize expected regret 𝑼 𝑼 𝒗 𝒖 (𝒄 ∗ ) 𝑺 𝑼 = 𝐭𝐯𝐪 𝒄 ∗ 𝔽 ෍ − 𝔽 ෍ 𝒗 𝒖 (𝒄 𝒖 ) 𝒖=𝟐 𝒖=𝟐 Utility with best Utility with bids fixed bid in hindsight generated by algorithm 19th ACM Conference on Economics and Computation, 6/21/2018 15 EC’18

  16. Mul ulti ti-Arme Armed d Ban andit dit (MAB AB) At each round 𝒖 = 𝟐, ⋯ , 𝑼 • Adversary chooses reward vector 𝒔 𝒖 = (𝒔 𝟐,𝒖 , ⋯ , 𝒔 𝑳,𝒖 ) • Learner chooses an action 𝒋 𝒖 ∈ 𝑪 • Learner gets reward 𝒔 𝒋 𝒖 ,𝒖 and only observes 𝒔 𝒋 𝒖 ,𝒖 EXP3 achieves regret 𝑷 𝑼|𝑪| 19th ACM Conference on Economics and Computation, 6/21/2018 16 EC’18

  17. Formal rmal mai ain n que uest stion ion Can we design an online learning algorithm for the learner to achieve better regret than generic MAB ? 19th ACM Conference on Economics and Computation, 6/21/2018 17 EC’18

  18. Ou Our Re r Resu sults: lts: WI WIN-EXP EXP al algorithm orithm Utilize partial feedback information from the auctions. Partial feedback: between bandit feedback and full information feedback Theorem 1. WIN-EXP algorithm achieves regret at most 𝟓 𝑼 𝐦𝐩𝐡|𝑪| Recall: EXP3 achieves 𝑷( 𝑼|𝑪|) 19th ACM Conference on Economics and Computation, 6/21/2018 18 EC’18

  19. Rel elated ated Wo Work rk No regret learning in GT/MD From auctioneer side: [Blum et. al, 04], [Amin et. al, 05], [Amin et. al, 06], [Cesa- Bianchi et.al, 15], … From bidder side: [Dikkala & Tardos, 13], [Balseiro & Gur, 17], [Weed et. al, 16] Learning with partial feedback Contextual Bandit: [Bubeck & Cesa-Bianchi, 12] [Agarwal et. al, 14]… Feedback graphs: [Alon et. al, 13], [Alon et. al, 15] 19th ACM Conference on Economics and Computation, 6/21/2018 19 EC’18

  20. Technical Parts 19th ACM Conference on Economics and Computation, 6/21/2018 20 EC’18

  21. The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback At each day 𝒖 : • Learner chooses an action 𝒄 𝒖 ∈ 𝑪 . 19th ACM Conference on Economics and Computation, 6/21/2018 21 EC’18

  22. The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback At each day 𝒖 : • Learner chooses an action 𝒄 𝒖 ∈ 𝑪 . • The adversary chooses a reward function 𝒔 𝒖 : 𝑪 → [−𝟐, 𝟐] and allocation function 𝒚 𝒖 (⋅). 19th ACM Conference on Economics and Computation, 6/21/2018 22 EC’18

  23. The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback At each day 𝒖 : • Learner chooses an action 𝒄 𝒖 ∈ 𝑪 . • The adversary chooses a reward function 𝒔 𝒖 : 𝑪 → [−𝟐, 𝟐] and allocation function 𝒚 𝒖 (⋅). • The learner wins reward 𝒔 𝒖 (𝒄 𝒖 ) with probability of 𝒚 𝒖 (𝒄 𝒖 ) 19th ACM Conference on Economics and Computation, 6/21/2018 23 EC’18

  24. The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback At each day 𝒖 : • Learner chooses an action 𝒄 𝒖 ∈ 𝑪 . • The adversary chooses a reward function 𝒔 𝒖 : 𝑪 → [−𝟐, 𝟐] and allocation function 𝒚 𝒖 (⋅). • The learner wins reward 𝒔 𝒖 (𝒄 𝒖 ) with probability of 𝒚 𝒖 (𝒄 𝒖 ) • Feedback: always learns the allocation rule 𝒚 𝒖 ; if she wins, also learns 𝒔 𝒖 (⋅) 19th ACM Conference on Economics and Computation, 6/21/2018 24 EC’18

  25. WI WIN-EXP EXP Alg lgorithm orithm For r Wi Win-Only Only Fee eedback dback At each round 𝒖 : • Draw a bid 𝒄 𝒖 ∼ 𝝆 𝒖 19th ACM Conference on Economics and Computation, 6/21/2018 25 EC’18

  26. WI WIN-EXP EXP Alg lgorithm orithm For r Wi Win-Only Only Fee eedback dback At each round 𝒖 : • Draw a bid 𝒄 𝒖 ∼ 𝝆 𝒖 • Observe allocation rule 𝒚 𝒖 ; if wins, observe 𝒔 𝒖 (⋅) 19th ACM Conference on Economics and Computation, 6/21/2018 26 EC’18

  27. WI WIN-EXP EXP Alg lgorithm orithm For r Wi Win-Only Only Fee eedback dback At each round 𝒖 : • Draw a bid 𝒄 𝒖 ∼ 𝝆 𝒖 • Observe allocation rule 𝒚 𝒖 ; if wins, observe 𝒔 𝒖 (⋅) • Compute the unbiased estimator of 𝒗 𝒖 𝒄 − 𝟐 (𝒔 𝒖 𝒄 −𝟐) ⋅ 𝒚 𝒖 (𝒄) , 𝐣𝐠 𝐮𝐢𝐟 𝐦𝐟𝐛𝐬𝐨𝐟𝐬 𝐱𝐣𝐨𝐭 σ 𝒄 𝝆 𝒖 𝒄 𝒚 𝒖 (𝒄) ෥ 𝒗 𝒖 𝒄 = 𝟐 − 𝒚 𝒖 𝒄 𝐣𝐠 𝐮𝐢𝐟 𝐦𝐟𝐛𝐬𝐨𝐟𝐬 𝐞𝐩𝐟𝐭𝐨 ′ 𝐮 𝐱𝐣𝐨 − 𝟐 − σ 𝒄 𝝆 𝒖 𝒄 𝒚 𝒖 𝒄 , 19th ACM Conference on Economics and Computation, 6/21/2018 27 EC’18

Recommend


More recommend