Lea Learn rning ing to to Bi Bid d Wi With thout out Kn Knowin wing g yo your ur Va Valu lue Zhe Feng, Harvard Joint work with Chara Podimata (Harvard) and Vasilis Syrgkanis (MSR) 19th ACM Conference on Economics and Computation, 6/21/2018 1 EC’18
Wa Warm rm-up up Auction theory & Mechanism Design v i b i ( a i , p i ) Auction Utility to buyer i : u i = a i v i − p i 19th ACM Conference on Economics and Computation, 6/21/2018 2 EC’18
Motiva tivation tion Key assumption in Auction Theory & Mechanism Design Private valuation but known to the bidder himself/herself 19th ACM Conference on Economics and Computation, 6/21/2018 3 EC’18
Motiva tivation tion Key assumption in Auction Theory & Mechanism Design Private valuation but known to the bidder himself/herself 19th ACM Conference on Economics and Computation, 6/21/2018 4 EC’18
Motiva tivation tion Key assumption in Auction Theory & Mechanism Design Small markets; Digital economy: online Bidders have time to advertisement auctions; prepare to bid No time to prepare to bid (market research) (market research) 19th ACM Conference on Economics and Computation, 6/21/2018 5 EC’18
Main ain que uest stion ion How to design a bidding strategy for the learner in online advertisement auctions when he/she doesn’t know the value before submitting the bid . 19th ACM Conference on Economics and Computation, 6/21/2018 6 EC’18
Sp Sponsored nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) 19th ACM Conference on Economics and Computation, 6/21/2018 7 EC’18
Sponsored Sp nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) 19th ACM Conference on Economics and Computation, 6/21/2018 8 EC’18
Sp Sponsored nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Clicked by users Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates value 𝑤 𝑢 19th ACM Conference on Economics and Computation, 6/21/2018 9 EC’18
Sp Sponsored nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Observes Clicked by (estimated) users 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates value 𝑤 𝑢 19th ACM Conference on Economics and Computation, 6/21/2018 10 EC’18
Sp Sponsored nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Observes (estimated) 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) 19th ACM Conference on Economics and Computation, 6/21/2018 11 EC’18
Sponsored Sp nsored Se Search arch Example xample bids Platform Advertiser (Auctioneer) (Learner) Observes Clicked by (estimated) users 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates 𝑦 𝑢 (⋅), 𝑞 𝑢 (⋅) Generates value 𝑤 𝑢 Reward 𝑤 𝑢 − 𝑞 𝑢 (⋅) Expected utility 𝑣 𝑢 (𝑐) = (𝑤 𝑢 −𝑞 𝑢 𝑐 ) ⋅ 𝑦 𝑢 (𝑐) 19th ACM Conference on Economics and Computation, 6/21/2018 12 EC’18
Si Simp mple le Model: del: Si Sing ngle le-item item Auc uctio tions ns • At each day 𝒖 : • Designer and competitors choose allocation rule, 𝒚 𝒖 (⋅) ; payment rule, 𝒒 𝒖 (⋅) • Learner submits 𝒄 𝒖 ∈ 𝑪 (finite set) • The learner wins item with probability 𝒚 𝒖 (𝐜 𝐮 ) • At the end, observes 𝒚 𝒖 (⋅), 𝒒 𝒖 (⋅) • If the learner wins, observes 𝒘 𝒖 19th ACM Conference on Economics and Computation, 6/21/2018 13 EC’18
Si Simp mple le Model: del: Si Sing ngle le-item item Auc uctio tions ns • At each day 𝒖 : • Designer and competitors choose allocation rule, 𝒚 𝒖 (⋅) ; payment rule, 𝒒 𝒖 (⋅) • Learner submits 𝒄 𝒖 ∈ 𝑪 (finite set) • The learner wins item with probability 𝒚 𝒖 (𝐜 𝐮 ) • At the end, observes 𝒚 𝒖 (⋅), 𝒒 𝒖 (⋅) • If the learner wins, observes 𝒘 𝒖 • Expected utility function: 𝒗 𝒖 𝒄 = 𝒘 𝒖 − 𝒒 𝒖 𝒄 ⋅ 𝒚 𝒖 (𝒄) 19th ACM Conference on Economics and Computation, 6/21/2018 14 EC’18
Si Simp mple le Model: del: Si Sing ngle le-item item Auc uctio tions ns • At each day 𝒖 : • Designer and competitors choose allocation rule, 𝒚 𝒖 (⋅) ; payment rule, 𝒒 𝒖 (⋅) • Learner submits 𝒄 𝒖 ∈ 𝑪 (finite set) • The learner wins item with probability 𝒚 𝒖 (𝐜 𝐮 ) • At the end, observes 𝒚 𝒖 (⋅), 𝒒 𝒖 (⋅) • If the learner wins, observes 𝒘 𝒖 • Expected utility function: 𝒗 𝒖 𝒄 = 𝒘 𝒖 − 𝒒 𝒖 𝒄 ⋅ 𝒚 𝒖 (𝒄) • Goal: minimize expected regret 𝑼 𝑼 𝒗 𝒖 (𝒄 ∗ ) 𝑺 𝑼 = 𝐭𝐯𝐪 𝒄 ∗ 𝔽 − 𝔽 𝒗 𝒖 (𝒄 𝒖 ) 𝒖=𝟐 𝒖=𝟐 Utility with best Utility with bids fixed bid in hindsight generated by algorithm 19th ACM Conference on Economics and Computation, 6/21/2018 15 EC’18
Mul ulti ti-Arme Armed d Ban andit dit (MAB AB) At each round 𝒖 = 𝟐, ⋯ , 𝑼 • Adversary chooses reward vector 𝒔 𝒖 = (𝒔 𝟐,𝒖 , ⋯ , 𝒔 𝑳,𝒖 ) • Learner chooses an action 𝒋 𝒖 ∈ 𝑪 • Learner gets reward 𝒔 𝒋 𝒖 ,𝒖 and only observes 𝒔 𝒋 𝒖 ,𝒖 EXP3 achieves regret 𝑷 𝑼|𝑪| 19th ACM Conference on Economics and Computation, 6/21/2018 16 EC’18
Formal rmal mai ain n que uest stion ion Can we design an online learning algorithm for the learner to achieve better regret than generic MAB ? 19th ACM Conference on Economics and Computation, 6/21/2018 17 EC’18
Ou Our Re r Resu sults: lts: WI WIN-EXP EXP al algorithm orithm Utilize partial feedback information from the auctions. Partial feedback: between bandit feedback and full information feedback Theorem 1. WIN-EXP algorithm achieves regret at most 𝟓 𝑼 𝐦𝐩𝐡|𝑪| Recall: EXP3 achieves 𝑷( 𝑼|𝑪|) 19th ACM Conference on Economics and Computation, 6/21/2018 18 EC’18
Rel elated ated Wo Work rk No regret learning in GT/MD From auctioneer side: [Blum et. al, 04], [Amin et. al, 05], [Amin et. al, 06], [Cesa- Bianchi et.al, 15], … From bidder side: [Dikkala & Tardos, 13], [Balseiro & Gur, 17], [Weed et. al, 16] Learning with partial feedback Contextual Bandit: [Bubeck & Cesa-Bianchi, 12] [Agarwal et. al, 14]… Feedback graphs: [Alon et. al, 13], [Alon et. al, 15] 19th ACM Conference on Economics and Computation, 6/21/2018 19 EC’18
Technical Parts 19th ACM Conference on Economics and Computation, 6/21/2018 20 EC’18
The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback At each day 𝒖 : • Learner chooses an action 𝒄 𝒖 ∈ 𝑪 . 19th ACM Conference on Economics and Computation, 6/21/2018 21 EC’18
The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback At each day 𝒖 : • Learner chooses an action 𝒄 𝒖 ∈ 𝑪 . • The adversary chooses a reward function 𝒔 𝒖 : 𝑪 → [−𝟐, 𝟐] and allocation function 𝒚 𝒖 (⋅). 19th ACM Conference on Economics and Computation, 6/21/2018 22 EC’18
The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback At each day 𝒖 : • Learner chooses an action 𝒄 𝒖 ∈ 𝑪 . • The adversary chooses a reward function 𝒔 𝒖 : 𝑪 → [−𝟐, 𝟐] and allocation function 𝒚 𝒖 (⋅). • The learner wins reward 𝒔 𝒖 (𝒄 𝒖 ) with probability of 𝒚 𝒖 (𝒄 𝒖 ) 19th ACM Conference on Economics and Computation, 6/21/2018 23 EC’18
The he Abstractio straction: n: Wi Win-Only Only Fee eedback dback At each day 𝒖 : • Learner chooses an action 𝒄 𝒖 ∈ 𝑪 . • The adversary chooses a reward function 𝒔 𝒖 : 𝑪 → [−𝟐, 𝟐] and allocation function 𝒚 𝒖 (⋅). • The learner wins reward 𝒔 𝒖 (𝒄 𝒖 ) with probability of 𝒚 𝒖 (𝒄 𝒖 ) • Feedback: always learns the allocation rule 𝒚 𝒖 ; if she wins, also learns 𝒔 𝒖 (⋅) 19th ACM Conference on Economics and Computation, 6/21/2018 24 EC’18
WI WIN-EXP EXP Alg lgorithm orithm For r Wi Win-Only Only Fee eedback dback At each round 𝒖 : • Draw a bid 𝒄 𝒖 ∼ 𝝆 𝒖 19th ACM Conference on Economics and Computation, 6/21/2018 25 EC’18
WI WIN-EXP EXP Alg lgorithm orithm For r Wi Win-Only Only Fee eedback dback At each round 𝒖 : • Draw a bid 𝒄 𝒖 ∼ 𝝆 𝒖 • Observe allocation rule 𝒚 𝒖 ; if wins, observe 𝒔 𝒖 (⋅) 19th ACM Conference on Economics and Computation, 6/21/2018 26 EC’18
WI WIN-EXP EXP Alg lgorithm orithm For r Wi Win-Only Only Fee eedback dback At each round 𝒖 : • Draw a bid 𝒄 𝒖 ∼ 𝝆 𝒖 • Observe allocation rule 𝒚 𝒖 ; if wins, observe 𝒔 𝒖 (⋅) • Compute the unbiased estimator of 𝒗 𝒖 𝒄 − 𝟐 (𝒔 𝒖 𝒄 −𝟐) ⋅ 𝒚 𝒖 (𝒄) , 𝐣𝐠 𝐮𝐢𝐟 𝐦𝐟𝐛𝐬𝐨𝐟𝐬 𝐱𝐣𝐨𝐭 σ 𝒄 𝝆 𝒖 𝒄 𝒚 𝒖 (𝒄) 𝒗 𝒖 𝒄 = 𝟐 − 𝒚 𝒖 𝒄 𝐣𝐠 𝐮𝐢𝐟 𝐦𝐟𝐛𝐬𝐨𝐟𝐬 𝐞𝐩𝐟𝐭𝐨 ′ 𝐮 𝐱𝐣𝐨 − 𝟐 − σ 𝒄 𝝆 𝒖 𝒄 𝒚 𝒖 𝒄 , 19th ACM Conference on Economics and Computation, 6/21/2018 27 EC’18
Recommend
More recommend