optimal non parametric learning in repeated contextual
play

Optimal Non-parametric Learning in Repeated Contextual Auctions with - PowerPoint PPT Presentation

Optimal Non-parametric Learning in Repeated Contextual Auctions with Strategic Buyer Alexey Drutsa Setup Repeated Contextual Posted-Price Auctions Different goods (e.g., ad spaces) described by -dimensional feature vectors (contexts)


  1. Optimal Non-parametric Learning in Repeated Contextual Auctions with Strategic Buyer Alexey Drutsa

  2. Setup

  3. Repeated Contextual Posted-Price Auctions Different goods (e.g., ad spaces) β€Ί described by 𝑒 -dimensional feature vectors (contexts) from 0,1 % β€Ί are repeatedly offered for sale by a seller β€Ί to a single buyer over π‘ˆ rounds (one good per round). The buyer β€Ί holds a private fixed valuation function 𝑀: 0,1 % β†’ 0,1 β€Ί used to calculate his valuation 𝑀(𝑦) for a good with context 𝑦 ∈ 0,1 % , β€Ί 𝑀 is unknown to the seller. At each round 𝑒 = 1, … , π‘ˆ , β€Ί a feature vector 𝑦 2 of the current good is observed by the seller and the buyer β€Ί a price π‘ž 2 is offered by the seller, β€Ί and an allocation decision 𝑏 2 ∈ {0,1} is made by the buyer: 𝑏 2 = 0 , when the buyer rejects, and 𝑏 2 = 1 , when the buyer accepts.

  4. Seller’s pricing algorithm and buyer strategy : The seller applies a pricing algorithm 𝐡 that sets prices {π‘ž 2 } 289 : : in response to buyer decisions 𝐛 = {𝑏 2 } 289 and observed contexts 𝐲 = {𝑦 2 } 289 . The price π‘ž 2 can depend only on β€Ί past decisions {𝑏 = } =89 2>9 β€Ί feature vectors {𝑦 = } =89 2 β€Ί the horizon π‘ˆ

  5. Strategic buyer β–Œ The seller announces her pricing algorithm 𝐡 in advance The buyer has some distribution (beliefs) 𝐸 about future contexts. In each round 𝑒 , given the history of previous rounds, he chooses his decision 𝑏 2 s.t. it maximizes his future 𝛿 -discounted surplus: : 𝛿 =>9 𝑏 = (𝑀(𝑦 = ) βˆ’ π‘ž = ) 𝔽 B C ~E F , 𝛿 ∈ (0,1] =82

  6. The game’s workflow and knowledge structure priva pr vate 𝑀 𝐸 𝑀 𝐸 𝑀 𝐸 knowle kn ledge Nature Na Buyer Buye Nature Na Buye Buyer Na Nature Buyer Buye Se Seller Algo Al gorithm hm Al Algo gorithm hm Al Algo gorithm hm ledge blic 𝑦 9 π‘ž 9 𝑏 9 𝑦 J π‘ž J 𝑏 J 𝑦 L π‘ž L 𝑏 L publ Al Algo gorithm knowle pu kn before game starts round 𝑒 = 1 round 𝑒 = 2 round 𝑒 = 3

  7. Seller’s goal The seller’s strategic regret: RST π‘ž 2 ) : SReg π‘ˆ, 𝐡, 𝑀, 𝛿, 𝑦 9:: , 𝐸 : = βˆ‘ (𝑀(𝑦 2 ) βˆ’ 𝑏 2 289 We will learn the function 𝑀 in a non-parametric way. For this, we will assume that it is Lipschitz (a standard requirement for non-parametric learning): 0,1 % ≔ 𝑔: 0,1 % β†’ 0,1 |βˆ€π‘¦, 𝑧 ∈ 0,1 % 𝑔 𝑦 βˆ’ 𝑔 𝑧 Lip X ≀ 𝑀 𝑦 βˆ’ 𝑧 The seller seeks for a no-regret pricing for worst-case valuation function: sup b∈cdS e f,9 g ,B h:i ,E SReg π‘ˆ, 𝐡, 𝑀, 𝛿, 𝑦 9:: , 𝐸 = 𝑝 π‘ˆ Optimality : the lowest possible upper bound for the regret of the form 𝑃 𝑔(π‘ˆ) .

  8. Background & Research question

  9. Background [Kleinberg et al., FOCS’2003] Non-contextual setup ( 𝑒 = 0 ). Horizon-dependent optimal algorithm against myopic buyer ( 𝛿 = 0 ) with truthful regret Θ(log log π‘ˆ) . [Amin et al., NIPS’2013] Non-contextual setup ( 𝑒 = 0 ). The strategic setting is introduced. βˆ„ no-regret pricing for non-discount case 𝛿 = 1 . [Drutsa, WWW’2017] Non-contextual setup ( 𝑒 = 0 ). Horizon- independent optimal algorithm against strategic buyer with regret Θ(log log π‘ˆ) for 𝛿 < 1 . [Mao et al., NIPS’2018] Our non-parametric contextual setup ( 𝑒 > 0 ). Horizon-dependent optimal algorithm against g myopic buyer ( 𝛿 = 0 ) with truthful regret Θ(π‘ˆ gph ) .

  10. Research question The key approaches of the non-contextual optimal algorithms ([pre]PRRFES) cannot be directly applied to contextual algorithm of [Mao et al., NIPS’2018] In order to search the valuation of the strategic buyer without context: β€Ί Penalization rounds are used β€Ί We do not propose prices below the ones that are earlier accepted In the approach of [Mao et al., NIPS’2018]: β€Ί Standard penalization does not help β€Ί Proposed prices can be below the ones that are earlier accepted by the buyer β–Œ In this study, I overcome these issues and propose an optimal β–Œ non-parametric algorithm for the contextual setting with strategic buyer

  11. Novel optimal algorithm

  12. Penalized Exploiting Lipschitz Search (PELS) PELS has three parameters: β€Ί the price offset πœƒ ∈ 1, +∞ β€Ί the degree of penalization 𝑠 ∈ β„• β€Ί the exploitation rate 𝑕: β„€ z β†’ β„€ z This algorithm keeps track of β€Ί a partition π”œ of the feature domain 0,1 % β€Ί initialized to 4πœƒ + 6 𝑀 % cubes (boxes) with side length π‘š = 1/ 4πœƒ + 6 𝑀 : % . π”œ = 𝐽 9 Γ— 𝐽 J Γ— β‹― Γ— 𝐽 % | 𝐽 9 , 𝐽 J , … , 𝐽 % ∈ 0, π‘š , π‘š, 2π‘š , … , 1 βˆ’ π‘š, 1

  13. Penalized Exploiting Lipschitz Search (PELS) For each box π‘Œ ∈ π”œ , PELS also keeps track of: β€Ί the lower bound 𝑣 † ∈ [0,1] , β€Ί the upper bound π‘₯ † ∈ [0,1] , β€Ί the depth 𝑛 † ∈ β„€ z . They are initialized as follows: 𝑣 † = 0 , π‘₯ † = 1 , and 𝑛 † = 0 , π‘Œ ∈ π”œ . The workflow of the algorithm is organized independently in each box π‘Œ ∈ π”œ . β€Ί the algorithm receives a good with a feature vector 𝑦 2 ∈ 0,1 % β€Ί finds the box π‘Œ ∈ π”œ in the current partition π”œ s.t. 𝑦 2 ∈ π‘Œ . β–Œ Then, the proposed price π‘ž 2 is determined only from the current state β–Œ associated with the box π‘Œ , while the buyer decision 𝑏 2 is used β–Œ only to update the state associated with this box π‘Œ .

  14. Penalized Exploiting Lipschitz Search (PELS) In each box π‘Œ ∈ π”œ , the algorithm iteratively offers exploration price: 𝑣 † + πœƒπ‘€diam(π‘Œ) β–Œ If this price is accepted by the buyer: β€Ί the lower bound 𝑣 † is increased by 𝑀diam(π‘Œ) . β–Œ If this price is rejected: β€Ί the upper bound π‘₯ † is decreased by π‘₯ † βˆ’ 𝑣 † βˆ’ 2(πœƒ + 1)𝑀diam(π‘Œ) β€Ί 1 is offered as a penalization price for 𝑠 βˆ’ 1 next rounds in this box π‘Œ (if one of them is accepted, we continue offering 1 all the remaining rounds).

  15. Penalized Exploiting Lipschitz Search (PELS) β–Œ If, after an acceptance of an exploration price or after penalization rounds we have π‘₯ † βˆ’ 𝑣 † < (2πœƒ + 3)𝑀diam(π‘Œ) , β–Œ then PELS: β€Ί offers the exploitation price 𝑣 † for 𝑕(𝑛 † ) next rounds in this box π‘Œ (buyer decisions made at them do not affect further pricing); β€Ί bisects each side of the box π‘Œ to obtain 2 % boxes π”œ † ≔ π‘Œ 9 , … , π‘Œ J g with β„“ Ε½ -diameter equal to diam(π‘Œ)/2 ; β€Ί refines the partition π”œ † replacing the box π‘Œ by the new boxes π”œ † . These new boxes π”œ † β€Ί inherit the state of the bounds 𝑣 † and π‘₯ † from the current state of π‘Œ , β€Ί while their depth 𝑛 β€’ = 𝑛 † + 1 βˆ€π‘ ∈ π”œ † .

  16. PELS is optimal Theorem 1. Let 𝑒 β‰₯ 1 and 𝛿 f ∈ 0,1 . Then for the pricing algorithm PELS 𝐡 with: β€Ί the number of penalization rounds 𝑠 β‰₯ log ’ β€œ 9>’ β€œ J β€Ί the exploitation rate 𝑕 𝑛 = 2 ” , 𝑛 ∈ β„€ z , β€Ί the price offset πœƒ β‰₯ 2/(1 βˆ’ 𝛿 f ) 0,1 % , discount 𝛿 ≀ 𝛿 f , distribution 𝐸 and for any valuation function 𝑀 ∈ Lip X feature vectors 𝑦 9:: , the strategic regret is upper bounded: 9 % %z9 = Θ(π‘ˆ SReg π‘ˆ, 𝐡, 𝑀, 𝛿, 𝑦 9:: , 𝐸 ≀ 𝐷 𝑂 f π‘ˆ + 𝑂 f % %z9 ), 𝐷 ≔ 2 % 𝑠 2πœƒ + 3 + 𝑀 >9 + 1 and 𝑂 f ≔ 4πœƒ + 6 𝑀 % .

  17. PELS: main properties and extensions β€Ί Can be applied against myopic buyer ( 𝛿 = 0 ) (setup of [Mao et al., NIPS’2018]) β€Ί PELS is horizon-independent (in contrast to [Mao et al., NIPS’2018]) β–Œ What if the loss is symmetric? β€Ί We can generalize the algorithm to classical online learning losses β€Ί For instance, we want to optimize regret of the form βˆ‘ : |𝑀(𝑦 2 ) βˆ’ π‘ž 2 | 289 β€Ί But interacting with the strategic buyer still gβ€”h β€Ί Slight modification of PELS has regret 𝑃(π‘ˆ g ) , which is tight for 𝑒 > 1 .

  18. Thank you! Alexey Drutsa Yandex adrutsa@yandex.ru

Recommend


More recommend