Optimal Non-parametric Learning in Repeated Contextual Auctions with Strategic Buyer Alexey Drutsa
Setup
Repeated Contextual Posted-Price Auctions Different goods (e.g., ad spaces) βΊ described by π -dimensional feature vectors (contexts) from 0,1 % βΊ are repeatedly offered for sale by a seller βΊ to a single buyer over π rounds (one good per round). The buyer βΊ holds a private fixed valuation function π€: 0,1 % β 0,1 βΊ used to calculate his valuation π€(π¦) for a good with context π¦ β 0,1 % , βΊ π€ is unknown to the seller. At each round π’ = 1, β¦ , π , βΊ a feature vector π¦ 2 of the current good is observed by the seller and the buyer βΊ a price π 2 is offered by the seller, βΊ and an allocation decision π 2 β {0,1} is made by the buyer: π 2 = 0 , when the buyer rejects, and π 2 = 1 , when the buyer accepts.
Sellerβs pricing algorithm and buyer strategy : The seller applies a pricing algorithm π΅ that sets prices {π 2 } 289 : : in response to buyer decisions π = {π 2 } 289 and observed contexts π² = {π¦ 2 } 289 . The price π 2 can depend only on βΊ past decisions {π = } =89 2>9 βΊ feature vectors {π¦ = } =89 2 βΊ the horizon π
Strategic buyer β The seller announces her pricing algorithm π΅ in advance The buyer has some distribution (beliefs) πΈ about future contexts. In each round π’ , given the history of previous rounds, he chooses his decision π 2 s.t. it maximizes his future πΏ -discounted surplus: : πΏ =>9 π = (π€(π¦ = ) β π = ) π½ B C ~E F , πΏ β (0,1] =82
The gameβs workflow and knowledge structure priva pr vate π€ πΈ π€ πΈ π€ πΈ knowle kn ledge Nature Na Buyer Buye Nature Na Buye Buyer Na Nature Buyer Buye Se Seller Algo Al gorithm hm Al Algo gorithm hm Al Algo gorithm hm ledge blic π¦ 9 π 9 π 9 π¦ J π J π J π¦ L π L π L publ Al Algo gorithm knowle pu kn before game starts round π’ = 1 round π’ = 2 round π’ = 3
Sellerβs goal The sellerβs strategic regret: RST π 2 ) : SReg π, π΅, π€, πΏ, π¦ 9:: , πΈ : = β (π€(π¦ 2 ) β π 2 289 We will learn the function π€ in a non-parametric way. For this, we will assume that it is Lipschitz (a standard requirement for non-parametric learning): 0,1 % β π: 0,1 % β 0,1 |βπ¦, π§ β 0,1 % π π¦ β π π§ Lip X β€ π π¦ β π§ The seller seeks for a no-regret pricing for worst-case valuation function: sup bβcdS e f,9 g ,B h:i ,E SReg π, π΅, π€, πΏ, π¦ 9:: , πΈ = π π Optimality : the lowest possible upper bound for the regret of the form π π(π) .
Background & Research question
Background [Kleinberg et al., FOCSβ2003] Non-contextual setup ( π = 0 ). Horizon-dependent optimal algorithm against myopic buyer ( πΏ = 0 ) with truthful regret Ξ(log log π) . [Amin et al., NIPSβ2013] Non-contextual setup ( π = 0 ). The strategic setting is introduced. β no-regret pricing for non-discount case πΏ = 1 . [Drutsa, WWWβ2017] Non-contextual setup ( π = 0 ). Horizon- independent optimal algorithm against strategic buyer with regret Ξ(log log π) for πΏ < 1 . [Mao et al., NIPSβ2018] Our non-parametric contextual setup ( π > 0 ). Horizon-dependent optimal algorithm against g myopic buyer ( πΏ = 0 ) with truthful regret Ξ(π gph ) .
Research question The key approaches of the non-contextual optimal algorithms ([pre]PRRFES) cannot be directly applied to contextual algorithm of [Mao et al., NIPSβ2018] In order to search the valuation of the strategic buyer without context: βΊ Penalization rounds are used βΊ We do not propose prices below the ones that are earlier accepted In the approach of [Mao et al., NIPSβ2018]: βΊ Standard penalization does not help βΊ Proposed prices can be below the ones that are earlier accepted by the buyer β In this study, I overcome these issues and propose an optimal β non-parametric algorithm for the contextual setting with strategic buyer
Novel optimal algorithm
Penalized Exploiting Lipschitz Search (PELS) PELS has three parameters: βΊ the price offset π β 1, +β βΊ the degree of penalization π β β βΊ the exploitation rate π: β€ z β β€ z This algorithm keeps track of βΊ a partition π of the feature domain 0,1 % βΊ initialized to 4π + 6 π % cubes (boxes) with side length π = 1/ 4π + 6 π : % . π = π½ 9 Γ π½ J Γ β― Γ π½ % | π½ 9 , π½ J , β¦ , π½ % β 0, π , π, 2π , β¦ , 1 β π, 1
Penalized Exploiting Lipschitz Search (PELS) For each box π β π , PELS also keeps track of: βΊ the lower bound π£ β β [0,1] , βΊ the upper bound π₯ β β [0,1] , βΊ the depth π β β β€ z . They are initialized as follows: π£ β = 0 , π₯ β = 1 , and π β = 0 , π β π . The workflow of the algorithm is organized independently in each box π β π . βΊ the algorithm receives a good with a feature vector π¦ 2 β 0,1 % βΊ finds the box π β π in the current partition π s.t. π¦ 2 β π . β Then, the proposed price π 2 is determined only from the current state β associated with the box π , while the buyer decision π 2 is used β only to update the state associated with this box π .
Penalized Exploiting Lipschitz Search (PELS) In each box π β π , the algorithm iteratively offers exploration price: π£ β + ππdiam(π) β If this price is accepted by the buyer: βΊ the lower bound π£ β is increased by πdiam(π) . β If this price is rejected: βΊ the upper bound π₯ β is decreased by π₯ β β π£ β β 2(π + 1)πdiam(π) βΊ 1 is offered as a penalization price for π β 1 next rounds in this box π (if one of them is accepted, we continue offering 1 all the remaining rounds).
Penalized Exploiting Lipschitz Search (PELS) β If, after an acceptance of an exploration price or after penalization rounds we have π₯ β β π£ β < (2π + 3)πdiam(π) , β then PELS: βΊ offers the exploitation price π£ β for π(π β ) next rounds in this box π (buyer decisions made at them do not affect further pricing); βΊ bisects each side of the box π to obtain 2 % boxes π β β π 9 , β¦ , π J g with β Ε½ -diameter equal to diam(π)/2 ; βΊ refines the partition π β replacing the box π by the new boxes π β . These new boxes π β βΊ inherit the state of the bounds π£ β and π₯ β from the current state of π , βΊ while their depth π β’ = π β + 1 βπ β π β .
PELS is optimal Theorem 1. Let π β₯ 1 and πΏ f β 0,1 . Then for the pricing algorithm PELS π΅ with: βΊ the number of penalization rounds π β₯ log β β 9>β β J βΊ the exploitation rate π π = 2 β , π β β€ z , βΊ the price offset π β₯ 2/(1 β πΏ f ) 0,1 % , discount πΏ β€ πΏ f , distribution πΈ and for any valuation function π€ β Lip X feature vectors π¦ 9:: , the strategic regret is upper bounded: 9 % %z9 = Ξ(π SReg π, π΅, π€, πΏ, π¦ 9:: , πΈ β€ π· π f π + π f % %z9 ), π· β 2 % π 2π + 3 + π >9 + 1 and π f β 4π + 6 π % .
PELS: main properties and extensions βΊ Can be applied against myopic buyer ( πΏ = 0 ) (setup of [Mao et al., NIPSβ2018]) βΊ PELS is horizon-independent (in contrast to [Mao et al., NIPSβ2018]) β What if the loss is symmetric? βΊ We can generalize the algorithm to classical online learning losses βΊ For instance, we want to optimize regret of the form β : |π€(π¦ 2 ) β π 2 | 289 βΊ But interacting with the strategic buyer still gβh βΊ Slight modification of PELS has regret π(π g ) , which is tight for π > 1 .
Thank you! Alexey Drutsa Yandex adrutsa@yandex.ru
Recommend
More recommend