allocating resources in the future
play

Allocating Resources, in the Future Sid Banerjee School of ORIE - PowerPoint PPT Presentation

Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model ... ... (1) (2) (3)


  1. Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making

  2. online resource allocation: basic model ... ... 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B 1 =3 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated 1/18

  3. online resource allocation: basic model ... ... 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B 2 =3 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated • principle makes irrevocable decisions 1/18

  4. online resource allocation: basic model ... ... 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B 3 =2 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated • principle makes irrevocable decisions; resource is non-replenishable 1/18

  5. ... ... online resource allocation: basic model 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B t =1 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated • principle makes irrevocable decisions; resource is non-replenishable • assumptions on agent types { θ t } i =1 (e.g. θ ( t ) = v i with prob p i i.i.d.) • finite set of values { v i } n • in general: arrivals can be time varying, correlated 1/18

  6. ... ... online resource allocation: basic model 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) B t =1 • single resource, initial capacity B ; T agents arrive sequentially • agent t has type θ ( t ) = reward earned if agent is allocated • principle makes irrevocable decisions; resource is non-replenishable • assumptions on agent types { θ t } i =1 (e.g. θ ( t ) = v i with prob p i i.i.d.) • finite set of values { v i } n • in general: arrivals can be time varying, correlated online resource allocation problem allocate resources to maximize sum of rewards 1/18

  7. online resource allocation: first generalization ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (A i ,v i ) w.p. p i • d resources, initial capacities ( B 1 , B 2 , . . . , B d ) • T agents; each has type θ i = ( A i , v i ) • A i ∈ { 0 , 1 } d : resource requirement, v i : value • agent has type θ i with prob p i also known as: network revenue management; single-minded buyer 2/18

  8. online resource allocation: second generalization ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (v i1 ,v i2 ) w.p. p i • d resources, initial capacities ( B 1 , B 2 , . . . , B d ) • T agents arrive sequentially • each has type θ = ( v i 1 , v i 2 , . . . , v id ), wants single resource also known as: online weighted matching; unit-demand buyer 3/18

  9. online allocation across fields • related problems studied in Markov decision processes, online algorithms, prophet inequalities, revenue management, etc. • informational variants: distributional knowledge ≺ bandit settings ≺ adversarial inputs 4/18

  10. the technological zeitgeist the ‘deep’ learning revolution vast improvements in machine learning for data-driven prediction 5/18

  11. axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction • axiom: have access to black-box predictive algorithms 6/18

  12. axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction • axiom: have access to black-box predictive algorithms core question of this talk how does having such an oracle affect online resource allocation? • TL;DR - new online allocation policies with strong regret bounds • re-examining old questions leads to surprising new insights 6/18

  13. bridging online allocation and predictive models The Bayesian Prophet: A Low-Regret Framework for Online Decision Making Alberto Vera & S.B. (2018) https://ssrn.com/abstract_id=3158062 7/18

  14. focus of talk: allocation with single-minded agents ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (A i ,v i ) w.p. p i • d resources, initial capacities ( B 1 , B 2 , . . . , B d ) • T agents arrive sequentially; each has type θ = ( A , v ) • A = resource requirement, v = value • agent has type θ i with prob p i , i.i.d. online allocation problem allocate resources to maximize sum of rewards 8/18

  15. performance measure ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (A i ,v i ) w.p. p i optimal policy can be computed via dynamic programming – requires exact distributional knowledge – ‘curse of dimensionality’: | state-space | = T × B 1 × . . . × B d – does not quantify cost of uncertainty 9/18

  16. performance measure ... ... 𝜄 (1) 𝜄 (1) 𝜄 (2) 𝜄 (3) 𝜄 (t) 𝜄 (T) 𝜄 ~ (A i ,v i ) w.p. p i optimal policy can be computed via dynamic programming – requires exact distributional knowledge – ‘curse of dimensionality’: | state-space | = T × B 1 × . . . × B d – does not quantify cost of uncertainty ‘prophet’ benchmark V off : OFFLINE optimal policy; has full knowledge of { θ 1 , θ 2 , . . . , θ T } 9/18

  17. performance measure: regret prophet benchmark: V off • OFFLINE knows entire type sequence { θ t | t = 1 . . . T } • for the network revenue management setting, V off given by n � max . x i v i i =1 n � s . t . A i x i ≤ B i =1 0 ≤ x i ≤ N i [1 : T ] – N i [1 : T ] ∼ # of arrivals of type θ i = ( A i , v i ) over { 1 , 2 , . . . , T } regret E [ Regret ] = E [ V off − V alg ] 10/18

  18. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) 11/18

  19. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) V off [ t , T ] decreases if OFFLINE accepts t th arrival � � • let π t = P 11/18

  20. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) V off [ t , T ] decreases if OFFLINE accepts t th arrival � � • let π t = P Bayes selector accept t th arrival iff π t > 0 . 5 11/18

  21. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) V off [ t , T ] decreases if OFFLINE accepts t th arrival � � • let π t = P Bayes selector accept t th arrival iff π t > 0 . 5 theorem [Vera & B, 2018] (under mild tail bounds on N i [ t : T ]) Bayes selector has E [ Regret ] independent of T , B 1 , B 2 , . . . , B d 11/18

  22. online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t , B , have statistical info about V off [ t , T ]) V off [ t , T ] decreases if OFFLINE accepts t th arrival � � • let π t = P Bayes selector accept t th arrival iff π t > 0 . 5 theorem [Vera & B, 2018] (under mild tail bounds on N i [ t : T ]) Bayes selector has E [ Regret ] independent of T , B 1 , B 2 , . . . , B d • arrivals can be time-varying, correlated; discounted rewards • works for general settings (single-minded, unit-demand, etc.) • can use approx oracle (e.g., from samples) 11/18

  23. standard approach: randomized admission control (RAC) offline optimum V off n � max . x i v i i =1 n � A i x i ≤ B s . t . i =1 0 ≤ x i ≤ N i [1 : T ] 12/18

  24. standard approach: randomized admission control (RAC) offline optimum V off (upfront) fluid LP V fl n n � � max . x i v i max . x i v i i =1 i =1 n n � � A i x i ≤ B s . t . s . t . A i x i ≤ B i =1 i =1 0 ≤ x i ≤ N i [1 : T ] 0 ≤ x i ≤ E [ N i [1 : T ]] = Tp i – E [ V off ] ≤ V fl (via Jensen’s, concavity of V off w.r.t. N i ) x i – fluid RAC: accept type θ i with prob Tp i 12/18

  25. standard approach: randomized admission control (RAC) offline optimum V off (upfront) fluid LP V fl n n � � max . x i v i max . x i v i i =1 i =1 n n � � A i x i ≤ B s . t . s . t . A i x i ≤ B i =1 i =1 0 ≤ x i ≤ N i [1 : T ] 0 ≤ x i ≤ E [ N i [1 : T ]] = Tp i – E [ V off ] ≤ V fl (via Jensen’s, concavity of V off w.r.t. N i ) x i – fluid RAC: accept type θ i with prob Tp i proposition √ fluid RAC has E [ Regret ] = Θ( T ) – [Gallego & van Ryzin’97], [Maglaras & Meissner’06] – N.B. this is a static policy! 12/18

  26. RAC with re-solving offline optimum V off re-solved fluid LP V fl ( t ): n n � � max . x i v i x i [ t ] v i max . i =1 i =1 n n � � A i x i ≤ B s . t . s . t . A i x i [ t ] ≤ B [ t ] i =1 i =1 0 ≤ x i ≤ N i 0 ≤ x i [ t ] ≤ E [ N i [ t : T ]] = ( T − t ) p i x i [ t ] AC with re-solving: at time t , accept type θ i with prob ( T − t ) p i 13/18

Recommend


More recommend