optimal large scale internet media selection
play

Optimal Large-Scale Internet Media Selection Gareth James - PowerPoint PPT Presentation

Introduction Solving the Criterion Empirical Results Conclusion Optimal Large-Scale Internet Media Selection Gareth James Department of Data Sciences and Operations Marshall School of Business University of Southern California October 21st,


  1. Introduction Solving the Criterion Empirical Results Conclusion Optimal Large-Scale Internet Media Selection Gareth James Department of Data Sciences and Operations Marshall School of Business University of Southern California October 21st, 2016 Joint with Paat Rusmevichientong, Lan Luo and Courtney Paulson

  2. Introduction Solving the Criterion Empirical Results Conclusion A General Constrained Optimization Consider the following constrained, penalized optimization problem: arg min β g ( β ) + λ � β � 1 subject to C β = b , (1) where g is a convex function, β ∈ R p , C ∈ R m × p and b ∈ R m are predefined matrices and vectors. There turn out to be many important problems that can be formulated in this fashion so we are interested in algorithms for solving (1) and the statistical properties of the resulting coefficient estimates.

  3. Introduction Solving the Criterion Empirical Results Conclusion Maximizing Reach Or Click Through Rate The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad.

  4. Introduction Solving the Criterion Empirical Results Conclusion Maximizing Reach Or Click Through Rate The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad. We have p websites and an advertising budget of $ B . Let β j be the allocation to the j th website and g ( β ) represent either the estimated non-reach (or non-CTR) for a given budget allocation.

  5. Introduction Solving the Criterion Empirical Results Conclusion Maximizing Reach Or Click Through Rate The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad. We have p websites and an advertising budget of $ B . Let β j be the allocation to the j th website and g ( β ) represent either the estimated non-reach (or non-CTR) for a given budget allocation. Hence, we wish to minimize g ( β ) such that � j β j ≤ B and β j ≥ 0. Or equivalently minimize g ( β ) + λ � β � 1 .

  6. Introduction Solving the Criterion Empirical Results Conclusion Maximizing Reach Or Click Through Rate The reach of an advertising campaign is defined as the probability a random customer views our ad at least once during the campaign, while click through rate (CTR) is the probability a random customer clicks the ad. We have p websites and an advertising budget of $ B . Let β j be the allocation to the j th website and g ( β ) represent either the estimated non-reach (or non-CTR) for a given budget allocation. Hence, we wish to minimize g ( β ) such that � j β j ≤ B and β j ≥ 0. Or equivalently minimize g ( β ) + λ � β � 1 . However, in many campaigns we also wish to place restrictions on subsets of websites e.g. a cruise operator may wish to spend 30% of their budget on travel websites. This imposes a natural constraint of the form C β = b .

  7. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion A Quadratic Approximation We can approximate g by g ( β ) ≈ 1 2 � Y − X β � 2 2 + K where X = H 1 / 2 , Y = H − 1 / 2 ( J − H ˜ β ), H is the Hessian and J is the Jacobian (both evaluated at ˜ β ). Hence, (1) can be approximated using the constrained lasso criterion. Minimize, 1 2 � Y − X β � 2 arg min 2 + λ � β � 1 subject to C β = b , (2) β

  8. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Intuition for Equality Constraints Suppose that we are given an index set, A , of size m .

  9. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Intuition for Equality Constraints Suppose that we are given an index set, A , of size m . Then we can partition β = ( β A , β ¯ A ) , C = ( C A , C ¯ A ) . Hence, C A β A + C ¯ A β ¯ A = b

  10. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Intuition for Equality Constraints Suppose that we are given an index set, A , of size m . Then we can partition β = ( β A , β ¯ A ) , C = ( C A , C ¯ A ) . Hence, C A β A + C ¯ A β ¯ A = b Or β A = C − 1 A ( b − C ¯ A ) . A β ¯

  11. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Removing the Constraint Hence, all we need to do is compute 1 2 � Y ∗ − X ∗ θ � 2 2 + λ � θ � 1 + λ � C − 1 β ¯ A = arg min A ( b − C ¯ A θ ) � 1 , θ (3) and set β A = C − 1 A ( b − C ¯ A β ¯ A ) .

  12. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Removing the Constraint Hence, all we need to do is compute 1 2 � Y ∗ − X ∗ θ � 2 2 + λ � θ � 1 + λ � C − 1 β ¯ A = arg min A ( b − C ¯ A θ ) � 1 , θ (3) and set β A = C − 1 A ( b − C ¯ A β ¯ A ) . The difficulty in computing (3) lies in the non-differentiability and non-separable nature of the second ℓ 1 penalty.

  13. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Intuition However, if we choose an m -vector, s , such that s = sign ( β A ) , then for θ close enough to β ¯ A � C − 1 A θ ) � 1 = s T C − 1 A ( b − C ¯ A ( b − C ¯ A θ ) and we can replace the ℓ 1 penalty by a differentiable term which no longer needs to be separable.

  14. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Intuition However, if we choose an m -vector, s , such that s = sign ( β A ) , then for θ close enough to β ¯ A � C − 1 A θ ) � 1 = s T C − 1 A ( b − C ¯ A ( b − C ¯ A θ ) and we can replace the ℓ 1 penalty by a differentiable term which no longer needs to be separable. Now our optimization becomes 1 2 � Y ∗ − X ∗ θ � 2 2 + λ s T C − 1 β ¯ = arg min A ( b − C ¯ A θ ) + λ � θ � 1 A θ 1 2 � ˜ Y − ˜ X θ � 2 = arg min 2 + λ � θ � 1 . θ

  15. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Toy Example 0 𝜇 1 𝜇 2 𝜇

  16. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Select Index Set A for m = 2 Two largest coefficients for m=2 0 𝜇 1 𝜇 2 𝜇

  17. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Check Coefficients in A Maintained Same Sign Coefficients have not crossed zero so solution is correct 0 𝜇 2 𝜇 1 𝜇

  18. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Not Every Index Set Will Work Coefficients have not crossed zero so solution is correct 0 𝜇 2 𝜇 1 This coefficient crossed zero 𝜇 so would have caused a problem to use.

  19. Introduction Solving the Criterion Developing an Algorithm Empirical Results Conclusion Select New Index Set for Next Step New index set. 0 𝜇 2 𝜇 1 𝜇

  20. Introduction Solving the Criterion Website Data Empirical Results Conclusion Click Through Rate 0.08 0.15 0.06 0.10 CTR (Travel Subset) CTR (Full) 0.04 0.05 0.02 0.00 0.00 0 2 4 6 8 10 0 2 4 6 8 10 Budget (in millions) Budget (in millions)

  21. Introduction Solving the Criterion Empirical Results Conclusion Summary A large number of real world problems are special cases of this constrained and penalized framework. A simple algorithm, using standard lasso fitting methods, can be used to efficiently compute the solution to our optimization problem. Theoretical bounds on the coefficients can be extended from the lasso and suggest better performance. Simulation results show practical improvement, computational efficiency and relative insensitivity to the constraints. Provides a highly efficient and practical approach to select optimal allocations of advertising budget in situations involving thousands of websites.

Recommend


More recommend