on coresets for regularized regression
play

On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , - PowerPoint PPT Presentation

On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , Anirban Dasgupta and Supratim Shit IIT Gandhinagar June 15, 2020 Motivation Coresets : Small summary of data for some cost function as proxy for original data Motivation


  1. On Coresets For Regularized Regression ICML 2020 Rachit Chhaya , Anirban Dasgupta and Supratim Shit IIT Gandhinagar June 15, 2020

  2. Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data

  3. Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data ◮ Coresets for ridge regression (smaller) shown by [ACW17].

  4. Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data ◮ Coresets for ridge regression (smaller) shown by [ACW17]. ◮ No study of coresets for regularized regression for general p -norm.

  5. Motivation ◮ Coresets : Small summary of data for some cost function as proxy for original data ◮ Coresets for ridge regression (smaller) shown by [ACW17]. ◮ No study of coresets for regularized regression for general p -norm.

  6. Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p

  7. Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p ◮ Implies no coreset for Lasso smaller in size than that of least squares regression ◮ Introducing modified lasso and building smaller coreset for it.

  8. Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p ◮ Implies no coreset for Lasso smaller in size than that of least squares regression ◮ Introducing modified lasso and building smaller coreset for it. ◮ Coresets for ℓ p -regression with ℓ p regularization. Extension to multiple response regression

  9. Our Contributions ◮ No coreset for min x ∈ R d � Ax − b � r p + λ � x � s q , where r � = s smaller in size than that for min x ∈ R d � Ax − b � r p ◮ Implies no coreset for Lasso smaller in size than that of least squares regression ◮ Introducing modified lasso and building smaller coreset for it. ◮ Coresets for ℓ p -regression with ℓ p regularization. Extension to multiple response regression ◮ Empirical Evaluations

  10. Coresets Definition For ǫ > 0, a dataset A , a non-negative function f and a query space Q , C is an ǫ -coreset of A if ∀ q ∈ Q � � � � � f q ( A ) − f q ( C ) � ≤ ǫ f q ( A ) � � We construct coresets which are subsamples (rescaled) from the original data

  11. Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a func- tion f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q �

  12. Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a func- tion f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q � ◮ Determines highest fractional contribution of point to the cost function

  13. Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a func- tion f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q � ◮ Determines highest fractional contribution of point to the cost function ◮ Can be used to create coresets. Coreset size is function of sum of sensitivities and dimension of query space

  14. Sensitivity [LS10] Definition The sensitivity of the i th point of some dataset X for a func- tion f and query space Q is defined as f q ( x i ) x ′∈ X f q ( x ′ ) . s i = sup q ∈ Q � ◮ Determines highest fractional contribution of point to the cost function ◮ Can be used to create coresets. Coreset size is function of sum of sensitivities and dimension of query space ◮ Upper bounds to sensitivities are enough [FL11, BFL16]

  15. Coresets for Regularized Regression ◮ Regularization is important to prevent overfitting, numerical stability, induce sparsity etc.

  16. Coresets for Regularized Regression ◮ Regularization is important to prevent overfitting, numerical stability, induce sparsity etc. We are interested in the following problem : For λ > 0 x ∈ R d � Ax − b � r p + λ � x � s min q for p , q ≥ 1 and r , s > 0.

  17. Coresets for Regularized Regression ◮ Regularization is important to prevent overfitting, numerical stability, induce sparsity etc. We are interested in the following problem : For λ > 0 x ∈ R d � Ax − b � r p + λ � x � s min q for p , q ≥ 1 and r , s > 0. b ) such that ∀ x ∈ R d and ∀ λ > 0, A coreset for this problem is (˜ A , ˜ � ~ Ax − ~ b � r p + λ � x � s q ∈ ( 1 ± ǫ )( � Ax − b � r p + λ � x � s q )

  18. Main Question ◮ Coresets for unregularized regression work for regularized counterpart

  19. Main Question ◮ Coresets for unregularized regression work for regularized counterpart ◮ [ACW17] showed coreset for ridge regression using ridge leverage scores. Coreset smaller than coresets for least squares regression

  20. Main Question ◮ Coresets for unregularized regression work for regularized counterpart ◮ [ACW17] showed coreset for ridge regression using ridge leverage scores. Coreset smaller than coresets for least squares regression ◮ Intuition : Regularization imposes a constraint on the solution space.

  21. Main Question ◮ Coresets for unregularized regression work for regularized counterpart ◮ [ACW17] showed coreset for ridge regression using ridge leverage scores. Coreset smaller than coresets for least squares regression ◮ Intuition : Regularization imposes a constraint on the solution space. ◮ Can we expect all regularized problems to have a smaller size coresets, than the unregularized version? For e.g. for Lasso

  22. Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p .

  23. Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p . Implication : Smaller coresets for regularized problem are not obtained when r � = s

  24. Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p . Implication : Smaller coresets for regularized problem are not obtained when r � = s The popular Lasso problem falls under this category and hence does not have a coreset smaller than one for least square regression.

  25. Our Main Result Theorem Given a matrix A ∈ R n × d and λ > 0, any coreset for the problem � Ax � r p + λ � x � s q , where r � = s , p , q ≥ 1 and r , s > 0, is also a coreset for � Ax � r p . Implication : Smaller coresets for regularized problem are not obtained when r � = s The popular Lasso problem falls under this category and hence does not have a coreset smaller than one for least square regression. Proof by Contradiction

  26. Modified Lasso x ∈ R d || Ax − b || 2 2 + λ || x || 2 min 1 ◮ Constrained version same as lasso

  27. Modified Lasso x ∈ R d || Ax − b || 2 2 + λ || x || 2 min 1 ◮ Constrained version same as lasso ◮ Empirically shown to induce sparsity like lasso

  28. Modified Lasso x ∈ R d || Ax − b || 2 2 + λ || x || 2 min 1 ◮ Constrained version same as lasso ◮ Empirically shown to induce sparsity like lasso ◮ Allows smaller coreset than least squares regression

  29. Coreset for Modified Lasso Theorem Given a matrix A ∈ R n × d , corresponding vector b ∈ R n , any coreset for the function � Ax − b � p p + λ � x � p p is also a coreset of the function � Ax − b � p p + λ � x � p q where q ≤ p , p , q ≥ 1 .

  30. Coreset for Modified Lasso Theorem Given a matrix A ∈ R n × d , corresponding vector b ∈ R n , any coreset for the function � Ax − b � p p + λ � x � p p is also a coreset of the function � Ax − b � p p + λ � x � p q where q ≤ p , p , q ≥ 1 . ◮ Implication: Coresets for ridge regression also work for modified lasso

  31. Coreset for Modified Lasso Theorem Given a matrix A ∈ R n × d , corresponding vector b ∈ R n , any coreset for the function � Ax − b � p p + λ � x � p p is also a coreset of the function � Ax − b � p p + λ � x � p q where q ≤ p , p , q ≥ 1 . ◮ Implication: Coresets for ridge regression also work for modified lasso ◮ Coreset of size O ( sd λ ( A ) log sd λ ( A ) ) with a high probability for ǫ 2 modified lasso 1 ◮ sd λ ( A ) = � ≤ d j ∈ [ d ] 1 + λ σ 2 j

  32. Coresets for ℓ p Regression with ℓ p Regularization The ℓ p Regression with ℓ p Regularization is given as x ∈ R d � Ax − b � p p + λ � x � p min p Coresets for ℓ p regression constructed using the well conditioned basis Well conditioned Basis [DDH + 09] A matrix U is called an ( α, β, p ) well-conditioned basis for A if � U � p ≤ α and ∀ x ∈ R d , � x � q ≤ β � Ux � p where 1 p + 1 q = 1.

Recommend


More recommend