minimax rates for memory constrained sparse linear
play

Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob - PowerPoint PPT Presentation

Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob Steinhardt John Duchi Stanford University { jsteinha,jduchi } @stanford.edu July 6, 2015 J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6,


  1. Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob Steinhardt John Duchi Stanford University { jsteinha,jduchi } @stanford.edu July 6, 2015 J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 1 / 11

  2. Resource-Constrained Learning How do we solve statistical problems with limited resources? J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 2 / 11

  3. Resource-Constrained Learning How do we solve statistical problems with limited resources? computation (Natarajan, 1995; Berthet & Rigollet, 2013; Zhang et al., 2014; Foster et al., 2015) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 2 / 11

  4. Resource-Constrained Learning How do we solve statistical problems with limited resources? computation (Natarajan, 1995; Berthet & Rigollet, 2013; Zhang et al., 2014; Foster et al., 2015) privacy (Kasiviswanathan et al., 2011; Duchi et al., 2013) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 2 / 11

  5. Resource-Constrained Learning How do we solve statistical problems with limited resources? computation (Natarajan, 1995; Berthet & Rigollet, 2013; Zhang et al., 2014; Foster et al., 2015) privacy (Kasiviswanathan et al., 2011; Duchi et al., 2013) communication / memory (Zhang et al., 2013; Shamir, 2014; Garg et al., 2014; Braverman et al., 2015) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 2 / 11

  6. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  7. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  8. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  9. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  10. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) Y ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  11. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) Y ( 1 ) Z ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  12. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) X ( 2 ) Y ( 1 ) Z ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  13. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) X ( 2 ) Y ( 1 ) Y ( 2 ) Z ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  14. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) X ( 2 ) Y ( 1 ) Y ( 2 ) b Z ( 1 ) Z ( 2 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  15. Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) X ( 2 ) X ( 3 ) Y ( 1 ) Y ( 2 ) Y ( 3 ) b b ... Z ( 1 ) Z ( 2 ) Z ( 3 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

  16. Motivating Question If we have enough memory to represent the answer, can we also efficiently learn the answer? J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 4 / 11

  17. Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

  18. Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

  19. Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) Achievable with ˜ O ( d ) memory (Agarwal et al., 2012; S., Wager, & Liang, 2015). J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

  20. Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) With memory constraints b : Theorem (S. & Duchi, 2015) k d b � n � k d ε ε 2 b J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

  21. Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) With memory constraints b : Theorem (S. & Duchi, 2015) k d b � n � k d ε ε 2 b Exponential increase if b ≪ d ! J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

  22. Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) With memory constraints b : Theorem (S. & Duchi, 2015) k d b � n � k d ε ε 2 b [Note: up to log factors; assumes k log ( d ) ≪ b ≤ d ] J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

  23. Proof Overview Lower bound: J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

  24. Proof Overview Lower bound: information-theoretic J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

  25. Proof Overview Lower bound: information-theoretic strong data-processing inequality d W ∗ X , Y Z 1 J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

  26. Proof Overview Lower bound: information-theoretic strong data-processing inequality ✁ db W ∗ X , Y Z ✁ 1 b d J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

  27. Proof Overview Lower bound: information-theoretic strong data-processing inequality ✁ db W ∗ X , Y Z ✁ 1 b d main challenge: dependence between X , Y J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

  28. Proof Overview Lower bound: information-theoretic strong data-processing inequality ✁ db W ∗ X , Y Z ✁ 1 b d main challenge: dependence between X , Y Upper bound: J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

  29. Proof Overview Lower bound: information-theoretic strong data-processing inequality ✁ db W ∗ X , Y Z ✁ 1 b d main challenge: dependence between X , Y Upper bound: count-min sketch + ℓ 1 -regularized dual averaging J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

Recommend


More recommend