hardness of certification for constrained pca
play

Hardness of Certification for Constrained PCA Alex Wein Courant - PowerPoint PPT Presentation

Hardness of Certification for Constrained PCA Alex Wein Courant Institute, NYU Joint work with: Afonso Bandeira (NYU) Tim Kunisky (NYU) 1 / 19 Part I: Statistical-to-Computational Gaps and the Low-Degree Method 2 / 19


  1. The Low-Degree Method Suppose we want to hypothesis test (with error probability o (1)) between two distributions: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { k -clique } Look for a degree- D multivariate polynomial f that distinguishes P from Q : E Y ∼ P [ f ( Y )] max � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D 6 / 19

  2. The Low-Degree Method Suppose we want to hypothesis test (with error probability o (1)) between two distributions: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { k -clique } Look for a degree- D multivariate polynomial f that distinguishes P from Q : E Y ∼ P [ f ( Y )] max � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D Want f ( Y ) to be big when Y ∼ P and small when Y ∼ Q 6 / 19

  3. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) 7 / 19

  4. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) E Y ∼ Q [ L ( Y ) f ( Y )] = max � E Y ∼ Q [ f ( Y ) 2 ] L ( Y ) = d P f ∈ R [ Y ] D d Q ( Y ) 7 / 19

  5. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) E Y ∼ Q [ L ( Y ) f ( Y )] = max � E Y ∼ Q [ f ( Y ) 2 ] L ( Y ) = d P f ∈ R [ Y ] D d Q ( Y ) � L , f � = max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � f � f ∈ R [ Y ] D � � f � = � f , f � 7 / 19

  6. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) E Y ∼ Q [ L ( Y ) f ( Y )] = max � E Y ∼ Q [ f ( Y ) 2 ] L ( Y ) = d P f ∈ R [ Y ] D d Q ( Y ) � L , f � = max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � f � f ∈ R [ Y ] D = � L ≤ D � � � f � = � f , f � Maximizer: f = L ≤ D := proj ( R [ Y ] D ) L 7 / 19

  7. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) E Y ∼ Q [ L ( Y ) f ( Y )] = max � E Y ∼ Q [ f ( Y ) 2 ] L ( Y ) = d P f ∈ R [ Y ] D d Q ( Y ) � L , f � = max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � f � f ∈ R [ Y ] D = � L ≤ D � � � f � = � f , f � Maximizer: f = L ≤ D := proj ( R [ Y ] D ) L Norm of low-degree likelihood ratio 7 / 19

  8. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D 8 / 19

  9. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail 8 / 19

  10. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms 8 / 19

  11. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms ◮ Spectral method: distinguish via top eigenvalue of matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y 8 / 19

  12. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms ◮ Spectral method: distinguish via top eigenvalue of matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y ◮ Log-degree distinguisher: f ( Y ) = Tr ( M q ) with q = Θ(log n ) 8 / 19

  13. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms ◮ Spectral method: distinguish via top eigenvalue of matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y ◮ Log-degree distinguisher: f ( Y ) = Tr ( M q ) with q = Θ(log n ) ◮ Spectral methods ⇔ sum-of-squares [HKPRSS ’17] 8 / 19

  14. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms ◮ Spectral method: distinguish via top eigenvalue of matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y ◮ Log-degree distinguisher: f ( Y ) = Tr ( M q ) with q = Θ(log n ) ◮ Spectral methods ⇔ sum-of-squares [HKPRSS ’17] Conjecture (informal variant of [Hopkins ’18] ) For “nice” Q , P , if � L ≤ D � = O (1) for D = log 1+Ω(1) ( n ) then no polynomial-time algorithm can distinguish Q , P with success probability 1 − o (1) . 8 / 19

  15. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems 9 / 19

  16. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) 9 / 19

  17. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) ◮ Planted clique, sparse PCA, stochastic block model, tensor PCA, ... 9 / 19

  18. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) ◮ Planted clique, sparse PCA, stochastic block model, tensor PCA, ... ◮ Heuristically, low-degree prediction matches performance of sum-of-squares 9 / 19

  19. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) ◮ Planted clique, sparse PCA, stochastic block model, tensor PCA, ... ◮ Heuristically, low-degree prediction matches performance of sum-of-squares ◮ But low-degree calculation is much easier than proving SOS lower bounds 9 / 19

  20. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) ◮ Planted clique, sparse PCA, stochastic block model, tensor PCA, ... ◮ Heuristically, low-degree prediction matches performance of sum-of-squares ◮ But low-degree calculation is much easier than proving SOS lower bounds ◮ By varying degree D , can explore power of subexponential-time algorithms: ◮ Degree- n δ polynomials ⇔ Time-2 n δ algorithms δ ∈ (0 , 1) 9 / 19

  21. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) 10 / 19

  22. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) 10 / 19

  23. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Write L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) 10 / 19

  24. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Write L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) � L ≤ D � 2 = � | α |≤ D c 2 α where c α = � L , h α � = E Y ∼ Q [ L ( Y ) h α ( Y )] 10 / 19

  25. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Write L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) � L ≤ D � 2 = � | α |≤ D c 2 α where c α = � L , h α � = E Y ∼ Q [ L ( Y ) h α ( Y )] · · · 10 / 19

  26. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Write L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) � L ≤ D � 2 = � | α |≤ D c 2 α where c α = � L , h α � = E Y ∼ Q [ L ( Y ) h α ( Y )] · · · Result: � L ≤ D � 2 = � D d ! E X , X ′ [ � X , X ′ � d ] 1 d =0 10 / 19

  27. Part II: Hardness of Certification for Constrained PCA Problems 11 / 19

  28. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) 12 / 19

  29. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] 12 / 19

  30. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] � x � =1 x ⊤ Wx PCA: max 12 / 19

  31. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] � x � =1 x ⊤ Wx = λ max ( W ) → 2 PCA: max as n → ∞ 12 / 19

  32. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] � x � =1 x ⊤ Wx = λ max ( W ) → 2 PCA: max as n → ∞ x ∈{± 1 / √ n } n x ⊤ Wx Constrained PCA: φ ( W ) := max 12 / 19

  33. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] � x � =1 x ⊤ Wx = λ max ( W ) → 2 PCA: max as n → ∞ x ∈{± 1 / √ n } n x ⊤ Wx Constrained PCA: φ ( W ) := max Statistical physics: “Sherrington–Kirkpatrick spin glass model” ◮ φ ( W ) → 2P ∗ ≈ 1 . 5264 as n → ∞ [Parisi ’80; Talagrand ’06] 12 / 19

  34. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) 13 / 19

  35. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: 13 / 19

  36. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx 13 / 19

  37. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) 13 / 19

  38. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B 13 / 19

  39. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: 13 / 19

  40. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: (i) φ ( W ) ≤ f n ( W ) ∀ W ∈ R n × n 13 / 19

  41. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: (i) φ ( W ) ≤ f n ( W ) ∀ W ∈ R n × n (ii) if W ∼ GOE ( n ), f n ( W ) ≤ B + o (1) w.p. 1 − o (1) 13 / 19

  42. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: (i) φ ( W ) ≤ f n ( W ) ∀ W ∈ R n × n (ii) if W ∼ GOE ( n ), f n ( W ) ≤ B + o (1) w.p. 1 − o (1) ◮ Note: cannot just output f n ( W ) = 2P ∗ + ε 13 / 19

  43. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: (i) φ ( W ) ≤ f n ( W ) ∀ W ∈ R n × n (ii) if W ∼ GOE ( n ), f n ( W ) ≤ B + o (1) w.p. 1 − o (1) ◮ Note: cannot just output f n ( W ) = 2P ∗ + ε 13 / 19

  44. Search vs Certification: Prior Work 14 / 19

  45. Search vs Certification: Prior Work Perfect search is possible in poly time 14 / 19

  46. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] 14 / 19

  47. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] 14 / 19

  48. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: 14 / 19

  49. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max 14 / 19

  50. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? 14 / 19

  51. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? 14 / 19

  52. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? ◮ Sum-of-squares? 14 / 19

  53. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? ◮ Sum-of-squares? Answer: no! 14 / 19

  54. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? ◮ Sum-of-squares? Answer: no! ◮ In particular, any convex relaxation fails 14 / 19

  55. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? ◮ Sum-of-squares? Answer: no! ◮ In particular, any convex relaxation fails 14 / 19

  56. Main Result 15 / 19

  57. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . 15 / 19

  58. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) 15 / 19

  59. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) ◮ Also for constraint sets other than {± 1 / √ n } n 15 / 19

  60. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) ◮ Also for constraint sets other than {± 1 / √ n } n Proof outline: 15 / 19

  61. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) ◮ Also for constraint sets other than {± 1 / √ n } n Proof outline: (i) Reduction from a hypothesis testing problem (negatively-spiked Wishart) to certification problem 15 / 19

  62. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) ◮ Also for constraint sets other than {± 1 / √ n } n Proof outline: (i) Reduction from a hypothesis testing problem (negatively-spiked Wishart) to certification problem (ii) Use low-degree method to show that the hypothesis testing problem is hard 15 / 19

  63. Spiked Wishart Model 16 / 19

  64. Spiked Wishart Model Q : Observe N independent samples y 1 , . . . , y N where y i ∼ N (0 , I n ) 16 / 19

  65. Spiked Wishart Model Q : Observe N independent samples y 1 , . . . , y N where y i ∼ N (0 , I n ) P : Planted vector x ∼ Unif ( {± 1 / √ n } n ) Observe y 1 , . . . , y N with y i ∼ N (0 , I n + β xx ⊤ ) Parameters: n / N → γ, β ∈ [ − 1 , ∞ ) 16 / 19

  66. Spiked Wishart Model Q : Observe N independent samples y 1 , . . . , y N where y i ∼ N (0 , I n ) P : Planted vector x ∼ Unif ( {± 1 / √ n } n ) Observe y 1 , . . . , y N with y i ∼ N (0 , I n + β xx ⊤ ) Parameters: n / N → γ, β ∈ [ − 1 , ∞ ) Spectral threshold: if β 2 > γ , can distinguish Q , P using top/bottom eigenvalue of sample covariance matrix Y = 1 i y i y ⊤ � [Baik, Ben Arous, P´ ech´ e ’05] N i 16 / 19

  67. Spiked Wishart Model Q : Observe N independent samples y 1 , . . . , y N where y i ∼ N (0 , I n ) P : Planted vector x ∼ Unif ( {± 1 / √ n } n ) Observe y 1 , . . . , y N with y i ∼ N (0 , I n + β xx ⊤ ) Parameters: n / N → γ, β ∈ [ − 1 , ∞ ) Spectral threshold: if β 2 > γ , can distinguish Q , P using top/bottom eigenvalue of sample covariance matrix Y = 1 i y i y ⊤ � [Baik, Ben Arous, P´ ech´ e ’05] N i Using low-degree method, we show: if β 2 < γ , cannot distinguish Q , P (unless given exponential time) 16 / 19

  68. Negatively-Spiked Wishart Model Our case of interest: β = − 1 (technically β > − 1 , β ≈ − 1) 17 / 19

Recommend


More recommend