introduction to the low degree polynomial method alex wein
play

Introduction to the Low-Degree Polynomial Method Alex Wein Courant - PowerPoint PPT Presentation

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York University 1 / 31 Part I: Why Low-Degree Polynomials? 2 / 31 Problems in High-Dimensional Statistics Example: finding a large clique in a random graph 3


  1. The Low-Degree Polynomial Method Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization For all of these problems... planted clique, sparse PCA, community detection, tensor PCA, planted CSPs, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, p-spin optimization, max independent set ...it is the case that ◮ the best known poly-time algorithms are low-degree (spectral/AMP/local) ◮ low-degree polynomials fail in the “hard” regime “Low-degree conjecture” (informal): low-degree polynomials are as powerful as all poly-time algorithms for “natural” high-dimensional problems [Hopkins ’18] 6 / 31

  2. Overview This talk: techniques to prove that all low-degree polynomials fail 7 / 31

  3. Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness 7 / 31

  4. Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) ◮ Recovery [Schramm, W. ’20] ◮ Optimization [Gamarnik, Jagannath, W. ’20] 7 / 31

  5. Part II: Detection 8 / 31

  6. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } 9 / 31

  7. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q 9 / 31

  8. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q 9 / 31

  9. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max fluctuations in Q � E Y ∼ Q [ f ( Y ) 2 ] f deg D 9 / 31

  10. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max fluctuations in Q � E Y ∼ Q [ f ( Y ) 2 ] f deg D � ω (1) “degree- D polynomial succeed” = O (1) “degree- D polynomials fail” 9 / 31

  11. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): 10 / 31

  12. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), 10 / 31

  13. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n 10 / 31

  14. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n 10 / 31

  15. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ 10 / 31

  16. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ Extended low-degree conjecture [Hopkins ’18] : degree- D polynomials ⇔ n ˜ Θ( D ) -time algorithms D = n δ exp( n δ ± o (1) ) time ⇔ 10 / 31

  17. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 11 / 31

  18. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) 11 / 31

  19. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i 11 / 31

  20. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T 11 / 31

  21. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T Numerator: Y ∼ P [ f ( Y )] E 11 / 31

  22. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] Numerator: Y ∼ P [ f ( Y )] = E f S E | S |≤ D 11 / 31

  23. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D 11 / 31

  24. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] E Denominator: 11 / 31

  25. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ E Denominator: (orthonormality) S | S |≤ D 11 / 31

  26. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D 11 / 31

  27. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f 11 / 31

  28. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 11 / 31

  29. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � = � c , c � Adv ≤ D = max � ˆ � c � ˆ f � f f ∗ = c Optimizer: ˆ 11 / 31

  30. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � = � c , c � � c � = � c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 11 / 31

  31. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � � 2 � ˆ f , c � = � c , c � � � � � c � = � c � = � Y ∼ P [ Y S ] Adv ≤ D = max E � ˆ � ˆ f � f | S |≤ D f ∗ = c Optimizer: ˆ 11 / 31

  32. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: 12 / 31

  33. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) 12 / 31

  34. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace 12 / 31

  35. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] 12 / 31

  36. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” 12 / 31

  37. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E 12 / 31

  38. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” 12 / 31

  39. Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” Proof: ˆ Y ∼ Q [ L ( Y ) Y S ] = E Y ∼ P [ Y S ] f ∗ ˆ Y ∼ P [ Y S ] ✶ | S |≤ D S = E L S = E 12 / 31

  40. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: 13 / 31

  41. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs 13 / 31

  42. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D = d ! E d =0 13 / 31

  43. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D = d ! E d =0 ◮ Rademacher model Y ∈ {± 1 } m : P : E [ Y | X ] = X Q : E [ Y ] = 0 vs 13 / 31

  44. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D = d ! E d =0 ◮ Rademacher model Y ∈ {± 1 } m : P : E [ Y | X ] = X Q : E [ Y ] = 0 vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D ≤ d ! E d =0 13 / 31

  45. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): 14 / 31

  46. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 14 / 31

  47. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D ◮ Need to know orthogonal polynomials w.r.t. Q ◮ Possible when Q has independent coordinates 14 / 31

  48. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D ◮ Need to know orthogonal polynomials w.r.t. Q ◮ Possible when Q has independent coordinates ◮ To predict computational complexity: for D ≈ log n , � ω (1) ⇒ “easy” Adv ≤ D = ⇒ “hard” O (1) 14 / 31

  49. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D ◮ Need to know orthogonal polynomials w.r.t. Q ◮ Possible when Q has independent coordinates ◮ To predict computational complexity: for D ≈ log n , � ω (1) ⇒ “easy” Adv ≤ D = ⇒ “hard” O (1) ◮ These predictions are “correct” for: planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, ... [BHKKMP16,HS17,HKPRSS17,Hop18,BK W 19,K W B19,DK W B19] 14 / 31

  50. Part III: Recovery 15 / 31

  51. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) 16 / 31

  52. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v 16 / 31

  53. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) 16 / 31

  54. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v 16 / 31

  55. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard 16 / 31

  56. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard But planted submatrix has a detection-recovery gap 16 / 31

  57. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard But planted submatrix has a detection-recovery gap How to show hardness of recovery when detection is easy? 16 / 31

  58. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) 17 / 31

  59. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Goal: given Y , estimate v 1 via polynomial f : R n × n → R 17 / 31

  60. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Goal: given Y , estimate v 1 via polynomial f : R n × n → R Low-degree minimum mean squared error: f deg D E ( f ( Y ) − v 1 ) 2 MMSE ≤ D = min 17 / 31

  61. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Goal: given Y , estimate v 1 via polynomial f : R n × n → R Low-degree minimum mean squared error: f deg D E ( f ( Y ) − v 1 ) 2 MMSE ≤ D = min Equivalent to low-degree maximum correlation: E [ f ( Y ) · v 1 ] Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D 1 ] − Corr 2 Fact: MMSE ≤ D = E [ v 2 ≤ D 17 / 31

  62. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D 18 / 31

  63. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? 18 / 31

  64. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D 18 / 31

  65. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D Numerator: E [ f ( Y ) · v 1 ] 18 / 31

  66. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] ˆ � Numerator: E [ f ( Y ) · v 1 ] = | S |≤ D 18 / 31

  67. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D 18 / 31

  68. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D Denominator: E [ f ( Y ) 2 ] 18 / 31

  69. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D f T E [ Y S · Y T ] � f S ˆ ˆ Denominator: E [ f ( Y ) 2 ] = S , T 18 / 31

  70. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D f T E [ Y S · Y T ] = ˆ � f S ˆ ˆ f ⊤ M ˆ Denominator: E [ f ( Y ) 2 ] = f S , T 18 / 31

Recommend


More recommend