Optimality of Low-Degree Polynomials? Low-degree polynomials seem to be optimal for many problems! For all of these problems... planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, ... ...it is the case that ◮ the best known poly-time algorithms are captured by O (log n )-degree polynomials (spectral/AMP) ◮ low-degree polynomials fail in the “hard” regime “Low-degree conjecture” (informal): for “natural” problems, if low-degree polynomials fail then all poly-time algorithms fail [Hopkins ’18] Caveat: Gaussian elimination for planted XOR-SAT 6 / 23
Overview This talk: techniques to prove that all low-degree polynomials fail 7 / 23
Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness 7 / 23
Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work) [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) 7 / 23
Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work) [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) ◮ Recovery (this work) [Schramm, W. ’20] 7 / 23
Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work) [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) ◮ Recovery (this work) [Schramm, W. ’20] ◮ Optimization [Gamarnik, Jagannath, W. ’20] 7 / 23
Relation to Other Frameworks 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18] 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18] ◮ Average-case reductions [BR13,...] 8 / 23
Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18] ◮ Average-case reductions [BR13,...] ◮ Need to argue that starting problem is hard [BB20] 8 / 23
Part II: Detection 9 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } 10 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q 10 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q 10 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] fluctuations in Q f deg D 10 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] fluctuations in Q f deg D � ω (1) “degree- D polynomial succeed” = O (1) “degree- D polynomials fail” 10 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): 11 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), 11 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n 11 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n 11 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ 11 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ Extended low-degree conjecture [Hopkins ’18] : degree- D polynomials ⇔ n ˜ Θ( D ) -time algorithms D = n δ exp( n δ ± o (1) ) time ⇔ 11 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 12 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) 12 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T Numerator: Y ∼ P [ f ( Y )] E 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] Numerator: Y ∼ P [ f ( Y )] = E f S E | S |≤ D 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] Denominator: E 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ Denominator: E (orthonormality) S | S |≤ D 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � = � c , c � Adv ≤ D = max � ˆ � c � ˆ f � f f ∗ = c Optimizer: ˆ 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � = � c , c � � c � = � c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 12 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � � 2 � ˆ f , c � = � c , c � � � � � c � = � c � = � Y ∼ P [ Y S ] Adv ≤ D = max E � ˆ � ˆ f � f | S |≤ D f ∗ = c Optimizer: ˆ 12 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: 13 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) 13 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace 13 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E 13 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” 13 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E 13 / 23
✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” 13 / 23
Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” Proof: ˆ f ∗ ˆ Y ∼ Q [ L ( Y ) Y S ] = E Y ∼ P [ Y S ] Y ∼ P [ Y S ] ✶ | S |≤ D L S = E S = E 13 / 23
Part III: Recovery 14 / 23
Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) 15 / 23
Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 15 / 23
Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. 15 / 23
Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 15 / 23
Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v 15 / 23
Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v ◮ Leading eigenvector succeeds when λ ≫ ( ρ √ n ) − 1 15 / 23
Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v ◮ Leading eigenvector succeeds when λ ≫ ( ρ √ n ) − 1 ◮ Exhaustive search succeeds when λ ≫ ( ρ n ) − 1 / 2 15 / 23
Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v ◮ Leading eigenvector succeeds when λ ≫ ( ρ √ n ) − 1 ◮ Exhaustive search succeeds when λ ≫ ( ρ n ) − 1 / 2 Detection-recovery gap 15 / 23
Recovery Hardness from Detection Hardness? 16 / 23
Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) 16 / 23
Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) v ⊤ Y ˆ ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v 16 / 23
Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) v ⊤ Y ˆ ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard 16 / 23
Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) v ⊤ Y ˆ ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard But how to show hardness of recovery when detection is easy? 16 / 23
Recommend
More recommend