the kikuchi hierarchy and tensor pca
play

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, - PowerPoint PPT Presentation

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, NYU Joint work with: Ahmed El Alaoui Cris Moore Stanford Santa Fe Institute 1 / 19 Statistical Physics of Inference High-dimensional inference problems: compressed


  1. Algorithms for Tensor PCA Local algorithms : keep track of a “guess” v ∈ R n and locally maximize the log-likelihood L ( v ) = � Y , v ⊗ p � ◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] 6 / 19

  2. Algorithms for Tensor PCA Local algorithms : keep track of a “guess” v ∈ R n and locally maximize the log-likelihood L ( v ) = � Y , v ⊗ p � ◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] ◮ Approximate message passing (AMP) [Richard-Montanari ’14] 6 / 19

  3. Algorithms for Tensor PCA Local algorithms : keep track of a “guess” v ∈ R n and locally maximize the log-likelihood L ( v ) = � Y , v ⊗ p � ◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] ◮ Approximate message passing (AMP) [Richard-Montanari ’14] These only succeed when λ ≫ n − 1 / 2 ◮ Recall: MLE works for λ ∼ n (1 − p ) / 2 6 / 19

  4. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : 7 / 19

  5. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] 7 / 19

  6. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] 7 / 19

  7. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] 7 / 19

  8. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] These are poly-time and succeed when λ ≫ n − p / 4 7 / 19

  9. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] These are poly-time and succeed when λ ≫ n − p / 4 SoS lower bounds suggest no poly-time algorithm when λ ≪ n − p / 4 [Hopkins-Shi-Steurer ’15, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17] 7 / 19

  10. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] These are poly-time and succeed when λ ≫ n − p / 4 SoS lower bounds suggest no poly-time algorithm when λ ≪ n − p / 4 [Hopkins-Shi-Steurer ’15, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17] n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS !!! Local Local algorithms (gradient descent, AMP, ...) are suboptimal when p ≥ 3 7 / 19

  11. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) 8 / 19

  12. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] 8 / 19

  13. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ 8 / 19

  14. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS Local 8 / 19

  15. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS Local In contrast, some problems have a sharp threshold ◮ E.g., λ > 1 is nearly-linear time; λ < 1 needs time 2 n 8 / 19

  16. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS Local In contrast, some problems have a sharp threshold ◮ E.g., λ > 1 is nearly-linear time; λ < 1 needs time 2 n For “soft” thresholds (like tensor PCA): BP/AMP can’t be optimal 8 / 19

  17. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) 9 / 19

  18. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio 9 / 19

  19. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems 9 / 19

  20. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] 9 / 19

  21. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) 9 / 19

  22. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) � O (1) E Y ∼ P [ f ( Y )] ⇒ “hard” ? max = ω (1) ⇒ “easy” � E Y ∼ Q [ f ( Y ) 2 ] f degree ≤ D 9 / 19

  23. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) � O (1) E Y ∼ P [ f ( Y )] ⇒ “hard” ? max = ω (1) ⇒ “easy” � E Y ∼ Q [ f ( Y ) 2 ] f degree ≤ D ◮ Take deg- D polynomials as a proxy for n ˜ Θ( D ) -time algorithms 9 / 19

  24. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) � O (1) E Y ∼ P [ f ( Y )] ⇒ “hard” ? max = ω (1) ⇒ “easy” � E Y ∼ Q [ f ( Y ) 2 ] f degree ≤ D ◮ Take deg- D polynomials as a proxy for n ˜ Θ( D ) -time algorithms For more, see the survey Kunisky-W.-Bandeira, “Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio”, arXiv:1907.11636 9 / 19

  25. Our Contributions 10 / 19

  26. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy 10 / 19

  27. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff 10 / 19

  28. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design 10 / 19

  29. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design ◮ Our algorithms and analysis are simpler than prior work 10 / 19

  30. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design ◮ Our algorithms and analysis are simpler than prior work ◮ This talk: even-order tensors only 10 / 19

  31. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design ◮ Our algorithms and analysis are simpler than prior work ◮ This talk: even-order tensors only ◮ Similar results for refuting random XOR formulas 10 / 19

  32. Motivating the Algorithm: Belief Propagation / AMP 11 / 19

  33. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y 11 / 19

  34. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] 11 / 19

  35. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] 11 / 19

  36. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] Problem: need exponentially-many parameters to describe µ 11 / 19

  37. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] Problem: need exponentially-many parameters to describe µ BP/AMP: just keep track of marginals m i = E [ x i ] and minimize a proxy, Bethe free energy B ( m ) 11 / 19

  38. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] Problem: need exponentially-many parameters to describe µ BP/AMP: just keep track of marginals m i = E [ x i ] and minimize a proxy, Bethe free energy B ( m ) ◮ Locally minimize B ( m ) via iterative update 11 / 19

  39. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) 12 / 19

  40. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: 12 / 19

  41. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) 12 / 19

  42. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] 12 / 19

  43. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy 12 / 19

  44. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy ◮ Gradient descent 12 / 19

  45. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy ◮ Gradient descent ◮ Generalized belief propagation (GBP) [Yedidia-Freeman-Weiss ’03] 12 / 19

  46. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy ◮ Gradient descent ◮ Generalized belief propagation (GBP) [Yedidia-Freeman-Weiss ’03] ◮ We will use a spectral method based on the Kikuchi Hessian 12 / 19

  47. The Kikuchi Hessian 13 / 19

  48. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] 13 / 19

  49. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } 13 / 19

  50. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 13 / 19

  51. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ 13 / 19

  52. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H 13 / 19

  53. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement 13 / 19

  54. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement ◮ Spectral method with performance essentially as good as BP for community detection 13 / 19

  55. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement ◮ Spectral method with performance essentially as good as BP for community detection Our approach: Kikuchi Hessian ◮ Bottom eigenvector of Hessian of K ( m ) with respect to moments m = { m i , m ij , . . . } 13 / 19

  56. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. 14 / 19

  57. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian 14 / 19

  58. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M 14 / 19

  59. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: n O ( ℓ ) 14 / 19

  60. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: n O ( ℓ ) ◮ The case ℓ = p / 2 is “tensor unfolding,” which is poly-time and succeeds up to the SoS threshold 14 / 19

  61. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: n O ( ℓ ) ◮ The case ℓ = p / 2 is “tensor unfolding,” which is poly-time and succeeds up to the SoS threshold ◮ ℓ = n δ gives an algorithm of runtime n O ( n ℓ ) = 2 n δ + o (1) 14 / 19

  62. Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ 15 / 19

  63. Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ Compute top eigenvector via power iteration: v ← Mv ◮ v ∈ R ( n ℓ ) where v S is an estimate of x S := � i ∈ S x i 15 / 19

  64. Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ Compute top eigenvector via power iteration: v ← Mv ◮ v ∈ R ( n ℓ ) where v S is an estimate of x S := � i ∈ S x i Expand formula v ← Mv : � v S ← Y S △ T v T T : | S △ T | = p ◮ Recall: Y S △ T is a noisy measurement of x S △ T ◮ So Y S △ T v T is T ’s opinion about x S 15 / 19

  65. Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ Compute top eigenvector via power iteration: v ← Mv ◮ v ∈ R ( n ℓ ) where v S is an estimate of x S := � i ∈ S x i Expand formula v ← Mv : � v S ← Y S △ T v T T : | S △ T | = p ◮ Recall: Y S △ T is a noisy measurement of x S △ T ◮ So Y S △ T v T is T ’s opinion about x S This is a message-passing algorithm among sets of size ℓ 15 / 19

  66. ✶ Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) 16 / 19

  67. Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue 16 / 19

  68. Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue Key step: bound spectral norm � M � when Y ∼ i.i.d. N (0 , 1) 16 / 19

  69. Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue Key step: bound spectral norm � M � when Y ∼ i.i.d. N (0 , 1) Theorem (Matrix Chernoff Bound [Oliveira ’10, Tropp ’10] ) Let M = � i z i A i where z i ∼ N (0 , 1) independently and { A i } is a finite sequence of fixed symmetric d × d matrices. Then, for all t ≥ 0 , � � σ 2 = � � P ( � M � ≥ t ) ≤ 2 de − t 2 / 2 σ 2 � ( A i ) 2 � . where � � � � � i 16 / 19

  70. Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue Key step: bound spectral norm � M � when Y ∼ i.i.d. N (0 , 1) Theorem (Matrix Chernoff Bound [Oliveira ’10, Tropp ’10] ) Let M = � i z i A i where z i ∼ N (0 , 1) independently and { A i } is a finite sequence of fixed symmetric d × d matrices. Then, for all t ≥ 0 , � � σ 2 = � � P ( � M � ≥ t ) ≤ 2 de − t 2 / 2 σ 2 � ( A i ) 2 � . where � � � � � i i ( A i ) 2 is a multiple of the identity In our case, � 16 / 19

  71. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max 17 / 19

  72. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so 17 / 19

  73. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y 17 / 19

  74. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y ◮ Analysis: trace moment method (complicated) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] 17 / 19

  75. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y ◮ Analysis: trace moment method (complicated) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Our method: instead find M (symm. diff. matrix) such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p �� x � 2 ℓ − p � Y � inj ≤ � M � and so 17 / 19

  76. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y ◮ Analysis: trace moment method (complicated) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Our method: instead find M (symm. diff. matrix) such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p �� x � 2 ℓ − p � Y � inj ≤ � M � and so ◮ Each entry of M is a degree-1 polynomial in Y 17 / 19

Recommend


More recommend