Tensor estimation with structured priors Clément Luneau, Nicolas Macris June 29, 2020 Laboratoire de Théorie des Communications, EPFL, Switzerland
Statistical model for tensor estimation Noisy observations of a symmetric rank-one tensor i.i.d. 1 √ √ λ λ n X ⊗ 3 + Z ∀ 1 ≤ i ≤ j ≤ k ≤ n : Y ijk = n X i X j X k + Z ijk ⇔ Y = √ √ λ λ n XX T + Z ∀ 1 ≤ i ≤ j ≤ n : Y ij = n X i X j + Z ij ⇔ Y = • n -dimensional spike X ∈ R n ∼ N ( 0 , 1 ) additive white Gaussian noise • Z ij ( k ) • λ > 0 ∝ signal-to-noise ratio Goal: estimate the spike X and/or the underlying rank-one tensor X ⊗ 3
High-dimensional regime for i.i.d. prior i.i.d. • Performance of Approximate Message Passing algorithm precisely tracked 2 1 Lelarge and Miolane, “Fundamental limits of symmetric low-rank matrix estimation”. 2 Lesieur et al., “Statistical and computational phase transitions in spiked tensor estimation”. 2 i.i.d. prior on the spike: X 1 , X 2 , . . . , X n ∼ P X • Precise formula 1 for MMSE := 1 n E ∥ X − E [ X | Y ] ∥ 2 when n → + ∞
Algorithmic gap for low sparsity prior Bernoulli-Rademacher prior 3 P X ( 1 ) = P X ( − 1 ) = ρ/ 2 P X ( 0 ) = 1 − ρ , Algorithmic gap even for matrix estimation if low sparsity ρ (below ρ = 0 . 05)
Structured prior Data in nature has structure 3 Aubin et al., “The spiked matrix model with generative priors”. Proposed by Aubin et al. in the context of matrix estimation i.i.d. • W sensing matrix: W ij i.i.d. 4 manifold • High-dimensional signal efgectively lies on a low-dimensional • Compressed sensing: signal to estimate sparse in some domain Recently 3 use of generative models to encode structure: � ( WS ) i � X i := φ √ p • S p -dimensional latent vector: S 1 , . . . , S p ∼ P S ∼ N ( 0 , 1 ) • φ (nonlinear) activation functions
Matrix estimation with generative priors “No algorithmic gap with generative-model priors” 4 ReLU (right) activations. Figure by Aubin et al. 4 Aubin et al., “The spiked matrix model with generative priors”. 5 High-dimensional limit n → + ∞ with fjxed ratio α := n / p Figure 1: MMSE as a function of ∆ = 1 /λ for linear (left), sign (centre) and
Tensor estimation with generative priors Can we leverage generative priors in tensor estimation to have a fjnite algorithmic gap for a centered prior? In this talk 6 1. Formulas for asymptotic mutual information & MMSE 2. Visualization of MMSE ( X ⊗ 3 ) for difgerent settings 3. Limit α := n / p → 0: simplifjed equivalent model with i.i.d. prior
Asymptotic normalized mutual information Theorem: asymptotic normalized mutual information 5 Z n 7 √ � ( WS ) i � λ ∀ 1 ≤ i ≤ j ≤ k ≤ n : Y ijk = n X i X j X k + Z ijk with ∀ i : X i := φ √ p I ( X ; Y | W ) lim = inf inf sup ψ λ,α ( q x , q s , r s ) n → + ∞ q x ∈ [ 0 ,ρ x ] q s ∈ [ 0 ,ρ s ] r s ≥ 0 n / p → α with potential function � � x / 2 φ ( √ ρ s − q s U + √ q s V ) + � � � � V ψ λ,α ( q x , q s , r s ) := I U ; λ q 2 α I ( S ; √ r s S + Z ) − r s ( ρ s − q s ) + λ + 1 12 ( ρ x − q x ) 2 ( ρ x + 2 q x ) 2 α ∼ N ( 0 , 1 ) and ρ s := E S 2 , ρ x := E φ ( √ ρ s U ) 2 where S ∼ P S , U , V , Z , � Z i.i.d. 5 Luneau and Macris, Tensor estimation with structured priors.
Minimum mean square error Theorem: asymptotic tensor MMSE 6 6 Luneau and Macris, Tensor estimation with structured priors. n 3 8 � Q ∗ q ∗ ψ λ,α ( q ∗ x ( λ ) := x ∈ [ 0 , ρ x ] : inf sup x , q s , r s ) q s ∈ [ 0 ,ρ s ] r s ≥ 0 � = inf inf sup ψ λ,α ( q x , q s , r s ) q x ∈ [ 0 ,ρ x ] q s ∈ [ 0 ,ρ s ] r s ≥ 0 For almost every λ > 0, Q ∗ x ( λ ) = { q ∗ x ( λ ) } is a singleton and � � � X ⊗ 3 − E [ X ⊗ 3 | Y , W ] � 2 � � 3 E q ∗ lim = ρ 3 x − x ( λ ) n → + ∞ n / p → α
Algorithmic gap asymptotic MMSE 9 critical point equation ∇ ψ λ,α ( q x , q s , r s ) = 0 ⇕ fjxed point equation ( q x , q s , r s ) = F λ,α ( q x , q s , r s ) • Fixed point with lowest potential ψ λ,α ( q x , q s , r s ) used to compute • Uninformative fjxed point q x = 0 ifg φ odd function, P S centered Strongly stable fjxed point ⇒ infjnite algorithmic gap persists
2 signal-to-latent space dimensions . 10 Asymptotic MMSE in the plane ( α, λ ) Information theoretic threshold λ IT decreases with the ratio α of Figure 2: Asymptotic MMSE ( X ⊗ 3 ) as a function of ( α, λ ) for ϕ ( x ) = x . Left: ( δ 1 + δ − 1 ) P S ∼ N ( 0 , 1 ) . Right: P S ∼
Asymptotic MMSE signal-to-latent space dimensions estimation problem with i.i.d. Rademacher prior. 11 Information theoretic threshold λ IT decreases with the ratio α of Figure 3: Asymptotic MMSE ( X ⊗ 3 ) as a function of λ for ϕ ( x ) = sign ( x ) , P S ∼ N ( 0 , 1 ) and difgerent values of α . Limit α → 0 + given by tensor
Limit of vanishing signal-to-latent space dimensions x 7 Lelarge and Miolane, “Fundamental limits of symmetric low-rank matrix estimation”. i.i.d. X n n Same asymptotic mutual information than Z 2 12 n Limit α → 0 + of the asymptotic mutual information I ( X ; Y | W ) λ lim lim = inf 12 ( ρ x − q x ) 2 ( ρ x + 2 q x ) n → + ∞ α → 0 + q x ∈ [ 0 ,ρ x ] n / p → α � � �� � � � λ q 2 � + � + I U ; φ ρ s − ( E S ) 2 U + | E S | V � V √ λ � � X i � X j � X k + � Z ijk , 1 ≤ i ≤ j ≤ k ≤ n , Y ijk = � with � X i = φ ( ρ s − ( E S ) 2 U i + | E S | V i ) ; U , V i.i.d. ∼ N ( 0 , I n ) ; V known • E S ∼ P S S = 0 : i.i.d. prior � X 1 , . . . , � ∼ φ ( N ( 0 , ρ s )) • E S ∼ P S S ̸ = 0 : side information V , proof in 7 easily adapted
Limit of vanishing signal-to-latent space dimensions i.i.d. 8 Aubin et al., “The spiked matrix model with generative priors”. 2 dx “No algorithmic gap with generative-model priors” 8 ? 13 Algorithmic gap for matrix estimation with generative prior 1. Similar behavior for matrix estimation with generative priors 2. We can choose φ to obtain any equivalent i.i.d. prior φ ( N ( 0 , ρ s )) when α → 0 + including a prior exhibiting an algorithmic gap � � WS / √ p X = φ with S 1 , . . . , S p ∼ P S centered unit-variance and − 1 if x < − ϵ � − ϵ 2 = ρ √ e − x 2 φ ( x ) = ; 0 if − ϵ < x < ϵ 2 π −∞ + 1 if x > ϵ Equivalent to i.i.d. Bernoulli-Rademacher prior when α → 0 + φ ( N ( 0 , ρ s )) ∼ ( 1 − ρ ) δ 0 + ρ 2 δ 1 + ρ 2 δ − 1
Limit of vanishing signal-to-latent space dimensions signal X lying on a lower p -dimensional space 14 However regime α → 0 + does not correspond to a high-dimensional Does the algorithmic gap vanishes/disappears when α increases?
References Aubin, Benjamin et al. “The spiked matrix model with generative priors”. In: Advances in Neural Information Processing Systems 32. 2019, pp. 8366–8377. Lelarge, Marc and Léo Miolane. “Fundamental limits of symmetric low-rank matrix estimation”. In: Probability Theory and Related Fields 173.3 (2019). ISSN: 1432-2064. DOI: 10.1007/s00440-018-0845-x . Lesieur, Thibault et al. “Statistical and computational phase 2017 IEEE International Symposium on Information Theory (ISIT) (2017). DOI: 10.1109/isit.2017.8006580 . Luneau, Clément and Nicolas Macris. Tensor estimation with structured priors. 2020. arXiv: 2006.14989 [cs.IT] . 15 transitions in spiked tensor estimation”. In:
Recommend
More recommend