sparse regression codes
play

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale - PowerPoint PPT Presentation

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale University University of Cambridge Joint work with Antony Joseph, Sanghee Cho, Cynthia Rush, Adam Greig, Tuhin Sarkar, Sekhar Tatikonda ISIT 2016 . . . . . . . . . . .


  1. Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale University University of Cambridge Joint work with Antony Joseph, Sanghee Cho, Cynthia Rush, Adam Greig, Tuhin Sarkar, Sekhar Tatikonda ISIT 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 / 31

  2. Part II of the tutorial: • Approximate message passing (AMP) decoding • Power-allocation schemes to improve finite block-length performance (Joint work with Cynthia Rush and Adam Greig) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 / 31

  3. SPARC Decoding Section 1 Section 2 Section L M columns M columns M columns A : n rows T β : , 0 0 , √ nP 1 , 0 , √ nP 2 , 0 , 0 , √ nP L , 0 , Channel output y = A β + ε Want efficient algorithm to decode β from y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 / 31

  4. AMP for Compressed Sensing • Approximation of loopy belief propagation for dense graphs [Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] • Compressed sensing (CS): Want to recover β from y = A β + ε A is n × N measurement matrix, β i.i.d. with known prior β 1 β 2 β N In CS, we often solve LASSO: ˆ β = arg min β ∥ y − A β ∥ 2 2 + λ ∥ β ∥ 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 / 31

  5. Min-Sum Message Passing for LASSO ∑ n i =1 ( y i − ( A β ) i ) 2 + λ ∑ N Want to compute ˆ j =1 | β j | β = arg min β β 1 β 2 β N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 31

  6. Min-Sum Message Passing for LASSO ∑ n i =1 ( y i − ( A β ) i ) 2 + λ ∑ N Want to compute ˆ j =1 | β j | β = arg min β β 1 β 2 β N For j = 1 , . . . , N , i = 1 , . . . , n : ∑ M t M t − 1 ˆ j → i ( β j ) = λ | β j | + i ′ → j ( β j ) i ′ ∈ [ n ] \ i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 31

  7. Min-Sum Message Passing for LASSO ∑ n i =1 ( y i − ( A β ) i ) 2 + λ ∑ N Want to compute ˆ j =1 | β j | β = arg min β β 1 β 2 β N For j = 1 , . . . , N , i = 1 , . . . , n : ∑ M t M t − 1 ˆ j → i ( β j ) = λ | β j | + i ′ → j ( β j ) i ′ ∈ [ n ] \ i ∑ ( y i − ( A β ) i ) 2 + ˆ M t M t i → j ( β j ) = min j ′ → i ( β j ′ ) β \ β j j ′ ∈ [ N ] \ j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 31

  8. β 1 β 2 β N ∑ M t − 1 ˆ M t j → i ( β j ) = λ | β j | + i ′ → j ( β j ) i ′ ∈ [ n ] \ i ∑ ( y i − ( A β ) i ) 2 + ˆ M t M t i → j ( β j ) = min j ′ → i ( β j ′ ) β \ β j j ′ ∈ [ N ] \ j But computing these messages is infeasible: — Each message needs to be computed for all β j ∈ R — There are nN such messages Further, the factor graph is not anything like a tree! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 / 31

  9. Quadratic Approximation of messages β 1 β 2 β N Messages approximated by two numbers : ∑ r t A ij ′ β t i → j = y i − j ′ → i j ′ ∈ [ N ] \ i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 / 31

  10. Quadratic Approximation of messages β 1 β 2 β N Messages approximated by two numbers : ( ∑ ) ∑ r t A ij ′ β t β t +1 A i ′ j r t i → j = y i − j → i = η t j ′ → i i ′ → j j ′ ∈ [ N ] \ i i ′ ∈ [ n ] \ i • For LASSO, η t is the soft-thresholding operator • We still have nN messages in each step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 / 31

  11. ∑ r t A ij ′ β t j ′ → i + A ij β t i → j = y i − j → i j ′ ∈ [ N ] ( ∑ ) β t +1 A i ′ j r t i ′ → j − A ij r t j → i = η t i → j i ′ ∈ [ n ] Using Taylor approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 / 31

  12. The AMP algorithm for LASSO [Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] r t = y − A β t + r t − 1 ∥ β t ∥ 0 n β t +1 = η t ( A T r t + β t ) • AMP iteratively produces estimates β 0 = 0 , β 1 , . . . , β t , . . . • r t is the ‘modified residual’ after step t • η t denoises the effective observation to produce β t +1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 31

  13. The AMP algorithm for LASSO [Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] r t = y − A β t + r t − 1 ∥ β t ∥ 0 n β t +1 = η t ( A T r t + β t ) • AMP iteratively produces estimates β 0 = 0 , β 1 , . . . , β t , . . . • r t is the ‘modified residual’ after step t • η t denoises the effective observation to produce β t +1 The momentum term in r t ensures that asymptotically A T r t + β t ≈ β + τ t Z t where Z t is N (0 , I ) ⇒ The effective observation A T r t + x t is true signal observed in independent Gaussian noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 31

  14. AMP for SPARC Decoding Section 1 Section 2 Section L M columns M columns M columns A : n rows T β : 0 , √ nP 1 , 0 , √ nP 2 , 0 , , 0 0 , √ nP L , 0 , ε i.i.d. ∼ N (0 , σ 2 ) y = A β + ε, SPARC decoding is a different optimization problem from LASSO: • Want arg min β ∥ Y − A β ∥ 2 s.t. β is a SPARC message • β has one non-zero per section, section size M → ∞ • The undersampling ratio n / ( ML ) → 0. Let us revisit the (approximated) min-sum updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 / 31

  15. Approximated Min-Sum β 1 β 2 β N ( ∑ ) ∑ r t A ij ′ β t β t +1 A i ′ j r t i → j = y i − j → i = η t , j j ′ → i i ′ → j j ′ ∈ [ N ] \ i i ′ ∈ [ n ] \ i � �� � stat t , j If for j ∈ [ N ], stat t , j is approximately distributed as β j + τ t Z t , j , then the Bayes optimal choice of η t , j is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 / 31

  16. Approximated Min-Sum β 1 β 2 β N ( ∑ ) ∑ r t A ij ′ β t β t +1 A i ′ j r t i → j = y i − j → i = η t , j j ′ → i i ′ → j j ′ ∈ [ N ] \ i i ′ ∈ [ n ] \ i � �� � stat t , j If for j ∈ [ N ], stat t , j is approximately distributed as β j + τ t Z t , j , then the Bayes optimal choice of η t , j is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 / 31

  17. Bayes Optimal η t η t ( stat t = s ) = E [ β | β + τ t Z = s ]: √ nP ℓ /τ 2 ( ) exp s j √ t s j ′ √ nP ℓ /τ 2 j ∈ section ℓ. η t , j ( s ) = nP ℓ ) , ( ∑ j ′ ∈ sec ℓ exp t Note that β t +1 is • the MMSE estimate of β given the observation β + τ t Z t • ∝ the posterior probability of entry j of β j being non-zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 / 31

  18. Bayes Optimal η t η t ( stat t = s ) = E [ β | β + τ t Z = s ]: √ nP ℓ /τ 2 ( ) exp s j √ t s j ′ √ nP ℓ /τ 2 j ∈ section ℓ. η t , j ( s ) = nP ℓ ) , ( ∑ j ′ ∈ sec ℓ exp t Note that β t +1 is • the MMSE estimate of β given the observation β + τ t Z t • ∝ the posterior probability of entry j of β j being non-zero ∑ r t A ij ′ β t j ′ → i + A ij β t i → j = y i − j → i j ′ ∈ [ N ] ( ∑ ) β t +1 A i ′ j r t i ′ → j − A ij r t j → i = η t i → j i ′ ∈ [ n ] Using Taylor approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 / 31

  19. AMP Decoder Set β 0 = 0. For t ≥ 0: ( ) r t = y − A β t + r t − 1 P − ∥ β t ∥ 2 , τ 2 n t − 1 = η t , j ( A T r t + β t ) , β t +1 for j = 1 , . . . , ML j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 / 31

  20. AMP Decoder Set β 0 = 0. For t ≥ 0: ( ) r t = y − A β t + r t − 1 P − ∥ β t ∥ 2 , τ 2 n t − 1 = η t , j ( A T r t + β t ) , β t +1 for j = 1 , . . . , ML j √ nP ℓ /τ 2 ( ) exp s j √ t η t , j ( s ) = s j ′ √ nP ℓ /τ 2 ) , j ∈ section ℓ. nP ℓ ( ∑ j ′ ∈ sec ℓ exp t β t +1 is the MMSE estimate of β given that β t + A T r t ≈ β + τ t Z t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 / 31

  21. The statistic β t + A T r t Suppose r t = y − A β t β t + A T r t = β + A T ε ( β t − β ) + ( I − A T A ) ���� � �� � N (0 ,σ 2 ) ≈ N (0 , 1 / n ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 / 31

Recommend


More recommend