Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale University University of Cambridge Joint work with Antony Joseph, Sanghee Cho, Cynthia Rush, Adam Greig, Tuhin Sarkar, Sekhar Tatikonda ISIT 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 / 31
Part II of the tutorial: • Approximate message passing (AMP) decoding • Power-allocation schemes to improve finite block-length performance (Joint work with Cynthia Rush and Adam Greig) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 / 31
SPARC Decoding Section 1 Section 2 Section L M columns M columns M columns A : n rows T β : , 0 0 , √ nP 1 , 0 , √ nP 2 , 0 , 0 , √ nP L , 0 , Channel output y = A β + ε Want efficient algorithm to decode β from y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 / 31
AMP for Compressed Sensing • Approximation of loopy belief propagation for dense graphs [Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] • Compressed sensing (CS): Want to recover β from y = A β + ε A is n × N measurement matrix, β i.i.d. with known prior β 1 β 2 β N In CS, we often solve LASSO: ˆ β = arg min β ∥ y − A β ∥ 2 2 + λ ∥ β ∥ 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 / 31
Min-Sum Message Passing for LASSO ∑ n i =1 ( y i − ( A β ) i ) 2 + λ ∑ N Want to compute ˆ j =1 | β j | β = arg min β β 1 β 2 β N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 31
Min-Sum Message Passing for LASSO ∑ n i =1 ( y i − ( A β ) i ) 2 + λ ∑ N Want to compute ˆ j =1 | β j | β = arg min β β 1 β 2 β N For j = 1 , . . . , N , i = 1 , . . . , n : ∑ M t M t − 1 ˆ j → i ( β j ) = λ | β j | + i ′ → j ( β j ) i ′ ∈ [ n ] \ i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 31
Min-Sum Message Passing for LASSO ∑ n i =1 ( y i − ( A β ) i ) 2 + λ ∑ N Want to compute ˆ j =1 | β j | β = arg min β β 1 β 2 β N For j = 1 , . . . , N , i = 1 , . . . , n : ∑ M t M t − 1 ˆ j → i ( β j ) = λ | β j | + i ′ → j ( β j ) i ′ ∈ [ n ] \ i ∑ ( y i − ( A β ) i ) 2 + ˆ M t M t i → j ( β j ) = min j ′ → i ( β j ′ ) β \ β j j ′ ∈ [ N ] \ j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 31
β 1 β 2 β N ∑ M t − 1 ˆ M t j → i ( β j ) = λ | β j | + i ′ → j ( β j ) i ′ ∈ [ n ] \ i ∑ ( y i − ( A β ) i ) 2 + ˆ M t M t i → j ( β j ) = min j ′ → i ( β j ′ ) β \ β j j ′ ∈ [ N ] \ j But computing these messages is infeasible: — Each message needs to be computed for all β j ∈ R — There are nN such messages Further, the factor graph is not anything like a tree! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 / 31
Quadratic Approximation of messages β 1 β 2 β N Messages approximated by two numbers : ∑ r t A ij ′ β t i → j = y i − j ′ → i j ′ ∈ [ N ] \ i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 / 31
Quadratic Approximation of messages β 1 β 2 β N Messages approximated by two numbers : ( ∑ ) ∑ r t A ij ′ β t β t +1 A i ′ j r t i → j = y i − j → i = η t j ′ → i i ′ → j j ′ ∈ [ N ] \ i i ′ ∈ [ n ] \ i • For LASSO, η t is the soft-thresholding operator • We still have nN messages in each step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 / 31
∑ r t A ij ′ β t j ′ → i + A ij β t i → j = y i − j → i j ′ ∈ [ N ] ( ∑ ) β t +1 A i ′ j r t i ′ → j − A ij r t j → i = η t i → j i ′ ∈ [ n ] Using Taylor approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 / 31
The AMP algorithm for LASSO [Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] r t = y − A β t + r t − 1 ∥ β t ∥ 0 n β t +1 = η t ( A T r t + β t ) • AMP iteratively produces estimates β 0 = 0 , β 1 , . . . , β t , . . . • r t is the ‘modified residual’ after step t • η t denoises the effective observation to produce β t +1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 31
The AMP algorithm for LASSO [Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] r t = y − A β t + r t − 1 ∥ β t ∥ 0 n β t +1 = η t ( A T r t + β t ) • AMP iteratively produces estimates β 0 = 0 , β 1 , . . . , β t , . . . • r t is the ‘modified residual’ after step t • η t denoises the effective observation to produce β t +1 The momentum term in r t ensures that asymptotically A T r t + β t ≈ β + τ t Z t where Z t is N (0 , I ) ⇒ The effective observation A T r t + x t is true signal observed in independent Gaussian noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 31
AMP for SPARC Decoding Section 1 Section 2 Section L M columns M columns M columns A : n rows T β : 0 , √ nP 1 , 0 , √ nP 2 , 0 , , 0 0 , √ nP L , 0 , ε i.i.d. ∼ N (0 , σ 2 ) y = A β + ε, SPARC decoding is a different optimization problem from LASSO: • Want arg min β ∥ Y − A β ∥ 2 s.t. β is a SPARC message • β has one non-zero per section, section size M → ∞ • The undersampling ratio n / ( ML ) → 0. Let us revisit the (approximated) min-sum updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 / 31
Approximated Min-Sum β 1 β 2 β N ( ∑ ) ∑ r t A ij ′ β t β t +1 A i ′ j r t i → j = y i − j → i = η t , j j ′ → i i ′ → j j ′ ∈ [ N ] \ i i ′ ∈ [ n ] \ i � �� � stat t , j If for j ∈ [ N ], stat t , j is approximately distributed as β j + τ t Z t , j , then the Bayes optimal choice of η t , j is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 / 31
Approximated Min-Sum β 1 β 2 β N ( ∑ ) ∑ r t A ij ′ β t β t +1 A i ′ j r t i → j = y i − j → i = η t , j j ′ → i i ′ → j j ′ ∈ [ N ] \ i i ′ ∈ [ n ] \ i � �� � stat t , j If for j ∈ [ N ], stat t , j is approximately distributed as β j + τ t Z t , j , then the Bayes optimal choice of η t , j is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 / 31
Bayes Optimal η t η t ( stat t = s ) = E [ β | β + τ t Z = s ]: √ nP ℓ /τ 2 ( ) exp s j √ t s j ′ √ nP ℓ /τ 2 j ∈ section ℓ. η t , j ( s ) = nP ℓ ) , ( ∑ j ′ ∈ sec ℓ exp t Note that β t +1 is • the MMSE estimate of β given the observation β + τ t Z t • ∝ the posterior probability of entry j of β j being non-zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 / 31
Bayes Optimal η t η t ( stat t = s ) = E [ β | β + τ t Z = s ]: √ nP ℓ /τ 2 ( ) exp s j √ t s j ′ √ nP ℓ /τ 2 j ∈ section ℓ. η t , j ( s ) = nP ℓ ) , ( ∑ j ′ ∈ sec ℓ exp t Note that β t +1 is • the MMSE estimate of β given the observation β + τ t Z t • ∝ the posterior probability of entry j of β j being non-zero ∑ r t A ij ′ β t j ′ → i + A ij β t i → j = y i − j → i j ′ ∈ [ N ] ( ∑ ) β t +1 A i ′ j r t i ′ → j − A ij r t j → i = η t i → j i ′ ∈ [ n ] Using Taylor approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 / 31
AMP Decoder Set β 0 = 0. For t ≥ 0: ( ) r t = y − A β t + r t − 1 P − ∥ β t ∥ 2 , τ 2 n t − 1 = η t , j ( A T r t + β t ) , β t +1 for j = 1 , . . . , ML j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 / 31
AMP Decoder Set β 0 = 0. For t ≥ 0: ( ) r t = y − A β t + r t − 1 P − ∥ β t ∥ 2 , τ 2 n t − 1 = η t , j ( A T r t + β t ) , β t +1 for j = 1 , . . . , ML j √ nP ℓ /τ 2 ( ) exp s j √ t η t , j ( s ) = s j ′ √ nP ℓ /τ 2 ) , j ∈ section ℓ. nP ℓ ( ∑ j ′ ∈ sec ℓ exp t β t +1 is the MMSE estimate of β given that β t + A T r t ≈ β + τ t Z t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 / 31
The statistic β t + A T r t Suppose r t = y − A β t β t + A T r t = β + A T ε ( β t − β ) + ( I − A T A ) ���� � �� � N (0 ,σ 2 ) ≈ N (0 , 1 / n ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 / 31
Recommend
More recommend