high rate sparse superposition codes with iteratively
play

High-Rate Sparse Superposition Codes with Iteratively Optimal - PowerPoint PPT Presentation

High-Rate Sparse Superposition Codes with Iteratively Optimal Estimates Andrew Barron, Sanghee Cho Department of Statistics Yale University 2012 IEEE International Symposium on Information Theory July 2, 2012 MIT Sparse Superposition Code


  1. High-Rate Sparse Superposition Codes with Iteratively Optimal Estimates Andrew Barron, Sanghee Cho Department of Statistics Yale University 2012 IEEE International Symposium on Information Theory July 2, 2012 MIT

  2. Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise

  3. Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise Linear Model Y = X β + ǫ

  4. Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise • Partitioned Coef.: β = ( 00 ∗ 0000 , 000 ∗ 000 , . . . , 0 ∗ 00000 ) • L sections of size M = N / L , one non-zero in each

  5. Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) snr = P L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise σ 2 • Partitioned Coef.: β = ( 00 ∗ 0000 , 000 ∗ 000 , . . . , 0 ∗ 00000 ) • L sections of size M = N / L , one non-zero in each n = L log M • Rate R = K Capacity C = 1 , 2 log ( 1 + snr ) n

  6. Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) snr = P L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise σ 2 • Partitioned Coef.: β = ( 00 ∗ 0000 , 000 ∗ 000 , . . . , 0 ∗ 00000 ) • L sections of size M = N / L , one non-zero in each n = L log M • Rate R = K Capacity C = 1 , 2 log ( 1 + snr ) n • Ultra-sparse case: Impractical M = 2 nR / L with L constant (successive decoder reliable for R < C : Cover 1972 IT ) • Moderately-sparse: M = L a with n = ( L log M ) / R

  7. Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) snr = P L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise σ 2 • Partitioned Coef.: β = ( 00 ∗ 0000 , 000 ∗ 000 , . . . , 0 ∗ 00000 ) • L sections of size M = N / L , one non-zero in each n = L log M • Rate R = K Capacity C = 1 , 2 log ( 1 + snr ) n • Ultra-sparse case: Impractical M = 2 nR / L with L constant (successive decoder reliable for R < C : Cover 1972 IT ) • Moderately-sparse: M = L a with n = ( L log M ) / R (reliable for R < C ) Maximum likelihood decoder (Joseph & Barron 2010a ISIT, 2012a IT) Adaptive successive decoder with threshold (J&B 2010b ISIT, 2012b) Adaptive successive decoder with soft decision (B&C, this talk)

  8. Progression of success rate 1.0 soft decision Thresholding with a=0.5 0.8 0.6 0.4 0.2 M = 2 9 , L = M snr=7 C=1.5 bits 0.0 R=1.05 bits(0.7C) 0.0 0.2 0.4 0.6 0.8 1.0 x

  9. Progression of success rate 1.0 soft decision Thresholding with a=0.5 0.8 0.6 0.4 0.2 M = 2 9 , L = M snr=7 C=1.5 bits 0.0 R=1.05 bits(0.7C) 0.0 0.2 0.4 0.6 0.8 1.0 x

  10. Progression of success rate 1.0 soft decision Thresholding with a=0.5 0.8 0.6 0.4 0.2 M = 2 9 , L = M snr=7 C=1.5 bits 0.0 R=1.05 bits(0.7C) 0.0 0.2 0.4 0.6 0.8 1.0 x

  11. Power Allocation � β � 2 = P • Power control: � L ℓ = 1 P ℓ = P • Special choice: P ℓ proportional to e − 2 C ℓ/ L for ℓ = 1 , . . . , L

  12. Power Allocation � β � 2 = P • Power control: � L ℓ = 1 P ℓ = P • Special choice: P ℓ proportional to e − 2 C ℓ/ L for ℓ = 1 , . . . , L 0.020 0.015 power allocation 0.010 0.005 0.000 0 20 40 60 80 100 section index

  13. Coefficient vectors β � β � 2 = P • Power control: � L ℓ = 1 P ℓ = P • Special choice: P ℓ proportional to e − 2 C ℓ/ L for ℓ = 1 , . . . , L √ √ √ • Coeff. sent: β = ( 00 P 1 0000 , 000 P 2 000 , . . . , 0 P L 00000 ) • Terms sent: ( j 1 , j 2 , . . . , j L ) • β j = √ P ℓ 1 { j = j ℓ } for j in section ℓ , for ℓ = 1 , . . . , L • B = set of such allowed vectors β for codewords X β

  14. Coefficient Estimates ˆ β � β � 2 = P • Power control: � L ℓ = 1 P ℓ = P • Special choice: P ℓ proportional to e − 2 C ℓ/ L for ℓ = 1 , . . . , L √ √ √ • Coeff. sent: β = ( 00 P 1 0000 , 000 P 2 000 , . . . , 0 P L 00000 ) • Terms sent: ( j 1 , j 2 , . . . , j L ) • β j = √ P ℓ 1 { j = j ℓ } for j in section ℓ , for ℓ = 1 , . . . , L • B = set of such allowed vectors β for codewords X β • ˆ β j restricted to B or the convex hull of B β j = √ P ℓ ˆ • ˆ with ˆ � j ∈ sec ℓ ˆ w j for j in sec ℓ , w j ≥ 0, w j = 1

  15. Iterative Estimation For k ≥ 1 • Coefficient fits: ˆ β k , j (initially 0) • Codeword fits: F k = X ˆ β k • Vector of statistics: stat k = function of ( X , Y , F 1 , . . . , F k ) • e.g. stat k , j proportional to X T j ( Y − F k ) • Update ˆ β k + 1 as a function of stat k

  16. Iterative Estimation For k ≥ 1 • Coefficient fits: ˆ β k , j (initially 0) • Codeword fits: F k = X ˆ β k • Vector of statistics: stat k = function of ( X , Y , F 1 , . . . , F k ) • e.g. stat k , j proportional to X T j ( Y − F k ) • Update ˆ β k + 1 as a function of stat k • Thresholding: Adaptive Successive Decoder β k + 1 , j = √ P ℓ ˆ if stat k , j is above threshold in sections ℓ not previously decoded

  17. Iterative Estimation For k ≥ 1 • Coefficient fits: ˆ β k , j (initially 0) • Codeword fits: F k = X ˆ also F k , − j = X ˆ β k β k , − j • Vector of statistics: stat k = function of ( X , Y , F 1 , . . . , F k ) • e.g. stat k , j proportional to X T j ( Y − F k , − j ) • Update ˆ β k + 1 as a function of stat k • Thresholding: Adaptive Successive Decoder β k + 1 , j = √ P ℓ ˆ if stat k , j is above threshold in sections ℓ not previously decoded • Soft decision: ˆ β k + 1 , j = E [ β j | stat k ] with thresholding on the last step

  18. Statistics F k = X ˆ • stat k = function of ( X , Y , F 1 , . . . , F k ) β k • Orthogonalization : Let G 0 = Y and for k ≥ 1 G k = part of F k orthogonal to G 0 , G 1 , . . . , G k − 1 • Components of statistics X T j G k Z k , j = � G k � • Class of statistics stat k formed by combining Z 0 , . . . , Z k

  19. Statistics F k = X ˆ • stat k = function of ( X , Y , F 1 , . . . , F k ) β k • Orthogonalization : Let G 0 = Y and for k ≥ 1 G k = part of F k orthogonal to G 0 , G 1 , . . . , G k − 1 • Components of statistics X T j G k Z k , j = � G k � • Class of statistics stat k formed by combining Z 0 , . . . , Z k √ n stat k , j = Z comb ˆ + √ c k β k , j k , j where Z comb = � λ k , 0 Z 0 − � λ k , 1 Z 1 − . . . − � λ k , k Z k k with λ k , 0 + λ k , 1 + . . . + λ k , k = 1

  20. Statistics based on residuals j ( Y − X ˆ Let stat k , j be proportional to X T β k , − j ) j ( Y − X ˆ X T β k ) + � X j � 2 ˆ stat k , j = √ nc k √ nc k β k , j Arises with λ k proportional to � β k ) 2 � 0 ˆ 1 ˆ k ˆ ( � Y � − Z T β k ) 2 , ( Z T β k ) 2 , . . . , ( Z T and nc k = � Y − X ˆ β k � 2 . Here, c k is typically between σ 2 and σ 2 + P

  21. Idealized Statistics λ k exists yielding stat ideal with distributional representation k √ n β + Z comb k � σ 2 + � β − ˆ β k � 2 Z comb ∼ N ( 0 , I ) . with k This is a normal shift that improves with decreasing � β − ˆ β k � 2 .

  22. Idealized Statistics λ k exists yielding stat ideal with distributional representation k √ n β + Z comb k � σ 2 + � β − ˆ β k � 2 Z comb ∼ N ( 0 , I ) . with k This is a normal shift that improves with decreasing � β − ˆ β k � 2 . For terms sent the shift α ℓ, k has an effective snr interpretation � P ℓ α ℓ, k = n σ 2 + P remaining , k P remaining , k = � β − ˆ β k � 2 . where

  23. Distributional Analysis Lemma 1: shifted normal conditional distribution Given F k − 1 = ( � G 0 � , . . . , � G k − 1 � , Z 0 , Z 1 , . . . , Z k − 1 ) , the Z k has the distributional representation Z k = � G k � b k + Z k σ k • � G k � 2 /σ 2 k ∼ Chi-square ( n − k ) • Z k ∼ N ( 0 , Σ k ) indep of � G k �

  24. Distributional Analysis Lemma 1: shifted normal conditional distribution Given F k − 1 = ( � G 0 � , . . . , � G k − 1 � , Z 0 , Z 1 , . . . , Z k − 1 ) , the Z k has the distributional representation Z k = � G k � b k + Z k σ k • � G k � 2 /σ 2 k ∼ Chi-square ( n − k ) • Z k ∼ N ( 0 , Σ k ) indep of � G k � • b 0 , b 1 , . . . , b k the successive orthonormal components of � β � ˆ � ˆ � � � β 1 β k , , . . . , ( ∗ ) σ 0 0 • Σ k = I − b 0 b T 0 − b 1 b T 1 − . . . − b k b T k = projection onto space orthogonal to ( ∗ ) k = ˆ k Σ k − 1 ˆ • σ 2 β T β k

Recommend


More recommend