High-Rate Sparse Superposition Codes with Iteratively Optimal Estimates Andrew Barron, Sanghee Cho Department of Statistics Yale University 2012 IEEE International Symposium on Information Theory July 2, 2012 MIT
Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise
Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise Linear Model Y = X β + ǫ
Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise • Partitioned Coef.: β = ( 00 ∗ 0000 , 000 ∗ 000 , . . . , 0 ∗ 00000 ) • L sections of size M = N / L , one non-zero in each
Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) snr = P L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise σ 2 • Partitioned Coef.: β = ( 00 ∗ 0000 , 000 ∗ 000 , . . . , 0 ∗ 00000 ) • L sections of size M = N / L , one non-zero in each n = L log M • Rate R = K Capacity C = 1 , 2 log ( 1 + snr ) n
Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) snr = P L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise σ 2 • Partitioned Coef.: β = ( 00 ∗ 0000 , 000 ∗ 000 , . . . , 0 ∗ 00000 ) • L sections of size M = N / L , one non-zero in each n = L log M • Rate R = K Capacity C = 1 , 2 log ( 1 + snr ) n • Ultra-sparse case: Impractical M = 2 nR / L with L constant (successive decoder reliable for R < C : Cover 1972 IT ) • Moderately-sparse: M = L a with n = ( L log M ) / R
Sparse Superposition Code for the Gaussian Channel Channel X u β X β Y ˆ Decoder u Input bits Sparse Dictionary Codeword received (length K ) coeff. vector n by N (length n ) (length n ) (length N ) indep N(0,1) snr = P L non-zero ǫ � β � 2 = P ∼ N ( 0 , σ 2 I ) Noise σ 2 • Partitioned Coef.: β = ( 00 ∗ 0000 , 000 ∗ 000 , . . . , 0 ∗ 00000 ) • L sections of size M = N / L , one non-zero in each n = L log M • Rate R = K Capacity C = 1 , 2 log ( 1 + snr ) n • Ultra-sparse case: Impractical M = 2 nR / L with L constant (successive decoder reliable for R < C : Cover 1972 IT ) • Moderately-sparse: M = L a with n = ( L log M ) / R (reliable for R < C ) Maximum likelihood decoder (Joseph & Barron 2010a ISIT, 2012a IT) Adaptive successive decoder with threshold (J&B 2010b ISIT, 2012b) Adaptive successive decoder with soft decision (B&C, this talk)
Progression of success rate 1.0 soft decision Thresholding with a=0.5 0.8 0.6 0.4 0.2 M = 2 9 , L = M snr=7 C=1.5 bits 0.0 R=1.05 bits(0.7C) 0.0 0.2 0.4 0.6 0.8 1.0 x
Progression of success rate 1.0 soft decision Thresholding with a=0.5 0.8 0.6 0.4 0.2 M = 2 9 , L = M snr=7 C=1.5 bits 0.0 R=1.05 bits(0.7C) 0.0 0.2 0.4 0.6 0.8 1.0 x
Progression of success rate 1.0 soft decision Thresholding with a=0.5 0.8 0.6 0.4 0.2 M = 2 9 , L = M snr=7 C=1.5 bits 0.0 R=1.05 bits(0.7C) 0.0 0.2 0.4 0.6 0.8 1.0 x
Power Allocation � β � 2 = P • Power control: � L ℓ = 1 P ℓ = P • Special choice: P ℓ proportional to e − 2 C ℓ/ L for ℓ = 1 , . . . , L
Power Allocation � β � 2 = P • Power control: � L ℓ = 1 P ℓ = P • Special choice: P ℓ proportional to e − 2 C ℓ/ L for ℓ = 1 , . . . , L 0.020 0.015 power allocation 0.010 0.005 0.000 0 20 40 60 80 100 section index
Coefficient vectors β � β � 2 = P • Power control: � L ℓ = 1 P ℓ = P • Special choice: P ℓ proportional to e − 2 C ℓ/ L for ℓ = 1 , . . . , L √ √ √ • Coeff. sent: β = ( 00 P 1 0000 , 000 P 2 000 , . . . , 0 P L 00000 ) • Terms sent: ( j 1 , j 2 , . . . , j L ) • β j = √ P ℓ 1 { j = j ℓ } for j in section ℓ , for ℓ = 1 , . . . , L • B = set of such allowed vectors β for codewords X β
Coefficient Estimates ˆ β � β � 2 = P • Power control: � L ℓ = 1 P ℓ = P • Special choice: P ℓ proportional to e − 2 C ℓ/ L for ℓ = 1 , . . . , L √ √ √ • Coeff. sent: β = ( 00 P 1 0000 , 000 P 2 000 , . . . , 0 P L 00000 ) • Terms sent: ( j 1 , j 2 , . . . , j L ) • β j = √ P ℓ 1 { j = j ℓ } for j in section ℓ , for ℓ = 1 , . . . , L • B = set of such allowed vectors β for codewords X β • ˆ β j restricted to B or the convex hull of B β j = √ P ℓ ˆ • ˆ with ˆ � j ∈ sec ℓ ˆ w j for j in sec ℓ , w j ≥ 0, w j = 1
Iterative Estimation For k ≥ 1 • Coefficient fits: ˆ β k , j (initially 0) • Codeword fits: F k = X ˆ β k • Vector of statistics: stat k = function of ( X , Y , F 1 , . . . , F k ) • e.g. stat k , j proportional to X T j ( Y − F k ) • Update ˆ β k + 1 as a function of stat k
Iterative Estimation For k ≥ 1 • Coefficient fits: ˆ β k , j (initially 0) • Codeword fits: F k = X ˆ β k • Vector of statistics: stat k = function of ( X , Y , F 1 , . . . , F k ) • e.g. stat k , j proportional to X T j ( Y − F k ) • Update ˆ β k + 1 as a function of stat k • Thresholding: Adaptive Successive Decoder β k + 1 , j = √ P ℓ ˆ if stat k , j is above threshold in sections ℓ not previously decoded
Iterative Estimation For k ≥ 1 • Coefficient fits: ˆ β k , j (initially 0) • Codeword fits: F k = X ˆ also F k , − j = X ˆ β k β k , − j • Vector of statistics: stat k = function of ( X , Y , F 1 , . . . , F k ) • e.g. stat k , j proportional to X T j ( Y − F k , − j ) • Update ˆ β k + 1 as a function of stat k • Thresholding: Adaptive Successive Decoder β k + 1 , j = √ P ℓ ˆ if stat k , j is above threshold in sections ℓ not previously decoded • Soft decision: ˆ β k + 1 , j = E [ β j | stat k ] with thresholding on the last step
Statistics F k = X ˆ • stat k = function of ( X , Y , F 1 , . . . , F k ) β k • Orthogonalization : Let G 0 = Y and for k ≥ 1 G k = part of F k orthogonal to G 0 , G 1 , . . . , G k − 1 • Components of statistics X T j G k Z k , j = � G k � • Class of statistics stat k formed by combining Z 0 , . . . , Z k
Statistics F k = X ˆ • stat k = function of ( X , Y , F 1 , . . . , F k ) β k • Orthogonalization : Let G 0 = Y and for k ≥ 1 G k = part of F k orthogonal to G 0 , G 1 , . . . , G k − 1 • Components of statistics X T j G k Z k , j = � G k � • Class of statistics stat k formed by combining Z 0 , . . . , Z k √ n stat k , j = Z comb ˆ + √ c k β k , j k , j where Z comb = � λ k , 0 Z 0 − � λ k , 1 Z 1 − . . . − � λ k , k Z k k with λ k , 0 + λ k , 1 + . . . + λ k , k = 1
Statistics based on residuals j ( Y − X ˆ Let stat k , j be proportional to X T β k , − j ) j ( Y − X ˆ X T β k ) + � X j � 2 ˆ stat k , j = √ nc k √ nc k β k , j Arises with λ k proportional to � β k ) 2 � 0 ˆ 1 ˆ k ˆ ( � Y � − Z T β k ) 2 , ( Z T β k ) 2 , . . . , ( Z T and nc k = � Y − X ˆ β k � 2 . Here, c k is typically between σ 2 and σ 2 + P
Idealized Statistics λ k exists yielding stat ideal with distributional representation k √ n β + Z comb k � σ 2 + � β − ˆ β k � 2 Z comb ∼ N ( 0 , I ) . with k This is a normal shift that improves with decreasing � β − ˆ β k � 2 .
Idealized Statistics λ k exists yielding stat ideal with distributional representation k √ n β + Z comb k � σ 2 + � β − ˆ β k � 2 Z comb ∼ N ( 0 , I ) . with k This is a normal shift that improves with decreasing � β − ˆ β k � 2 . For terms sent the shift α ℓ, k has an effective snr interpretation � P ℓ α ℓ, k = n σ 2 + P remaining , k P remaining , k = � β − ˆ β k � 2 . where
Distributional Analysis Lemma 1: shifted normal conditional distribution Given F k − 1 = ( � G 0 � , . . . , � G k − 1 � , Z 0 , Z 1 , . . . , Z k − 1 ) , the Z k has the distributional representation Z k = � G k � b k + Z k σ k • � G k � 2 /σ 2 k ∼ Chi-square ( n − k ) • Z k ∼ N ( 0 , Σ k ) indep of � G k �
Distributional Analysis Lemma 1: shifted normal conditional distribution Given F k − 1 = ( � G 0 � , . . . , � G k − 1 � , Z 0 , Z 1 , . . . , Z k − 1 ) , the Z k has the distributional representation Z k = � G k � b k + Z k σ k • � G k � 2 /σ 2 k ∼ Chi-square ( n − k ) • Z k ∼ N ( 0 , Σ k ) indep of � G k � • b 0 , b 1 , . . . , b k the successive orthonormal components of � β � ˆ � ˆ � � � β 1 β k , , . . . , ( ∗ ) σ 0 0 • Σ k = I − b 0 b T 0 − b 1 b T 1 − . . . − b k b T k = projection onto space orthogonal to ( ∗ ) k = ˆ k Σ k − 1 ˆ • σ 2 β T β k
Recommend
More recommend