Capacity of Continuous Channels with Memory via Directed Information Neural Estimator Ziv Aharoni 1 , Dor Tsur 1 , Ziv Goldfeld 2 , Haim H. Permuter 1 1 Ben-Gurion University of the Negev 2 Cornell University International Symposium on Information Theory June 21 st , 2020 Ziv Aharoni Capacity via DINE 1 / 18
Communication Channel X i Y i � M Encoder Channel Decoder M Y i − 1 ∆ Continuous alphabet Time invariant channel with memory channel is unknown Ziv Aharoni Capacity via DINE 2 / 18
Capacity Feedback is not present: 1 nI ( X n ; Y n ) C FF = lim n →∞ sup P Xn Feedback is present: 1 nI ( X n → Y n ) C FB = lim sup n →∞ P Xn � Y n − 1 where I ( X n → Y n ) is the directed information (DI) Ziv Aharoni Capacity via DINE 3 / 18
Capacity Feedback is not present: 1 n I ( X n → Y n ) C FF = lim n →∞ sup P Xn Feedback is present: 1 n I ( X n → Y n ) C FB = lim sup n →∞ P Xn � Y n − 1 where I ( X n → Y n ) is the directed information (DI) DI is a unifying measure for feed-forward (FF) and feedback (FB) capacity Ziv Aharoni Capacity via DINE 3 / 18
Talk Outline Directed Information Neural Estimator (DINE) X i Y i � M Channel Decoder M Gradient Y i − 1 ∆ Ziv Aharoni Capacity via DINE 4 / 18
Talk Outline Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT) X i Y i � M Channel Decoder M Gradient Y i − 1 ∆ Ziv Aharoni Capacity via DINE 4 / 18
Talk Outline Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT) Capacity estimation X i Y i � M Channel Decoder M Gradient Y i − 1 ∆ Ziv Aharoni Capacity via DINE 4 / 18
Preliminaries - Donsker-Varadhan Theorem ( Donsker-Varadhan Representation) The KL-divergence between the probability measures P and Q, can be represented by � e T � D KL ( P � Q ) = E P [T] − log E Q sup T:Ω − → R where, T is measurable and expectations are finite. For mutual information: � e T � I ( X ; Y ) = sup E P XY [T] − log E P X P Y T:Ω − → R Ziv Aharoni Capacity via DINE 5 / 18
MINE (Y. Bengio Keynoe ISIT ’19) Mutual Information Neural Estimator : Given { x i , y i } n i =1 Approximation � e T θ � ˆ I ( X ; Y ) = sup E P XY [T θ ] − log E P X P Y θ ∈ Θ Estimation � n � n 1 T θ ( x i , y i ) − log 1 ˆ e T θ ( x i , � y i ) I n ( X , Y ) = sup n n θ ∈ Θ i =1 i =1 Ziv Aharoni Capacity via DINE 6 / 18
Estimator Derivation DI as entropies difference I ( X n → Y n ) = h ( Y n ) − h ( Y n � X n ) � Y i | X i , Y i − 1 � where h ( Y n � X n ) = � n i =1 h Using an reference measure: I ( X n → Y n ) = I ( X n − 1 → Y n − 1 )+ D KL ( P Y n � X n � P Y n − 1 � X n − 1 ⊗ P � Y | P X n ) − D KL ( P Y n � P Y n − 1 ⊗ P � Y ) � �� � � �� � D ( n ) D ( n ) Y Y � X P � Y is some uniform i.i.d reference measure of the dataset. Ziv Aharoni Capacity via DINE 7 / 18
Estimator Derivation DI Rate as a difference of KL-divergences: I ( X n → Y n ) = I ( X n − 1 → Y n − 1 ) D ( n ) Y � X − D ( n ) + Y � �� � increment in info. in step n Ziv Aharoni Capacity via DINE 8 / 18
Estimator Derivation DI Rate as a difference of KL-divergences: n →∞ D ( n ) Y � X − D ( n ) − − − → I ( X → Y ) Y The limit exists for ergodic and stationary processes Ziv Aharoni Capacity via DINE 8 / 18
Estimator Derivation DI Rate as a difference of KL-divergences: n →∞ D ( n ) Y � X − D ( n ) − − − → I ( X → Y ) Y The goal: Estimate D ( n ) Y � X , D ( n ) Y Ziv Aharoni Capacity via DINE 8 / 18
Directed Information Neural Estimator Apply DV formula on D ( n ) Y � X , D ( n ) Y : � � �� D ( n ) � T( Y n − 1 , ˜ E P Y n [T( Y n )] − E P Y n − 1 ⊗ P ˜ = sup exp Y ) Y Y T :Ω → R P Yn | Y n − 1 where the optimal solution is T ∗ = log P ˜ Y Ziv Aharoni Capacity via DINE 9 / 18
Directed Information Neural Estimator Approximate T with a recurrent neural network (RNN) � � �� D ( n ) � T θ Y ( Y n − 1 , ˜ E P Y n [T θ Y ( Y n )] − E P Y n − 1 ⊗ P ˜ = sup exp Y ) Y Y θ Y Ziv Aharoni Capacity via DINE 9 / 18
Directed Information Neural Estimator Estimate expectations with empirical means � � n n � � 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 Ziv Aharoni Capacity via DINE 9 / 18
Directed Information Neural Estimator Estimate expectations with empirical means � � n n � � 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 I ( n ) ( X → Y ) = � � D ( n ) XY − � D ( n ) Finally, Y Ziv Aharoni Capacity via DINE 9 / 18
Consistency Theorem ( DINE consistency) Let { X i , Y i } ∞ i =1 ∼ P be jointly stationary ergodic stochastic processes. Then, there exist RNNs F 1 ∈ RNN d y , 1 , F 2 ∈ RNN d xy , 1 , such that DINE � I n (F 1 , F 2 ) is a strongly consistent estimator of I ( X → Y ) , i.e., a.s � lim I n (F 1 , F 2 ) = I ( X → Y ) n →∞ Ziv Aharoni Capacity via DINE 10 / 18
Consistency Theorem ( DINE consistency) Let { X i , Y i } ∞ i =1 ∼ P be jointly stationary ergodic stochastic processes. Then, there exist RNNs F 1 ∈ RNN d y , 1 , F 2 ∈ RNN d xy , 1 , such that DINE � I n (F 1 , F 2 ) is a strongly consistent estimator of I ( X → Y ) , i.e., � a.s lim I n (F 1 , F 2 ) = I ( X → Y ) n →∞ Sketch of proof: Represent the solution T ∗ by a dynamic system. Universal approximation of dynamical system with RNNs. Estimation of expectations with empirical means. Ziv Aharoni Capacity via DINE 10 / 18
Implementation � � � n � n 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 Ziv Aharoni Capacity via DINE 11 / 18
Implementation � � � n � n 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 Adjust RNN to process both inputs and carry the state generated by true samples Ziv Aharoni Capacity via DINE 11 / 18
Implementation � � � n � n 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 Adjust RNN to process both inputs and carry the state generated by true samples � � S 1 S T S 1 S T 0 S T − 1 ... F F F F � Y 1 Y 1 � Y T Y T Ziv Aharoni Capacity via DINE 11 / 18
Implementation Complete system layout for the calculation of � D ( n ) Y Input Y i S i Dense T θ Y ( Y i | Y i − 1 ) Modified � D Y ( θ Y , D n ) LSTM DV Layer � � Y i S i Dense T θ Y ( � Reference Gen. Y i | Y i − 1 ) Ziv Aharoni Capacity via DINE 12 / 18
NDT Neural Distribution Transformer (NDT) X i Y i � M Channel Decoder M Gradient Y i − 1 ∆ Ziv Aharoni Capacity via DINE 13 / 18
NDT Model M as i.i.d Gaussian noise { N i } i ∈ Z . The NDT a mapping NDT : N i �− w/o feedback: → X i NDT : N i , Y i − 1 �− → X i w/ feedback: Ziv Aharoni Capacity via DINE 14 / 18
NDT Model M as i.i.d Gaussian noise { N i } i ∈ Z . The NDT a mapping NDT : N i �− w/o feedback: → X i NDT : N i , Y i − 1 �− → X i w/ feedback: NDT is modeled by an RNN Y i − 1 X i Channel LSTM Dense Dense Constraint N i Ziv Aharoni Capacity via DINE 14 / 18
Capacity Estimation Iterating between DINE and NDT. Y i − 1 Feedback ∆ X i NDT Channel (RNN) P Y i | X i Y i − 1 Noise N i Gradient Y i DINE Output � I n ( X → Y ) (RNN) Ziv Aharoni Capacity via DINE 15 / 18
Results Channel - MA(1) additive Gaussian noise (AGN): Z i = α U i − 1 + U i Y i = X i + Z i i.i.d. where, U i ∼ N (0 , 1), X i is the channel input sequence bound to the power constraint E [ X 2 i ] ≤ P , and Y i is the channel output. Ziv Aharoni Capacity via DINE 16 / 18
MA(1) AGN Results Estimation performance 1.8 2 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -20 -15 -10 -5 0 5 10 15 -20 -15 -10 -5 0 5 10 15 (a) Feed-forward Capacity (b) Feedback Capacity Ziv Aharoni Capacity via DINE 17 / 18
Conclusion and Future Work Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds Ziv Aharoni Capacity via DINE 18 / 18
Conclusion and Future Work Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits Ziv Aharoni Capacity via DINE 18 / 18
Conclusion and Future Work Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits Thank You! Ziv Aharoni Capacity via DINE 18 / 18
Recommend
More recommend