Capacity of Continuous Channels with Memory via Directed Information - PowerPoint PPT Presentation

Capacity of Continuous Channels with Memory via Directed Information Neural Estimator Ziv Aharoni 1 , Dor Tsur 1 , Ziv Goldfeld 2 , Haim H. Permuter 1 1 Ben-Gurion University of the Negev 2 Cornell University International Symposium on Information Theory June 21 st , 2020 Ziv Aharoni Capacity via DINE 1 / 18

Communication Channel X i Y i � M Encoder Channel Decoder M Y i − 1 ∆ Continuous alphabet Time invariant channel with memory channel is unknown Ziv Aharoni Capacity via DINE 2 / 18

Capacity Feedback is not present: 1 nI ( X n ; Y n ) C FF = lim n →∞ sup P Xn Feedback is present: 1 nI ( X n → Y n ) C FB = lim sup n →∞ P Xn � Y n − 1 where I ( X n → Y n ) is the directed information (DI) Ziv Aharoni Capacity via DINE 3 / 18

Capacity Feedback is not present: 1 n I ( X n → Y n ) C FF = lim n →∞ sup P Xn Feedback is present: 1 n I ( X n → Y n ) C FB = lim sup n →∞ P Xn � Y n − 1 where I ( X n → Y n ) is the directed information (DI) DI is a unifying measure for feed-forward (FF) and feedback (FB) capacity Ziv Aharoni Capacity via DINE 3 / 18

Talk Outline Directed Information Neural Estimator (DINE) X i Y i � M Channel Decoder M Gradient Y i − 1 ∆ Ziv Aharoni Capacity via DINE 4 / 18

Talk Outline Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT) X i Y i � M Channel Decoder M Gradient Y i − 1 ∆ Ziv Aharoni Capacity via DINE 4 / 18

Talk Outline Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT) Capacity estimation X i Y i � M Channel Decoder M Gradient Y i − 1 ∆ Ziv Aharoni Capacity via DINE 4 / 18

Preliminaries - Donsker-Varadhan Theorem ( Donsker-Varadhan Representation) The KL-divergence between the probability measures P and Q, can be represented by � e T � D KL ( P � Q ) = E P [T] − log E Q sup T:Ω − → R where, T is measurable and expectations are finite. For mutual information: � e T � I ( X ; Y ) = sup E P XY [T] − log E P X P Y T:Ω − → R Ziv Aharoni Capacity via DINE 5 / 18

MINE (Y. Bengio Keynoe ISIT ’19) Mutual Information Neural Estimator : Given { x i , y i } n i =1 Approximation � e T θ � ˆ I ( X ; Y ) = sup E P XY [T θ ] − log E P X P Y θ ∈ Θ Estimation � n � n 1 T θ ( x i , y i ) − log 1 ˆ e T θ ( x i , � y i ) I n ( X , Y ) = sup n n θ ∈ Θ i =1 i =1 Ziv Aharoni Capacity via DINE 6 / 18

Estimator Derivation DI as entropies difference I ( X n → Y n ) = h ( Y n ) − h ( Y n � X n ) � Y i | X i , Y i − 1 � where h ( Y n � X n ) = � n i =1 h Using an reference measure: I ( X n → Y n ) = I ( X n − 1 → Y n − 1 )+ D KL ( P Y n � X n � P Y n − 1 � X n − 1 ⊗ P � Y | P X n ) − D KL ( P Y n � P Y n − 1 ⊗ P � Y ) � �� D ( n ) D ( n ) Y Y � X P � Y is some uniform i.i.d reference measure of the dataset. Ziv Aharoni Capacity via DINE 7 / 18

Estimator Derivation DI Rate as a difference of KL-divergences: I ( X n → Y n ) = I ( X n − 1 → Y n − 1 ) D ( n ) Y � X − D ( n ) + Y � �� increment in info. in step n Ziv Aharoni Capacity via DINE 8 / 18

Estimator Derivation DI Rate as a difference of KL-divergences: n →∞ D ( n ) Y � X − D ( n ) − − − → I ( X → Y ) Y The limit exists for ergodic and stationary processes Ziv Aharoni Capacity via DINE 8 / 18

Estimator Derivation DI Rate as a difference of KL-divergences: n →∞ D ( n ) Y � X − D ( n ) − − − → I ( X → Y ) Y The goal: Estimate D ( n ) Y � X , D ( n ) Y Ziv Aharoni Capacity via DINE 8 / 18

Directed Information Neural Estimator Apply DV formula on D ( n ) Y � X , D ( n ) Y : � � �� D ( n ) � T( Y n − 1 , ˜ E P Y n [T( Y n )] − E P Y n − 1 ⊗ P ˜ = sup exp Y ) Y Y T :Ω → R P Yn | Y n − 1 where the optimal solution is T ∗ = log P ˜ Y Ziv Aharoni Capacity via DINE 9 / 18

Directed Information Neural Estimator Approximate T with a recurrent neural network (RNN) � � �� D ( n ) � T θ Y ( Y n − 1 , ˜ E P Y n [T θ Y ( Y n )] − E P Y n − 1 ⊗ P ˜ = sup exp Y ) Y Y θ Y Ziv Aharoni Capacity via DINE 9 / 18

Directed Information Neural Estimator Estimate expectations with empirical means � � n n � � 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 Ziv Aharoni Capacity via DINE 9 / 18

Directed Information Neural Estimator Estimate expectations with empirical means � � n n � � 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 I ( n ) ( X → Y ) = � � D ( n ) XY − � D ( n ) Finally, Y Ziv Aharoni Capacity via DINE 9 / 18

Consistency Theorem ( DINE consistency) Let { X i , Y i } ∞ i =1 ∼ P be jointly stationary ergodic stochastic processes. Then, there exist RNNs F 1 ∈ RNN d y , 1 , F 2 ∈ RNN d xy , 1 , such that DINE � I n (F 1 , F 2 ) is a strongly consistent estimator of I ( X → Y ) , i.e., a.s � lim I n (F 1 , F 2 ) = I ( X → Y ) n →∞ Ziv Aharoni Capacity via DINE 10 / 18

Consistency Theorem ( DINE consistency) Let { X i , Y i } ∞ i =1 ∼ P be jointly stationary ergodic stochastic processes. Then, there exist RNNs F 1 ∈ RNN d y , 1 , F 2 ∈ RNN d xy , 1 , such that DINE � I n (F 1 , F 2 ) is a strongly consistent estimator of I ( X → Y ) , i.e., � a.s lim I n (F 1 , F 2 ) = I ( X → Y ) n →∞ Sketch of proof: Represent the solution T ∗ by a dynamic system. Universal approximation of dynamical system with RNNs. Estimation of expectations with empirical means. Ziv Aharoni Capacity via DINE 10 / 18

Implementation � � � n � n 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 Ziv Aharoni Capacity via DINE 11 / 18

Implementation � � � n � n 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 Adjust RNN to process both inputs and carry the state generated by true samples Ziv Aharoni Capacity via DINE 11 / 18

Implementation � � � n � n 1 1 D ( n ) � y i | y i − 1 ) T θ Y ( y i | y i − 1 ) − log e T θ Y ( � = sup Y n n θ Y i =1 i =1 Adjust RNN to process both inputs and carry the state generated by true samples � � S 1 S T S 1 S T 0 S T − 1 ... F F F F � Y 1 Y 1 � Y T Y T Ziv Aharoni Capacity via DINE 11 / 18

Implementation Complete system layout for the calculation of � D ( n ) Y Input Y i S i Dense T θ Y ( Y i | Y i − 1 ) Modified � D Y ( θ Y , D n ) LSTM DV Layer � � Y i S i Dense T θ Y ( � Reference Gen. Y i | Y i − 1 ) Ziv Aharoni Capacity via DINE 12 / 18

NDT Neural Distribution Transformer (NDT) X i Y i � M Channel Decoder M Gradient Y i − 1 ∆ Ziv Aharoni Capacity via DINE 13 / 18

NDT Model M as i.i.d Gaussian noise { N i } i ∈ Z . The NDT a mapping NDT : N i �− w/o feedback: → X i NDT : N i , Y i − 1 �− → X i w/ feedback: Ziv Aharoni Capacity via DINE 14 / 18

NDT Model M as i.i.d Gaussian noise { N i } i ∈ Z . The NDT a mapping NDT : N i �− w/o feedback: → X i NDT : N i , Y i − 1 �− → X i w/ feedback: NDT is modeled by an RNN Y i − 1 X i Channel LSTM Dense Dense Constraint N i Ziv Aharoni Capacity via DINE 14 / 18

Capacity Estimation Iterating between DINE and NDT. Y i − 1 Feedback ∆ X i NDT Channel (RNN) P Y i | X i Y i − 1 Noise N i Gradient Y i DINE Output � I n ( X → Y ) (RNN) Ziv Aharoni Capacity via DINE 15 / 18

Results Channel - MA(1) additive Gaussian noise (AGN): Z i = α U i − 1 + U i Y i = X i + Z i i.i.d. where, U i ∼ N (0 , 1), X i is the channel input sequence bound to the power constraint E [ X 2 i ] ≤ P , and Y i is the channel output. Ziv Aharoni Capacity via DINE 16 / 18

MA(1) AGN Results Estimation performance 1.8 2 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -20 -15 -10 -5 0 5 10 15 -20 -15 -10 -5 0 5 10 15 (a) Feed-forward Capacity (b) Feedback Capacity Ziv Aharoni Capacity via DINE 17 / 18

Conclusion and Future Work Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds Ziv Aharoni Capacity via DINE 18 / 18

Conclusion and Future Work Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits Ziv Aharoni Capacity via DINE 18 / 18

Conclusion and Future Work Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits Thank You! Ziv Aharoni Capacity via DINE 18 / 18

Capacity of Continuous Channels with Memory via Directed Information - PowerPoint PPT Presentation

Capacity of Continuous Channels with Memory via Directed Information Neural Estimator Ziv Aharoni 1 , Dor Tsur 1 , Ziv Goldfeld 2 , Haim H. Permuter 1 1 Ben-Gurion University of the Negev 2 Cornell University International Symposium on Information

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Multiple Input and Output Channels Multiple Input and Output Channels Multiple Input Channels In

Side Channels and Covert Channels Daniel Bosk Department of Information and Communication Systems

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Lecture 2 Capacity of Fading Gaussian Channels Slow fading channels: Ch. 5.4.14 Fast

Lecture 7 Space- Multiplexed Channel Electrical Channels-2 Types of Electrical Channels

Other Communication Channels Select channels of communication that will reach your audiences.

Chapter 6 Marks and Channels Vis/Visual Analytics, Chap 6 Marks/Channels 1 CGGM Lab., CS

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Unlimited Event Channels David Vrabel 24 October 2013 What are Event Channels?

Information Theory Lecture 4 Discrete channels, codes and capacity: CT7 Channels:

Information Theory Lecture 5 Continuous variables and Gaussian channels: CT89

Incidence Relations and Directed Cycles Hao Wu George Washington University Directed graphs and

3.5 Connectivity in Directed Graphs Directed Graphs Directed graph. G = (V, E) Edge (u, v)

6.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

On Static Timing Analysis of GPU Kernels Vesa Hirvisalo Department of Computer Science and

Raising the Reliability of Estimates of Generative Performance of MRFs Yuri Burda, Fields

TSV P2P Efforts From an ISPs Perspec7ve Richard

Counting authorised paths in constrained control-flow graphs Nikola K. Blanchard 1 , Siargey

Information Dynamics and Temporal Structure in Music Samer Abdallah and Mark Plumbley Centre for

Graphical models Why All probabilistic inference and learning amount at repeated applications

Lecture 5: Connections and Differences between Directed Acyclic and Undirected Graphical Models