Chapter 28 Entropy and Shannons Theorem CS 573: Algorithms, Fall - PDF document

Chapter 28 Entropy and Shannon’s Theorem CS 573: Algorithms, Fall 2013 December 10, 2013 28.1 Entropy 28.2 Extracting randomness 28.2.1 Enumerating binary strings with j ones 28.2.1.1 Storing all strings of length n and j bits on (A) S n,j : set of all strings of length n with j ones in them. (B) T n,j : prefix tree storing all T n,j . T 0 , 0 T 1 , 0 T 1 , 1 28.2.1.2 Prefix tree ∀ binary strings of length n with j ones T n,j # of leafs: | T n,j | = | T n − 1 ,j | + | T n − 1 ,j − 1 | 0 1 ( n ) ( n − 1 ) ( n − 1 ) = + j j j − 1 ( n ) = ⇒ | T n,j | = . T n − 1 ,j T n − 1 ,j − 1 j 1

T n,n T n, 0 0 1 T n − 1 , 0 T n − 1 ,n − 1 28.2.1.3 Encoding a string in S n,j (A) T n,j leafs corresponds to strings of S n,j . (B) Order all strings of S n,j order in lexicographical ordering (C) ≡ ordering leafs of T n,j from left to right. T n,j 0 1 T n − 1 ,j T n − 1 ,j − 1 (D) Input: s ∈ S n,j : compute index of s in sorted set S n,j . (E) EncodeBinomCoeff ( s ) denote this polytime procedure. 28.2.1.4 Decoding a string in S n,j (A) T n,j leafs corresponds to strings of S n,j . (B) Order all strings of S n,j order in lexicographical ordering (C) ≡ ordering leafs of T n,j from left to right. T n,j 0 1 T n − 1 ,j T n − 1 ,j − 1 { ( n )} (D) x ∈ 1 , . . . , : compute x th string in S n,j in polytime. j (E) DecodeBinomCoeff ( x ) denote this procedure. 28.2.1.5 Encoding/decoding strings of S n,j Lemma 28.2.1. S n,j : Set of binary strings of length n with j ones, sorted lexicographically. (A) EncodeBinomCoeff ( α ): Input is string α ∈ S n,j , compute index x of α in S n,j in polynomial time in n . { ( n )} (B) DecodeBinomCoeff ( x ): Input index x ∈ 1 , . . . , . Output x th string α in S n,j , in time j O (polylog n + n ) . 2

28.2.2 Extracting randomness 28.2.2.1 Extracting randomness Theorem 28.2.2. Consider a coin that comes up heads with probability p > 1 / 2 . For any constant δ > 0 and for n sufficiently large: (A) One can extract, from an input of a sequence of n flips, an output sequence of (1 − δ ) n H ( p ) (unbiased) independent random bits. (B) One can not extract more than n H ( p ) bits from such a sequence. 28.2.2.2 Proof... ( n ) (A) There are input strings with exactly j heads. j (B) each has probability p j (1 − p ) n − j . { ( n )} (C) map string s like that to index number in the set S j = 1 , . . . , . j (D) Given that input string s has j ones (out of n bits) defines a uniform distribution on S n,j . (E) x ← EncodeBinomCoeff ( s ) ( n ) (F) x uniform distributed in { 1 , . . . , N } , N = . j (G) Seen in previous lecture... (H) ... extract in expectation, ⌊ lg N ⌋ − 1 bits from uniform random variable in the range 1 , . . . , N . (I) Extract bits using ExtractRandomness ( x, N ):. 28.2.2.3 Exciting proof continued... (A) Z : random variable: number of heads in input string s . (B) B : number of random bits extracted. n � [ ] [ ] ∑ B = Pr [ Z = k ] E B � Z = k , E � k =0 ⌊ ( n )⌋ [ � ] (C) Know: E B � Z = k ≥ lg − 1. � k (D) ε < p − 1 / 2: sufficiently small constant. (E) n ( p − ε ) ≤ k ≤ n ( p + ε ): ( n ) ( ) ≥ 2 n H ( p + ε ) n ≥ n + 1 , k ⌊ n ( p + ε ) ⌋ ( n (F) ... since 2 n H ( p ) is a good approximation to ) as proved in previous lecture. np 28.2.2.4 Super exciting proof continued... � [ ] [ ] = ∑ n B k =0 Pr [ Z = k ] E B � Z = k . E � [ ] ≥ ∑ ⌈ n ( p + ε ) ⌉ [ ] [ � ] B k = ⌊ n ( p − ε ) ⌋ Pr Z = k B � Z = k E E � ⌈ n ( p + ε ) ⌉ ] (⌊ ( n )⌋ ) [ ∑ ≥ Pr Z = k lg − 1 k k = ⌊ n ( p − ε ) ⌋ ⌈ n ( p + ε ) ⌉ lg 2 n H ( p + ε ) ]( ) [ ∑ ≥ Pr Z = k − 2 n + 1 k = ⌊ n ( p − ε ) ⌋ ( ) = n H ( p + ε ) − lg( n + 1) − 2 Pr [ | Z − np | ≤ εn ] 3

)( ( − nε 2 )) ( ≥ n H ( p + ε ) − lg( n + 1) − 2 1 − 2 exp , 4 p ( ) 2 ) [ ] ( ( ) − np − nε 2 | Z − np | ≥ ε ε since µ = E [ Z ] = np and Pr p pn ≤ 2 exp = 2 exp , by the Chernoff 4 4 p p inequality. 28.2.2.5 Hyper super exciting proof continued... (A) Fix ε > 0, such that H ( p + ε ) > (1 − δ/ 4) H ( p ), p is fixed. (B) = ⇒ n H ( p ) = Ω( n ), (C) For n sufficiently large: − lg( n + 1) ≥ − δ 10 n H ( p ). ( ) − nε 2 δ (D) ... also 2 exp ≤ 10 . 4 p (E) For n large enough; ( ) ( ) 1 − δ 4 − δ 1 − δ E [ B ] ≥ n H ( p ) 10 10 ≥ (1 − δ ) n H ( p ) , 28.2.2.6 Hyper super duper exciting proof continued... (A) Need to prove upper bound. (B) If input sequence x has probability Pr [ X = x ], then y = Ext ( x ) has probability to be generated ≥ Pr [ X = x ]. (C) All sequences of length | y | have equal probability to be generated (by definition). (D) 2 | Ext ( x ) | Pr [ X = x ] ≤ 2 | Ext ( x ) | Pr [ y = Ext ( x )] ≤ 1. (E) = ⇒ | Ext ( x ) | ≤ lg(1 / Pr [ X = x ]) [ ] [ ] [ ] 1 (F) E B = ∑ x Pr X = x | Ext ( x ) | ≤ ∑ x Pr X = x lg [ X = x ] = H ( X ) . Pr 28.3 Coding: Shannon’s Theorem 28.3.0.7 Shannon’s Theorem Definition 28.3.1. The input to a binary symmetric channel with parameter p is a sequence of bits x 1 , x 2 , . . . , and the output is a sequence of bits y 1 , y 2 , . . . , such that Pr [ x i = y i ] = 1 − p independently for each i . 28.3.0.8 Encoding/decoding with noise Definition 28.3.2. A ( k, n ) encoding function Enc : { 0 , 1 } k → { 0 , 1 } n takes as input a sequence of k bits and outputs a sequence of n bits. A ( k, n ) decoding function Dec : { 0 , 1 } n → { 0 , 1 } k takes as input a sequence of n bits and outputs a sequence of k bits. 28.3.0.9 Claude Elwood Shannon Claude Elwood Shannon (April 30, 1916 - February 24, 2001), an American electrical engineer and mathematician, has been called “the father of information theory”. His master thesis was how to building boolean circuits for any boolean function. 4

28.3.0.10 Shannon’s theorem (1948) Theorem 28.3.3 (Shannon’s theorem). For a binary symmetric channel with parameter p < 1 / 2 and for any constants δ, γ > 0 , where n is sufficiently large, the following holds: (i) For an k ≤ n (1 − H ( p ) − δ ) there exists ( k, n ) encoding and decoding functions such that the probability the receiver fails to obtain the correct message is at most γ for every possible k -bit input messages. (ii) There are no ( k, n ) encoding and decoding functions with k ≥ n (1 − H ( p ) + δ ) such that the probability of decoding correctly is at least γ for a k -bit input message chosen uniformly at random. 28.3.0.11 Intuition behind Shanon’s theorem 28.3.0.12 When the sender sends a string... S = s 1 s 2 . . . s n S S np S np S np (1 − δ ) np (1 + δ ) np One ring to rule them all! 28.3.0.13 Some intuition... (A) senders sent string S = s 1 s 2 . . . s n . (B) receiver got string T = t 1 t 2 . . . t n . (C) p = Pr [ t i ̸ = s i ], for all i . [ ] (D) U : Hamming distance between S and T : U = ∑ s i ̸ = t i . i (E) By assumption: E [ U ] = pn , and U is a binomial variable. [ ] (F) By Chernoff inequality: U ∈ (1 − δ ) np, (1 + δ ) np with high probability, where δ is tiny constant. (G) T is in a ring R centered at S , with inner radius (1 − δ ) np and outer radius (1 + δ ) np . (H) This ring has (1+ δ ) np ( n ) ( ) n ≤ α = 2 · 2 n H ((1+ δ ) p ) . ∑ ≤ 2 i (1 + δ ) np i =(1 − δ ) np strings in it. 5

28.3.0.14 Many rings for many codewords... 28.3.0.15 Some more intuition... (A) Pick as many disjoint rings as possible: R 1 , . . . , R κ . (B) If every word in the hypercube would be covered... (C) ... use 2 n codewords = ⇒ κ ≥ κ ≥ 2 n 2 n 2 · 2 n H ((1+ δ ) p ) ≈ 2 n (1 − H ((1+ δ ) p )) . | R | ≥ (D) Consider all possible strings of length k such that 2 k ≤ κ . (E) Map i th string in { 0 , 1 } k to the center C i of the i th ring R i . (F) If send C i = ⇒ receiver gets a string in R i . (G) Decoding is easy - find the ring R i containing the received string, take its center string C i , and output the original string it was mapped to. ( ( )) (H) How many bits? k = ⌊ log κ ⌋ = n 1 − H (1 + δ ) p ≈ n (1 − H ( p )), 28.3.0.16 What is wrong with the above? 28.3.0.17 What is wrong with the above? (A) Can not find such a large set of disjoint rings. (B) Reason is that when you pack rings (or balls) you are going to have wasted spaces around. (C) Overcome this: allow rings to overlap somewhat. (D) Makes things considerably more involved. (E) Details in class notes. 6

Chapter 28 Entropy and Shannons Theorem CS 573: Algorithms, Fall - PDF document

Chapter 28 Entropy and Shannons Theorem CS 573: Algorithms, Fall 2013 December 10, 2013 28.1 Entropy 28.2 Extracting randomness 28.2.1 Enumerating binary strings with j ones 28.2.1.1 Storing all strings of length n and j bits on (A) S

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Entropy and Shannon information Entropy and Shannon information For a random variable X with

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

Mid Shannon Wilderness Park The potential future of the Longford bogs Mid Shannon Potential 22

ARE THE SHANNON ENTROPY AND RESIDUAL ENTROPY SYNONYMS? H = R 0 ? MARKO POPOVIC DEPARTMENT OF

31. Stokes Theorem Stokes theorem is to Greens theorem, for the work done, as the

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Welding Torch Problem Nyquist-Shannon Sampling Theorem Welding Torch Problem Model Solution

Welding Torch Problem Nyquist-Shannon Sampling Theorem Welding Torch Problem Model Solution

Quantum Lecture 6 Shannon information Quantum information Distance measures Mikael

4200:225 Equilibrium Thermodynamics Unit I. Earth, Air, Fire, and Water Chapter 3. The Entropy

Chapter 4 Entropy Rates of a Stochastic Process Peng-Hua Wang Graduate Inst. of Comm.

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Analysis of multistate data with thenetsurvival orcause specific survival :

Welcome to the Membership Meeting! 1:30 - 3:00pm @JerseyWaterWrks #JWWMembership2020 Zoom

Sequence to Sequence Models for Machine Translation CMSC 723 / LING 723 / INST 725 Marine

Asynchronous random sampling for decentralized detection Georgios Fellouris , Columbia

Edge colouring multigraphs Penny Haxell University of Waterloo Hal Kierstead Arizona State

Reconciling Fitts Law with Shannons Information Theory EMPG 2015 University of Padua,

Comparison of Noisy Channels and Reverse Data-Processing Theorems Francesco Buscemi 1 2017 IEEE

- Stuart Boersma, Central Washington Univ. - Caren Diefenderfer, Hollins University - Shannon