Kolmogorov Complexity and Other Entropy Measures Iftach Haitner - PowerPoint PPT Presentation

Kolmogorov complexity ◮ For s string x ∈ { 0 , 1 } ∗ , let K ( x ) be the length of the shortest C ++ program (written in binary) that outputs x (on empty input) ◮ Now the term “described” is well defined. ◮ Why C ++ ? ◮ All (complete) programming language/computational model are essentially equivalent. ◮ Let K ′ ( x ) be the description length of x in another complete language, then | K ( x ) − k ′ ( x ) | ≤ const . ◮ What is K ( x ) for x = 0101010101 . . . 01 � �� n pairs ◮ “For i = 1 : i ++ : n ; print 01” ◮ K ( x ) ≤ log n + const ◮ This is considered to be small complexity. We typically ignore log n factors. ◮ What is K ( x ) for x being the first n digits of π ? Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

Kolmogorov complexity ◮ For s string x ∈ { 0 , 1 } ∗ , let K ( x ) be the length of the shortest C ++ program (written in binary) that outputs x (on empty input) ◮ Now the term “described” is well defined. ◮ Why C ++ ? ◮ All (complete) programming language/computational model are essentially equivalent. ◮ Let K ′ ( x ) be the description length of x in another complete language, then | K ( x ) − k ′ ( x ) | ≤ const . ◮ What is K ( x ) for x = 0101010101 . . . 01 � �� n pairs ◮ “For i = 1 : i ++ : n ; print 01” ◮ K ( x ) ≤ log n + const ◮ This is considered to be small complexity. We typically ignore log n factors. ◮ What is K ( x ) for x being the first n digits of π ? ◮ K ( x ) = log n + const Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

More examples Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

More examples ◮ What is K ( x ) for x ∈ { 0 , 1 } n with k ones? Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

More examples ◮ What is K ( x ) for x ∈ { 0 , 1 } n with k ones? � n � ◮ Recall that ≤ 2 nh ( k / n ) k Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

More examples ◮ What is K ( x ) for x ∈ { 0 , 1 } n with k ones? � n � ◮ Recall that ≤ 2 nh ( k / n ) k ◮ Hence K ( x ) ≤ log n + nh ( k / n ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

Bounds Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

Bounds ◮ K ( x ) ≤ | x | + const Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

Bounds ◮ K ( x ) ≤ | x | + const ◮ Proof : “output x ” Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

Bounds ◮ K ( x ) ≤ | x | + const ◮ Proof : “output x ” ◮ Most sequences have high Kolmogorov complexity: Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

Bounds ◮ K ( x ) ≤ | x | + const ◮ Proof : “output x ” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2 n − 1 ( C ++ ) programs of length ≤ n − 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

Bounds ◮ K ( x ) ≤ | x | + const ◮ Proof : “output x ” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2 n − 1 ( C ++ ) programs of length ≤ n − 2 ◮ 2 n strings of length n Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

Bounds ◮ K ( x ) ≤ | x | + const ◮ Proof : “output x ” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2 n − 1 ( C ++ ) programs of length ≤ n − 2 ◮ 2 n strings of length n ◮ Hence, at least 1 2 of n -bit strings have Kolmogorov complexity at least n − 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

Bounds ◮ K ( x ) ≤ | x | + const ◮ Proof : “output x ” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2 n − 1 ( C ++ ) programs of length ≤ n − 2 ◮ 2 n strings of length n ◮ Hence, at least 1 2 of n -bit strings have Kolmogorov complexity at least n − 1 ◮ In particular, a random sequence has Kolmogorov complexity ≈ n Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

Conditional Kolmogorov complexity Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24

Conditional Kolmogorov complexity ◮ K ( x | y ) — Kolmogorov complexity of x given y . The length of the shortest partogram that outputd x on input y Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24

Conditional Kolmogorov complexity ◮ K ( x | y ) — Kolmogorov complexity of x given y . The length of the shortest partogram that outputd x on input y ◮ Chain rule K ( x , y ) ≈ k ( y ) + k ( x | y ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24

H vs. K Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

H vs. K H ( X ) speaks about a random variable X and K ( x ) of a string x , but Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

H vs. K H ( X ) speaks about a random variable X and K ( x ) of a string x , but ◮ Both quantities measure the amount of uncertainty or randomness in an object Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

H vs. K H ( X ) speaks about a random variable X and K ( x ) of a string x , but ◮ Both quantities measure the amount of uncertainty or randomness in an object ◮ Both measure the number of bits it takes to describe an object Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

H vs. K H ( X ) speaks about a random variable X and K ( x ) of a string x , but ◮ Both quantities measure the amount of uncertainty or randomness in an object ◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X 1 , . . . , X n be iid, then whp K ( X 1 , . . . , X n ) ≈ H ( X 1 , . . . , X n ) = nH ( X 1 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

H vs. K H ( X ) speaks about a random variable X and K ( x ) of a string x , but ◮ Both quantities measure the amount of uncertainty or randomness in an object ◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X 1 , . . . , X n be iid, then whp K ( X 1 , . . . , X n ) ≈ H ( X 1 , . . . , X n ) = nH ( X 1 ) ◮ Proof : ? Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

H vs. K H ( X ) speaks about a random variable X and K ( x ) of a string x , but ◮ Both quantities measure the amount of uncertainty or randomness in an object ◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X 1 , . . . , X n be iid, then whp K ( X 1 , . . . , X n ) ≈ H ( X 1 , . . . , X n ) = nH ( X 1 ) ◮ Proof : ? AEP Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

H vs. K H ( X ) speaks about a random variable X and K ( x ) of a string x , but ◮ Both quantities measure the amount of uncertainty or randomness in an object ◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X 1 , . . . , X n be iid, then whp K ( X 1 , . . . , X n ) ≈ H ( X 1 , . . . , X n ) = nH ( X 1 ) ◮ Proof : ? AEP ◮ Example: coin flip ( 0 . 7 , 0 . 3 ) then whp we get a string with K ( x ) ≈ n · h ( 0 . 3 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

Universal compression Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

Universal compression ◮ A program of length K ( x ) that outputs x , compresses x into k ( x ) bit of information. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

Universal compression ◮ A program of length K ( x ) that outputs x , compresses x into k ( x ) bit of information. ◮ Example: length of the human genome: 6 · 10 9 bits Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

Universal compression ◮ A program of length K ( x ) that outputs x , compresses x into k ( x ) bit of information. ◮ Example: length of the human genome: 6 · 10 9 bits ◮ But the code is redundant Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

Universal compression ◮ A program of length K ( x ) that outputs x , compresses x into k ( x ) bit of information. ◮ Example: length of the human genome: 6 · 10 9 bits ◮ But the code is redundant ◮ The relevant number to measure the number of possible values is the Kolmogorov complexity of the code. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

Universal compression ◮ A program of length K ( x ) that outputs x , compresses x into k ( x ) bit of information. ◮ Example: length of the human genome: 6 · 10 9 bits ◮ But the code is redundant ◮ The relevant number to measure the number of possible values is the Kolmogorov complexity of the code. ◮ No-one knows its value... Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

Universal probability Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Universal probability K ( x ) = min p : p ()= x | p | , where p () is the output of C ++ program defined by p . Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Universal probability K ( x ) = min p : p ()= x | p | , where p () is the output of C ++ program defined by p . Definition 1 The universal probability of a string x is P U ( x ) = � p : p ()= x 2 −| p | = Pr p ←{ 0 , 1 } ∞ [ p () = x ] Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Universal probability K ( x ) = min p : p ()= x | p | , where p () is the output of C ++ program defined by p . Definition 1 The universal probability of a string x is P U ( x ) = � p : p ()= x 2 −| p | = Pr p ←{ 0 , 1 } ∞ [ p () = x ] ◮ Namely, the probability that if one picks a program at random, it prints x . Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Universal probability K ( x ) = min p : p ()= x | p | , where p () is the output of C ++ program defined by p . Definition 1 The universal probability of a string x is P U ( x ) = � p : p ()= x 2 −| p | = Pr p ←{ 0 , 1 } ∞ [ p () = x ] ◮ Namely, the probability that if one picks a program at random, it prints x . ◮ Insensitive (up o constant factor) to the computation model. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Universal probability K ( x ) = min p : p ()= x | p | , where p () is the output of C ++ program defined by p . Definition 1 The universal probability of a string x is P U ( x ) = � p : p ()= x 2 −| p | = Pr p ←{ 0 , 1 } ∞ [ p () = x ] ◮ Namely, the probability that if one picks a program at random, it prints x . ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: P U ( x ) is the the probability that you observe x in nature. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Universal probability K ( x ) = min p : p ()= x | p | , where p () is the output of C ++ program defined by p . Definition 1 The universal probability of a string x is P U ( x ) = � p : p ()= x 2 −| p | = Pr p ←{ 0 , 1 } ∞ [ p () = x ] ◮ Namely, the probability that if one picks a program at random, it prints x . ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: P U ( x ) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Universal probability K ( x ) = min p : p ()= x | p | , where p () is the output of C ++ program defined by p . Definition 1 The universal probability of a string x is P U ( x ) = � p : p ()= x 2 −| p | = Pr p ←{ 0 , 1 } ∞ [ p () = x ] ◮ Namely, the probability that if one picks a program at random, it prints x . ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: P U ( x ) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier Theorem 2 ∃ c > 0 such that 2 − K ( x ) ≤ P U ( x ) ≤ c · 2 − K ( x ) for every x ∈ { 0 , 1 } ∗ . Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Universal probability K ( x ) = min p : p ()= x | p | , where p () is the output of C ++ program defined by p . Definition 1 The universal probability of a string x is P U ( x ) = � p : p ()= x 2 −| p | = Pr p ←{ 0 , 1 } ∞ [ p () = x ] ◮ Namely, the probability that if one picks a program at random, it prints x . ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: P U ( x ) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier Theorem 2 ∃ c > 0 such that 2 − K ( x ) ≤ P U ( x ) ≤ c · 2 − K ( x ) for every x ∈ { 0 , 1 } ∗ . ◮ The interesting part is P U ( x ) ≤ c · 2 − K ( x ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Universal probability K ( x ) = min p : p ()= x | p | , where p () is the output of C ++ program defined by p . Definition 1 The universal probability of a string x is P U ( x ) = � p : p ()= x 2 −| p | = Pr p ←{ 0 , 1 } ∞ [ p () = x ] ◮ Namely, the probability that if one picks a program at random, it prints x . ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: P U ( x ) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier Theorem 2 ∃ c > 0 such that 2 − K ( x ) ≤ P U ( x ) ≤ c · 2 − K ( x ) for every x ∈ { 0 , 1 } ∗ . ◮ The interesting part is P U ( x ) ≤ c · 2 − K ( x ) � � � ≤ c ◮ Hence, for X ∼ P U , it holds that � E K ( X ) [ − ] H ( X ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

Proving Theorem 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

Proving Theorem 2 ◮ We need to find c > 0 such that k ( x ) ≤ log P U ( x ) + c for every x ∈ { 0 , 1 } ∗ 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

Proving Theorem 2 ◮ We need to find c > 0 such that k ( x ) ≤ log P U ( x ) + c for every x ∈ { 0 , 1 } ∗ 1 1 ◮ In other words, find a program to output x whose length is log P U ( x ) + c Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

Proving Theorem 2 ◮ We need to find c > 0 such that k ( x ) ≤ log P U ( x ) + c for every x ∈ { 0 , 1 } ∗ 1 1 ◮ In other words, find a program to output x whose length is log P U ( x ) + c ◮ Idea, program chooses a leaf on the Shannon code for P U (in which x is � � 1 of depth log ) P U ( x ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

Proving Theorem 2 ◮ We need to find c > 0 such that k ( x ) ≤ log P U ( x ) + c for every x ∈ { 0 , 1 } ∗ 1 1 ◮ In other words, find a program to output x whose length is log P U ( x ) + c ◮ Idea, program chooses a leaf on the Shannon code for P U (in which x is � � 1 of depth log ) P U ( x ) ◮ Problem: P U is not computable Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

Proving Theorem 2 ◮ We need to find c > 0 such that k ( x ) ≤ log P U ( x ) + c for every x ∈ { 0 , 1 } ∗ 1 1 ◮ In other words, find a program to output x whose length is log P U ( x ) + c ◮ Idea, program chooses a leaf on the Shannon code for P U (in which x is � � 1 of depth log ) P U ( x ) ◮ Problem: P U is not computable ◮ Solution: compute a better and better estimate for the tree of P U along with the “mapping" from the tree nodes back to codewords. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

Proving Theorem 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Proving Theorem 2 ◮ Initial T to be the infinite Binary tree. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Proving Theorem 2 ◮ Initial T to be the infinite Binary tree. Program 3 ( M ) Enumerate over all programs in { 0 , 1 } ∗ : at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and ( ∗ , x , n ( x )) / ∈ T , place ( p , x , n ( x )) at unused n ( x ) -depth node of � � P U ( x ) = � p ′ : emulated p ′ has output x 2 − | p ′ | + 1 and ˆ 1 T , for n ( x ) = log ˆ P U ( x ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Proving Theorem 2 ◮ Initial T to be the infinite Binary tree. Program 3 ( M ) Enumerate over all programs in { 0 , 1 } ∗ : at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and ( ∗ , x , n ( x )) / ∈ T , place ( p , x , n ( x )) at unused n ( x ) -depth node of � � P U ( x ) = � p ′ : emulated p ′ has output x 2 − | p ′ | + 1 and ˆ 1 T , for n ( x ) = log ˆ P U ( x ) ◮ The program never gets stack (can always add the node). Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Proving Theorem 2 ◮ Initial T to be the infinite Binary tree. Program 3 ( M ) Enumerate over all programs in { 0 , 1 } ∗ : at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and ( ∗ , x , n ( x )) / ∈ T , place ( p , x , n ( x )) at unused n ( x ) -depth node of � � P U ( x ) = � p ′ : emulated p ′ has output x 2 − | p ′ | + 1 and ˆ 1 T , for n ( x ) = log ˆ P U ( x ) ◮ The program never gets stack (can always add the node). Proof : Let x ∈ { 0 , 1 } ∗ . At each point through the execution of M, � ( p , x , · ) ∈ T 2 −| p | ≤ 2 − K ( x ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Proving Theorem 2 ◮ Initial T to be the infinite Binary tree. Program 3 ( M ) Enumerate over all programs in { 0 , 1 } ∗ : at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and ( ∗ , x , n ( x )) / ∈ T , place ( p , x , n ( x )) at unused n ( x ) -depth node of � � P U ( x ) = � p ′ : emulated p ′ has output x 2 − | p ′ | + 1 and ˆ 1 T , for n ( x ) = log ˆ P U ( x ) ◮ The program never gets stack (can always add the node). Proof : Let x ∈ { 0 , 1 } ∗ . At each point through the execution of M, � ( p , x , · ) ∈ T 2 −| p | ≤ 2 − K ( x ) Since � x 2 − K ( x ) ≤ 1, the proof follows by Kraft inequality. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Proving Theorem 2 ◮ Initial T to be the infinite Binary tree. Program 3 ( M ) Enumerate over all programs in { 0 , 1 } ∗ : at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and ( ∗ , x , n ( x )) / ∈ T , place ( p , x , n ( x )) at unused n ( x ) -depth node of � � P U ( x ) = � p ′ : emulated p ′ has output x 2 − | p ′ | + 1 and ˆ 1 T , for n ( x ) = log ˆ P U ( x ) ◮ The program never gets stack (can always add the node). Proof : Let x ∈ { 0 , 1 } ∗ . At each point through the execution of M, � ( p , x , · ) ∈ T 2 −| p | ≤ 2 − K ( x ) Since � x 2 − K ( x ) ≤ 1, the proof follows by Kraft inequality. � � ◮ ∀ x ∈ { 0 , 1 } ∗ : M adds a node ( · , x , · ) to T at depth 2 + 1 log P U ( x ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Proving Theorem 2 ◮ Initial T to be the infinite Binary tree. Program 3 ( M ) Enumerate over all programs in { 0 , 1 } ∗ : at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and ( ∗ , x , n ( x )) / ∈ T , place ( p , x , n ( x )) at unused n ( x ) -depth node of � � P U ( x ) = � p ′ : emulated p ′ has output x 2 − | p ′ | + 1 and ˆ 1 T , for n ( x ) = log ˆ P U ( x ) ◮ The program never gets stack (can always add the node). Proof : Let x ∈ { 0 , 1 } ∗ . At each point through the execution of M, � ( p , x , · ) ∈ T 2 −| p | ≤ 2 − K ( x ) Since � x 2 − K ( x ) ≤ 1, the proof follows by Kraft inequality. � � ◮ ∀ x ∈ { 0 , 1 } ∗ : M adds a node ( · , x , · ) to T at depth 2 + 1 log P U ( x ) Proof : ˆ P U ( x ) converges to P U ( x ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Proving Theorem 2 ◮ Initial T to be the infinite Binary tree. Program 3 ( M ) Enumerate over all programs in { 0 , 1 } ∗ : at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and ( ∗ , x , n ( x )) / ∈ T , place ( p , x , n ( x )) at unused n ( x ) -depth node of � � P U ( x ) = � p ′ : emulated p ′ has output x 2 − | p ′ | + 1 and ˆ 1 T , for n ( x ) = log ˆ P U ( x ) ◮ The program never gets stack (can always add the node). Proof : Let x ∈ { 0 , 1 } ∗ . At each point through the execution of M, � ( p , x , · ) ∈ T 2 −| p | ≤ 2 − K ( x ) Since � x 2 − K ( x ) ≤ 1, the proof follows by Kraft inequality. � � ◮ ∀ x ∈ { 0 , 1 } ∗ : M adds a node ( · , x , · ) to T at depth 2 + 1 log P U ( x ) Proof : ˆ P U ( x ) converges to P U ( x ) � � 1 ◮ For x ∈ { 0 , 1 } ∗ , let ℓ ( x ) be the location its ( 2 + log ) -depth node P U ( x ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Proving Theorem 2 ◮ Initial T to be the infinite Binary tree. Program 3 ( M ) Enumerate over all programs in { 0 , 1 } ∗ : at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and ( ∗ , x , n ( x )) / ∈ T , place ( p , x , n ( x )) at unused n ( x ) -depth node of � � P U ( x ) = � p ′ : emulated p ′ has output x 2 − | p ′ | + 1 and ˆ 1 T , for n ( x ) = log ˆ P U ( x ) ◮ The program never gets stack (can always add the node). Proof : Let x ∈ { 0 , 1 } ∗ . At each point through the execution of M, � ( p , x , · ) ∈ T 2 −| p | ≤ 2 − K ( x ) Since � x 2 − K ( x ) ≤ 1, the proof follows by Kraft inequality. � � ◮ ∀ x ∈ { 0 , 1 } ∗ : M adds a node ( · , x , · ) to T at depth 2 + 1 log P U ( x ) Proof : ˆ P U ( x ) converges to P U ( x ) � � 1 ◮ For x ∈ { 0 , 1 } ∗ , let ℓ ( x ) be the location its ( 2 + log ) -depth node P U ( x ) ◮ Program for printing x . Run M till it assigns the node at the location of ℓ ( x ) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

Applications Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

Applications ◮ (another) Proof that there are infinity many primes. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

Applications ◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p 1 , . . . , p m Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

Applications ◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p 1 , . . . , p m ◮ Any length n integer x can be written as x = � m i = 1 p d i i Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

Applications ◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p 1 , . . . , p m ◮ Any length n integer x can be written as x = � m i = 1 p d i i ◮ d i ≤ n , hence length d i ≤ log n Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

Applications ◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p 1 , . . . , p m ◮ Any length n integer x can be written as x = � m i = 1 p d i i ◮ d i ≤ n , hence length d i ≤ log n ◮ Hence, K ( x ) ≤ m · log n + const Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

Applications ◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p 1 , . . . , p m ◮ Any length n integer x can be written as x = � m i = 1 p d i i ◮ d i ≤ n , hence length d i ≤ log n ◮ Hence, K ( x ) ≤ m · log n + const ◮ But for most numbers k ( x ) ≥ n − 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

Computability of K Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? ◮ Answer, No. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? ◮ Answer, No. ◮ Proof : Assume K is computable by a program of length C Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? ◮ Answer, No. ◮ Proof : Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K ( s ) > 2 C + 10 , 000 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? ◮ Answer, No. ◮ Proof : Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K ( s ) > 2 C + 10 , 000 ◮ s can be computed by the following program: Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? ◮ Answer, No. ◮ Proof : Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K ( s ) > 2 C + 10 , 000 ◮ s can be computed by the following program: 1. x = 0 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? ◮ Answer, No. ◮ Proof : Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K ( s ) > 2 C + 10 , 000 ◮ s can be computed by the following program: 1. x = 0 2. While ( K ( x ) < 2 C + 10 , 000 ) : x ++ Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? ◮ Answer, No. ◮ Proof : Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K ( s ) > 2 C + 10 , 000 ◮ s can be computed by the following program: 1. x = 0 2. While ( K ( x ) < 2 C + 10 , 000 ) : x ++ 3. Output x Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? ◮ Answer, No. ◮ Proof : Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K ( s ) > 2 C + 10 , 000 ◮ s can be computed by the following program: 1. x = 0 2. While ( K ( x ) < 2 C + 10 , 000 ) : x ++ 3. Output x ◮ Thus K ( s ) < C + log C + log 10 , 000 + const < 2 C + 10 , 000 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Computability of K ◮ Can we compute K ( x ) ? ◮ Answer, No. ◮ Proof : Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K ( s ) > 2 C + 10 , 000 ◮ s can be computed by the following program: 1. x = 0 2. While ( K ( x ) < 2 C + 10 , 000 ) : x ++ 3. Output x ◮ Thus K ( s ) < C + log C + log 10 , 000 + const < 2 C + 10 , 000 ◮ Bergg’s Paradox, revisited: Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

Kolmogorov Complexity and Other Entropy Measures Iftach Haitner - PowerPoint PPT Presentation

Application of Information Theory, Lecture 8 Kolmogorov Complexity and Other Entropy Measures Iftach Haitner Tel Aviv University. December 16, 2014 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 1 / 24

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Kolmogorov-Loveland stochasticity and Kolmogorov complexity Laurent Bienvenu Laboratoire

CISC 876: Kolmogorov Complexity Neil Conway March 27, 2007 Neil Conway CISC 876: Kolmogorov

Kolmogorov complexity of 2D sequences Bruno Durand Laboratoire dInformatique Fondamentale de

On the Kolmogorov Complexity of Continuous Real Functions Amin Farjudian Division of Computer

preliminaries: Kolmogorov complexity U(p) = T i (p) K(x) = min p { p : U(p) = x }

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Polynomial time algorithms in Kolmogorov complexity theory Marius Zimand Towson University CCR

E N T R O P Y S T R U C T U R E Entropy versus resolution Tomasz Downarowicz MOTIVATION

Model Uncertainty and Robustness: Entropy Coherent and Entropy Convex Measures of Risk Roger J.

CDM Program Size Complexity Klaus Sutner Carnegie Mellon University kolmogorov 2018/2/8 22:58

MA/CSSE 474 Theory of Computation FSM: Whats it all about? Instructor/Course Intro Tomorrow

Clicker question 0 Which of the following statements is not true? A. L 1 ( L 2 L 3 ) = ( L

Automata Theory Why Study Automata? What the Course is About 1 Why Study Automata? A survey of

Simulating EXPSPACE Turing machines using P systems with active membranes Artiom Alhazov 1 , 2

Tennis Exercise Followup Tamara Munzner Department of Computer Science University of British

Lecture 7 More Remotes and Working with Github Sign in on the attendance sheet! Today

Developers' Contribution to Structural Complexity in Free Software Projects Antonio Terceiro

Azure Services Every .NET Developer Needs to Know Chad Green January 17, 2019 Chad Green Data