Communications essentials Communications and Redundancy Benefits of redundancy Cross-word puzzles Understand foreigners with imperfect pronounciation. How much would you understand of a lecture without redundancy? Hear in a noisy environment. Read bad hand writing How could I mark exam scripts without redundancy? Cryptanalysis? Steganalysis? Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 8 / 56
Communications essentials Communications and Redundancy Benefits of redundancy Cross-word puzzles Understand foreigners with imperfect pronounciation. How much would you understand of a lecture without redundancy? Hear in a noisy environment. Read bad hand writing How could I mark exam scripts without redundancy? Cryptanalysis? Steganalysis? Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 8 / 56
Communications essentials Communications and Redundancy What if there were no redundancy? No use for steganography! Any text would be meaningful, in particular, ciphertext would be meaningful Simple encryption would give a stegogramme indistinguishable from cover-text. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 9 / 56
Communications essentials Communications and Redundancy What if there were no redundancy? No use for steganography! Any text would be meaningful, in particular, ciphertext would be meaningful Simple encryption would give a stegogramme indistinguishable from cover-text. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 9 / 56
Communications essentials Communications and Redundancy What if there were no redundancy? No use for steganography! Any text would be meaningful, in particular, ciphertext would be meaningful Simple encryption would give a stegogramme indistinguishable from cover-text. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 9 / 56
Communications essentials Anderson and Petitcolas 1999 Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 10 / 56
Communications essentials Anderson and Petitcolas 1999 Perfect compression Compression removes redundancy Minimises average string length (file size) Retaining information contents Decompression replaces the redundancy Recover original (loss-less compression) Perfect means no redundancy in compressed string Consequently all strings are used A(ny) random string can be decompressed ... and yield sensible output Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 11 / 56
Communications essentials Anderson and Petitcolas 1999 Perfect compression Compression removes redundancy Minimises average string length (file size) Retaining information contents Decompression replaces the redundancy Recover original (loss-less compression) Perfect means no redundancy in compressed string Consequently all strings are used A(ny) random string can be decompressed ... and yield sensible output Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 11 / 56
� � � � � � � Communications essentials Anderson and Petitcolas 1999 Steganography by Perfect Compression Anderson and Petitcolas 1998 A perfect compression scheme. A secure cipher. Message Message Encryption Key Decrypt adf!haj dgh a adf!haj dgh a Once upon a Decompress Compress time there was a red herring. . . Steganography without data hiding. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 12 / 56
� � � � � � � Communications essentials Anderson and Petitcolas 1999 Steganography by Perfect Compression Anderson and Petitcolas 1998 A perfect compression scheme. A secure cipher. Message Message Encryption Key Decrypt adf!haj dgh a adf!haj dgh a Once upon a Decompress Compress time there was a red herring. . . Steganography without data hiding. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 12 / 56
Communications essentials Digital Communications Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 13 / 56
Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 14 / 56
Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 14 / 56
Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 14 / 56
Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 14 / 56
Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 14 / 56
Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 14 / 56
Communications essentials Digital Communications Problems in natural language How efficient is the redundancy Natural languages are arbitrary Some words/sentences have a lot of redundancy Others have very little Unstructured: hard to automate correction Structured redundancy is necessary for digital comms Coding Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 14 / 56
Communications essentials Digital Communications Coding Channel and source coding Source coding (aka. compression) Remove redundancy Make a compact representation Channel coding (aka. error-control coding) Add mathematically structured redundancy Computationally efficient error-correction Optimised (low error-rate, small space) Two aspect of Information Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 15 / 56
Communications essentials Digital Communications Coding Channel and source coding Source coding (aka. compression) Remove redundancy Make a compact representation Channel coding (aka. error-control coding) Add mathematically structured redundancy Computationally efficient error-correction Optimised (low error-rate, small space) Two aspect of Information Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 15 / 56
Communications essentials Digital Communications Coding Channel and source coding Source coding (aka. compression) Remove redundancy Make a compact representation Channel coding (aka. error-control coding) Add mathematically structured redundancy Computationally efficient error-correction Optimised (low error-rate, small space) Two aspect of Information Theory Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 15 / 56
� � � � � Communications essentials Digital Communications Channel and Source Coding Message r Comp. Decom. Channel Encode Decode Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 16 / 56
� � � � � Communications essentials Digital Communications Channel and Source Coding Message r Comp. Decom. Remove redundancy Channel Encode Decode Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 16 / 56
� � � � � Communications essentials Digital Communications Channel and Source Coding Message r Comp. Decom. Remove redundancy Channel Encode Decode Add redundancy Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 16 / 56
� � � � � � � Communications essentials Digital Communications Channel and Source Coding Message r Comp. Decom. Remove redundancy Encrypt. Decrypt. Scramble Channel Encode Decode Add redundancy Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 16 / 56
Communications essentials Shannon Entropy Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 17 / 56
Communications essentials Shannon Entropy Uncertainty Shannon Entropy m and r are stochastic variables (drawn at random from a distribution) How much uncertainty about the message m ? Uncertainty measured by entropy H ( m ) before any message is received. H ( m | r ) after receipt of the message Conditional entropy Mutual Information is derived from entropy I ( m ; r ) = H ( m ) − H ( m | r ) I ( m ; r ) is the amount of information contained in r about m . I ( m ; r ) = I ( r ; m ) Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 18 / 56
Communications essentials Shannon Entropy Uncertainty Shannon Entropy m and r are stochastic variables (drawn at random from a distribution) How much uncertainty about the message m ? Uncertainty measured by entropy H ( m ) before any message is received. H ( m | r ) after receipt of the message Conditional entropy Mutual Information is derived from entropy I ( m ; r ) = H ( m ) − H ( m | r ) I ( m ; r ) is the amount of information contained in r about m . I ( m ; r ) = I ( r ; m ) Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 18 / 56
Communications essentials Shannon Entropy Uncertainty Shannon Entropy m and r are stochastic variables (drawn at random from a distribution) How much uncertainty about the message m ? Uncertainty measured by entropy H ( m ) before any message is received. H ( m | r ) after receipt of the message Conditional entropy Mutual Information is derived from entropy I ( m ; r ) = H ( m ) − H ( m | r ) I ( m ; r ) is the amount of information contained in r about m . I ( m ; r ) = I ( r ; m ) Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 18 / 56
Communications essentials Shannon Entropy Shannon entropy Definition Random variable X ∈ X � H q ( X ) = − Pr ( X = x ) log q Pr ( X = x ) x ∈X Usually q = 2, giving entropy in bits q = e (natural logarithm) gives entropy in nats If Pr ( X = x i ) = p i for x 1 , x 2 , . . . ∈ X , we write H ( X ) = h ( p 1 , p 2 , . . . ) Example: One question Q ; Yes/No is 50-50 probability � � 1 2 log 1 H ( Q ) = − 2 = 1 2 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 19 / 56
Communications essentials Shannon Entropy Shannon entropy Definition Random variable X ∈ X � H q ( X ) = − Pr ( X = x ) log q Pr ( X = x ) x ∈X Usually q = 2, giving entropy in bits q = e (natural logarithm) gives entropy in nats If Pr ( X = x i ) = p i for x 1 , x 2 , . . . ∈ X , we write H ( X ) = h ( p 1 , p 2 , . . . ) Example: One question Q ; Yes/No is 50-50 probability � � 1 2 log 1 H ( Q ) = − 2 = 1 2 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 19 / 56
Communications essentials Shannon Entropy Shannon entropy Definition Random variable X ∈ X � H q ( X ) = − Pr ( X = x ) log q Pr ( X = x ) x ∈X Usually q = 2, giving entropy in bits q = e (natural logarithm) gives entropy in nats If Pr ( X = x i ) = p i for x 1 , x 2 , . . . ∈ X , we write H ( X ) = h ( p 1 , p 2 , . . . ) Example: One question Q ; Yes/No is 50-50 probability � � 1 2 log 1 H ( Q ) = − 2 = 1 2 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 19 / 56
Communications essentials Shannon Entropy Shannon entropy Definition Random variable X ∈ X � H q ( X ) = − Pr ( X = x ) log q Pr ( X = x ) x ∈X Usually q = 2, giving entropy in bits q = e (natural logarithm) gives entropy in nats If Pr ( X = x i ) = p i for x 1 , x 2 , . . . ∈ X , we write H ( X ) = h ( p 1 , p 2 , . . . ) Example: One question Q ; Yes/No is 50-50 probability � � 1 2 log 1 H ( Q ) = − 2 = 1 2 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 19 / 56
Communications essentials Shannon Entropy Example Alice has a 1-bit message m , with 50-50 distribution The entropy (Bob’s uncertainty) is H ( m Binary Symmetric Channel with error rate of 25% i.e. 25% risk that Alice’s message is flipped Alice’s uncertainty about the received message is H ( r | 1 ) = H ( r | 0 ) = − 0 . 25 log 0 . 25 − 0 . 75 log 0 . 75 ≈ 0 . 811 H ( r | m ) = 0 . 5 H ( r | 0 ) + 0 . 5 H ( r | 1 ) = 0 . 811 The information received by Bob is I ( m ; r ) = H ( m ) − H ( m | r ) = H ( r ) − H ( r | m ) = 1 − 0 . 811 = 0 . 189 What if the error rate is 50 % ? Or 10 % ? Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 20 / 56
Communications essentials Shannon Entropy Shannon entropy Properties Additive, if X and Y are independent, then 1 H ( X , Y ) = H ( X ) + H ( Y ) . If you are uncertain about two completely different questions, the entropy is the sum of uncertainty for each question If X is uniformly distributed, 2 then H ( X ) increase when the size of X increases. The more possibilities, the more uncertainty Continuity, h ( p 1 , p 2 , . . . ) is continuous in each p i . 3 Shannon entropy is a measure in mathematical terms Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 21 / 56
Communications essentials Shannon Entropy What it tells us Shannon entropy Consider a message X of entropy k = H ( X ) (in bits) The average size of a file F describing X is at least k bits If the size of F is exactly k bits on average then we have found a perfect compression of F Each message bit contains one bit of information on average Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 22 / 56
Communications essentials Shannon Entropy What it tells us Shannon entropy Consider a message X of entropy k = H ( X ) (in bits) The average size of a file F describing X is at least k bits If the size of F is exactly k bits on average then we have found a perfect compression of F Each message bit contains one bit of information on average Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 22 / 56
Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 23 / 56
Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 23 / 56
Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 23 / 56
Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 23 / 56
Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 23 / 56
Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 23 / 56
Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 23 / 56
Communications essentials Shannon Entropy Exemple banale A single bit may contain more than a 1 bit of information E.G. Image Compression 0: Mona Lisa 10: Lenna 110: Baboon 11100: Peppers 11110: F-16 11101: Che Guevarra 11111. . . : other images However, on average, Maximum information in one bit is one bit (most of the time it is less) The example is based on Huffmann coding Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 23 / 56
Communications essentials Security Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 24 / 56
Communications essentials Security Cryptography Alice ciphertext Bob , → → m c m Eve Eve seeks information about m , observing c If I ( m ; c ) > 0 then Eve succeeds in theory or if I ( k ; c ) > 0 If H ( m | c ) = H ( m ) then the system is absolutely secure. The above are strong statements Even if Eve has information I ( m ; c ) > 0, she may be unable to make sense of it. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 25 / 56
Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 26 / 56
Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 26 / 56
Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 26 / 56
Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 26 / 56
Communications essentials Security Stegananalysis Question: Does Alice send secret information to Bob? Answer: X ∈ { yes , no } What is the uncertainty H ( X ) ? Eve intercepts a message S , Is there any information I ( X ; S ) ? If H ( X | S ) = H ( X ) , then the system is absolutely secure. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 26 / 56
Communications essentials Prediction Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 27 / 56
Communications essentials Prediction Random sequences Text is a sequence of random samples (letters) ( l 1 , l 2 , l 3 , . . . ) ; l i ∈ A = { A , B , . . . , Z } Each letter has a probability distribution P ( l ) , l ∈ A . Statistical dependence (implies redundancy) P ( l i | l i − 1 ) � = P ( l i ) H ( l i | l i − 1 ) < H ( l i ) : Letter i − 1 contains information about l i Use this information to guess l i The more letters l i − j , . . . , l i − 1 we have seen the more reliable can we predict l i Wayner (Ch 6.1) gives example of first, second, . . . , fifth order prediction Using j = 0 , 1 , 2 , 3 , 4 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 28 / 56
Communications essentials Prediction First-order prediction Example from Wayner Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 29 / 56
Communications essentials Prediction Second-order prediction Example from Wayner Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 30 / 56
Communications essentials Prediction Third-order prediction Example from Wayner Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 31 / 56
Communications essentials Prediction Fourth-order prediction Example from Wayner Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 32 / 56
Communications essentials Prediction Markov models Markov source is a sequence M 1 , M 2 , . . . of stochastic (random) variables An n -th order Markov source completely described by probability distributions P [ M 1 , M 2 , . . . , M n ] P [ M i | M i − n , . . . , M i − 1 ] (identical for all i ) This is a finite-state machine (automaton) State of the source last n bits M i − n , . . . , M i − 1 determines probability distribution of next symbol The random texts from Wayner are generated using 1st, 2nd, 3rd, and 4th order Markov models Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 33 / 56
Communications essentials Prediction A related example A group of MIT students software generating random ‘science’ papers random paper accepted for WMSCI 2005 You can generate your own paper on-line http://pdos.csail.mit.edu/scigen/ Source code available (SCIgen) If you are brave – as a poster topic modify SCIgen for steganography Or maybe for your dissertation if you have a related topic you can tweek Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 34 / 56
Compression Outline Communications essentials 1 Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 35 / 56
Compression Huffmann Coding Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 36 / 56
Compression Huffmann Coding Compression ❋ ∗ is set of binary strings of arbitrary length Definition A compression system is a function c : ❋ ∗ → ❋ ∗ , such that E ( length m ) > E ( length ( c ( m ))) when m is drawn from ❋ ∗ . The compressed string is expected to be shorter than the original. Definition A compression c is perfect E ( length c ( m )) = H ( m ) . It follows from the definition that the compression is one-to-one Decompress any random string m , and c − 1 ( m ) makes sense! Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 37 / 56
Compression Huffmann Coding Huffmann Coding Short codewords for frequent quantities Long codewords for unusual quantities Each symbol (bit) should be equally probable � ���� ���� � � � � � � � � � � 0 1 � � � � 50% � � ���� ���� � ���� ���� � � � � � � � 0 1 � � 25% 25% Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 38 / 56
Compression Huffmann Coding Example � ������������� � � � � � � � � 0 1 � � � � � � � ���� ���� � ���� ���� � � ���� ���� � � � � � � � � � � � � � � 0 � 0 � 1 1 � � � � � 25% � 25% 25% � � ���� ���� � � � � � � � � 0 1 � � � 12 1 � � 2 % � ���� ���� � ���� ���� � � � � � � � � 0 1 � 7 1 7 1 4 % 4 % Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 39 / 56
Compression Huffmann Coding Decoding Huffmann codes are prefix free No codeword is the prefix of another This simplifies the decoding This is expressed in the Huffmann tree, follow edges for each coded bit (only) leaf node resolves to a message symbol When a message symbol is recovered, start over for next symbol. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 40 / 56
Compression Huffmann Coding Ideal Huffmann code Each branch equally likely: P ( b i | b i − 1 , b i − 2 , . . . ) = 1 / 2 Maximum entropy H ( B i | B i − 1 , B i − 2 , . . . ) = 1 uniform distribution of compressed files implies perfect compression In practice, the probabilities are rarely powers of 1 2 hence the Huffmann code is imperfect Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 41 / 56
Compression Huffmann Steganography Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 42 / 56
Compression Huffmann Steganography Reverse Huffmann Core Reading Peter Wayner: Disappearing Cryptography Ch. 6-7 Use a Huffmann code for each state in the Markov model Stegano-encoder: Huffmann decompression Stegano-decoder: Huffmann compression Is this similar to Anderson & Petitcolas Steganography by Perfect Compression? Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 43 / 56
Compression Huffmann Steganography The Steganogram Steganogram looks like random text use probability distribution based on sample text higher-order statistics make it look natural Fifth-order statistics is reasonable Higher order will look more natural Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 44 / 56
Compression Huffmann Steganography The Steganogram Steganogram looks like random text use probability distribution based on sample text higher-order statistics make it look natural Fifth-order statistics is reasonable Higher order will look more natural Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 44 / 56
Compression Huffmann Steganography Example Fifth order For each 5-tupple of letters A 0 , A 1 , A 2 , A 3 , A 4 , Let l i − 4 , . . . , l i be consecutive letters in natural text tabulate P ( l i = A 0 | l i − j = A j , j = 1 , 2 , 3 , 4 ) For each 4-tuple A 1 , A 2 , A 3 , A 4 make an (approximate) Huffmann code for A 0 . we may ommit some values of A 0 , or have non-unique codewords We encode a message by Huffmann decompression using Huffmann code depending on the last four stegogramme symbols obtaining a fifth-order random text Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 45 / 56
Compression Huffmann Steganography Example Fifth order Consider four preceeding letters comp Next letter may be letter r e l a o probability 40% 12% 22% 18% 8% combined 52% 22% 26% rounded 50% 25% 25% Rounding to power of 1 2 Combining several letters reduces rounding error. The example is arbitrary and fictuous. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 46 / 56
Compression Huffmann Steganography Example The Huffmann code Huffmann code based on fifth-order conditional probabilities � � ���� ���� � � � � � � � � � 0 1 � � � � r / e � ���� ���� � � ���� ���� � � � � � � � � 0 1 � � a / o l When two letters are possible, choose at random (according to probability in natural text) decoding (compression) is still unique encoding (decompression) is not unique This evens out the statistics in the stegogramme Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 47 / 56
Miscellanea Outline Communications essentials 1 Compression 2 Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 48 / 56
Miscellanea Synthesis by Grammar Outline Communications essentials 1 Communications and Redundancy Anderson and Petitcolas 1999 Digital Communications Shannon Entropy Security Prediction Compression 2 Huffmann Coding Huffmann Steganography Miscellanea 3 Synthesis by Grammar Redundancy in Images Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 49 / 56
Miscellanea Synthesis by Grammar Grammar A grammar describes the structure of a language Simple grammar sentence → noun verb noun → Mr. Brown | Miss Scarlet verb → eats | drinks Each choice can map to a message symbol 0 : Mr. Brown, eats 1 : Miss Scarlet, drinks Two messages can be stego-encrypted No cover-text is input. Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 50 / 56
Miscellanea Synthesis by Grammar More complex grammar sentence → noun verb addition noun → Mr. Brown | Miss Scarlet | . . . | Mrs. White verb → eats | drinks | celebrates | . . . | cooks addition → addition term | ∅ term → on Monday | in March | with Mr. Green | . . . | in Alaska | at home general → sentence | question question → Does noun verb addition ? xgeneral → general | sentence , because sentence Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 51 / 56
Miscellanea Synthesis by Grammar More complex grammar sentence → noun verb addition noun → Mr. Brown | Miss Scarlet | . . . | Mrs. White verb → eats | drinks | celebrates | . . . | cooks addition → addition term | ∅ term → on Monday | in March | with Mr. Green | . . . | in Alaska | at home general → sentence | question question → Does noun verb addition ? xgeneral → general | sentence , because sentence Dr Hans Georg Schaathun Information Theory and Synthetic Steganography Spring 2009 – Week 8 51 / 56
Recommend
More recommend