information theory
play

Information theory " Information content of a message a boolean - PowerPoint PPT Presentation

Information theory " Information content of a message a boolean value "true"/"false" can be encoded as one bit without losing information: 1/0 a direction up/down/right/left: 2 bits the strings


  1. Information theory " Information content of a message – a boolean value "true"/"false" can be encoded as one bit without losing information: 1/0 – a direction up/down/right/left: 2 bits – the strings "AAAAAAAAAAAAAAAAAAAAAA" and "22*A" can be considered to have the same meaning, but different length " The information content of a string/message is measured by its entropy

  2. Entropy For X = { x 1 , ... , x n } with associated probabilities p( x 1 ), ..., p( x n ) such that their sum is 1 and all are n p( x i ) ⋅ log 2 (p( x i )) positive, the entropy H( X ) = ! � ! i=1 The entropy of a message is higher if the probablilities are evenly distributed – booleans B with p( true )=p( false )=½ H( B )= ! (½log 2 (½)+½log 2 (½))=1 – X s.t. p( x i )=0 except p( x k )=1 H( X ) = 0 (only one possible value: no info) – p( x i )=1/ n for all i : H( X ) = log 2 n " Fact: 0 " H( X ) " log 2 n , where # X # = n

  3. Entropy (cont) " The entropy is a measure of the uncertainty of the contents of, e.g., a message. " Higher entropy $ more difficult to use e.g. frequency analysis " Compression raises the entropy of a message $ good to compress m before encryption " First lab tomorrrow: Huffman encoding, a kind of compression

  4. Redundancy " How much of a message can be discarded without losing information? The redundancy D = R ! r , where r = H( X )/ N is the rate of the language for msgs of length N (the entropy per character; average info per character) R = log 2 # X # is the absolute rate (the maximum info per character; maximum entropy) The redundancy ratio is D/R (how much can be discarded) . " English: 26 chars $ R % 4.7; 1.0 " r " 1.5 (for large N ) $ 3.2 " D " 3.7 $ between 68% ! 79% redundant

  5. Equivocation " With additional information, the uncertainty may be reduced – a random 32 ! bit integer has H( X )=32, but if we learn that it is even, the uncertainty is reduced by 1 bit. " The equivocation H Y ( X ) is the conditional entropy of X given Y

  6. Conditional probabilities " For Y & { Y 1 , ..., Y m } a probability distribution, let p Y ( X ) be the conditional probability for X given Y – sometimes written p( X | Y ) " and p( X,Y ) = p Y ( X ) ' p( Y ) the joint probability of X and Y " Perfect secrecy: iff p M ( C) = p( C ) for all M – prob. of C received given that M was encrypted is the same as that of receiving C if some other M’ was encrypted – requires that | K | ( | M |

  7. Equivocation (cont) " H Y ( X ) = ! ! X,Y p( X,Y ) ' log 2 p Y ( X ) " H Y ( X ) = ! X,Y p( X,Y ) ' log 2 (1/p Y ( X )) " H Y ( X ) = ! Y p(Y) '! X p Y ( X ) ' log 2 (1/p Y ( X )) " Note: H Y ( X ) " H( X ) – extra knowledge of Y can not increase the uncertainty of X

  8. Key equivocation " How uncertain is the key, given a cryptogram? H C ( K ) ! the key equivocation . " If H C ( K )=0: no uncertainty, can be broken " Usually: lim n )* H C ( K ) = 0 – i.e., the longer the message, the easier to break " H C ( K ) difficult to compute exactly, but can be approximated

  9. Unicity distance " The unicity distance N u is the smallest N such that H C ( K ) is close to 0 – the amount of ciphertext needed to uniquely determine the key – but it may still be computationally infeasible " Can be approximated to H( K )/ D for random ciphers (where given c and k , D k ( c ) is as likely to produce one cleartext as another) " Unconditional security: – if H C ( K ) never approaches 0 even for large N

  10. Unicity distance (cont) " The DES algorithm encrypts 64 bits at a time with a 56 ! bit key – H( K )=56, and D= 3.2 for English $ N u = 56/3.2=17.2 characters (137 bit > 2 ' 64) – but it takes a lot of effort... " Shift cipher, K = Z 26 . Then H( K )=4.7, D=3.2, and N u = 1.5 characters! – but D =3.2 only for long messages – and poor approximation of random cipher

Recommend


More recommend