✬ ✩ The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ ˜ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7, 2005 ✫ ✪ 1
✬ ✩ Outline • Part I: General Concepts – Introduction – Definitions – What is it useful for? • Part II: Application to Information Hiding – Performance guarantees against omnipotent attacker? – Steganography, Watermarking, Fingerprinting ✫ ✪ 2
✬ ✩ Part I: General Concepts ✫ ✪ 3
✬ ✩ Reference Materials • I. Csiszar, “The Method of Types”, IEEE Trans. Information Theory , Oct. 1998 (commemorative Shannon issue) • A. Lapidoth and P. Narayan, “Reliable Communication under Channel Uncertainty”, same issue. • Application areas: – capacity analyses – computation of error probabilities (exponential behavior) – universal coding/decoding – hypothesis testing ✫ ✪ 4
✬ ✩ Basic Notation • Discrete alphabets X and Y • Random variables X, Y with joint pmf p ( x, y ) • The entropy of X is H ( X ) = − � x ∈X p ( x ) log p ( x ) (will sometimes be denoted by H ( p X )) • Joint entropy H ( X, Y ) = − � � y ∈Y p ( x, y ) log p ( x, y ) x ∈X • The conditional entropy of Y given X is � � H ( Y | X ) = − p ( x, y ) log p ( y | x ) x ∈X y ∈Y = H ( X, Y ) − H ( X ) ✫ ✪ 5
✬ ✩ • The mutual information between X and Y is p ( x, y ) log p ( x, y ) � � I ( X ; Y ) = p ( x ) p ( y ) x ∈X y ∈Y = H ( Y ) − H ( Y | X ) • The Kullback-Leibler divergence between pmf’s p and q is p ( x ) log p ( x ) � D ( p || q ) = q ( x ) x ∈X ✫ ✪ 6
✬ ✩ Types • Deterministic notion • Given a length- n sequence x ∈ X n , count the frequency of occurrence of each letter of the alphabet X • Example: X = { 0 , 1 } , n = 12, x = 110100101110 contains 5 zeroes and 7 ones p x = ( 5 12 , 7 ⇒ the sequence x has type ˆ 12 ) • ˆ p x is also called empirical pmf. It may be viewed as a pmf over X p x ( x ) is a multiple of 1 • Each ˆ n . ✫ ✪ 7
✬ ✩ Joint Types • Given two length- n sequences x ∈ X n and y ∈ Y n , count the frequency of occurrence of each pair ( x, y ) ∈ X × Y • Example: x = 110100101110 y = 111100101110 4 / 12 1 / 12 • ( x , y ) have joint type ˆ p xy = 0 7 / 12 • Empirical pmf over X × Y ✫ ✪ 8
✬ ✩ Conditional Types • By analogy with Bayes rule, define the conditional type of y given x as p y | x ( y | x ) = ˆ p xy ( x, y ) ˆ p x ( x ) ˆ which is an empirical conditional pmf • Example: x = 110100101110 y = 111100101110 4 / 5 1 / 5 ⇒ ˆ p y | x = 0 1 ✫ ✪ 9
✬ ✩ Type Classes • The type class T x is the set of all sequences that have the same type as x . Example: all sequences with 5 zeroes and 7 ones • The joint type class T xy is the set of all sequences that have the same joint type as ( x , y ) • The conditional type class T y | x is the set of all sequences y ′ that have the same type as y , conditioned on x ✫ ✪ 10
✬ ✩ Information Measures • Any type may be represented by a dummy sequence • Can define empirical information measures: � H ( x ) H (ˆ p x ) � H ( y | x ) H (ˆ p y | x ) � I ( x ; y ) I ( X ; Y ) for ( X, Y ) ∼ ˆ p xy • Will be useful to design universal decoders ✫ ✪ 11
✬ ✩ Typicality • Consider pmf p over X • Length- n sequence x ∼ i.i.d. p . Notation: x ∼ p n • Example: X = { 0 , 1 } , n = 12, x = 110100101110 • For large n , all typical sequences have approximately composition p • This can be measured in various ways: – Entropy ǫ -typicality: | 1 n log p n ( x ) − H ( X ) | < ǫ – Strong ǫ -typicality: max x ∈X | ˆ p x ( x ) − p ( x ) | < ǫ both define sets of typical sequences ✫ ✪ 12
✬ ✩ Application to Channel Coding • Channel input x = ( x 1 , · · · , x n ) ∈ X n , output y = ( y 1 , · · · , y n ) ∈ Y n • Discrete Memoryless Channel (DMC): p n ( y | x ) = � n i =1 p ( y i | x i ) • Many fundamental coding theorems can be proven using the concept of entropy typicality. Examples: – Shannon’s coding theorem (capacity of DMC) – Rate-distortion bound for memoryless sources ✫ ✪ 13
✬ ✩ • Many fundamental coding theorems cannot be proved using the concept of entropy typicality. Examples: – precise calculations of error log-probability – various kinds of unknown channels • So let’s derive some useful facts about types • Number of types ≤ ( n + 1) |X| (polynomial in n ) • Size of type class T x : p x ) ≤ | T x | ≤ e nH (ˆ ( n + 1) −|X| e nH (ˆ p x ) Ignoring polynomial terms, we write | T x | . = e nH (ˆ p x ) ✫ ✪ 14
✬ ✩ • Probability of x under distribution p n : � p ( x ) n ˆ p x ( x ) p n ( x ) = x ∈X e − n � x ∈X ˆ p x ( x ) log p ( x ) = e − n [ H (ˆ p x )+ D (ˆ p x || p )] = same for all x in the same type class • Probability of type class T x under distribution p n : P n ( T x ) = | T x | p n ( x ) . = e − nD (ˆ p x || p ) • Similarly: | T y | x | . = e nH (ˆ p y | x ) Y | X ( T y | x | x ) . = e − nD (ˆ p xy || p Y | X ˆ p x ) P n ✫ ✪ 15
✬ ✩ Constant-Composition Codes • All codewords have the same type ˆ p x • Random coding : generate codewords x m , m ∈ M randomly and independently from uniform pmf on type class T x • Note that channel outputs have different types in general ✫ ✪ 16
✬ ✩ Unknown DMC’s – Universal Codes • Channel p Y | X is revealed neither to encoder nor to decoder ⇒ neither encoding rule nor decoding rule may depend on p Y | X C = max p X min p Y | X I ( X ; Y ) • Universal codes: same error exponent as in known- p Y | X case (existence?) • Encoder : select T x , use constant-composition codes • Decoder : uses Maximum Mutual Information rule ˆ = argmax m ∈M I ( x m ; y ) m = argmin m ∈M H ( y | x m ) • Note: the GLRT decoder is in general not universal (GLRT: first estimate p Y | X , then plug in ML decoding rule) ✫ ✪ 17
✬ ✩ Key idea in proof • Denote by D m ⊂ Y n the decoding region for message m • Polynomial number of type classes, forming a partition of Y n • Given that m was transmitted, partition error event y ∈ Y n \ D m into a union over type classes: � y ∈ T y | x m \ D m T y | x m ✫ ✪ 18
✬ ✩ • The probability of the error event is therefore given by � Pr [error | m ] = Pr T y | x m \ D m T y | x m � � � ≤ Pr T y | x m \ D m T y | x m . � � = max Pr T y | x m \ D m T y | x m Pr [ T y | x m ] | T y | x m \ D m | = max | T y | x m | T y | x m p x m ) | T y | x m \ D m | . e − nD (ˆ p x m y || p Y | X ˆ = max | T y | x m | T y | x m ⇒ the worst conditional type class dominates error probability • Calculation mostly involves combinatorics: finding out | T y | x m \ D m | ✫ ✪ 19
✬ ✩ Extensions • Channels with memory • “Arbitrary Varying” Channels ⇒ randomized codes • Continuous alphabets (difficult!) ✫ ✪ 20
✬ ✩ Part II: Applications to WM ✫ ✪ 21
✬ ✩ Reference Materials [SM’03 ] A. Somekh-Baruch and N. Merhav, “On the Error Exponent and Capacity Games of Private Watermarking Systems,” IEEE Trans. Information Theory , March 2003 [SM’04 ] A. Somekh-Baruch and N. Merhav, “On the Capacity Game of Public Watermarking Systems,” IEEE Trans. Information Theory , March 2004 [MO’03 ] P. Moulin and J. O’Sullivan, “Information-Theoretic Analysis of Information Hiding,” IEEE Trans. Information Theory , March 2003 [MW’04 ] P. Moulin and Y. Wang, “Error Exponents for Channel Coding with Side Information,” preprint , Sep. 2004 ✫ ✪ 22
✬ ✩ Communication Model for Data Hiding Decoder Attack Encoder ^ x y M M y x g( , ) y k Message f( ,m, ) s k p( | ) s Host k Key • Memoryless host sequence s • Message M uniformly distributed over { 1 , 2 , · · · , 2 nR } • Unknown attack channel p ( y | x ) • Randomization via secret key sequence k , arbitrary alphabet K ✫ ✪ 23
✬ ✩ Attack Channel Model • First IT formulations of this problem assumed a fixed attack channel (e.g., AWGN) or a family of memoryless channels (1998-1999) • Memoryless assumption was later relaxed (2001) • We’ll just require the following distortion constraint: n � d n ( x , y ) � d ( x i , y i ) ≤ D 2 ∀ x , y (wp1) i =1 ⇒ unknown channel with arbitrary memory • Similarly the following embedding constraint will be assumed: d n ( s , x ) ≤ D 1 ∀ s , k , m, x (wp1) ✫ ✪ 24
✬ ✩ Data-Hiding Capacity [SM’04] • Single-letter formula: C ( D 1 , D 2 ) = sup p ( y | x ) ∈A ( D 2 ) [ I ( U ; Y ) − I ( U ; S )] min p ( x,u | s ) ∈Q ( D 1 ) where U is an auxiliary random variable Q ( D 1 ) = { p XU | S : � x,u,s p ( x, u | s ) p ( s ) d ( s, x ) ≤ D 1 } A ( D 2 ) = { p Y | X : � x,y p ( y | x ) p ( x ) d ( x, y ) ≤ D 2 } • Same capacity formula as in [MO’03], where p ( y | x ) was constrained to belong to the family A n ( D 2 ) of memoryless channels • Why? ✫ ✪ 25
Recommend
More recommend