Coding Theorems for Reversible Embedding Frans Willems and Ton Kalker T.U. Eindhoven, Philips Research DIMACS, March 16-19, 2003
Outline 1. Gelfand-Pinsker coding theorem 2. Noise-free embedding 3. Reversible embedding 4. Robust and reversible embedding 5. Partially reversible embedding 6. Remarks 1
I. The Gelfand-Pinsker Coding Theorem ˆ Y N Z N W W Y N = e ( W, X N ) ✲ ✲ ✲ ˆ ✲ W = d ( Z N ) P c ( z | y, x ) ✻ ✻ X N P s ( x ) Messages: Pr { W = w } = 1 /M for w ∈ { 1 , 2 , · · · , M } . Side information: Pr { X N = x N } = Π N n =1 P s ( x n ) for x N ∈ X N . Channel: discrete memoryless {Y × X , P c ( z | y, x ) , Z} . Error probability: P E = Pr { ˆ W � = W } . Rate: R = 1 N log 2 ( M ). 2
Capacity The side-information capacity C si is the largest ρ such that for all ǫ > 0 there exist for all large enough N encoders and decoders with R ≥ ρ − ǫ and P E ≤ ǫ . THEOREM (Gelfand-Pinsker [1980]): C si = P t ( u,y | x ) I ( U ; Z ) − I ( U ; X ) . max (1) Achievability proof: Fix a test-channel P t ( u, y | x ). Consider sets A ǫ ( · ) of strongly typical sequences , etc. (a) For each message index w ∈ { 1 , · · · , 2 NR } , generate 2 NR u sequences u N at random according to P ( u ) = � x,y P s ( x ) P t ( u, y | x ). Give these se- quences the label w . (b) When message index w has to be transmitted choose a sequence u N having label w such that ( u N , x N ) ∈ A ǫ ( U, X ). Such a sequence exists almost always if R u > I ( U ; X ) (roughly). 3
(c) The input sequence y N results from applying the ”channel” P ( y | u, x ) = y P t ( u, y | x ) to u N and x N . Then y N is transmitted. P t ( y, u | x ) / � (d) The decoder upon receiving z N , looks for the unique sequence u N such that ( u N , z N ) ∈ A ǫ ( U, Z ). If R + R u < I ( U ; Z ) (roughly) such a unique sequence exists. The message index is the label of u N . Conclusion is that R < I ( U ; Z ) − I ( U ; X ) is achievable. Observations A: As an intermediate result the decoder recovers the sequence u N . B: The transmitted u N is jointly typical with the side-info sequence x N , i.e. ( u N , x N ) ∈ A ǫ ( U, X ) thus their joint composition is OK . Note that P ( u, x ) = � y P s ( x ) P t ( u, y | x ). 4
II. Noise-free Embedding X N Y N Y N = e ( W, X N ) ✲ ✲ P s ( x ) ✻ ✲ ✲ ˆ W = d ( Y N ) ˆ W W Messages: Pr { W = w } = 1 M for w ∈ { 1 , 2 , · · · , M } . Source (host): Pr { X N = x N } = Π n =1 ,N P s ( x n ) for x N ∈ X N . Error probability: P E = Pr { ˆ W � = W } . Rate: R = 1 N log 2 ( M ). Embedding distortion: D xy = E [ 1 n =1 ,N D xy ( X n , e n ( W, X N ))] for some � N distortion matrix { D xy ( x, y ) , x ∈ X , y ∈ Y} . 5
Achievable region noise-free embedding A rate-distortion pair ( ρ, ∆ xy ) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, ∆ xy + ǫ, D xy ≤ P E ≤ ǫ. THEOREM (Chen [2000], Barron [2000]): The set of achievable rate-distortion pairs is equal to G nfe which is defined as G nfe = { ( ρ, ∆ xy ) : 0 ≤ ρ ≤ H ( Y | X ) , � ∆ xy ≥ P ( x, y ) D xy ( x, y ) , x,y for P ( x, y ) = P s ( x ) P t ( y | x ) } . (2) Again {X , P t ( y | x ) , Y} is called test-channel . 6
Proof: Achievability: In the Gelfand-Pinsker achievability proof, note that Z = Y (noiseless channel) and take the auxiliary random variable U = Y . Then ( x N , y N ) ∈ A ǫ ( X, Y ) hence D xy is OK. For the embedding rate we obtain R = I ( U ; Z ) − I ( U ; X ) = I ( Y ; Y ) − I ( Y ; X ) = H ( Y | X ) . Converse: Rate part: H ( W ) − H ( W | ˆ log 2 ( M ) W ) + Fano term ≤ H ( W | X N ) − H ( W | X N , Y N ) + Fano term ≤ I ( W ; Y N | X N ) + Fano term = H ( Y N | X N ) + Fano term ≤ � ≤ H ( Y n | X n ) + Fano term n =1 ,N ≤ NH ( Y | X ) + Fano term , 7
where X and Y are random variables with Pr { ( X, Y ) = ( x, y ) } = 1 � Pr { ( X n , Y n ) = ( x, y ) } , N n =1 ,N for x ∈ X and y ∈ Y . Note that for x ∈ X Pr { X = x } = P s ( x ) . Distortion part: Pr { ( X N , Y N ) = ( x N , y N ) } 1 � � D xy = D xy ( x n , y n ) N n x N ,y N � = Pr { ( X, Y ) = ( x, y ) } D xy ( x, y ) . x,y Let P E ↓ 0, etc. 8
III. Reversible Embedding X N Y N Y N = e ( W, X N ) ✲ ✲ P s ( x ) ˆ W ✻ ✲ ✲ ( ˆ W, ˆ X N 1 ) = d ( Y N ) W ✲ ˆ X N 1 Messages: Pr { W = w } = 1 M for w ∈ { 1 , 2 , · · · , M } . Source (host): Pr { X N = x N } = Π n =1 ,N P s ( x n ) for x N ∈ X N . Error probability: P E = Pr { ˆ W � = W ∨ ˆ X N 1 � = X N } . Rate: R = 1 N log 2 ( M ). Embedding distortion: D xy = E [ 1 n =1 ,N D xy ( X n , e n ( W, X N ))] for some � N distortion matrix { D xy ( x, y ) , x ∈ X , y ∈ Y} . Inspired by Fridrich, Goljan, and Du, ”Lossless data embedding for all image formats,” Proc. SPIE, Security and Watermarking of Multimedia Contents , San Jose, CA, 2002. 9
Achievable region for reversible embedding A rate-distortion pair ( ρ, ∆ xy ) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, ∆ xy + ǫ, D xy ≤ P E ≤ ǫ. RESULT (Kalker-Willems [2002]): The set of achievable rate-distortion pairs is equal to G re which is defined as G re = { ( ρ, ∆ xy ) : 0 ≤ ρ ≤ H ( Y ) − H ( X ) , � ∆ xy ≥ P ( x, y ) D xy ( x, y ) , x,y for P ( x, y ) = P s ( x ) P t ( y | x ) } . (3) Note that {X , P t ( y | x ) , Y} is the test channel. 10
Proof: Achievability: In the Gelfand-Pinsker achievability proof, note that Z = Y (noiseless channel) and take the auxiliary random variable U = [ X, Y ]. Then x N can be reconstructed by the decoder and ( x N , y N ) ∈ A ǫ ( X, Y ) hence D xy is OK. For the embedding rate we obtain R = I ( U ; Z ) − I ( U ; X ) = I ([ X, Y ]; Y ) − I ([ X, Y ]; X ) = H ( Y ) − H ( X ) . Converse: Rate part: H ( W ) − H ( W, X N | ˆ X N W, ˆ log 2 ( M ) 1 ) + Fano term ≤ H ( W, X N ) − H ( W, X N | ˆ X N 1 ) − H ( X N ) + Fano term W, ˆ = H ( W, X N ) − H ( W, X N | Y N , ˆ X N 1 ) − H ( X N ) + Fano term W, ˆ ≤ I ( W, X N ; Y N ) − H ( X N ) + Fano term = H ( Y N ) − H ( X N ) + Fano term = � ≤ [ H ( Y n ) − H ( X n )] + Fano term n =1 ,N ≤ N [ H ( Y ) − H ( X )] + Fano term , 11
where X and Y are random variables with Pr { ( X, Y ) = ( x, y ) } = 1 � Pr { ( X n , Y n ) = ( x, y ) } , N n =1 ,N for x ∈ X and y ∈ Y . Note that for x ∈ X Pr { X = x } = P s ( x ) . Distortion part: Pr { ( X N , Y N ) = ( x N , y N ) } 1 � � D xy = D xy ( x n , y n ) N n x N ,y N � = Pr { ( X, Y ) = ( x, y ) } D xy ( x, y ) . x,y Let P E ↓ 0, etc. 12
Example: Binary source, Hamming distortion p x p y ❍ ❍ ✟ ◗ ✟ ✑ ◗ ✑ ◗ ✑ 1 d 1 1 ◗ ✑ ◗ ✑ ◗ ✑ ◗ ✑ ◗ X Y ✑ ◗ ✑ ◗ ✑ ◗ ✑ ◗ ✑ d 0 ◗ 0 0 ✑ ◗ ✑ ◗ ✑ ✑ ◗ Since ∆ xy ≥ p x d 1 + (1 − p x ) d 0 p y = p x (1 − d 1 ) + (1 − p x ) d 0 we can write p y ≤ ∆ xy + p x (1 − 2 d 1 ) . Assume w.l.o.g. that p x ≤ 1 / 2. First let ∆ xy be such that ∆ xy + p x ≤ 1 / 2 or ∆ xy ≤ 1 / 2 − p x . Then we have p y ≤ ∆ xy + p x ≤ 1 / 2 , 13
and hence ρ ≤ h ( p y ) − h ( p x ) ≤ h ( p x + ∆ xy ) − h ( p x ) . However ρ = h ( p x + ∆ xy ) − h ( p x ) is achievable with ∆ xy by taking ∆ xy d 1 = 0 and d 0 = . 1 − p x Note that the test channel is not symmetric and that ∆ xy ≤ 1 / 2 − p x d 0 = ≤ 1 / 2 . 1 − p x 1 − p x For ∆ xy + p x ≥ 1 / 2 the rate is bounded as ρ ≤ 1 − h ( p x ) but also achievable. 14
Plot of rate-distortion region G re 0.35 0.3 0.25 0.2 RATE in BITS 0.15 0.1 0.05 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 DISTORTION Horizontal axis ∆ xy , vertical axis ρ , for p x = 0 . 2. Maximum embedding rate 1 − h (0 . 2) ≈ 0 . 278. 15
Another perspective y N ( k ) y N ( k + 1) ✻ ✻ w ( k ) w ( k + 1) x N ( k ) x N ( k + 1) Consider a blocked system with blocks of length N . In block k message bits can be (noise-free) embedded with rate H ( Y | X ) and corresponding distortion. Then in block k + 1 message bits are embedded that allow for recon- struction of x N ( k ) given y N ( k ). This requires NH ( X | Y ) bits. Therefore the resulting embedding rate is = H ( Y | X ) − H ( X | Y ) R = H ( Y, X ) − H ( X ) − H ( X | Y ) = H ( Y ) − H ( X ) . 16
IV. Robust and Reversible Embedding X N Y N Y N = e ( W, X N ) ✲ ✲ P s ( x ) ˆ W Z N ✻ ✲ ✲ ✲ ( ˆ W, ˆ X N 1 ) = d ( Z N ) P c ( z | y ) ✲ W ˆ X N 1 Messages: Pr { W = w } = 1 M for w ∈ { 1 , 2 , · · · , M } . Source (host): Pr { X N = x N } = Π n =1 ,N P s ( x n ) for x N ∈ X N . Channel: discrete memoryless {Y , P c ( z | y ) , Z} . X N 1 � = X N } . Error probability: P E = Pr { ˆ W � = W ∨ ˆ Rate: R = 1 N log 2 ( M ). Embedding distortion: D xy = E [ 1 n =1 ,N D xy ( X n , e n ( W, X N ))] for some � N distortion matrix { D xy ( x, y ) , x ∈ X , y ∈ Y} . 17
Recommend
More recommend