Scalable Source Coding With Causal Side Information and a Causal - - PowerPoint PPT Presentation

scalable source coding with causal side information and a
SMART_READER_LITE
LIVE PREVIEW

Scalable Source Coding With Causal Side Information and a Causal - - PowerPoint PPT Presentation

Scalable Source Coding With Causal Side Information and a Causal Helper Shraga Bross Faculty of Engineering, Bar-Ilan University, Israel brosss@biu.ac.il ISIT, June 2020 Transmission model Z k X 1 , 1 , . . . , 1 ( X n ) X 1 ,n


slide-1
SLIDE 1

Scalable Source Coding With Causal Side Information and a Causal Helper

Shraga Bross

Faculty of Engineering, Bar-Ilan University, Israel brosss@biu.ac.il

ISIT, June 2020

slide-2
SLIDE 2

Transmission model

✲ ✲ ˆ X1,1, . . . , ˆ X1,n ˆ X2,1, . . . , ˆ X2,n Decoder 1 Decoder 2 Encoder Xn ✲ ✲ ✲ ✲ ❄ gk(φ1(Xn), Zk) k = 1, . . . , n φ1(Xn) φ2(Xn) ❄ ✻ Y k Zk

Figure: Scalable source coding with causal side-information and causal helper.

2 / 16

slide-3
SLIDE 3

Definition of a code

An (n, M1, M2, n

k=1 Lk, D1, D2) scalable code for the source X with causal SI

(Y, Z) and causal helper, consists of:

1

A first-stage encoder map φ1 : X n → {1, . . . , M1}, and a sequence ψ1,1, . . . , ψ1,n of reconstruction mappings ψ1,k : {1, . . . , M1} × Zk → ˆ X, k = 1, . . . , n such that, with E denoting the expectation operator, E d[Xn, (ψ1,1(φ1(Xn), Z1), . . . , ψ1,k(φ1(Xn), Zk), . . . , ψ1,n(φ1(Xn), Zn))] ≤ D1. (1)

2

A unidirectional conference between Decoder 1 and Decoder 2 consisting of a sequence of mappings gk : {1, . . . , M1} × Zk → {1, . . . , Lk}, k = 1, . . . , n

3

A second-stage encoder map φ2 : X n → {1, . . . , M2}, and a sequence ψ2,1, . . . , ψ2,n of reconstruction mappings ψ2,k : {1, 2, . . . , M1} × {1, 2, . . . , M2} × {1, . . . , L1} × . . . × {1, . . . , Lk} × Yk → ˆ X

3 / 16

slide-4
SLIDE 4

Definition of achievable rate-distortions tuple

such that E d[Xn, (ψ2,1(φ1(Xn), φ2(Xn), g1(φ1(Xn), Z1), Y1), . . . , ψ2,k(φ1(Xn), φ2(Xn),

  • gj(φ1(Xn), Zj)

k

j=1, Y k), . . . ,

ψ2,n(φ1(Xn), φ2(Xn),

  • gj(φ1(Xn), Zj)

n

j=1, Y n))]

≤ D2. (2) The rate tuple (R1, R2, Rh) of the scalable code is R1 = 1 n log M1, R2 = R1 + 1 n log M2, Rh = 1 n

n

  • k=1

log Lk. A D-achievable tuple (R1, R2, Rh) is defined in the regular way. The collection of all D-achievable rate tuples is the achievable scalable source-coding region R(D).

4 / 16

slide-5
SLIDE 5

Related work

With non-causal SI at the decoders: “On successive refinement for the Wyner-Ziv problem” [Steinberg-Merhav (2004)] — characterization of R(D) when X−

  • Y −
  • Z.

“Side-information scalable source coding” [Tian-Diggavi (2008)] — inner and

  • uter bounds on R(D) when X−
  • Z−
  • Y .

“On successive refinement for the Wyner-Ziv problem with partially cooperating decoders” [Bross-Weissman (2008)] — conclusive characterization of encoder rates but gap in the helper’s rate. With causal SI at the decoders: “On successive refinement with causal side information at the decoders” [Maor-Merhav (2008)] — characterization of R(D) no matter what is the relative SI quality at the decoders. This work: extends the last in the sense that Decoder 1 can communicate with Decoder 2, via a conference channel, at a rate not exceeding Rh

5 / 16

slide-6
SLIDE 6

Definition of R∗(D)

Fix a pair D = (D1, D2). Define R∗(D) to be the set of all (R1, R2, Rh) for which there exist random variables (U, V, W) taking values in finite alphabets U, V, W, respectively, such that

1

(U, V )−

  • X−
  • (Y, Z) forms a Markov chain.

2

There exist deterministic maps f1 : U × Z → ˆ X g : U × Z → W f2 : U × V × W × Y → ˆ X such that, with W g(U, Z), Ed(X, f1(U, Z)) ≤ D1 Ed(X, f2(U, V, W, Y )) ≤ D2.

3

The rates R1, R2 and Rh satisfy R1 ≥ I(X; U) (3a) R2 ≥ I(X; UV ) (3b) Rh ≥ H(W|U). (3c)

6 / 16

slide-7
SLIDE 7

Main result

Theorem R(D) = R∗(D). Remarks: The converse shows that (U, V )−

  • X−
  • (Y, Z), in the definition of R∗(D),

can be replaced by the Markov chain U−

  • V −
  • X−
  • (Y, Z).

The converse holds as well when the causal helper and the reconstructors are allowed to depend also on Xi−1. That is, gk = gk(φ1(Xn), Zi, Xi−1), ˆ X1,k = ψ1,k(φ1(Xn), Zi, Xi−1), and ˆ X2,k = ψ2,k(φ1(Xn), φ2(Xn), {gj(·)}k

j=1 , Y i, Xi−1).

7 / 16

slide-8
SLIDE 8

Example

Let X be a BSS and let d(·, ·) be the Hamming loss. The SI pair (Z, Y ) is conditionally independent given X, where Z is the output of a BSC(δ1) channel with input X, and Y is the output of a BSC(δ2) channel with input X, where 0 < δ1 < δ2 < 0.5. Define RX,δ(∆) = 1 − Hb(∆) 0 ≤ ∆ ≤ dc(δ) H′

b(dc(δ))(δ − ∆)

dc(δ) < ∆ ≤ δ where dc(δ) the solution to the equation (1 − Hb(dc))/(dc − δ) = −H′

b(dc),

where 0 < dc < δ < 0.5. RX,δ(D) is the RDF for SC with causal SI Y (Y is the output of a BSC(δ) channel with input X) [Weissman-El Gamal (2006)]. Fix a rate-constraint Rh and a distortion pair (D1, D2), where D2 < D1 < δ1.

8 / 16

slide-9
SLIDE 9

Example cont.

Let ˜ V = X ⊕ S, where S ∼ Ber(min{ ˜ D2, dc(δ2)}) β2 is independent of (X, Z, Y ) and ˜ D2 > D2 will be defined in the sequel. Let B1 ∼ Ber δ2−max{ ˜

D2,dc(δ2)} δ2−dc(δ2)

  • independently of (X, Y, Z, S). Let

T ∼ Ber(Pr{T = 1}), independently of (X, Y, Z, S, B1) such that Pr{T = 1} ∗ β2 = min{D1, dc(δ1)} β1. With the assumption that γ δ1 − max{D1, dc(δ1)} δ1 − dc(δ1) · δ2 − dc(δ2) δ2 − max{ ˜ D2, dc(δ2)} ≤ 1 let B2 ∼ Ber(γ) independently of (X, Y, Z, S, B1, T). X ✲ ⊕ ✻ S ˜ V ✲ ⊗ ✻ B1 ✲ V ⊕ ✻ T ✲ ⊗ ✻ B2 ✲ U U−

  • V −
  • X−
  • (Y, Z)

Figure: Scalable source coding scheme for a BSS.

9 / 16

slide-10
SLIDE 10

Example cont.

Let Θh be the time fraction during which Decoder 1 (the helper) describes W, and define f1(U, Z) = B1 · B2 · U + (1 − B1 · B2) · Z f2(U, V, W, Y ) = B1 · V + Θh · W + (1 − B1 − Θh) · Y With the choice of W = g(U, Z) = Z the distortion constraint at Decoder 2 is fulfilled as long as B1β2 + Θhδ1 + (1 − B1 − Θh)δ2 ≤ D2, (4) and the distortion constraint at Decoder 1 is met as long as B1 · B2β1 + (1 − B1 · B2)δ1 ≤ D1. Consequently, for the first stage I(X; U) = I(X; T ⊕ ˜ V |B1 · B2 = 1) = RX,δ1(D1).

10 / 16

slide-11
SLIDE 11

Example cont.

Assuming that the helper describes Z using a binary quantizer followed by an entropy encoder, given Rh, we obtain Rh ≥ ΘhH(Z) = ΘhHb(δ1). Thus, choosing Θh = min{Rh/Hb(δ1), 1 − B1} (5) this quantity is an upper bound on the time fraction at which Decoder 2 can

  • bserve the lossless description of Z in forming its reconstruction. Thus, the

constraint (4) becomes B1β2 + (1 − B1)δ2 ≤ D2 + min{Rh/Hb(δ1), 1 − B1}(δ2 − δ1) ˜ D2, (6) and consequently for the second stage I(X; UV ) = I(X; V ) = I(X; ˜ V |B1 = 1) = RX,δ2( ˜ D2). Comparing the first and second stage rate expressions with the [Maor-Merhav] expression, inequality (6) reflects the helper’s assistance in terms of the relaxation

  • f R2 − R1.

11 / 16

slide-12
SLIDE 12

Converse

Assume that (R1, R2, Rh) is (D1, D2) achievable. Let T1 φ1(Xn), T2 φ2(Xn). Then, with Uj (T1, Xj−1, Y j−1, Zj−1) nR1 ≥ log M1 ≥ H(T1) = I(Xn; T1) =

n

  • k=1

I(Xk; T1|Xk−1)

(a)

=

n

  • k=1

I(Xk; T1Xk−1)

(b)

=

n

  • k=1

I(Xk; T1Xk−1Y k−1Zk−1) =

n

  • k=1

I(Xk; Uk). (a) follows since Xn is memoryless, (b) follows since Xk−

  • (T1, Xk−1)−
  • (Y k−1, Zk−1) forms a Markov chain.

12 / 16

slide-13
SLIDE 13

Converse cont.

Next, with Vj (T2, Uj) n(R2 − R1) ≥ log M2 ≥ H(T2|T1) ≥ I(Xn; T2|T1) =

n

  • k=1

I(Xk; T2|T1Xk−1)

(c)

=

n

  • k=1

I(Xk; T2|T1Xk−1Y k−1Zk−1) =

n

  • k=1

I(Xk; Vk|Uk). (c) follows since Xk−

  • (T1, Xk−1)−
  • (Y k−1, Zk−1) and

Xk−

  • (T1, T2, Xk−1)−
  • (Y k−1, Zk−1) are Markov chains. Furthermore,

Uk−

  • Vk−
  • Xk−
  • (Yk, Zk).

Consequently, nR2 ≥ nR1 +

n

  • k=1

I(Xk; Vk|Uk) ≥

n

  • k=1

[I(Xk; Uk) + I(Xk; Vk|Uk)] =

n

  • k=1

I(Xk; UkVk).

13 / 16

slide-14
SLIDE 14

Converse cont.

With Wj gj(T1, Zj) we have nRh ≥

n

  • k=1

log Lk ≥ H(W1, W2, . . . , Wn) ≥ H(W1, W2, . . . , Wn|T1) =

n

  • k=1

H(Wk|T1W k−1) ≥

n

  • k=1

H(Wk|T1W k−1Zk−1Xk−1Y k−1)

(d)

=

n

  • k=1

H(Wk|T1Zk−1Xk−1Y k−1) =

n

  • k=1

H(Wk|Uk), where (d) follows since Wj is a functions of (T1, Zj). Defining f1(Ui, Zi) = ψ1,i(T1, Zi) we may write nD1 ≥

n

  • i=1

E[d(Xi, ˆ X1,i)] =

n

  • i=1

E[d(Xi, ψ1,i(T1, Zi))] =

n

  • i=1

E[d(Xi, f1(Ui, Zi))].

14 / 16

slide-15
SLIDE 15

Converse cont.

Also, we may write nD2 ≥

n

  • i=1

E[d(Xi, ˆ X2,i)] =

n

  • i=1

E[d(Xi, ψ2,i(T1, T2, W1, . . . , Wi, Y i))]

(e)

n

  • i=1

E[d(Xi, ψ∗

2,i(T1, T2, Xi−1, Y i−1, Zi−1, Wi, Yi))]

=

n

  • i=1

E[d(Xi, ψ∗

2,i(Uj, Vi, Wi, Yi))] = nE[d(X, f2(U, V, W, Y ))].

(e) follows since W1, . . . , Wi−1 are deterministic functions of (T1, Zi−1) (T2, Xi−1, Y i−1, Wi, Yi)−

  • (T1, Zi−1)−
  • (W1, . . . , Wi−1).

Hence, given (Uk, Vk, Wk, Yk), the tuple (T1, T2, W1, . . . , Wk, Y k) is independent

  • f (W1, . . . , Wk−1). This guarantees the existence of a reconstruction

ˆ X∗

2,k(Uk, Vk, Wk, Yk) which dominates ˆ

X2,k in the sense that E

  • d(Xk, ˆ

X∗

2,k(Uk, Vk, Wk, Yk))

  • ≤ E
  • d(Xk, ˆ

X2,k(T1, T2, W1, . . . , Wk, Yk, Y k−1))

  • .

15 / 16

slide-16
SLIDE 16

Summary

Scalable source coding with causal SI and a causal helper : Characterization of RD region. Example–computation of RD region for BSS.

16 / 16