Transform Coding - Overview Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform coefficients Threshold coding Typical coding artifacts Fast implementation of the DCT 6-1 Girod: Image and Video Compression
Transform Coding reconstructed original image image original reconstructed image block block Inverse Transform A transform A -1 Quantization & Transmission quantized transform transform coefficients coefficients 6-2 Girod: Image and Video Compression
Properties of Orthonormal Transforms Forward Transform → → y = A x input signal block of size N*N, N*N transform coefficients, Transform matrix arranged as a vector arranged as a vector of size N2 * N2 Inverse transform → → → -1 T x = A y = A y → Linearity: x is represented as linear combination of “basis functions“. Parseval‘s Theorem holds: transform is a rotation of the signal vector around the origin of an N 2 -dimensional vector space. 6-3 Girod: Image and Video Compression
Separable Orthonormal Transforms, I An orthonormal transform is separable, if the transformof a signal block of size N*N-can be expressed by Note: A = A ⊗ A y = A x A T Kronecker product N*N transform coefficients Orthonormal transform matrix N*N block of input signal of size N * N The inverse transform is x = A T y A Great practical importance: The transform requires 2 matrix multiplications of size N*N instead one multiplication of a vector of size 1*N 2 with a matrix of size N 2 *N 2 Reduction of the complexity from O(N 4 ) to O(N 3 ) 6-4 Girod: Image and Video Compression
Separable Orthonormal Transforms, II 2D transform realized by 2 one-dimensional transforms (along rows and columns of the signal block) N*N block of N*N block of transform pixels coefficients N T x A x A x A column-wise row-wise N N-transform N-transform 6-5 Girod: Image and Video Compression
Criteria for the Selection of a Particular Transform Decorrelation, energy concentration (e.g., KLT, DCT, . . .) Visually pleasant basis functions (e.g., pseudo-random-noise , m-sequences, lapped transforms) Low complexity of computation 6-6 Girod: Image and Video Compression
Karhunen Loève Transform (KLT) Karhunen Loève Transform (KLT) yields decorrelated transform coefficients. Basis functions are eigenvectors of the covariance matrix of the input signal. KLT achieves optimum energy concentration. Disadvantages: KLT dependent on signal statistics KLT not separable for image blocks Transform matrix cannot be factored into sparse matrices. 6-7 Girod: Image and Video Compression
Comparison of Various Transforms, I Karhunen Loève transform (1948/1960) Haar transform (1910) Walsh-Hadamard transform(1923) Slant transform (Enomoto, Shibata, 1971) Discrete CosineTransform (DCT) (Ahmet, Natarajan, Rao, 1974) Comparison of 1D basis functions for block size N=8 6-8 Girod: Image and Video Compression
Comparison of Various Transforms, II Energy concentration measured for typical natural images, block size 1x32 (Lohscheller): KLT is optimum DCT performs only slightly worse than KLT 6-9 Girod: Image and Video Compression
Discrete Cosine Transform and Discrete Fourier Transform Transform coding of images using the Discrete Fourier Transform (DFT): edge For stationary image statistics, the energy folded concentration properties of the DFT converge against those of the KLT for large block sizes. Problem of blockwise DFT coding: blocking effects due to circular topology of the DFT and Gibbs phenomena. Remedy: reflect image at block boundaries, DFT of larger symmetric block -> “DCT“ pixel folded 6-10 Girod: Image and Video Compression
DCT Type II-DCT of blocksize M x M 2D basis functions of the DCT: is defined by transform matrix A containing elements a ik = α i cos π (2k + 1) i 2 M i, k = 0.....M-1 1 α 0 = with M 2 α i = M ∀ i ≠ 0 6-11 Girod: Image and Video Compression
Bit Allocation for Transform Coefficients I Problem: divide bit-rate R among MxM transform coefficients i such that resulting distortion D is minimized. ∑ Assumptions ∑ R = R i D = D i i i Total rate Rate for Distortion contributed Total coefficient i by coefficient i distortion lead to "Pareto condition" = ∂ D j ∂ D i for all i,j ∂ R i ∂ R j 6-12 Girod: Image and Video Compression
Bit Allocation for Transform Coefficients II Additional assumptions “Gaussian r.v.“ and mse distortion yield the optimum rate for each transform coefficient i: variance of transform coefficient (i,j) 2 σ i 1 R = max [ ( log ), 0 ] bit i 2 2 D Maximum acceptable mean squared error Literature contains many practical bit allocation schemes that are based on this insight 6-13 Girod: Image and Video Compression
Amplitude Distribution of the DCT Coefficients ✗ Histograms for 8x8 DCT coefficient amplitudes measured for natural images (from Mauersberger): DC coefficient is typically uniformly distributed. For the other coefficients, the distribution resembles a Laplacian pdf. 6-14 Girod: Image and Video Compression
Threshold Coding, I Transform coefficients that fall below a threshold are discarded. Implementation by uniform quantizer with threshold characteristic: Quantizer output Quantizer input Positions of non-zero transform coefficients are transmitted in addition to their amplitude values. 6-15 Girod: Image and Video Compression
Threshold Coding, II Efficient encoding of the position of non-zero transform coefficients: zig-zag-scan + run-level-coding ordering of the transform coefficients by zig-zag-scan 6-16 Girod: Image and Video Compression
Threshold Coding, III 201 195 188 193 169 157 196 190 1480 49 33 -15 -14 33 -38 20 185 3 1 1 -3 2 -1 0 193 188 187 201 195 193 213 193 10 -52 11 -12 16 17 -13 -12 1 1 -1 0 -1 0 0 1 Q 184 192 180 195 182 151 199 193 DCT ( 185 3 1 0 1 1 1 -1 0 1 0 1 1 0 -3 19 32 -22 -10 22 -20 9 8 0 0 1 0 -1 0 0 0 176 172 179 179 152 148 198 183 2 -1 0 0 0 0 0 0 1 -1 -1 0 -1 0 0 16 10 17 27 -31 12 6 -5 1 1 0 -1 0 0 0 -1 196 195 169 171 159 185 218 175 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 -30 -6 13 -12 8 4 -3 -3 0 0 1 0 0 0 -1 0 214 213 205 170 173 185 206 150 0 0 0 0 0 -1 -1 EOB) -25 16 6 -24 9 3 3 3 0 0 0 0 0 0 0 0 207 205 207 184 180 167 173 160 -2 17 4 -6 0 -4 -9 8 run-level- 0 0 0 0 0 0 0 0 198 203 205 186 196 149 159 163 1 -2 6 0 7 -5 -8 -7 0 0 0 0 0 0 0 0 coding Mean of block: 185 Original 8x8 block (0,3) (0,1) (1,1) (0,1) (0,1) (0,-1) (1,1) (1,1) (0,1) (1,-3) (0,2) (0,-1) (6,1) (0,-1) (0,- 1) (1,-1) (14,1) (9,-1) (0,-1) (EOB) transmission Mean of block: 185 (0,3) (0,1) (1,1) (0,1) (0,1) (0,-1) (1,1) (1,1) (0,1) (1,-3) (0,2) (0,-1) (6,1) (0,-1) (0,- 1) (1,-1) (14,1) (9,-1) (0,-1) Reconstructed 8x8 block (EOB) run-level- decoding 196 193 187 192 179 176 196 189 185 3 1 1 -3 2 -1 0 198 188 182 198 196 192 208 200 1 1 -1 0 -1 0 0 1 185 189 191 197 174 159 184 189 0 0 1 0 -1 0 0 0 ( 185 3 1 0 1 1 1 -1 0 1 0 1 1 0 -3 1 1 0 -1 0 0 0 -1 2 -1 0 0 0 0 0 0 1 -1 -1 0 -1 0 0 167 181 182 177 154 153 187 189 0 0 1 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 201 199 178 165 163 185 206 179 scaling 0 0 0 0 0 0 0 0 220 217 193 176 165 179 197 170 0 0 0 0 0 -1 -1 EOB) 0 0 0 0 0 0 0 0 194 198 195 193 169 156 180 179 and inverse inverse 210 196 192 209 185 149 157 160 0 0 0 0 0 0 0 0 DCT zig-zag- scan 6-17 Girod: Image and Video Compression
Detail in a Block vs. DCT Coefficients Transmitted block reconstructed quantized DCT from quantized DCT coefficients coefficients image block coefficients of block of block 30 30 20 20 10 10 0 0 -10 -10 -20 -20 0 0 -30 -30 2 2 0 0 4 4 2 2 4 4 6 6 6 6 30 30 20 20 10 10 0 0 -10 -10 -20 -20 0 0 -30 -30 2 2 0 0 4 4 2 2 4 4 6 6 6 6 30 30 20 20 10 10 0 0 -10 -10 -20 -20 0 0 -30 -30 2 2 0 0 4 4 2 2 4 4 6 6 6 6 6-18 Girod: Image and Video Compression
Typical DCT Coding Artifacts DCT coding with increasingly coarse quantization, block size 8x8 quantizer stepsize quantizer stepsize quantizer stepsize for AC coefficients: 25 for AC coefficients: 100 for AC coefficients: 200 6-19 Girod: Image and Video Compression
Adaptive Transform Coding Input signal Entropy Transform Quantization coding Block class classification Quantization and entropy coding optimized separately for each class. Typical classes: Blocks without detail Horizontal structures Vertical structures Diagonals Textures without preferred orientation 6-20 Girod: Image and Video Compression
Influence of DCT Block Size Efficiency as a function of blocksize NxN, measured for 8 bit quantization in the original domain and equivalent quantization in the transform domain Memoryless entropy of original signal G = 0 mean entropy of transform coefficients Block size 8x8 is a good compromise. 6-21 Girod: Image and Video Compression
Recommend
More recommend