information theory and coding
play

Information Theory and Coding i f s f f Image, Video and Audio - PowerPoint PPT Presentation

Sampling, aliasing and Nyquist limit Information Theory and Coding i f s f f Image, Video and Audio Compression Markus Kuhn 0 Lent 2003 Part II Computer Laboratory 0 3fs 2fs fs 0 fs 2fs 3fs A wave cos(2 tf )


  1. Sampling, aliasing and Nyquist limit Information Theory and Coding – i ⋅ f s ± f f Image, Video and Audio Compression Markus Kuhn 0 Lent 2003 – Part II Computer Laboratory 0 −3fs −2fs −fs 0 fs 2fs 3fs A wave cos(2 πtf ) sampled with frequency f s cannot be distinguished http://www.cl.cam.ac.uk/Teaching/2002/InfoTheory/ from cos(2 πt ( if s ± f )) for any i ∈ Z , therefore ensure | f | < f s / 2 . 3 Structure of modern audiovisual Quantization communication systems Uniform: 4 2 Perceptual Entropy Sensor+ Channel 0 Signal ✲ ✲ ✲ ✲ sampling coding coding coding −2 −4 −4 −3 −2 −1 0 1 2 3 ❄ Noise Channel ✲ Non-uniform (e.g., logarithmic): 8 ❄ 6 Perceptual Entropy Channel Human Display ✛ ✛ ✛ ✛ senses decoding decoding decoding 4 2 The dashed box marks the focus of the main part of this course as taught by Neil Dodgson. 0 0.5 1 2 4 8 2 4

  2. Example for non-uniform quantization: digital telephone network Fechner’s scale matches older subjective intensity scales that follow differentiability of stimuli, e.g. the astronomical magnitude numbers for star brightness introduced by Hipparchos ( ≈ 150 BC). µ −law (US) A−law (Europe) signal voltage Stevens’ law 0 A sound that is 20 DL over SL is perceived as more than twice as loud as one that is 10 DL over SL, i.e. Fechner’s scale does not describe well perceived intensity. A rational scale attempts to reflect subjective −128 −96 −64 −32 0 32 64 96 128 byte value relations perceived between different values of stimulus intensity φ . Stevens observed that such rational scales ψ follow a power law: Simple logarithm fails for values ≤ 0 → apply µ -law compression y = V log(1 + µ | X | /V ) ψ = k · ( φ − φ 0 ) a sgn ( x ) log(1 + µ ) Example coefficients a : temperature 1.6, weight 1.45, loudness 0.6, before uniform quantization ( µ = 255 , V maximum value). brightness 0.33. Lloyd’s algorithm: finds least-square-optimal non-uniform quantiza- tion function for a given probability distribution of sample values. S.P. Lloyd: Least Squares Quantization in PCM. IEEE Trans. on Information Theory. Vol. 28, March 1982, pp 129–137. 5 7 Psychophysics of perception Decibel Sensation limit (SL) = lowest intensity stimulus that can still be perceived Communications engineers love logarithmic units: Difference limit (DL) = smallest perceivable stimulus difference at given → Quantities often vary over many orders of magnitude → difficult intensity level to agree on a common SI prefix Weber’s law → Quotient of quantities (amplification/attenuation) usually more Difference limit ∆ φ is proportional to the intensity φ of the stimulus interesting than difference (except for a small correction constant a describe deviation of experi- mental results near SL): → Signal strength usefully expressed as field quantity (voltage, ∆ φ = c · ( φ + a ) current, pressure, etc.) or power, but quadratic relationship between these two ( P = U 2 /R = I 2 R ) rather inconvenient Fechner’s scale → Weber/Fechner: perception is logarithmic Define a perception intensity scale ψ using the sensation limit φ 0 as the origin and the respective difference limit ∆ φ = c · φ as a unit step. Plus: Using magic special-purpose units has its own odd attractions ( → typographers, navigators) The result is a logarithmic relationship between stimulus intensity and Neper (Np) denotes the natural logarithm of the quotient of a field scale value: quantity F and a reference value F 0 . φ ψ = log c Bel (B) denotes the base-10 logarithm of the quotient of a power P φ 0 and a reference power P 0 . Common prefix: 10 decibel (dB) = 1 bel. 6 8

  3. YCrCb video colour coordinates Where P is some power and P 0 a 0 dB reference power, or F is a field quantity and F 0 the reference: Human eye processes color and luminosity at different resolutions, therefore use colour space with luminance coordinate P F 10 dB · log 10 = 20 dB · log 20 Y = 0 . 3 R + 0 . 6 G + 0 . 1 B P 0 F 0 and colour components Common reference vales indicated with additional letter afer dB: V = R − Y = 0 . 7 R − 0 . 6 G − 0 . 1 B 0 dBW = 1 W U = B − Y = − 0 . 3 R − 0 . 6 G + 0 . 9 B 0 dBm = 1 mW = − 30 dBW Since − 0 . 7 ≤ V ≤ 0 . 7 and − 0 . 9 ≤ U ≤ 0 . 9 , a more convenient 0 dB µ V = 1 µ V normalized encoding of chrominance is: 0 dB SPL = 20 µ Pa (sound pressure level) U Cb = 2 . 0 + 0 . 5 0 dB SL = perception threshold (sensation level) V Cr = 1 . 6 + 0 . 5 3 dB = double power, 6 dB = double pressure/voltage/etc. 10 dB = 10 × power, 20 dB = 10 × pressure/voltage/etc. Modern image compression techniques operate on Y , Cr , Cb channels separately, using half the resolution of Y for storing Cr , Cb . 9 11 RGB video colour coordinates Correlation of neighbour pixels Values of nighbour pixels at distance 1 Values of nighbour pixels at distance 2 Hardware interface (VGA): red, green, blue signals with 0–0.7 V 250 250 Electron-beam current and photon count of cathode-ray display are 200 200 proportional to ( v − v 0 ) γ , where v is the video-interface or screen-grid 150 150 voltage and γ is usually in the range 1.5–3.0. CRT non-linearity is 100 100 compensated electronically in TV cameras and approximates Stevens scale. 50 50 Software interfaces map RGB voltage linearly to { 0 , 1 , . . . , 255 } or 0–1 0 0 0 100 200 0 100 200 Values of nighbour pixels at distance 4 Values of nighbour pixels at distance 8 Mapping of numeric RGB values to colour and luminosity is at present 250 250 still highly hardware and sometimes even operating-system or device- 200 200 driver dependent. 150 150 New specification “sRGB” aims to fix meaning of RGB with γ = 2 . 2 and standard primary colour coordinates. 100 100 http://www.w3.org/Graphics/Color/sRGB 50 50 http://www.srgb.com/ IEC 61966 0 0 0 100 200 0 100 200 10 12

  4. Karhunen-Lo` eve transform (KLT) The 2-dimensional variant of the DCT applies the 1-D transform on both rows and columns of an image: Two random variables x , y are not correlated if their covariance S ( u, v ) = C ( u ) C ( v ) cov( x, y ) = E { ( x − E { x } ) · ( y − E { y } ) } = 0 . · � � N/ 2 N/ 2 Take an image (or in practice a small 8 × 8 pixel block) as a random- N − 1 N − 1 s ( y, x ) cos (2 x + 1) uπ cos (2 x + 1) vπ variable vector b . The components of a random-variable vector b = � � 2 N 2 N ( b 1 , . . . , b k ) are decorrelated if the covariance matrix cov( b ) with y =0 x =0 (cov( b )) i,j = E { ( b i − E { b i } ) · ( b j − E { b j } ) } = cov( b i , b j ) Breakthrough: is a diagonal matrix. The Karhunen-Lo` eve transform of b is the matrix Ahmed/Natarajan/Rao discovered the DCT as an excellent approxima- A with which cov(A b ) is diagonal. tion of the KLT for typical photographic images, but far more efficient Since cov( b ) is symmetric, its eigenvectors are orthogonal. Using these to calculate. eigenvectors as the rows of A and the corresponding eigenvalues as the Ahmed, Natarajan, Rao: Discrete Cosine Transform. IEEE Transactions on Computers, Vol. 23, diagonal elements of the diagonal matrix D , we obtain the decompo- January 1974, pp. 90–93. sition cov( b ) = A T DA , and therefore cov( A b ) = D . A range of fast algorithms have been found for calculating 1-D and The Karhunen-Lo` eve transform is the orthogonal matrix of the singular- 2-D DCTs (e.g., Ligtenberg/Vetterli). value decomposition of the covariance matrix of its input. 13 15 Whole-image DCT Discrete cosine transform (DCT) The forward and inverse discrete cosine transform 2D Discrete Cosine Transform (log10) Original image 4 N − 1 C ( u ) s ( x ) cos (2 x + 1) uπ 3 � S ( u ) = � 2 N N/ 2 2 x =0 1 N − 1 C ( u ) S ( u ) cos (2 x + 1) uπ � 0 s ( x ) = � 2 N N/ 2 −1 u =0 −2 with −3 1 � u = 0 √ −4 C ( u ) = 2 1 u > 0 is an orthonormal transform: � 1 N − 1 C ( u ) cos (2 x + 1) uπ · C ( u ′ ) cos (2 x + 1) u ′ π u = u ′ � = 0 u � = u ′ � 2 N � 2 N N/ 2 N/ 2 x =0 14 16

  5. Whole-image DCT, 80% coefficient cutoff Whole-image DCT, 95% coefficient cutoff 80% truncated 2D DCT (log10) 80% truncated DCT: reconstructed image 95% truncated 2D DCT (log10) 95% truncated DCT: reconstructed image 4 4 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 −4 −4 17 19 Whole-image DCT, 90% coefficient cutoff Whole-image DCT, 99% coefficient cutoff 90% truncated 2D DCT (log10) 90% truncated DCT: reconstructed image 99% truncated 2D DCT (log10) 99% truncated DCT: reconstructed image 4 4 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 −4 −4 18 20

Recommend


More recommend