Image/video compression: Basics and research issues Christine GUILLEMOT
Outline A few basics in source coding Practical use in standardized solutions Research issues • Towards better transforms • Towards better prediction Inpainting-based compression
Compression: a few basics - 3
Basics in source coding Lossless Rate Bounds Function of Source Probability Distributions - 4
Basics in source coding - 5
Basics in source coding How to « optimally » encode separately dependent symbols? Lossless coding: limits in terms of compression factor (order of 2‐3 for natural images, and 3 to 4 or video) - 6
Basics in source coding To further decrease the bit rate, one has to tolerate distortion => Lossy compression under a rate or distortion constraint R(D) Source Information Redundancy Entropy Information not relevant Useful Information R D D Maximum Uniform scalar quantization + entropy coding Distortion Scheme quasi-optimal if pixels were independent - 7
Basics in source coding How to address dependency between symbols ? Transform the pixels into independent data - 8
Basics in source coding Classical transforms: discrete cosine transform, discrete wavelet transform Discrete Wavelet Transform - 9
Basics in source coding Further/better suppressing dependencies : Prediction - 10
Basics in source coding In summary - 11
Practical use of these concepts in standardized solutions - 12
Three decades of standards development …. Guided by the same concepts JPEG-2000 JPEG - 13
… leading to a common framework The same hybrid motion-compensated temporal prediction + DCT over the years - 14
First key ingredient: motion-compensated temporal prediction Exploiting pixel dependency in the temporal dimension With many optimizations over the years (e.g. multiple reference frames) - 15
Second key ingredient: Spatial prediction Exploiting dependency in the spatial dimension (H.264) If efficient prediction, difference between original and prediction (residue): independent samples Many optimizations over the years (up to 35 modes in HEVC) - 16
Third key ingredient: Transform + joint RD optim With a joint rate-distortion optimization of prediction and transform support to adapt to local image characteristics (flat regions, contours, texture..) Transform : a simple block transform (DCT) with R-D optimized support - 17
Fourth key ingredient: entropy coding Higher-order statistics to exploit remaining dependencies Context modeling On-line learning of probability laws Binarization followed by arithmetic coding - 18
Performance evolution of video compression over the years - 19
Research Issues: Towards better transforms • Anisotropic transforms • Graph-based transforms • Sparse approximations - 20
Block-based Transforms limitations Assuming a n image is a piecewise smooth function, i.e., it contains Sharp boundaries between smooth regions Super-pixels obtained with SLIC method Block-based Transforms are limited when blocks contain arbitrary shaped discontinuities 2D separable wavelets well adapted to point singularities only, not so well to smooth boundaries (contours , whereas in 2D images, there are mostly line and curve singularities => Design of alternative transforms like curvelets, bandelets, oriented wavelets etc. or graph-based-transforms
Bandelets [E. Pennec & S. Mallat 2003] Using modified (warped) orthogonal wavelets in the flow direction To perfom a transform on smooth functions Quad-tree segmentation vs T Each arrow is a vector orienting the support of the wavelet transform Estimation of the geometrical flow: T Sample geometry (green lines) Warped 1D filtering 1D Signal 1D Wavelet Transform vs T 1D Signal Sub-square 22
Bandelets [E. Pennec & 0.44 bpp S. Mallat 2003] wavelets (0.2bpp) Bandelets (0.2bpp) original 23
Oriented wavelet transforms [ V. Chappelier & C. Guillemot TIP-2006] Lifting scheme of the 1D-wavelet transform Generalization to 2D Separation of the square grid into 2 quincunx cosets Iteration of the splitting on one of the grids
Oriented wavelet transforms [ V. Chappelier & C. Guillemot TIP-2006] Multi-scale quincunx sampling pyramid Downsampling by a factor of at each scale L k {0,1} either square or quincunx grids Orientation of the 1D wavelets along edges with binary orientations
Oriented wavelet transforms [ V. Chappelier & C. Guillemot TIP-2006] Better preservation of directionnal frequencies LL0-wavelet L1-wavelet
The field of transform design is reviving with graph-based transforms [Kim et al. 2012, Shuman et al. 2013, Hu et al. 2015] Signal values pixels - 27
Towards graph-based transforms [Kim et al. 2012, Shuman et al. 2013, Hu et al. 2015] Characterization of the graph Real Symmetric matrix Laplacian operator: difference operator
Towards graph-based transforms [Kim et al. 2012, Shuman et al. 2013, Hu et al. 2015] The Laplacian of the graph Has a complete set of eigenvectors: Associated to real non-negative eigen-values (defining the spectrum of the graph) Normalized Laplacian: weights normalized by
Towards graph-based transforms • The eigenvectors associated to the eigenvalues carry a notion of frequency. The eigenvector associated to the eigenvalue 0 is constant whereas the eigenvector associated to a higher eigenvalue varies more on the vertices of the graph. • The number of zero crossings is higher with a higher eigenvalue. Analogous to classical Fourier analysis where a higher f means faster oscillation (Exponentials) • The eigenvectors of the Laplacian define the Graph Fourier Transform [Shuman et al. 2013] GFT iGFT
Towards graph-based transforms Active area of research Wavelets on graphs via spectral graph theory [Hammond et al. 11] Wavelet filterbanks [Narang et Ortega12, Gadde et al.13, …] Overcomplete dictionnaries on graphs [Zhang et al. 12, …] Nevertheless a big issue in compression Rate cost for signalling the graph structure
Sparse approximations for compression D y nxM n R Given an input vector , and a dictionary , M>n, and D of full rank, R min . . x s t Dx y 0 2 d 1 L d x is the norm of x , D is the dictionary (columns are the atoms ) k k 0 0 The “basis” vectors are not ρ 0 required to be orthogonal X y D nx1 nxM Mx1 Finding an exact solution is difficult. In practice, approximate solutions are good enough min . . x s t y Dx 0 p Or, equivalently, given D and y, computationally tractable search algorithm for an 2 approximate solution: arg min . . y Dx s t x 0 2 X • Greedy pursuit algorithms : MP [Mallat & Zhang (1993)], OMP [Pati 1993], OOMP, …. • L2-L1 min (constrained least squares): BP denoising [Chen, Donoho, & Saunders (1995)]
L1-minimization: Basis Pursuit (BP) Chen, Donoho, & Saunders (1995) Solve Instead of solving min . . min . . x s t Dx y x s t Dx y x x 0 1 • The problem becomes convex (linear programming) • Very efficient solvers: Interior point methods [Chen, Donoho, & Saunders (`95)] , Sequential shrinkage for union of ortho-bases [Bruce et.al. (`98)] , Iterated shrinkage [Figuerido & Nowak (`03), Daubechies, Defrise, & Demole (‘04), E. (`05), E., Matalon, & Zibulevsky (`06)] . • L1 regularization: quadratic programming 1 2 min Basis Pursuit Denoising y Dx x 2 2 1 (LASSO)
Sparsity depends on how well the dictionary is adapted to the data in hand Given training vectors Y=[Y 1 , ....., Y T ], learn D that minimizes the averaged error of the sparse representation of the training vectors 2 arg min ( min . . , 1 , , ) Y DX s t X L n T n 0 F X D The optimization problem is combinatorial and highly non- convex, but convex with respect to one of its variables when the other one is fixed => Two steps approach 2 min Y DX 2 Y arg min F DX X F D . . , 1 , , s t X L n T n 0
Sparsity depends on how well the dictionary is adapted to the data in hand Extensive work on dictionary learning: Non-structural learned dictionaries • MOD (Engan et al., 1999), • K-SVD (Aharon et al., 2006): SVD-based atom-by-atom dictionary update Imposing constraints on dictionaries • Sparse Dictionary [Rubinstein’10] • Translation invariant [Jost’06; Aharon and Elad, 2008] • Multiscale dictionaries (Mairal’08) • Unions of orthonormal bases (Lesage 2005; Sezer et al., 2008) • Online learned dictionaries [Mairal’10] • Tree-structured dictionaries [Monaci 2004; Jenatton et al., 2011] No so easy to use in compression due to the dimension of the sparse vectors
Recommend
More recommend