On the use of Gaussian models on patches for image denoising Antoine Houdard Young Researchers in Imaging Seminars Institut Henri Poincaré Wednesday, February 27th
Digital photography: noise in images Different ISO settings with constant exposure – 25600 ISO 2/48
Digital photography: noise in images Different ISO settings with constant exposure – 200 ISO 2/48
Noise modeling and denoising problem 3/48
Patch-based image denoising � Many denoising methods rely on the description of the image by patches: ‹ NL-means Buades, Coll, Morel (2005), ‹ BM3D Dabov, Foi, Katkovnik (2007), ‹ PLE Yu, Sapiro, Mallat (2012), ‹ NL-Bayes Lebrun, Buades, Morel (2012), ‹ LDMM Shi, Osher, Zhu (2017), ‹ and many others... 4/48
Patch-based image denoising Hypothesis: the N i are i.i.d. 5/48
Patch-based image denoising The Bayesian paradigm ‹ We consider each clean patch x as a realization of a random vector X with prior distribution P X . Ñ The Gaussian white noise model rewrites: , then Bayes’ theorem yields the posterior distribution: P X | Y p x | y q “ P Y | X p y | x q P X p x q . P Y p y q 6/48
Patch-based image denoising Denoising strategies � p x “ E r X | Y “ y s the minimum mean square error (MMSE) estimator � p x “ Dy ` α s.t. D and α minimize E r} DY ` α ´ X } 2 s which is the linear MMSE also called Wiener estimator � p x “ arg max x P R p p p x | y q the maximum a posteriori (MAP) 7/48
Outline 1. Gaussian priors for X : why are they widely used? 2. How to infer parameters in high dimension? 3. Presentation of the HDMI method. 4. Limitations of model-based patch-based approaches. 8/48
1. Modeling the clean patches X i 9/48
Choice of the model In the literature � local Gaussian models ‹ patch-based PCA Deledalle, Salmon, Dalalyan (2011), ‹ NL-bayes Lebrun, Buades, Morel (2012), ‹ ... � Gaussian mixture models ‹ EPLL Zoran, Weiss (2011), ‹ PLE Yu, Sapiro, Mallat (2012), ‹ Single-frame Image Denoising Teodoro, Almeida, Figueiredo (2015). ‹ ... Why Gaussian models are so widely used? 10/48
Gaussian is convenient � Gaussian model If X „ N p µ, Σ q then x MAP “ µ ` Σ p Σ ` σ 2 I q ´ 1 p y ´ µ q . p x MMSE “ p x Wiener “ p � Gaussian mixture model (GMM) If X „ ř K k “ 1 π k N p µ k , Σ k q then ÿ K “ ‰ µ k ` Σ k p Σ k ` σ 2 I q ´ 1 p y ´ µ k q p x MMSE “ P p Z “ k | Y “ y q . k “ 1 11/48
What do Gaussian models encode? The covariance matrix in Gaussian models and GMM encodes geometric structures up to some contrast change: s s ˆ s Patches generated from N p m, Σ q . Covariance matrix Σ . 12/48
What do Gaussian models encode? The covariance matrix in Gaussian models and GMM encodes geometric structures up to some contrast change: s s ˆ s Patches generated from N p m, Σ q . Covariance matrix Σ . 12/48
What do Gaussian models encode? A covariance matrix cannot encode multiple translated versions of a structure: A set of 10000 patches representing edges with random grey levels and random translations. 13/48
What do Gaussian models encode? A covariance matrix cannot encode multiple translated versions of a structure: s s ˆ s Patches generated from N p m, Σ q . Covariance matrix Σ . 13/48
Restore with the right model covariance matrix clean patch noisy patch denoised 14/48
Conclusion Modeling the patches with Gaussian models is a good idea: � They are convenient for computing the estimates; � They are able to encode the geometric structures of the patches. Need of good parameters for the model! 15/48
2. How to infer parameters in high dimension? 16/48
Parameters inference Gaussian model case: X „ N p µ X , Σ X q observed data t y 1 , . . . , y n u sampled from Y “ X ` N „ N p µ Y , Σ Y q . The maximization of the likelihood ÿ n L p y ; θ q “ 1 p y ´ µ Y q T Σ Y ´ 1 p y ´ µ Y q , 2 i “ 1 yields the Maximum Likelihood estimators (MLE) ÿ n ÿ n µ Y “ 1 Σ Y “ 1 µ Y q T p y i ´ p p p p y i ´ p y i , µ Y q . n n i “ 1 i “ 1 Since Σ Y “ Σ X ` σ 2 I p , it yields Σ X “ p p Σ Y ´ σ 2 I p . p µ X “ p µ Y , 17/48
How to group patches? Need to group the patches representing the same structure together � For instance with } ¨ } 2 Ñ not robust for strong noise: � Gaussian Mixture Models naturally provide a (more robust) grouping! 18/48
Parameters inference Gaussian Mixture Model case: X „ ř π k N p µ k , Σ k q This implies a GMM on the noisy patches Y „ ř π k N p µ k , S k q EM algorithm: maximize the conditional expectation of the complete log-likelihood: ÿ K ÿ n t ik log p π k g p y i ; θ k qq , k “ 1 i “ 1 where t ik “ E r Z “ k | y i , θ ˚ s and θ ˚ a given set of parameters. � E-step estimation of t ik knowing the current parameters � M-step compute maximum likelihood estimators (MLE) for parameters: ÿ ÿ µ k “ 1 S k “ 1 π k “ n k t ik p y i ´ µ k qp y i ´ µ k q T , p p p t ik y i , n , n k n k with n k “ ř i i i t ik . 19/48
Sketch of a denoising algorithm With all these ingredients, we can design a denoising algorithm: � Extract the patches from the image with P i operators � Learn a GMM for the clean patches X from the observations of Y � Denoise each patch with the MMSE � Aggregate all the denoised patches with the P T operators i 20/48
Sketch of a denoising algorithm With all these ingredients, we can design a denoising algorithm: � Extract the patches from the image with P i operators � Learn a GMM for the clean patches X from the observations of Y � Denoise each patch with the MMSE � Aggregate all the denoised patches with the P T operators i But... 20/48
The curse of dimensionality Parameter estimation for Gaussian models or GMMs suffers from the curse of dimensionality The number of samples needed for the estimation of a parameter grows exponentially with the dimension 21/48
The curse of dimensionality in patches space We consider patches of size p “ 10 ˆ 10 Ñ High dimension. Ñ the estimation of sample covariance matrices is difficult: ill conditioned, singular... 22/48
The curse of dimensionality in patches space We consider patches of size p “ 10 ˆ 10 Ñ High dimension. Ñ the estimation of sample covariance matrices is difficult: ill conditioned, singular... In the literature , this issue is generally worked around by � the use of small patches ( 3 ˆ 3 or 5 ˆ 5 ) NL-Bayes [Lebrun, Buades, Morel] � adding ε I to singular covariance matrices PLE [Yu, Sapiro, Mallat] � fixing a lower dimension for covariance matrices S-PLE [Wang, Morel] But, there is no reason to be afraid of this curse! 22/48
The bless of dimensionality? In high-dimensional spaces, it is easier to separate data: Many patches represent structures that live locally in a low dimensional space: using this latent lower dimension allows to group the patches in a more robust way. This “bless” is used in clustering algorithms designed for high-dimension High-Dimensional Data Clustering [Bouveyron, Girard, Schmid] 2007 23/48
The bless of dimensionality? An illustration in the context of patches: an image made of vertical stripes of width >2 pixels with random grey levels. 24/48
The bless of dimensionality? An illustration in the context of patches: view 1 view 2 In the patch space, we cannot distinguish three classes 24/48
The bless of dimensionality? An illustration in the context of patches: view 1 of the first 3 pixels view 2 of the first 3 pixels The algorithm is now able to separate these classes! 24/48
3. High-Dimensional Mixture Models for Image Denoising 25/48
HDMI: presentation of the model � model the clean patches X ` Z latent random variable indicating group membership ` X lives in a low-dimensional subspace which is specific to its latent group: X | Z “ k „ N p µ k , U k Λ k U T k q where U k is a p ˆ d k orthogonal matrix and Λ k “ diag p λ k 1 , . . . , λ k d k q a diagonal matrix of size d k ˆ d k . 26/48
HDMI: induced model � Induced model on the noisy patches Y The model on X implies that Y follows a full rank GMM ÿ K p p y q “ π k g p y ; µ k , Σ k q k “ 1 where U k Σ k U t k has the specific structure: ¨ ˛ , a k 1 0 . ˚ ‹ ... ˚ 0 ‹ d k - ˚ ‹ ˚ ‹ 0 a kd ˚ ‹ , ˚ ‹ ˚ ‹ / / ˚ ‹ . σ 2 0 ˚ ‹ ˚ ‹ p p ´ d k q 0 ... ˝ ‚ / / - σ 2 0 j ` σ 2 and a kj ą σ 2 , for j “ 1 , . . . , d k . where a kj “ λ k 27/48
Denoising with the HDMI model The HDMI model being known, each patch is denoised with the MMSE ÿ K p x i “ E r X | Y “ y i s “ t ik ψ k p y i q k “ 1 where t ik is the posterior probability for the patch y i to belong in the k -th group and ¨ ˛ a k 1 ´ σ 2 0 a k 1 ˚ ‹ ... ˚ ‹ ‚ U T ψ k p y i q “ µ k ` U k k p y i ´ µ k q . ˝ a kdk ´ σ 2 0 a kdk 28/48
Model inference with an EM algorithm, the parameters are updated during the M-step : � p U k is formed by the d k first eigenvectors of the sample covariance matrix � p a kj is the j -th eigenvalue of the sample covariance matrix 29/48
Recommend
More recommend