Gabriel Peyré
www.numerical-tours.com
Low Complexity Regularization of
Inverse Problems
Samuel Vaiter Jalal Fadili Joint work with:
VISI N
Inverse Problems Recovering x 0 R N from noisy observations y = x 0 - - PowerPoint PPT Presentation
Low Complexity Regularization of Inverse Problems Joint work with: Samuel Vaiter Jalal Fadili Gabriel Peyr VISI N www.numerical-tours.com Inverse Problems Recovering x 0 R N from noisy observations y = x 0 + w R P Inverse
Gabriel Peyré
www.numerical-tours.com
Samuel Vaiter Jalal Fadili Joint work with:
VISI N
y = Φx0 + w ∈ RP
Inverse Problems
Recovering x0 ∈ RN from noisy observations
y = Φx0 + w ∈ RP Φ Examples: Inpainting, super-resolution, . . .
Inverse Problems
Recovering x0 ∈ RN from noisy observations
x0
Φ
Φx = (pθk)1kK
Inverse Problems in Medical Imaging
Magnetic resonance imaging (MRI): Φx = (pθk)1kK Φx = ( ˆ f(ω))ω∈Ω
Inverse Problems in Medical Imaging
ˆ x
Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . . . Φx = (pθk)1kK Φx = ( ˆ f(ω))ω∈Ω
Inverse Problems in Medical Imaging
ˆ x
˜ x0
Compressed Sensing
[Rice Univ.]
P measures N micro-mirrors ˜ x0 y[i] = hx0, ϕi
Compressed Sensing
[Rice Univ.]
P/N = 0.16 P/N = 0.02 P/N = 1 P measures N micro-mirrors ˜ x0 y[i] = hx0, ϕi
Compressed Sensing
[Rice Univ.]
Inverse Problem Regularization
parameter λ Estimator: x(y) depends only on Observations: y = Φx0 + w ∈ RP .
Inverse Problem Regularization
parameter λ Example: variational methods Estimator: x(y) depends only on x(y) ∈ argmin
x∈RN
1 2| |y Φx| |2 + λ J(x) Data fidelity Regularity Observations: y = Φx0 + w ∈ RP .
J(x0) Regularity of x0
Inverse Problem Regularization
parameter λ Example: variational methods Estimator: x(y) depends only on x(y) ∈ argmin
x∈RN
1 2| |y Φx| |2 + λ J(x) Data fidelity Regularity Observations: y = Φx0 + w ∈ RP . Choice of λ: tradeoff | |w| | Noise level
J(x0) Regularity of x0 x(y) ∈ argmin
Φx=y
J(x)
Inverse Problem Regularization
parameter λ Example: variational methods Estimator: x(y) depends only on x(y) ∈ argmin
x∈RN
1 2| |y Φx| |2 + λ J(x) Data fidelity Regularity Observations: y = Φx0 + w ∈ RP . No noise: λ → 0+, minimize Choice of λ: tradeoff | |w| | Noise level
J(x0) Regularity of x0 x(y) ∈ argmin
Φx=y
J(x)
Inverse Problem Regularization
parameter λ Example: variational methods Estimator: x(y) depends only on x(y) ∈ argmin
x∈RN
1 2| |y Φx| |2 + λ J(x) Data fidelity Regularity Observations: y = Φx0 + w ∈ RP . No noise: λ → 0+, minimize Choice of λ: tradeoff | |w| | Noise level
|w| |, λ) to ensure
| |x(y) − x0| | = O(| |w| |)
model stability Performance analysis:
Overview
Ψ Coefficients x Image Ψx M
Union of Models for Data Processing
Synthesis sparsity: Union of models: M ⊂ RN subspaces or manifolds.
Ψ Coefficients x Image Ψx M
Union of Models for Data Processing
Synthesis sparsity: Structured sparsity: Union of models: M ⊂ RN subspaces or manifolds.
Ψ Coefficients x Image Ψx M
Union of Models for Data Processing
D∗ Image x
Gradient D∗x
Synthesis sparsity: Structured sparsity: Analysis sparsity: Union of models: M ⊂ RN subspaces or manifolds.
Ψ Coefficients x Image Ψx Multi-spectral imaging:
xi,· = r
j=1 Ai,jSj,·
M
Union of Models for Data Processing
D∗ Image x
Gradient D∗x
Synthesis sparsity: Structured sparsity: Analysis sparsity: Low-rank: S1,· S2,· S3,·
x
Union of models: M ⊂ RN subspaces or manifolds.
(iii) ∂J is continuous on Mx around x. [Lewis 2003] (i) J is C2 along Mx around x ; (ii) ∀ h 2 Tx(Mx)⊥, t 7! J(x + th) non-smooth at t = 0. Tx(Mx)
Partly Smooth Functions
J : RN → R is partly smooth at x for a manifold Mx
x
Mx
J(x) = max(0, | |x| | − 1)
x
Mx Mx ={z ; supp(z) ⊂ supp(x)}
Examples of Partly-smooth Regularizers
J(x) = | |x| |1
x
Mx 1 sparsity: J(x) = | |x| |1
x
Mx same Mx Mx ={z ; supp(z) ⊂ supp(x)}
Examples of Partly-smooth Regularizers
J(x) = | |x| |1
x
Mx
J(x)=|x1|+| |x2,3| |
x x
Mx Mx 1 sparsity: J(x) = | |x| |1 Structured sparsity: J(x) =
b |
|xb| |
x
Mx Mx = {x ; rank(z) = rank(x)} same Mx Mx ={z ; supp(z) ⊂ supp(x)}
Examples of Partly-smooth Regularizers
J(x) = | |x| |1
x
Mx
J(x) = | |x| |∗
x
Mx
J(x)=|x1|+| |x2,3| |
x x
Mx Mx 1 sparsity: J(x) = | |x| |1 Structured sparsity: J(x) =
b |
|xb| | Nuclear norm: J(x) = | |x| |∗
x
Mx Mx = {x ; rank(z) = rank(x)} same Mx I = {i ; |xi| = | |x| |∞} Mx ={z ; supp(z) ⊂ supp(x)} Mx = {z ; zI ∝ xI}
Examples of Partly-smooth Regularizers
J(x) = | |x| |1
x
Mx Mx
J(x) = | |x| |∞
x x
Mx
J(x) = | |x| |∗
x
Mx
J(x)=|x1|+| |x2,3| |
x x
Mx Mx Anti-sparsity: J(x) = | |x| |∞ 1 sparsity: J(x) = | |x| |1 Structured sparsity: J(x) =
b |
|xb| | Nuclear norm: J(x) = | |x| |∗
Overview
Φ x = Φ x0
Dual Certificates
Noiseless recovery: min
Φx=Φx0 J(x)
(P0)
x0
Dual certificates:
Φ x = Φ x0
Proposition: D(x0) = Im(Φ∗) ∩ ∂J(x0) x0 solution of (P0) ( ) ∃ η 2 D(x0) ∂J(x) = {⌘ ; 8 y, J(y) J(x) + h⌘, y x}
Dual Certificates
η ∂J(x0) Noiseless recovery:
min
Φx=Φx0 J(x)
(P0)
x0
Dual certificates:
Φ x = Φ x0
Proposition: D(x0) = Im(Φ∗) ∩ ∂J(x0) x0 solution of (P0) ( ) ∃ η 2 D(x0) Example: J(x) = | |x| |1 Φx = x ' D(x0) = {⌘ = x ' ; ⌘i = sign(x0,i), | |⌘| |∞ 1} ∂J(x) = {⌘ ; 8 y, J(y) J(x) + h⌘, y x}
Dual Certificates
η ∂J(x0) Noiseless recovery:
min
Φx=Φx0 J(x)
(P0)
x0 η η
= interior for the topology of aff(E) ri(E) = relative interior of E Non degenerate dual certificate:
Dual Certificates and L2 Stability
Φ x = Φ x0
η ∂J(x0) x
¯ D(x0) = Im(Φ∗) ∩ ri(∂J(x0))
= interior for the topology of aff(E) ri(E) = relative interior of E Non degenerate dual certificate:
Dual Certificates and L2 Stability
Theorem:
[Fadili et al. 2013]
If ∃ ⌘ 2 ¯ D(x0), for λ ⇠ | |w| | one has | |x x0| | = O(| |w| |)
Φ x = Φ x0
η ∂J(x0) x
¯ D(x0) = Im(Φ∗) ∩ ri(∂J(x0))
= interior for the topology of aff(E) ri(E) = relative interior of E Non degenerate dual certificate:
Dual Certificates and L2 Stability
[Grassmair 2012]: J(x − x0) = O(| |w| |). [Grassmair, Haltmeier, Scherzer 2010]: J = | | · | |1. Theorem:
[Fadili et al. 2013]
If ∃ ⌘ 2 ¯ D(x0), for λ ⇠ | |w| | one has | |x x0| | = O(| |w| |)
Φ x = Φ x0
η ∂J(x0) x
¯ D(x0) = Im(Φ∗) ∩ ri(∂J(x0))
= interior for the topology of aff(E) ri(E) = relative interior of E Non degenerate dual certificate:
Dual Certificates and L2 Stability
[Grassmair 2012]: J(x − x0) = O(| |w| |). [Grassmair, Haltmeier, Scherzer 2010]: J = | | · | |1.
Theorem:
[Fadili et al. 2013]
If ∃ ⌘ 2 ¯ D(x0), for λ ⇠ | |w| | one has | |x x0| | = O(| |w| |)
Φ x = Φ x0
η ∂J(x0) x
¯ D(x0) = Im(Φ∗) ∩ ri(∂J(x0))
Random matrix: Φi,j ∼ N(0, 1), i.i.d. Φ ∈ RP ×N,
Compressed Sensing Setting
Theorem: Random matrix: Φi,j ∼ N(0, 1), i.i.d. Sparse vectors: J = | | · | |1. Let s = | |x0| |0. If Then ∃η 2 ¯ D(x0) with high probability on Φ. Φ ∈ RP ×N,
[Chandrasekaran et al. 2011]
P 2s log (N/s)
[Rudelson, Vershynin 2006]
Compressed Sensing Setting
Theorem: Then ∃η 2 ¯ D(x0) with high probability on Φ. Theorem: Random matrix: Φi,j ∼ N(0, 1), i.i.d. Sparse vectors: J = | | · | |1. Low-rank matrices: J = | | · | |∗. Let s = | |x0| |0. If Let r = rank(x0). If Then ∃η 2 ¯ D(x0) with high probability on Φ. Φ ∈ RP ×N,
[Chandrasekaran et al. 2011]
P 2s log (N/s) P 3r(N1 + N2 − r) x0 ∈ RN1×N2
[Rudelson, Vershynin 2006]
Compressed Sensing Setting
[Chandrasekaran et al. 2011]
Theorem: Then ∃η 2 ¯ D(x0) with high probability on Φ. Theorem: Random matrix: Φi,j ∼ N(0, 1), i.i.d. Sparse vectors: J = | | · | |1. Low-rank matrices: J = | | · | |∗. Let s = | |x0| |0. If Let r = rank(x0). If Then ∃η 2 ¯ D(x0) with high probability on Φ. Φ ∈ RP ×N,
| · | |1,2, | | · | |∞.
[Chandrasekaran et al. 2011]
P 2s log (N/s) P 3r(N1 + N2 − r) x0 ∈ RN1×N2
[Rudelson, Vershynin 2006]
Compressed Sensing Setting
[Chandrasekaran et al. 2011]
From [Amelunxen et al. 2013]
Phase Transitions
P/N 1 s/N r/ √ N J = | | · | |1 J = | | · | |∗
25 50 75 100 25 50 75 100 10 20 30 300 600 900
1 1 1 P/N
Overview
Minimal Norm Certificate
η0 = argmin
⌘=Φ∗q∈∂J(x0)
| |q| | Minimal-norm certificate:
∂J(x0) ⊂ A(x0) = AffHull(∂J(x0))
Minimal Norm Certificate
η0 = argmin
⌘=Φ∗q∈∂J(x0)
| |q| | Minimal-norm certificate: Case J = | | · | |1 A(x0)
x0
∂J(x0) T =Tx0(Mx0)
∂J(x0) ⊂ A(x0) = AffHull(∂J(x0))
Minimal Norm Certificate
η0 = argmin
⌘=Φ∗q∈∂J(x0)
| |q| | Minimal-norm certificate: ηF = argmin
η=Φ∗q∈A(x0)
| |q| | Linearized pre-certificate: Case J = | | · | |1 A(x0)
x0
∂J(x0) T =Tx0(Mx0)
∂J(x0) ⊂ A(x0) = AffHull(∂J(x0))
Minimal Norm Certificate
η0 = argmin
⌘=Φ∗q∈∂J(x0)
| |q| | Minimal-norm certificate: ηF = argmin
η=Φ∗q∈A(x0)
| |q| | Linearized pre-certificate: Case J = | | · | |1 A(x0)
x0
∂J(x0) T =Tx0(Mx0)
∂J(x0) ⊂ A(x0) = AffHull(∂J(x0))
Minimal Norm Certificate
η0 = argmin
⌘=Φ∗q∈∂J(x0)
| |q| | Minimal-norm certificate: Theorem: If ker(Φ) ∩ T = {0},
D(x0) = ) ηF = η0, η0 ∈ ¯ D(x0) = ) ηF = η0. ηF = argmin
η=Φ∗q∈A(x0)
| |q| | Linearized pre-certificate: Case J = | | · | |1 A(x0)
x0
∂J(x0) T =Tx0(Mx0)
[Vaiter et al. 2014]
Model Stability
Theorem: the unique solution x of P(y) for y = Φx0 + w satisfies If ηF ∈ ¯ D(x0), there exists C such that if max (λ, | |w| |/λ) C x ∈ Mx0 and | |x x0| | = O(| |w| |, λ)
[Fuchs 2004]: J = | | · | |1. [Bach 2008]: J = | | · | |1,2 and J = | | · | |∗. Previous works: [Vaiter et al. 2014]
Model Stability
Theorem: the unique solution x of P(y) for y = Φx0 + w satisfies If ηF ∈ ¯ D(x0), there exists C such that if max (λ, | |w| |/λ) C [Vaiter et al. 2011]: J = | |D∗ · | |1. x ∈ Mx0 and | |x x0| | = O(| |w| |, λ)
Compressed Sensing Setting
Theorem: Random matrix: Φi,j ∼ N(0, 1), i.i.d. Sparse vectors: J = | | · | |1. Let s = | |x0| |0. If Φ ∈ RP ×N, Then η0 ∈ ¯ D(x0) with high probability on Φ.
[Dossal et al. 2011] [Wainwright 2009]
P > 2s log(N)
P ∼ 2s log(N) P ∼ 2s log(N/s) L2 stability Model stability Phase transitions: vs.
Compressed Sensing Setting
Theorem: Random matrix: Φi,j ∼ N(0, 1), i.i.d. Sparse vectors: J = | | · | |1. Let s = | |x0| |0. If Φ ∈ RP ×N, Then η0 ∈ ¯ D(x0) with high probability on Φ.
[Dossal et al. 2011] [Wainwright 2009]
P > 2s log(N)
| · | |1,2, | | · | |⇤, | | · | |∞. P ∼ 2s log(N) P ∼ 2s log(N/s) L2 stability Model stability Phase transitions: vs.
Compressed Sensing Setting
Theorem: Random matrix: Φi,j ∼ N(0, 1), i.i.d. Sparse vectors: J = | | · | |1. Let s = | |x0| |0. If Φ ∈ RP ×N, Then η0 ∈ ¯ D(x0) with high probability on Φ.
[Dossal et al. 2011] [Wainwright 2009]
P > 2s log(N)
| · | |1,2, | | · | |⇤, | | · | |∞.
P ∼ 2s log(N) P ∼ 2s log(N/s) L2 stability Model stability Phase transitions: vs.
Compressed Sensing Setting
Theorem: Random matrix: Φi,j ∼ N(0, 1), i.i.d. Sparse vectors: J = | | · | |1. Let s = | |x0| |0. If Φ ∈ RP ×N, Then η0 ∈ ¯ D(x0) with high probability on Φ.
[Dossal et al. 2011] [Wainwright 2009]
P > 2s log(N)
Φx =
xiϕ(· − i) Increasing ∆: → reduces correlation. → reduces resolution. J(x) = | |x| |1
1-D Sparse Spikes Deconvolution
Φx0
x0
γ
Φx =
xiϕ(· − i) Increasing ∆: → reduces correlation. → reduces resolution. γ 10 2 support recovery. J(x) = | |x| |1 ( ⇒ I = {j \ x0(j) 6= 0} | |ηF,Ic| |∞ | |ηF,Ic| |∞ < 1 η0 = ηF ∈ ¯ D(x0)
1-D Sparse Spikes Deconvolution
Φx0
x0
γ
20 1 ( ⇒
Conclusion
Partial smoothness: encodes models using singularities.
Performance measures L2 error model different CS guarantees
Conclusion
Partial smoothness: encodes models using singularities.
Performance measures L2 error model different CS guarantees Specific certificate: η0, ηF , . . .
Conclusion
Partial smoothness: encodes models using singularities.
Performance measures L2 error model different CS guarantees Specific certificate: – CS performance with arbitrary gauges. – Infinite dimensional regularizations (BV, . . . ) – Convergence discrete → continuous. η0, ηF , . . .
Conclusion
Open problems:
Partial smoothness: encodes models using singularities.