Robust PCA Data increasingly high dimensional Gross errors frequently occur in many applications Image processing Occlusions Web data analysis Malicious tampering Bioinformatics Sensor failures ... ... ❆ x 12 . . . x 1 n ❆ x 21 x 22 . . . . . . . . . . . . . . . ❆ x d 1 . . . x dn Important to make PCA robust
Gross errors Movies × × ❆ ❆ ❆ × Users × × × ❆ × Observe corrupted entries Y ij = L ij + S ij ( i, j ) ∈ Ω obs L low-rank matrix S entries that have been tampered with (impulsive noise) Problem Recover L from missing and corrupted samples
When does separation make sense? M = L + S ∗ 0 0 0 · · · 0 0 ∗ 0 0 0 · · · 0 0 Sparse component cannot be low rank: S = . . . . . . . . . . . . . . . . . . ∗ 0 0 0 · · · 0 0 Sparsity pattern will be assumed (uniform) random
When does separation make sense? M = L + S ∗ 0 0 0 · · · 0 0 ∗ 0 0 0 · · · 0 0 Sparse component cannot be low rank: S = . . . . . . . . . . . . . . . . . . ∗ 0 0 0 · · · 0 0 Sparsity pattern will be assumed (uniform) random ∗ ∗ ∗ ∗ · · · ∗ ∗ 0 0 0 0 · · · 0 0 Low-rank component cannot be sparse: L = . . . . . . . . . . . . . . . . . . 0 0 0 0 · · · 0 0
Sparse component cannot be low-rank ❆ x 1 x 2 · · · x n − 1 x n x 2 · · · x n − 1 x n x 1 x 2 · · · x n − 1 x n ❆ · · · x 2 x n − 1 x n ⇒ L + S = L = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . · · · x 1 x 2 x n − 1 x n ❆ x 2 · · · x n − 1 x n � �� � 1 x ∗
Low-rank component cannot be sparse x 1 x 2 x 3 x 4 · · · x n − 1 x n x 1 x 2 x 3 x 4 · · · x n − 1 x n 0 0 0 0 · · · 0 0 L = 0 0 0 0 · · · 0 0 . . . . . . . . . . . . . . . . . . 0 0 0 0 · · · 0 0 Incoherent condition [C. and Recht (’08)]: column and row spaces not aligned with coordinate axes (cannot have small subsets of rows and/or columns that are singular)
Low-rank component cannot be sparse ❆ x 1 x 2 x 4 · · · x n − 1 x n ❆ ❆ x 1 x 2 x 4 · · · x n − 1 ❆ ❆ 0 0 · · · 0 0 M = ❆ 0 0 0 · · · 0 0 . . . . . . . . . . . . . . . . . . ❆ ❆ 0 0 0 · · · 0 Incoherent condition [C. and Recht (’08)]: column and row spaces not aligned with coordinate axes (cannot have small subsets of rows and/or columns that are singular)
Demixing by convex programming M = L + S L unknown (rank unknown) S unknown (# of entries � = 0 , locations, magnitudes all unknown)
Demixing by convex programming M = L + S L unknown (rank unknown) S unknown (# of entries � = 0 , locations, magnitudes all unknown) Recovery via SDP � ˆ L � ∗ + λ � ˆ S � 1 minimize L + ˆ ˆ subject to S = M See also Chandrasekaran, Sanghavi, Parrilo, Willsky (’09) nuclear norm: � L � ∗ = � i σ i ( L ) (sum of sing. values) ℓ 1 norm: � S � 1 = � ij | S ij | (sum of abs. values)
❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ Exact recovery via SDP min � ˆ L � ∗ + λ � ˆ L + ˆ ˆ S � 1 s. t. S = M
❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ Exact recovery via SDP min � ˆ L � ∗ + λ � ˆ L + ˆ ˆ S � 1 s. t. S = M Theorem L is n × n of rank ( L ) ≤ ρ r n (log n ) − 2 and incoherent S is n × n , random sparsity pattern of cardinality at most ρ s n 2 Then with probability 1 − O ( n − 10 ) , SDP with λ = 1 / √ n is exact: ˆ ˆ L = L, S = S √ Same conclusion for rectangular matrices with λ = 1 / max dim
Exact recovery via SDP min � ˆ L � ∗ + λ � ˆ L + ˆ ˆ S � 1 s. t. S = M Theorem L is n × n of rank ( L ) ≤ ρ r n (log n ) − 2 and incoherent S is n × n , random sparsity pattern of cardinality at most ρ s n 2 Then with probability 1 − O ( n − 10 ) , SDP with λ = 1 / √ n is exact: ˆ ˆ L = L, S = S √ Same conclusion for rectangular matrices with λ = 1 / max dim ❆ ❆ ❆ ❆ × × ❆ ❆ ❆ ❆ × × No tuning parameter! × ❆ ❆ × ❆ ❆ Whatever the magnitudes of L and S ❆ ❆ × ❆ ❆ × × ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ ❆ × ×
Phase transitions in probability of success (a) RPCA, Random Signs (b) RPCA, Coherent Signs (c) Matrix Completion L = XY T is a product of independent n × r i.i.d. N (0 , 1 /n ) matrices
Missing and corrupted ❆ ? ? ? × × RPCA ❆ ? ? ? ? × ? ? ? ? × × ❆ ? ? ? ? � ˆ L � ∗ + λ � ˆ × min S � 1 ❆ ? ? ? ? × L ij + ˆ ˆ s. t. S ij = L ij + S ij ( i, j ) ∈ Ω obs ❆ ? ? ? ? × Same theorem: with high prob. 1 ˆ λ = √ = ⇒ L = L ! frac. observed × max dim
Video surveillance Sequence of 200 video frames ( 144 × 172 pixels) with a static background Problem: detect any activity in the foreground … RPCA …
L + S background subtraction
L + S background subtraction From GoDec
L + S reconstruction of MR angiography L + S L S automatic and improved background suppression Joint with R. Otazo and D. Sodickson
Free-breathing MRI of the liver NUFFT Standard L + S Motion-Guided L + S 12.8 fold acceleration min � L � ∗ + λ � S � 1 s. t. A ( L + S ) = y Joint with R. Otazo and D. Sodickson
Free-breathing MRI of the liver NUFFT Standard L + S Motion-Guided L + S Temporal blurring Joint with R. Otazo and D. Sodickson
Free-breathing MRI of the kidneys NUFFT Standard L + S Motion-Guided L + S 12.8 fold acceleration min � L � ∗ + λ � S � 1 s. t. A ( L + S ) = y Joint with R. Otazo and D. Sodickson
Free-breathing MRI of the kidneys NUFFT Standard L + S Motion-Guided L + S Joint with R. Otazo and D. Sodickson
Story #3: Super-resolution Collaborator: C. Fernandez-Granda
Limits of resolution In any optical imaging system, diffraction imposes fundamental limit on resolution The physical phenomenon called diffraction is of the utmost importance in the theory of optical imaging systems (Joseph Goodman) Interested in usual bandlimited imaging systems
Pupil Airy disk Cross section
Rayleigh resolution limit Lord Rayleigh
The super-resolution problem ⇐ Fundamental problem objective data Radar Retrieve fine scale information from low-pass data Microscopy Spectroscopy Medical imaging Astronomy Geophysics ⇐ ... Equivalent description: extrapolate spectrum
Single molecule imaging Microscope receives light from fluorescent molecules Problem Resolution is much coarser than size of individual molecules (low-pass data) Can we ‘beat’ the diffraction limit and super-resolve those molecules? Higher molecule density − → faster imaging
Mathematical model Signal x = � j a j δ τ j a j ∈ C , τ j ∈ [0 , 1] Data y = F n x : n = 2 f lo + 1 low-frequency coefficients (Nyquist sampling) � 1 � e − i 2 πkt x ( d t ) = a j e − i 2 πkτ j k ∈ Z , | k | ≤ f lo y ( k ) = 0 j Resolution limit: ( λ lo / 2 is Rayleigh distance) 1 /f lo = λ lo
Mathematical model Signal x = � j a j δ τ j a j ∈ C , τ j ∈ [0 , 1] Data y = F n x : n = 2 f lo + 1 low-frequency coefficients (Nyquist sampling) � 1 � e − i 2 πkt x ( d t ) = a j e − i 2 πkτ j k ∈ Z , | k | ≤ f lo y ( k ) = 0 j Resolution limit: ( λ lo / 2 is Rayleigh distance) 1 /f lo = λ lo Question Can we resolve the signal beyond this limit?
Can you find the spikes? Low-frequency data about spike train
Can you find the spikes? Low-frequency data about spike train
Recovery by minimum total variation Recovery by cvx prog. min � ˜ x � TV subject to F n ˜ x = y � | x ( d t ) | is continuous analog of discrete ℓ 1 norm � x � ℓ 1 = � � x � TV = t | x t | � � x = a j δ τ j = ⇒ � x � TV = | a j | j j Work on ℓ 1 minimization: Logan, Donoho, Stark, Tropp, Elad, C. , Tao...
Recovery by convex programming � 1 e − i 2 πkt x ( d t ) | k | ≤ f lo y ( k ) = 0
Recovery by convex programming � 1 e − i 2 πkt x ( d t ) | k | ≤ f lo y ( k ) = 0 Theorem (C. and Fernandez Granda (2012)) If spikes are separated by at least 1 . 86 /f lo := 1 . 86 λ lo then min TV solution is exact! (Current state of the art 1 . 4 λ lo )
Recovery by convex programming � 1 e − i 2 πkt x ( d t ) | k | ≤ f lo y ( k ) = 0 Theorem (C. and Fernandez Granda (2012)) If spikes are separated by at least 1 . 86 /f lo := 1 . 86 λ lo then min TV solution is exact! (Current state of the art 1 . 4 λ lo ) Infinite precision! (Whatever the amplitudes)
Recovery by convex programming � 1 e − i 2 πkt x ( d t ) | k | ≤ f lo y ( k ) = 0 Theorem (C. and Fernandez Granda (2012)) If spikes are separated by at least 1 . 86 /f lo := 1 . 86 λ lo then min TV solution is exact! (Current state of the art 1 . 4 λ lo ) Infinite precision! (Whatever the amplitudes) Cannot go below λ lo
Recovery by convex programming � 1 e − i 2 πkt x ( d t ) | k | ≤ f lo y ( k ) = 0 Theorem (C. and Fernandez Granda (2012)) If spikes are separated by at least 1 . 86 /f lo := 1 . 86 λ lo then min TV solution is exact! (Current state of the art 1 . 4 λ lo ) Infinite precision! (Whatever the amplitudes) Cannot go below λ lo Can recover (2 λ lo ) − 1 = f lo / 2 = n/ 4 spikes from n low-freq. samples
Recovery by convex programming � 1 e − i 2 πkt x ( d t ) | k | ≤ f lo y ( k ) = 0 Theorem (C. and Fernandez Granda (2012)) If spikes are separated by at least 1 . 86 /f lo := 1 . 86 λ lo then min TV solution is exact! (Current state of the art 1 . 4 λ lo ) Infinite precision! (Whatever the amplitudes) Cannot go below λ lo Can recover (2 λ lo ) − 1 = f lo / 2 = n/ 4 spikes from n low-freq. samples Essentially same result in higher dimensions
Formulation as a finite-dimensional problem Dual problem Primal problem max Re � y, c � s. t. �F ∗ n c � ∞ ≤ 1 min � x � TV s. t. F n x = y Finite-dimensional variable c Infinite-dimensional variable x Infinitely many constraints Finitely many constraints � ( F ∗ c k e i 2 πkt n c )( t ) = | k |≤ f lo
Formulation as a finite-dimensional problem Dual problem Primal problem max Re � y, c � s. t. �F ∗ n c � ∞ ≤ 1 min � x � TV s. t. F n x = y Finite-dimensional variable c Infinite-dimensional variable x Infinitely many constraints Finitely many constraints � ( F ∗ c k e i 2 πkt n c )( t ) = | k |≤ f lo Semidefinite representability | ( F ∗ n c )( t ) | ≤ 1 for all t ∈ [0 , 1] equivalent to (1) there is Q Hermitian s. t. � � Q c � 0 c ∗ 1 (2) Tr ( Q ) = 1 (3) sums along superdiagonals vanish: � n − j i =1 Q i,i + j = 0 for 1 ≤ j ≤ n − 1
Dual solution c : coeffs. of low-pass trig. poly. � k c k e i 2 πkt interpolating sign of primal solution
Super-resolution via semidefinite programming
Super-resolution via semidefinite programming
Super-resolution via semidefinite programming 1. Solve semidefinite program to obtain dual solution
Super-resolution via semidefinite programming 2. Locate points at which corresponding polynomial has unit magnitude
Super-resolution via semidefinite programming Signal Estimate 3. Estimate amplitudes via least squares
Support-location accuracy f c 25 50 75 100 6 . 66 10 − 9 1 . 70 10 − 9 5 . 58 10 − 10 2 . 96 10 − 10 Average error 1 . 83 10 − 7 8 . 14 10 − 8 2 . 55 10 − 8 2 . 31 10 − 8 Maximum error For each f c , 100 random signals with | T | = f c / 4 and ∆( T ) ≥ 2 / f c
Example Minimum separation : 1 . 5 λ c
Example SNR 20 dB Noisy Noiseless
Recommend
More recommend