Early use of ` 1 norm Rich history in applied science Logan (50’s) Claerbout (70’s) Santosa and Symes (80’s) Donoho (90’s) Osher and Rudin (90’s) Tibshirani (90’s) Many since then Ben Logan (1927–) Mathematician Bluegrass music fiddler
A Taste of Analysis: Geometry and Probability
Geometry C = { h : k x + th k k x k for some t > 0 } cone of descent Exact recovery if C \ null ( A ) = { 0 }
Geometry C = { h : k x + th k k x k for some t > 0 } cone of descent Exact recovery if C \ null ( A ) = { 0 }
Geometry
Gaussian models Entries of A iid N (0 , 1) � ! row vectors a 1 , . . . , a m are iid N (0 , I ) Important consequence: null ( A ) uniformly distributed P ( C \ null ( A ) = { 0 } ) volume calculation
Volume calculations: geometric functional analysis
Volume of a cone C o = { y : h y, z i 0 for all z 2 C} Polar cone polar cone 0 C g C descent cone Statistical dimension z 2 C o k g � z k 2 ` 2 = E g k ⇡ C ( g ) k 2 � ( C ) := E g min g ⇠ N (0 , I ) ` 2
Volume of a cone C o = { y : h y, z i 0 for all z 2 C} Polar cone polar cone polar cone 0 C g g C descent cone descent cone Statistical dimension z 2 C o k g � z k 2 ` 2 = E g k ⇡ C ( g ) k 2 � ( C ) := E g min g ⇠ N (0 , I ) ` 2
Gordon’s escape lemma Theorem (Gordon ’88) Convex cone K ⇢ R n and m ⇥ n Gaussian matrix A . With prob. at least 1 � e � t 2 / 2 p � ( K ) + t ) 2 + 1 null ( A ) \ K = { 0 } m � ( = ) |{z} codim ( null ( A ))
Gordon’s escape lemma Theorem (Gordon ’88) Convex cone K ⇢ R n and m ⇥ n Gaussian matrix A . With prob. at least 1 � e � t 2 / 2 p � ( K ) + t ) 2 + 1 null ( A ) \ K = { 0 } m � ( = ) |{z} codim ( null ( A )) Implication: exact recovery if m � � ( C ) (roughly) [Rudelson & Vershynin (’08)]
Gordon’s escape lemma Theorem (Gordon ’88) Convex cone K ⇢ R n and m ⇥ n Gaussian matrix A . With prob. at least 1 � e � t 2 / 2 p � ( K ) + t ) 2 + 1 null ( A ) \ K = { 0 } m � ( = ) |{z} codim ( null ( A )) Implication: exact recovery if m � � ( C ) (roughly) [Rudelson & Vershynin (’08)] Gordon’s lemma originally stated with Gaussian width w ( K ) := E g z 2 K \ S n − 1 h g, z i sup � ( K ) � 1 w 2 ( K ) � ( K )
Statistical dimension of ` 1 descent cone C o is cone of subdi ff erential polar cone 0 C C o = { t u : t > 0 and u 2 @ k x k } g u 2 @ k x k i ff 8 h k x + h k � k x k + h u, h i C descent cone
Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1 i s u 2 @ k x ? k ` 1 ( ) | u i | 1 i > s C descent cone
Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1 i s u 2 @ k x ? k ` 1 ( ) | u i | 1 i > s C descent cone 8 9 < = X X ( g i � tu i ) 2 + z 2 C o k g � z k 2 ( g i � tu i ) 2 E g min = E inf ` 2 t � 0 u 2 @ k x ? k ` 1 : ; i s i>s | {z } � ( C )
Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1 i s u 2 @ k x ? k ` 1 ( ) | u i | 1 i > s C descent cone 8 9 < = X X ( g i ± t ) 2 + z 2 C o k g � z k 2 ( | g i | � t ) 2 E g min = E inf ` 2 + t � 0 : ; i s i>s | {z } � ( C )
Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1 i s u 2 @ k x ? k ` 1 ( ) | u i | 1 i > s C descent cone � z 2 C o k g � z k 2 s · (1 + t 2 ) + ( n � s ) · E ( | g 1 | � t ) 2 E g min inf + ` 2 t � 0 | {z } � ( C )
Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1 i s u 2 @ k x ? k ` 1 ( ) | u i | 1 i > s C descent cone � z 2 C o k g � z k 2 s · (1 + t 2 ) + ( n � s ) · E ( | g 1 | � t ) 2 E g min inf + ` 2 t � 0 | {z } � ( C ) 2 s log( n/s ) + 2 s | {z } su ffi cient # of equations Stojnic (’09); Chandrasekaharan, Recht, Parrilo, Willsky (’12)
Phase transitions for Gaussian maps Theorem (Amelunxen, Lotz, McCoy and Tropp ’13) C is descent cone (norm k · k ) at fixed x ? 2 R n . Then for a fixed " 2 (0 , 1) p n m � ( C ) � a " = ) cvx. prog. succeeds with prob. " p n m � � ( C ) + a " = ) cvx. prog. succeeds with prob. � 1 � " p a " = 8 log(4 / " )
Phase transitions for Gaussian maps Theorem (Amelunxen, Lotz, McCoy and Tropp ’13) C is descent cone (norm k · k ) at fixed x ? 2 R n . Then for a fixed " 2 (0 , 1) p n m � ( C ) � a " = ) cvx. prog. succeeds with prob. " p n m � � ( C ) + a " = ) cvx. prog. succeeds with prob. � 1 � " p a " = 8 log(4 / " ) 100 900 75 600 50 300 25 0 0 0 25 50 75 100 0 10 20 30
Phase transitions for Gaussian maps Courtesy of Amelunxen, Lotz, McCoy and Tropp 100 900 75 600 50 300 25 0 0 0 25 50 75 100 0 10 20 30 Asymptotic phase transition for ` 1 recovery: Donoho (’06), Donoho & Tanner (’09)
Discrete geometry approach (Donoho and Tanner ’06, ’09) Cross-polytope P = { x 2 R n : k x k ` 1 1 } Projected polytope A P e 3 e 2 Ae 3 Ae 2 e 1 Range of A Ae 1 s -sparse x 2 ( s � 1) -dim face F of P ` 1 succeeds ( ) face F is conserved ( AF : face of projected polytope)
Discrete geometry approach (Donoho and Tanner ’06, ’09) Cross-polytope P = { x 2 R n : k x k ` 1 1 } Projected polytope A P e 3 e 2 Ae 3 Ae 2 e 1 Range of A Ae 1 s -sparse x 2 ( s � 1) -dim face F of P ` 1 succeeds ( ) face F is conserved ( AF : face of projected polytope) Integral geometry of convex sets: McMullen (’75), Gr¨ unbaum (’68) Polytope angle calculations: Vershik and Sporishev (’86, ’92), A ff entranger and Schneider (’92)
Non-Gaussian models MRI Collaborative filtering Under incoherence, cvx. prog. succeeds if m & df · log n |{z} # eqns
Dual certificates min k x k s.t. y = Ax row(A) polar cone x solution i ff there exists v ? null ( A ) and v 2 C o , v 2 @ k x k null(A) descent cone
Dual certificates min k x k s.t. y = Ax null(A) polar cone row(A) x solution i ff there exists v ? null ( A ) and v 2 C o , v 2 @ k x k descent cone
Sparse recovery ( v i = sgn ( x i ) x i 6 = 0 dual v 2 row ( A ) = span ( a 1 , . . . , a m ) and certificate | v i | 1 x i = 0 ! a k ( t ) = e i 2 ⇡! k t , ! k random Example : Fourier sampling � +1 X c k e i 2 ⇡! k t v ( t ) = and k | {z } v 2 row ( A ) -1 sgn ( x ) ( x 6 = 0)
Dual certificate construction ( ( Pv = sgn ( x ) v i x i 6 = 0 v 2 row ( A ) and ( Pv ) i = k ( I � P ) v k ` ∞ 1 0 x i = 0
Dual certificate construction ( ( Pv = sgn ( x ) v i x i 6 = 0 v 2 row ( A ) and ( Pv ) i = k ( I � P ) v k ` ∞ 1 0 x i = 0 Candidate certificate 9 minimize k v k ` 2 = v = A ⇤ A ( PA ⇤ AP ) � 1 sgn ( x ) subject to Pv = sgn ( x ) ; v 2 row ( A )
Dual certificate construction ( ( Pv = sgn ( x ) v i x i 6 = 0 v 2 row ( A ) and ( Pv ) i = k ( I � P ) v k ` ∞ 1 0 x i = 0 Candidate certificate 9 minimize k v k ` 2 = v = A ⇤ A ( PA ⇤ AP ) � 1 sgn ( x ) subject to Pv = sgn ( x ) ; v 2 row ( A ) sgn ( x ) ( x 6 = 0)
Dual certificate construction ( ( Pv = sgn ( x ) v i x i 6 = 0 v 2 row ( A ) and ( Pv ) i = k ( I � P ) v k ` ∞ 1 0 x i = 0 Candidate certificate 9 minimize k v k ` 2 = v = A ⇤ A ( PA ⇤ AP ) � 1 sgn ( x ) subject to Pv = sgn ( x ) ; v 2 row ( A ) Analysis via combinatorial methods sparse signal recovery (C. Romberg and Tao, ’04) matrix completion (C. and Tao ’09) Analysis for matrix completion via tools from geometric functional analysis (C. and Recht, ’08) Gives accurate answers in Gaussian case: m � 2 s log n (C. and Recht, ’12) Widely used since then
Some Immediate and (Far) Less Immediate Applications
Impact on MR pediatrics Lustig (UCB), Pauly, Vasanawala (Stanford) 6 year old 8X acceleration 16 second scan 0.875 mm in-plane 1.6 slice thickness 32 channels
1 year old female with liver lesions: 8X acceleration Lustig (UCB), Pauly, Vasanawala (Stanford) Parallel imaging (PI) Compressed sensing + PI Lesions are barely seen with linear reconstruction
6 year old male abdomen: 8X acceleration Lustig (UCB), Pauly, Vasanawala (Stanford) Parallel imaging (PI) Compressed sensing + PI Fine structures (arrows) are buried in noise (artifacts + noise amplification) and recovered by CS ( ` 1 + wavelets)
6 year old male abdomen: 8X acceleration Lustig (UCB), Pauly, Vasanawala (Stanford) Parallel imaging (PI) Compressed sensing + PI Fine structures (arrows) are buried in noise and recovered by CS
Missing phase problem Eyes and detectors see intensity But light is a wave ! has intensity and phase Phase retrieval x 2 C n find y = | Ax | 2 ( or y k = | h a k , x i | 2 , k = 1 , . . . , m ) subject to
Origin in X-ray crystallography 10 Nobel Prizes in X-ray crystallography, and counting...
Another look at phase retrieval With Eldar, Strohmer and Voroninski | h a k , x i | 2 = y k find x subject to k = 1 , . . . , m Solving quadratic equations is NP hard in general ! ad-hoc solutions
Another look at phase retrieval With Eldar, Strohmer and Voroninski | h a k , x i | 2 = y k find x subject to k = 1 , . . . , m Solving quadratic equations is NP hard in general ! ad-hoc solutions | h a k , x i | 2 = Tr ( a k a ⇤ Lifting : X = xx ⇤ k xx ⇤ ) := Tr ( a k a ⇤ k X ) Phase retrieval problem find X such that A ( X ) = y X ⌫ 0 , rank ( X ) = 1
Another look at phase retrieval With Eldar, Strohmer and Voroninski | h a k , x i | 2 = y k find x subject to k = 1 , . . . , m Solving quadratic equations is NP hard in general ! ad-hoc solutions | h a k , x i | 2 = Tr ( a k a ⇤ Lifting : X = xx ⇤ k xx ⇤ ) := Tr ( a k a ⇤ k X ) Phase retrieval problem PhaseLift find X minimize Tr ( X ) such that A ( X ) = y subject to A ( X ) = y X ⌫ 0 , rank ( X ) = 1 X ⌫ 0
Another look at phase retrieval With Eldar, Strohmer and Voroninski | h a k , x i | 2 = y k find x subject to k = 1 , . . . , m Solving quadratic equations is NP hard in general ! ad-hoc solutions | h a k , x i | 2 = Tr ( a k a ⇤ Lifting : X = xx ⇤ k xx ⇤ ) := Tr ( a k a ⇤ k X ) Phase retrieval problem PhaseLift find X minimize Tr ( X ) such that A ( X ) = y subject to A ( X ) = y X ⌫ 0 , rank ( X ) = 1 X ⌫ 0 Other convex relaxations of quadratically constrained QP’s: Shor (87); Goemans and Williamson (95) [MAX-CUT]
A surprise Phase retrieval PhaseLift find x min Tr ( X ) y k = | h a k , x i | 2 A ( X ) = y, X ⌫ 0 s. t. s. t.
A surprise Phase retrieval PhaseLift find x min Tr ( X ) y k = | h a k , x i | 2 A ( X ) = y, X ⌫ 0 s. t. s. t. Theorem (C. and Li (’12); C., Strohmer and Voroninski (’11)) a k independently and uniformly sampled on unit sphere m & n Then with prob. 1 � O ( e � � m ) , only feasible point is xx ⇤ X ⌫ 0 } = { xx ⇤ } ! { X : A ( X ) = y and Proof via construction of dual certificates
A separation problem Cand` es, Li Wright, Ma (’09) Chandrasekaran, Sanghavi, Parrilo, Willsky (’09) 2 3 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ Y = L + S ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 6 7 6 7 Y : data matrix (observed) ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 6 7 6 7 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ L : low-rank (unobserved) 6 7 6 7 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 4 5 S : sparse (unobserved) ⇥ ⇥ ⇥ ⇥ ⇥ ⇥
A separation problem Cand` es, Li Wright, Ma (’09) Chandrasekaran, Sanghavi, Parrilo, Willsky (’09) 2 3 ⇥ ⇥ Y = L + S 6 ⇥ ⇥ 7 6 7 Y : data matrix (observed) 6 7 ⇥ ⇥ 6 7 L : low-rank (unobserved) 6 7 ⇥ ⇥ 6 7 6 7 S : sparse (unobserved) ⇥ 4 5 ⇥ ⇥
A separation problem Cand` es, Li Wright, Ma (’09) Chandrasekaran, Sanghavi, Parrilo, Willsky (’09) 2 3 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ Y = L + S ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 6 7 6 7 Y : data matrix (observed) ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 6 7 6 7 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ L : low-rank (unobserved) 6 7 6 7 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 4 5 S : sparse (unobserved) ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ Can we recover L and S accurately? Looks impossible Recover low-dimensional structure from corrupted data: approach to robust principal component analysis (PCA)
Recommend
More recommend