mathematics of sparsity and a few other things
play

Mathematics of Sparsity (and a Few Other Things) Emmanuel Cand` es - PowerPoint PPT Presentation

Mathematics of Sparsity (and a Few Other Things) Emmanuel Cand` es International Congress of Mathematicians (ICM 2014), Seoul, August 2014 Some Motivation Magnetic Resonance Imaging (MRI) MR scanner MR image Image from K. Pauly, G. Gold,


  1. Early use of ` 1 norm Rich history in applied science Logan (50’s) Claerbout (70’s) Santosa and Symes (80’s) Donoho (90’s) Osher and Rudin (90’s) Tibshirani (90’s) Many since then Ben Logan (1927–) Mathematician Bluegrass music fiddler

  2. A Taste of Analysis: Geometry and Probability

  3. Geometry C = { h : k x + th k  k x k for some t > 0 } cone of descent Exact recovery if C \ null ( A ) = { 0 }

  4. Geometry C = { h : k x + th k  k x k for some t > 0 } cone of descent Exact recovery if C \ null ( A ) = { 0 }

  5. Geometry

  6. Gaussian models Entries of A iid N (0 , 1) � ! row vectors a 1 , . . . , a m are iid N (0 , I ) Important consequence: null ( A ) uniformly distributed P ( C \ null ( A ) = { 0 } ) volume calculation

  7. Volume calculations: geometric functional analysis

  8. Volume of a cone C o = { y : h y, z i  0 for all z 2 C} Polar cone polar cone 0 C g C descent cone Statistical dimension z 2 C o k g � z k 2 ` 2 = E g k ⇡ C ( g ) k 2 � ( C ) := E g min g ⇠ N (0 , I ) ` 2

  9. Volume of a cone C o = { y : h y, z i  0 for all z 2 C} Polar cone polar cone polar cone 0 C g g C descent cone descent cone Statistical dimension z 2 C o k g � z k 2 ` 2 = E g k ⇡ C ( g ) k 2 � ( C ) := E g min g ⇠ N (0 , I ) ` 2

  10. Gordon’s escape lemma Theorem (Gordon ’88) Convex cone K ⇢ R n and m ⇥ n Gaussian matrix A . With prob. at least 1 � e � t 2 / 2 p � ( K ) + t ) 2 + 1 null ( A ) \ K = { 0 } m � ( = ) |{z} codim ( null ( A ))

  11. Gordon’s escape lemma Theorem (Gordon ’88) Convex cone K ⇢ R n and m ⇥ n Gaussian matrix A . With prob. at least 1 � e � t 2 / 2 p � ( K ) + t ) 2 + 1 null ( A ) \ K = { 0 } m � ( = ) |{z} codim ( null ( A )) Implication: exact recovery if m � � ( C ) (roughly) [Rudelson & Vershynin (’08)]

  12. Gordon’s escape lemma Theorem (Gordon ’88) Convex cone K ⇢ R n and m ⇥ n Gaussian matrix A . With prob. at least 1 � e � t 2 / 2 p � ( K ) + t ) 2 + 1 null ( A ) \ K = { 0 } m � ( = ) |{z} codim ( null ( A )) Implication: exact recovery if m � � ( C ) (roughly) [Rudelson & Vershynin (’08)] Gordon’s lemma originally stated with Gaussian width w ( K ) := E g z 2 K \ S n − 1 h g, z i sup � ( K ) � 1  w 2 ( K )  � ( K )

  13. Statistical dimension of ` 1 descent cone C o is cone of subdi ff erential polar cone 0 C C o = { t u : t > 0 and u 2 @ k x k } g u 2 @ k x k i ff 8 h k x + h k � k x k + h u, h i C descent cone

  14. Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1  i  s u 2 @ k x ? k ` 1 ( ) | u i |  1 i > s C descent cone

  15. Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1  i  s u 2 @ k x ? k ` 1 ( ) | u i |  1 i > s C descent cone 8 9 < = X X ( g i � tu i ) 2 + z 2 C o k g � z k 2 ( g i � tu i ) 2 E g min = E inf ` 2 t � 0 u 2 @ k x ? k ` 1 : ; i  s i>s | {z } � ( C )

  16. Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1  i  s u 2 @ k x ? k ` 1 ( ) | u i |  1 i > s C descent cone 8 9 < = X X ( g i ± t ) 2 + z 2 C o k g � z k 2 ( | g i | � t ) 2 E g min = E inf ` 2 + t � 0 : ; i  s i>s | {z } � ( C )

  17. Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1  i  s u 2 @ k x ? k ` 1 ( ) | u i |  1 i > s C descent cone � z 2 C o k g � z k 2 s · (1 + t 2 ) + ( n � s ) · E ( | g 1 | � t ) 2 E g min  inf + ` 2 t � 0 | {z } � ( C )

  18. Statistical dimension of ` 1 descent cone x ? = ( ⇤ , ⇤ , . . . , ⇤ , 0 , 0 . . . , 0 ) polar cone 0 | {z } | {z } C s times n � s times g ( u i = sgn ( x ? i ) 1  i  s u 2 @ k x ? k ` 1 ( ) | u i |  1 i > s C descent cone � z 2 C o k g � z k 2 s · (1 + t 2 ) + ( n � s ) · E ( | g 1 | � t ) 2 E g min  inf + ` 2 t � 0 | {z } � ( C )  2 s log( n/s ) + 2 s | {z } su ffi cient # of equations Stojnic (’09); Chandrasekaharan, Recht, Parrilo, Willsky (’12)

  19. Phase transitions for Gaussian maps Theorem (Amelunxen, Lotz, McCoy and Tropp ’13) C is descent cone (norm k · k ) at fixed x ? 2 R n . Then for a fixed " 2 (0 , 1) p n m  � ( C ) � a " = ) cvx. prog. succeeds with prob.  " p n m � � ( C ) + a " = ) cvx. prog. succeeds with prob. � 1 � " p a " = 8 log(4 / " )

  20. Phase transitions for Gaussian maps Theorem (Amelunxen, Lotz, McCoy and Tropp ’13) C is descent cone (norm k · k ) at fixed x ? 2 R n . Then for a fixed " 2 (0 , 1) p n m  � ( C ) � a " = ) cvx. prog. succeeds with prob.  " p n m � � ( C ) + a " = ) cvx. prog. succeeds with prob. � 1 � " p a " = 8 log(4 / " ) 100 900 75 600 50 300 25 0 0 0 25 50 75 100 0 10 20 30

  21. Phase transitions for Gaussian maps Courtesy of Amelunxen, Lotz, McCoy and Tropp 100 900 75 600 50 300 25 0 0 0 25 50 75 100 0 10 20 30 Asymptotic phase transition for ` 1 recovery: Donoho (’06), Donoho & Tanner (’09)

  22. Discrete geometry approach (Donoho and Tanner ’06, ’09) Cross-polytope P = { x 2 R n : k x k ` 1  1 } Projected polytope A P e 3 e 2 Ae 3 Ae 2 e 1 Range of A Ae 1 s -sparse x 2 ( s � 1) -dim face F of P ` 1 succeeds ( ) face F is conserved ( AF : face of projected polytope)

  23. Discrete geometry approach (Donoho and Tanner ’06, ’09) Cross-polytope P = { x 2 R n : k x k ` 1  1 } Projected polytope A P e 3 e 2 Ae 3 Ae 2 e 1 Range of A Ae 1 s -sparse x 2 ( s � 1) -dim face F of P ` 1 succeeds ( ) face F is conserved ( AF : face of projected polytope) Integral geometry of convex sets: McMullen (’75), Gr¨ unbaum (’68) Polytope angle calculations: Vershik and Sporishev (’86, ’92), A ff entranger and Schneider (’92)

  24. Non-Gaussian models MRI Collaborative filtering Under incoherence, cvx. prog. succeeds if m & df · log n |{z} # eqns

  25. Dual certificates min k x k s.t. y = Ax row(A) polar cone x solution i ff there exists v ? null ( A ) and v 2 C o , v 2 @ k x k null(A) descent cone

  26. Dual certificates min k x k s.t. y = Ax null(A) polar cone row(A) x solution i ff there exists v ? null ( A ) and v 2 C o , v 2 @ k x k descent cone

  27. Sparse recovery ( v i = sgn ( x i ) x i 6 = 0 dual v 2 row ( A ) = span ( a 1 , . . . , a m ) and certificate | v i |  1 x i = 0 ! a k ( t ) = e i 2 ⇡! k t , ! k random Example : Fourier sampling � +1 X c k e i 2 ⇡! k t v ( t ) = and k | {z } v 2 row ( A ) -1 sgn ( x ) ( x 6 = 0)

  28. Dual certificate construction ( ( Pv = sgn ( x ) v i x i 6 = 0 v 2 row ( A ) and ( Pv ) i = k ( I � P ) v k ` ∞  1 0 x i = 0

  29. Dual certificate construction ( ( Pv = sgn ( x ) v i x i 6 = 0 v 2 row ( A ) and ( Pv ) i = k ( I � P ) v k ` ∞  1 0 x i = 0 Candidate certificate 9 minimize k v k ` 2 = v = A ⇤ A ( PA ⇤ AP ) � 1 sgn ( x ) subject to Pv = sgn ( x ) ; v 2 row ( A )

  30. Dual certificate construction ( ( Pv = sgn ( x ) v i x i 6 = 0 v 2 row ( A ) and ( Pv ) i = k ( I � P ) v k ` ∞  1 0 x i = 0 Candidate certificate 9 minimize k v k ` 2 = v = A ⇤ A ( PA ⇤ AP ) � 1 sgn ( x ) subject to Pv = sgn ( x ) ; v 2 row ( A ) sgn ( x ) ( x 6 = 0)

  31. Dual certificate construction ( ( Pv = sgn ( x ) v i x i 6 = 0 v 2 row ( A ) and ( Pv ) i = k ( I � P ) v k ` ∞  1 0 x i = 0 Candidate certificate 9 minimize k v k ` 2 = v = A ⇤ A ( PA ⇤ AP ) � 1 sgn ( x ) subject to Pv = sgn ( x ) ; v 2 row ( A ) Analysis via combinatorial methods sparse signal recovery (C. Romberg and Tao, ’04) matrix completion (C. and Tao ’09) Analysis for matrix completion via tools from geometric functional analysis (C. and Recht, ’08) Gives accurate answers in Gaussian case: m � 2 s log n (C. and Recht, ’12) Widely used since then

  32. Some Immediate and (Far) Less Immediate Applications

  33. Impact on MR pediatrics Lustig (UCB), Pauly, Vasanawala (Stanford) 6 year old 8X acceleration 16 second scan 0.875 mm in-plane 1.6 slice thickness 32 channels

  34. 1 year old female with liver lesions: 8X acceleration Lustig (UCB), Pauly, Vasanawala (Stanford) Parallel imaging (PI) Compressed sensing + PI Lesions are barely seen with linear reconstruction

  35. 6 year old male abdomen: 8X acceleration Lustig (UCB), Pauly, Vasanawala (Stanford) Parallel imaging (PI) Compressed sensing + PI Fine structures (arrows) are buried in noise (artifacts + noise amplification) and recovered by CS ( ` 1 + wavelets)

  36. 6 year old male abdomen: 8X acceleration Lustig (UCB), Pauly, Vasanawala (Stanford) Parallel imaging (PI) Compressed sensing + PI Fine structures (arrows) are buried in noise and recovered by CS

  37. Missing phase problem Eyes and detectors see intensity But light is a wave ! has intensity and phase Phase retrieval x 2 C n find y = | Ax | 2 ( or y k = | h a k , x i | 2 , k = 1 , . . . , m ) subject to

  38. Origin in X-ray crystallography 10 Nobel Prizes in X-ray crystallography, and counting...

  39. Another look at phase retrieval With Eldar, Strohmer and Voroninski | h a k , x i | 2 = y k find x subject to k = 1 , . . . , m Solving quadratic equations is NP hard in general ! ad-hoc solutions

  40. Another look at phase retrieval With Eldar, Strohmer and Voroninski | h a k , x i | 2 = y k find x subject to k = 1 , . . . , m Solving quadratic equations is NP hard in general ! ad-hoc solutions | h a k , x i | 2 = Tr ( a k a ⇤ Lifting : X = xx ⇤ k xx ⇤ ) := Tr ( a k a ⇤ k X ) Phase retrieval problem find X such that A ( X ) = y X ⌫ 0 , rank ( X ) = 1

  41. Another look at phase retrieval With Eldar, Strohmer and Voroninski | h a k , x i | 2 = y k find x subject to k = 1 , . . . , m Solving quadratic equations is NP hard in general ! ad-hoc solutions | h a k , x i | 2 = Tr ( a k a ⇤ Lifting : X = xx ⇤ k xx ⇤ ) := Tr ( a k a ⇤ k X ) Phase retrieval problem PhaseLift find X minimize Tr ( X ) such that A ( X ) = y subject to A ( X ) = y X ⌫ 0 , rank ( X ) = 1 X ⌫ 0

  42. Another look at phase retrieval With Eldar, Strohmer and Voroninski | h a k , x i | 2 = y k find x subject to k = 1 , . . . , m Solving quadratic equations is NP hard in general ! ad-hoc solutions | h a k , x i | 2 = Tr ( a k a ⇤ Lifting : X = xx ⇤ k xx ⇤ ) := Tr ( a k a ⇤ k X ) Phase retrieval problem PhaseLift find X minimize Tr ( X ) such that A ( X ) = y subject to A ( X ) = y X ⌫ 0 , rank ( X ) = 1 X ⌫ 0 Other convex relaxations of quadratically constrained QP’s: Shor (87); Goemans and Williamson (95) [MAX-CUT]

  43. A surprise Phase retrieval PhaseLift find x min Tr ( X ) y k = | h a k , x i | 2 A ( X ) = y, X ⌫ 0 s. t. s. t.

  44. A surprise Phase retrieval PhaseLift find x min Tr ( X ) y k = | h a k , x i | 2 A ( X ) = y, X ⌫ 0 s. t. s. t. Theorem (C. and Li (’12); C., Strohmer and Voroninski (’11)) a k independently and uniformly sampled on unit sphere m & n Then with prob. 1 � O ( e � � m ) , only feasible point is xx ⇤ X ⌫ 0 } = { xx ⇤ } ! { X : A ( X ) = y and Proof via construction of dual certificates

  45. A separation problem Cand` es, Li Wright, Ma (’09) Chandrasekaran, Sanghavi, Parrilo, Willsky (’09) 2 3 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ Y = L + S ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 6 7 6 7 Y : data matrix (observed) ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 6 7 6 7 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ L : low-rank (unobserved) 6 7 6 7 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 4 5 S : sparse (unobserved) ⇥ ⇥ ⇥ ⇥ ⇥ ⇥

  46. A separation problem Cand` es, Li Wright, Ma (’09) Chandrasekaran, Sanghavi, Parrilo, Willsky (’09) 2 3 ⇥ ⇥ Y = L + S 6 ⇥ ⇥ 7 6 7 Y : data matrix (observed) 6 7 ⇥ ⇥ 6 7 L : low-rank (unobserved) 6 7 ⇥ ⇥ 6 7 6 7 S : sparse (unobserved) ⇥ 4 5 ⇥ ⇥

  47. A separation problem Cand` es, Li Wright, Ma (’09) Chandrasekaran, Sanghavi, Parrilo, Willsky (’09) 2 3 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ Y = L + S ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 6 7 6 7 Y : data matrix (observed) ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 6 7 6 7 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ L : low-rank (unobserved) 6 7 6 7 ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ 4 5 S : sparse (unobserved) ⇥ ⇥ ⇥ ⇥ ⇥ ⇥ Can we recover L and S accurately? Looks impossible Recover low-dimensional structure from corrupted data: approach to robust principal component analysis (PCA)

Recommend


More recommend