exact and stable covariance estimation from quadratic
play

Exact and Stable Covariance Estimation from Quadratic Sampling via - PowerPoint PPT Presentation

May 9 Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming Yuxin Chen , Yuejie Chi , Andrea J. Goldsmith Stanford University , Ohio State University Page 1 High-Dimensional Sequential Data /


  1. May 9 Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming Yuxin Chen † , Yuejie Chi ∗ , Andrea J. Goldsmith † Stanford University † , Ohio State University ∗ Page 1

  2. High-Dimensional Sequential Data / Signals • Data Stream / Stochastic Processes ◦ Each data instance can be high-dimensional ◦ We’re interested in information in the data rather than the data themselves • Covariance Estimation ◦ second-order statistics Σ ∈ R n × n ◦ cornerstone of many information processing tasks Page 2

  3. What are Quadratic Measurements? • Quadratic Measurements ◦ obtain m measurements of Σ taking the form y i ≈ a ⊤ i Σ a i (1 ≤ i ≤ m ) ◦ rank-1 measurements! Page 3

  4. Example: Applications in Spectral Estimation • High-frequency wireless and signal processing (Energy Measurements) ◦ Spectral estimation of stationary processes ( possibly sparse ) Page 4

  5. Example: Applications in Spectral Estimation • High-frequency wireless and signal processing (Energy Measurements) ◦ Spectral estimation of stationary processes ( possibly sparse ) ◦ Channel Estimation in MIMO Channels Page 4

  6. Example: Applications in Optics • Phase Space Tomography ◦ measure correlation functions of a wave field 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 10 20 30 40 10 20 30 40 Fig credit: Chi et al Page 5

  7. Example: Applications in Optics • Phase Space Tomography ◦ measure correlation functions of a wave field 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 10 20 30 40 10 20 30 40 courtesy of Chi et al • Phase Retrieval ◦ signal recovery from magnitude measurements Page 5 courtesy of Candes et al

  8. Example: Applications in Data Streams • Covariance Sketching ◦ data stream: real-time data { x t } ∞ t =1 arriving sequentially at a high rate... • Challenges ◦ limited memory ◦ computational efficiency ◦ hopefully a single pass over the data binary data stream by Kazmin Page 6

  9. Proposed Quadratic Sketching Method 1) Sketching : ◦ at each time t , obtain a quadratic sketch ( a ⊤ i x t ) 2 — a i : sketching vector Page 7

  10. Proposed Quadratic Sketching Method 1) Sketching : ◦ at each time t , obtain a quadratic sketch ( a ⊤ i x t ) 2 — a i : sketching vector 2) Aggregation: ◦ all sketches are aggregated into m measurements � � T 1 � y i = a ⊤ x t x ⊤ a i ≈ a ⊤ i Σ a i (1 ≤ i ≤ m ) i t T t =1 Page 7

  11. Proposed Quadratic Sketching Method 1) Sketching : ◦ at each time t , obtain a quadratic sketch ( a ⊤ i x t ) 2 — a i : sketching vector 2) Aggregation: ◦ all sketches are aggregated into m measurements � � T 1 � y i = a ⊤ x t x ⊤ a i ≈ a ⊤ i Σ a i (1 ≤ i ≤ m ) i t T t =1 • Benefits: ◦ one pass ◦ minimal storage ( as will be shown ) Page 7

  12. Problem Formulation • Given: m ( ≪ n 2 ) quadratic measurements y = { y i } m i =1 y i = a ⊤ i Σ a i + η i , i = 1 , · · · , m, ◦ a i : sampling vectors ◦ η = { η i } m i =1 : noise terms ◦ more concise operator form: y = A ( Σ ) + η • Goal: recover Σ ∈ R n × n . • Sampling model ◦ sub-Gaussian i.i.d. sampling vectors Page 8

  13. Geometry of Covariance Structure • # unknown > # stored measurements ◦ exploit low-dimensional structures! • Structures considered in this talk: ◦ low rank ◦ Toeplitz low rank ◦ simultaneously sparse and low-rank Piet Mondrian 1) low rank 2) Toeplitz low rank 3) jointly sparse and low rank Page 9

  14. Low Rank • Low-Rank Structure: ◦ A few components explains most of the data variability ◦ metric learning, array signal processing, collaborative filtering ... • rank ( Σ ) = r ≪ n . Page 10

  15. Trace Minimization for Low-Rank Structure • Trace Minimization ( TraceMin ) minimize M trace ( M ) � �� � low rank s.t. �A ( M ) − y � 1 ≤ ǫ , ���� noise bound M � 0 . ◦ inspired by Candes et. al. for phase retrieval Page 11

  16. Near-Optimal Recovery for Low-Rank Structure minimize tr ( M ) s.t. �A ( M ) − y � 1 ≤ ǫ, M � 0 Theorem 1 (Low Rank). With high prob, for all Σ with rank ( Σ ) ≤ r , the solution ˆ Σ to TraceMin obeys � Σ − Σ r � ∗ ǫ � ˆ Σ − Σ � F � √ r + , m ���� � �� � due to noise due to imperfect structure provided that m � rn . ( Σ r : rank- r approx of Σ ) • Exact recovery in the noiseless case • Universal recovery : simultaneously works for all low-rank matrices • Robust recovery when Σ is approximately low-rank • Stable recovery against bounded noise Page 12

  17. Phase Transition for Low-Rank Recovery 1 1 0.9 theoretic sampling limit 0.9 0.8 0.8 0.7 0.7 0.6 0.6 r/n 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 m / (n*n) empirical success probability of Monte Carlo trials: n = 50 • Near-Optimal Storage Complexity! ◦ degrees of freedom ≈ rn Page 13

  18. Toeplitz Low Rank • Toeplitz Low-Rank Structure: ◦ Spectral sparsity! ∗ possibly off-the-grid frequency spikes (Vandemonde decomposition) ◦ wireless communication, array signal processing ... • rank ( Σ ) = r ≪ n . Page 14

  19. Trace Minimization for Toeplitz Low-Rank Structure • Trace Minimization ( ToepTraceMin ) minimize M trace ( M ) � �� � low rank s.t. �A ( M ) − y � 2 ≤ ǫ 2 , ���� noise bound M � 0 , M is Toeplitz . Page 15

  20. Near-Optimal Recovery for Toeplitz Low-Rank Structure minimize tr ( M ) s.t. �A ( M ) − y � 2 ≤ ǫ 2 , M � 0 , M is Toeplitz Theorem 2 (Toeplitz Low Rank). With high prob, for all Toeplitz Σ with rank ( Σ ) ≤ r , the solution ˆ Σ to ToepTraceMin obeys ǫ 2 � ˆ Σ − Σ � F � √ m , ���� due to noise provided that m � r poly log( n ) . Toeplitz ball • Exact recovery in the absence of noise • Universal recovery : simultaneously works for all Toeplitz low-rank matrices • Stable recovery against bounded noise Page 16

  21. Phase Transition for Toeplitz Low-Rank Recovery 1 50 0.9 45 theoretic sampling limit 0.8 40 0.7 35 0.6 30 r: rank 0.5 25 0.4 20 0.3 15 10 0.2 5 0.1 0 0 5 10 15 20 25 30 35 40 45 50 m: number of measurements empirical success probability of Monte Carlo trials: n = 50 • Near-Optimal Storage Complexity! ◦ degrees of freedom ≈ r Page 17

  22. Simultaneous Structure • Joint Structure: Σ is simultaneously sparse and low-rank. ◦ rank: r ◦ sparsity: k Σ = U Λ U ⊤ , where U = [ u 1 , · · · , u r ] ◦ SVD: Page 18

  23. Convex Relaxation for Simultaneous Structure • Convex Relaxation minimize M trace ( M ) + λ � M � 1 � �� � � �� � sparsity low rank s.t. �A ( M ) − y � 1 ≤ ǫ , ���� noise bound M � 0 . ◦ coincides with Li and Voroninski for rank-1 cases Page 19

  24. Exact Recovery for Simultaneous Structure minimize tr ( M ) + λ � M � 1 s.t. A ( M ) = y , M � 0 � � 1 1 Theorem 3 (Simultaneous Structure). SDP with λ ∈ n , is exact with N Σ high probability, provided that m � r log n (1) λ 2 � � � k � r i =1 � u i � 2 where N Σ := max � sign ( Σ Ω ) � , . 1 r • Exact recovery with appropriate regularization parameters • Question: how good is the storage complexity (1)? Page 20

  25. Compressible Covariance Matrices: Near-Optimal Recovery Definition (Compressible Matrices) • non-zero entries of u i exhibit power-law decays ◦ � u i � 1 = O ( poly log( n )) . Page 21

  26. Compressible Covariance Matrices: Near-Optimal Recovery Definition (Compressible Matrices) • non-zero entries of u i exhibit power-law decays ◦ � u i � 1 = O ( poly log( n )) . Corollary 1 (Compressible Case). For compressible covariance matrices, SDP 1 with λ ≈ k is exact w.h.p., provided that √ m � kr · poly log( n ) . • Near-Minimal Measurements! ◦ degree-of-freedom: Θ( kr ) Page 22

  27. Stability and Robustness • noise : � η � 1 ≤ ǫ • imperfect structural assumption : Σ = Σ Ω + Σ c ���� ���� simultaneous sparse and low-rank residuals Page 23

  28. Stability and Robustness • noise : � η � 1 ≤ ǫ • imperfect structural assumption : Σ = Σ Ω + Σ c ���� ���� simultaneous sparse and low-rank residuals Theorem 4. Under the same λ as in Theorem 1 or Corollary 1,   � � F � 1 ǫ � ˆ  � Σ c � ∗ + λ � Σ c � 1  + � � √ r Σ − Σ Ω � m � �� � ���� due to imperfect structure due to noise • stable against bounded noise • robust against imperfect structural assumptions Page 24

  29. Mixed-Norm RIP (for Low-Rank and Joint Structure) • Restricted Isometry Property : a powerful notion for compressed sensing ∀ X in some class : �B ( X ) � 2 ≈ � X � F . ◦ unfortunately, it does NOT hold for quadratic models Page 25

  30. Mixed-Norm RIP (for Low-Rank and Joint Structure) • Restricted Isometry Property : a powerful notion for compressed sensing ∀ X in some class : �B ( X ) � 2 ≈ � X � F . ◦ unfortunately, it does NOT hold for quadratic models • A Mixed-norm Variant: RIP- ℓ 2 /ℓ 1 ∀ X in some class : �B ( X ) � 1 ≈ � X � F . Page 25

Recommend


More recommend