May 9 Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming Yuxin Chen † , Yuejie Chi ∗ , Andrea J. Goldsmith † Stanford University † , Ohio State University ∗ Page 1
High-Dimensional Sequential Data / Signals • Data Stream / Stochastic Processes ◦ Each data instance can be high-dimensional ◦ We’re interested in information in the data rather than the data themselves • Covariance Estimation ◦ second-order statistics Σ ∈ R n × n ◦ cornerstone of many information processing tasks Page 2
What are Quadratic Measurements? • Quadratic Measurements ◦ obtain m measurements of Σ taking the form y i ≈ a ⊤ i Σ a i (1 ≤ i ≤ m ) ◦ rank-1 measurements! Page 3
Example: Applications in Spectral Estimation • High-frequency wireless and signal processing (Energy Measurements) ◦ Spectral estimation of stationary processes ( possibly sparse ) Page 4
Example: Applications in Spectral Estimation • High-frequency wireless and signal processing (Energy Measurements) ◦ Spectral estimation of stationary processes ( possibly sparse ) ◦ Channel Estimation in MIMO Channels Page 4
Example: Applications in Optics • Phase Space Tomography ◦ measure correlation functions of a wave field 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 10 20 30 40 10 20 30 40 Fig credit: Chi et al Page 5
Example: Applications in Optics • Phase Space Tomography ◦ measure correlation functions of a wave field 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 10 20 30 40 10 20 30 40 courtesy of Chi et al • Phase Retrieval ◦ signal recovery from magnitude measurements Page 5 courtesy of Candes et al
Example: Applications in Data Streams • Covariance Sketching ◦ data stream: real-time data { x t } ∞ t =1 arriving sequentially at a high rate... • Challenges ◦ limited memory ◦ computational efficiency ◦ hopefully a single pass over the data binary data stream by Kazmin Page 6
Proposed Quadratic Sketching Method 1) Sketching : ◦ at each time t , obtain a quadratic sketch ( a ⊤ i x t ) 2 — a i : sketching vector Page 7
Proposed Quadratic Sketching Method 1) Sketching : ◦ at each time t , obtain a quadratic sketch ( a ⊤ i x t ) 2 — a i : sketching vector 2) Aggregation: ◦ all sketches are aggregated into m measurements � � T 1 � y i = a ⊤ x t x ⊤ a i ≈ a ⊤ i Σ a i (1 ≤ i ≤ m ) i t T t =1 Page 7
Proposed Quadratic Sketching Method 1) Sketching : ◦ at each time t , obtain a quadratic sketch ( a ⊤ i x t ) 2 — a i : sketching vector 2) Aggregation: ◦ all sketches are aggregated into m measurements � � T 1 � y i = a ⊤ x t x ⊤ a i ≈ a ⊤ i Σ a i (1 ≤ i ≤ m ) i t T t =1 • Benefits: ◦ one pass ◦ minimal storage ( as will be shown ) Page 7
Problem Formulation • Given: m ( ≪ n 2 ) quadratic measurements y = { y i } m i =1 y i = a ⊤ i Σ a i + η i , i = 1 , · · · , m, ◦ a i : sampling vectors ◦ η = { η i } m i =1 : noise terms ◦ more concise operator form: y = A ( Σ ) + η • Goal: recover Σ ∈ R n × n . • Sampling model ◦ sub-Gaussian i.i.d. sampling vectors Page 8
Geometry of Covariance Structure • # unknown > # stored measurements ◦ exploit low-dimensional structures! • Structures considered in this talk: ◦ low rank ◦ Toeplitz low rank ◦ simultaneously sparse and low-rank Piet Mondrian 1) low rank 2) Toeplitz low rank 3) jointly sparse and low rank Page 9
Low Rank • Low-Rank Structure: ◦ A few components explains most of the data variability ◦ metric learning, array signal processing, collaborative filtering ... • rank ( Σ ) = r ≪ n . Page 10
Trace Minimization for Low-Rank Structure • Trace Minimization ( TraceMin ) minimize M trace ( M ) � �� � low rank s.t. �A ( M ) − y � 1 ≤ ǫ , ���� noise bound M � 0 . ◦ inspired by Candes et. al. for phase retrieval Page 11
Near-Optimal Recovery for Low-Rank Structure minimize tr ( M ) s.t. �A ( M ) − y � 1 ≤ ǫ, M � 0 Theorem 1 (Low Rank). With high prob, for all Σ with rank ( Σ ) ≤ r , the solution ˆ Σ to TraceMin obeys � Σ − Σ r � ∗ ǫ � ˆ Σ − Σ � F � √ r + , m ���� � �� � due to noise due to imperfect structure provided that m � rn . ( Σ r : rank- r approx of Σ ) • Exact recovery in the noiseless case • Universal recovery : simultaneously works for all low-rank matrices • Robust recovery when Σ is approximately low-rank • Stable recovery against bounded noise Page 12
Phase Transition for Low-Rank Recovery 1 1 0.9 theoretic sampling limit 0.9 0.8 0.8 0.7 0.7 0.6 0.6 r/n 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 m / (n*n) empirical success probability of Monte Carlo trials: n = 50 • Near-Optimal Storage Complexity! ◦ degrees of freedom ≈ rn Page 13
Toeplitz Low Rank • Toeplitz Low-Rank Structure: ◦ Spectral sparsity! ∗ possibly off-the-grid frequency spikes (Vandemonde decomposition) ◦ wireless communication, array signal processing ... • rank ( Σ ) = r ≪ n . Page 14
Trace Minimization for Toeplitz Low-Rank Structure • Trace Minimization ( ToepTraceMin ) minimize M trace ( M ) � �� � low rank s.t. �A ( M ) − y � 2 ≤ ǫ 2 , ���� noise bound M � 0 , M is Toeplitz . Page 15
Near-Optimal Recovery for Toeplitz Low-Rank Structure minimize tr ( M ) s.t. �A ( M ) − y � 2 ≤ ǫ 2 , M � 0 , M is Toeplitz Theorem 2 (Toeplitz Low Rank). With high prob, for all Toeplitz Σ with rank ( Σ ) ≤ r , the solution ˆ Σ to ToepTraceMin obeys ǫ 2 � ˆ Σ − Σ � F � √ m , ���� due to noise provided that m � r poly log( n ) . Toeplitz ball • Exact recovery in the absence of noise • Universal recovery : simultaneously works for all Toeplitz low-rank matrices • Stable recovery against bounded noise Page 16
Phase Transition for Toeplitz Low-Rank Recovery 1 50 0.9 45 theoretic sampling limit 0.8 40 0.7 35 0.6 30 r: rank 0.5 25 0.4 20 0.3 15 10 0.2 5 0.1 0 0 5 10 15 20 25 30 35 40 45 50 m: number of measurements empirical success probability of Monte Carlo trials: n = 50 • Near-Optimal Storage Complexity! ◦ degrees of freedom ≈ r Page 17
Simultaneous Structure • Joint Structure: Σ is simultaneously sparse and low-rank. ◦ rank: r ◦ sparsity: k Σ = U Λ U ⊤ , where U = [ u 1 , · · · , u r ] ◦ SVD: Page 18
Convex Relaxation for Simultaneous Structure • Convex Relaxation minimize M trace ( M ) + λ � M � 1 � �� � � �� � sparsity low rank s.t. �A ( M ) − y � 1 ≤ ǫ , ���� noise bound M � 0 . ◦ coincides with Li and Voroninski for rank-1 cases Page 19
Exact Recovery for Simultaneous Structure minimize tr ( M ) + λ � M � 1 s.t. A ( M ) = y , M � 0 � � 1 1 Theorem 3 (Simultaneous Structure). SDP with λ ∈ n , is exact with N Σ high probability, provided that m � r log n (1) λ 2 � � � k � r i =1 � u i � 2 where N Σ := max � sign ( Σ Ω ) � , . 1 r • Exact recovery with appropriate regularization parameters • Question: how good is the storage complexity (1)? Page 20
Compressible Covariance Matrices: Near-Optimal Recovery Definition (Compressible Matrices) • non-zero entries of u i exhibit power-law decays ◦ � u i � 1 = O ( poly log( n )) . Page 21
Compressible Covariance Matrices: Near-Optimal Recovery Definition (Compressible Matrices) • non-zero entries of u i exhibit power-law decays ◦ � u i � 1 = O ( poly log( n )) . Corollary 1 (Compressible Case). For compressible covariance matrices, SDP 1 with λ ≈ k is exact w.h.p., provided that √ m � kr · poly log( n ) . • Near-Minimal Measurements! ◦ degree-of-freedom: Θ( kr ) Page 22
Stability and Robustness • noise : � η � 1 ≤ ǫ • imperfect structural assumption : Σ = Σ Ω + Σ c ���� ���� simultaneous sparse and low-rank residuals Page 23
Stability and Robustness • noise : � η � 1 ≤ ǫ • imperfect structural assumption : Σ = Σ Ω + Σ c ���� ���� simultaneous sparse and low-rank residuals Theorem 4. Under the same λ as in Theorem 1 or Corollary 1, � � F � 1 ǫ � ˆ � Σ c � ∗ + λ � Σ c � 1 + � � √ r Σ − Σ Ω � m � �� � ���� due to imperfect structure due to noise • stable against bounded noise • robust against imperfect structural assumptions Page 24
Mixed-Norm RIP (for Low-Rank and Joint Structure) • Restricted Isometry Property : a powerful notion for compressed sensing ∀ X in some class : �B ( X ) � 2 ≈ � X � F . ◦ unfortunately, it does NOT hold for quadratic models Page 25
Mixed-Norm RIP (for Low-Rank and Joint Structure) • Restricted Isometry Property : a powerful notion for compressed sensing ∀ X in some class : �B ( X ) � 2 ≈ � X � F . ◦ unfortunately, it does NOT hold for quadratic models • A Mixed-norm Variant: RIP- ℓ 2 /ℓ 1 ∀ X in some class : �B ( X ) � 1 ≈ � X � F . Page 25
Recommend
More recommend