Exact and Stable Covariance Estimation from Quadratic Sampling via - PowerPoint PPT Presentation

May 9 Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming Yuxin Chen † , Yuejie Chi ∗ , Andrea J. Goldsmith † Stanford University † , Ohio State University ∗ Page 1

High-Dimensional Sequential Data / Signals • Data Stream / Stochastic Processes ◦ Each data instance can be high-dimensional ◦ We’re interested in information in the data rather than the data themselves • Covariance Estimation ◦ second-order statistics Σ ∈ R n × n ◦ cornerstone of many information processing tasks Page 2

What are Quadratic Measurements? • Quadratic Measurements ◦ obtain m measurements of Σ taking the form y i ≈ a ⊤ i Σ a i (1 ≤ i ≤ m ) ◦ rank-1 measurements! Page 3

Example: Applications in Spectral Estimation • High-frequency wireless and signal processing (Energy Measurements) ◦ Spectral estimation of stationary processes ( possibly sparse ) Page 4

Example: Applications in Spectral Estimation • High-frequency wireless and signal processing (Energy Measurements) ◦ Spectral estimation of stationary processes ( possibly sparse ) ◦ Channel Estimation in MIMO Channels Page 4

Example: Applications in Optics • Phase Space Tomography ◦ measure correlation functions of a wave field 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 10 20 30 40 10 20 30 40 Fig credit: Chi et al Page 5

Example: Applications in Optics • Phase Space Tomography ◦ measure correlation functions of a wave field 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 10 20 30 40 10 20 30 40 courtesy of Chi et al • Phase Retrieval ◦ signal recovery from magnitude measurements Page 5 courtesy of Candes et al

Example: Applications in Data Streams • Covariance Sketching ◦ data stream: real-time data { x t } ∞ t =1 arriving sequentially at a high rate... • Challenges ◦ limited memory ◦ computational efficiency ◦ hopefully a single pass over the data binary data stream by Kazmin Page 6

Proposed Quadratic Sketching Method 1) Sketching : ◦ at each time t , obtain a quadratic sketch ( a ⊤ i x t ) 2 — a i : sketching vector Page 7

Proposed Quadratic Sketching Method 1) Sketching : ◦ at each time t , obtain a quadratic sketch ( a ⊤ i x t ) 2 — a i : sketching vector 2) Aggregation: ◦ all sketches are aggregated into m measurements � � T 1 � y i = a ⊤ x t x ⊤ a i ≈ a ⊤ i Σ a i (1 ≤ i ≤ m ) i t T t =1 Page 7

Proposed Quadratic Sketching Method 1) Sketching : ◦ at each time t , obtain a quadratic sketch ( a ⊤ i x t ) 2 — a i : sketching vector 2) Aggregation: ◦ all sketches are aggregated into m measurements � � T 1 � y i = a ⊤ x t x ⊤ a i ≈ a ⊤ i Σ a i (1 ≤ i ≤ m ) i t T t =1 • Benefits: ◦ one pass ◦ minimal storage ( as will be shown ) Page 7

Problem Formulation • Given: m ( ≪ n 2 ) quadratic measurements y = { y i } m i =1 y i = a ⊤ i Σ a i + η i , i = 1 , · · · , m, ◦ a i : sampling vectors ◦ η = { η i } m i =1 : noise terms ◦ more concise operator form: y = A ( Σ ) + η • Goal: recover Σ ∈ R n × n . • Sampling model ◦ sub-Gaussian i.i.d. sampling vectors Page 8

Geometry of Covariance Structure • # unknown > # stored measurements ◦ exploit low-dimensional structures! • Structures considered in this talk: ◦ low rank ◦ Toeplitz low rank ◦ simultaneously sparse and low-rank Piet Mondrian 1) low rank 2) Toeplitz low rank 3) jointly sparse and low rank Page 9

Low Rank • Low-Rank Structure: ◦ A few components explains most of the data variability ◦ metric learning, array signal processing, collaborative filtering ... • rank ( Σ ) = r ≪ n . Page 10

Trace Minimization for Low-Rank Structure • Trace Minimization ( TraceMin ) minimize M trace ( M ) � �� low rank s.t. �A ( M ) − y � 1 ≤ ǫ , �� noise bound M � 0 . ◦ inspired by Candes et. al. for phase retrieval Page 11

Near-Optimal Recovery for Low-Rank Structure minimize tr ( M ) s.t. �A ( M ) − y � 1 ≤ ǫ, M � 0 Theorem 1 (Low Rank). With high prob, for all Σ with rank ( Σ ) ≤ r , the solution ˆ Σ to TraceMin obeys � Σ − Σ r � ∗ ǫ � ˆ Σ − Σ � F � √ r + , m �� due to noise due to imperfect structure provided that m � rn . ( Σ r : rank- r approx of Σ ) • Exact recovery in the noiseless case • Universal recovery : simultaneously works for all low-rank matrices • Robust recovery when Σ is approximately low-rank • Stable recovery against bounded noise Page 12

Phase Transition for Low-Rank Recovery 1 1 0.9 theoretic sampling limit 0.9 0.8 0.8 0.7 0.7 0.6 0.6 r/n 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 m / (n*n) empirical success probability of Monte Carlo trials: n = 50 • Near-Optimal Storage Complexity! ◦ degrees of freedom ≈ rn Page 13

Toeplitz Low Rank • Toeplitz Low-Rank Structure: ◦ Spectral sparsity! ∗ possibly off-the-grid frequency spikes (Vandemonde decomposition) ◦ wireless communication, array signal processing ... • rank ( Σ ) = r ≪ n . Page 14

Trace Minimization for Toeplitz Low-Rank Structure • Trace Minimization ( ToepTraceMin ) minimize M trace ( M ) � �� low rank s.t. �A ( M ) − y � 2 ≤ ǫ 2 , �� noise bound M � 0 , M is Toeplitz . Page 15

Near-Optimal Recovery for Toeplitz Low-Rank Structure minimize tr ( M ) s.t. �A ( M ) − y � 2 ≤ ǫ 2 , M � 0 , M is Toeplitz Theorem 2 (Toeplitz Low Rank). With high prob, for all Toeplitz Σ with rank ( Σ ) ≤ r , the solution ˆ Σ to ToepTraceMin obeys ǫ 2 � ˆ Σ − Σ � F � √ m , �� due to noise provided that m � r poly log( n ) . Toeplitz ball • Exact recovery in the absence of noise • Universal recovery : simultaneously works for all Toeplitz low-rank matrices • Stable recovery against bounded noise Page 16

Phase Transition for Toeplitz Low-Rank Recovery 1 50 0.9 45 theoretic sampling limit 0.8 40 0.7 35 0.6 30 r: rank 0.5 25 0.4 20 0.3 15 10 0.2 5 0.1 0 0 5 10 15 20 25 30 35 40 45 50 m: number of measurements empirical success probability of Monte Carlo trials: n = 50 • Near-Optimal Storage Complexity! ◦ degrees of freedom ≈ r Page 17

Simultaneous Structure • Joint Structure: Σ is simultaneously sparse and low-rank. ◦ rank: r ◦ sparsity: k Σ = U Λ U ⊤ , where U = [ u 1 , · · · , u r ] ◦ SVD: Page 18

Convex Relaxation for Simultaneous Structure • Convex Relaxation minimize M trace ( M ) + λ � M � 1 � �� sparsity low rank s.t. �A ( M ) − y � 1 ≤ ǫ , �� noise bound M � 0 . ◦ coincides with Li and Voroninski for rank-1 cases Page 19

Exact Recovery for Simultaneous Structure minimize tr ( M ) + λ � M � 1 s.t. A ( M ) = y , M � 0 � � 1 1 Theorem 3 (Simultaneous Structure). SDP with λ ∈ n , is exact with N Σ high probability, provided that m � r log n (1) λ 2 � � � k � r i =1 � u i � 2 where N Σ := max � sign ( Σ Ω ) � , . 1 r • Exact recovery with appropriate regularization parameters • Question: how good is the storage complexity (1)? Page 20

Compressible Covariance Matrices: Near-Optimal Recovery Definition (Compressible Matrices) • non-zero entries of u i exhibit power-law decays ◦ � u i � 1 = O ( poly log( n )) . Page 21

Compressible Covariance Matrices: Near-Optimal Recovery Definition (Compressible Matrices) • non-zero entries of u i exhibit power-law decays ◦ � u i � 1 = O ( poly log( n )) . Corollary 1 (Compressible Case). For compressible covariance matrices, SDP 1 with λ ≈ k is exact w.h.p., provided that √ m � kr · poly log( n ) . • Near-Minimal Measurements! ◦ degree-of-freedom: Θ( kr ) Page 22

Stability and Robustness • noise : � η � 1 ≤ ǫ • imperfect structural assumption : Σ = Σ Ω + Σ c �� simultaneous sparse and low-rank residuals Page 23

Stability and Robustness • noise : � η � 1 ≤ ǫ • imperfect structural assumption : Σ = Σ Ω + Σ c �� simultaneous sparse and low-rank residuals Theorem 4. Under the same λ as in Theorem 1 or Corollary 1,   � � F � 1 ǫ � ˆ  � Σ c � ∗ + λ � Σ c � 1  + � � √ r Σ − Σ Ω � m � �� due to imperfect structure due to noise • stable against bounded noise • robust against imperfect structural assumptions Page 24

Mixed-Norm RIP (for Low-Rank and Joint Structure) • Restricted Isometry Property : a powerful notion for compressed sensing ∀ X in some class : �B ( X ) � 2 ≈ � X � F . ◦ unfortunately, it does NOT hold for quadratic models Page 25

Mixed-Norm RIP (for Low-Rank and Joint Structure) • Restricted Isometry Property : a powerful notion for compressed sensing ∀ X in some class : �B ( X ) � 2 ≈ � X � F . ◦ unfortunately, it does NOT hold for quadratic models • A Mixed-norm Variant: RIP- ℓ 2 /ℓ 1 ∀ X in some class : �B ( X ) � 1 ≈ � X � F . Page 25

Exact and Stable Covariance Estimation from Quadratic Sampling via - PowerPoint PPT Presentation

May 9 Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming Yuxin Chen , Yuejie Chi , Andrea J. Goldsmith Stanford University , Ohio State University Page 1 High-Dimensional Sequential Data /

The quadratic formula You may recall the quadratic formula for roots of quadratic polynomials ax 2

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Sparse Inverse Covariance Estimation Using Quadratic Approximation Inderjit S. Dhillon Dept of

Stable Marriage Problem Stable Marriage Problem Small town with n boys and n girls. Stable

Robust covariance estimation for financial applications Tim Verdonck , Mia Hubert, Peter Rousseeuw

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid

11. Quadratic forms and ellipsoids Quadratic forms Orthogonal decomposition Positive

Section3.3 Analyzing Graphs of Quadratic Functions Introduction Definitions A quadratic function

Solving Quadratic Equations MCR3U: Functions Recall that to solve a quadratic equation means to

Quadratic Residues Definition : The numbers 0 2 , 1 2 , 2 2 , . . . , ( n 1) 2 mod n , are

Sequential Quadratic Programming 1 Lecture 17 ME EN 575 Andrew Ning aning@byu.edu Outline

3.2 Graphing Quadratic Functions The equation of a quadratic relation may be written in several

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A

Beyond the graphical Lasso: Structure learning via inverse covariance estimation Po-Ling Loh UC

CogX Briefing pack for speakers 4th June 2020 How do we get the next 10 years right? The

Discovering Internet-of-Things Devices Xuan Feng, Qiang Li, Haining Wang, Limin Sun Jan 19, 2019

FROM ZERO TO HERO: MARKETING FOR STARTUPS & GROWING COMPANIES #ACC2015 Lily Leung Twitter:

CSc 337 LECTURE 28: SESSIONS AND WRAPPING UP What is a session? session : an abstract concept

AAM Ecosystem Working Groups (AEWG): Urban Air Mobility (UAM) Concept of Operations (ConOps)

Leadership Denver.ED 101 What you will learn in this session How to explain economic

MPLS network built on ROADM based DWDM system using GMPLS signaling PIONIER experiences in GMPLS

EHEALTH COMMISSION MEETING NOVEMBER 14TH, 2018 NOVEMBER AGENDA Call to Order 12:00 Roll Call

Exact and Stable Covariance Estimation from Quadratic Sampling via - PowerPoint PPT Presentation

May 9 Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming Yuxin Chen , Yuejie Chi , Andrea J. Goldsmith Stanford University , Ohio State University Page 1 High-Dimensional Sequential Data /

The quadratic formula You may recall the quadratic formula for roots of quadratic polynomials ax 2

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Sparse Inverse Covariance Estimation Using Quadratic Approximation Inderjit S. Dhillon Dept of

Stable Marriage Problem Stable Marriage Problem Small town with n boys and n girls. Stable

Robust covariance estimation for financial applications Tim Verdonck , Mia Hubert, Peter Rousseeuw

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid

11. Quadratic forms and ellipsoids Quadratic forms Orthogonal decomposition Positive

Section3.3 Analyzing Graphs of Quadratic Functions Introduction Definitions A quadratic function

Solving Quadratic Equations MCR3U: Functions Recall that to solve a quadratic equation means to

Quadratic Residues Definition : The numbers 0 2 , 1 2 , 2 2 , . . . , ( n 1) 2 mod n , are

Sequential Quadratic Programming 1 Lecture 17 ME EN 575 Andrew Ning aning@byu.edu Outline

3.2 Graphing Quadratic Functions The equation of a quadratic relation may be written in several

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A

Beyond the graphical Lasso: Structure learning via inverse covariance estimation Po-Ling Loh UC

CogX Briefing pack for speakers 4th June 2020 How do we get the next 10 years right? The

Discovering Internet-of-Things Devices Xuan Feng, Qiang Li, Haining Wang, Limin Sun Jan 19, 2019

FROM ZERO TO HERO: MARKETING FOR STARTUPS &amp; GROWING COMPANIES #ACC2015 Lily Leung Twitter:

CSc 337 LECTURE 28: SESSIONS AND WRAPPING UP What is a session? session : an abstract concept

AAM Ecosystem Working Groups (AEWG): Urban Air Mobility (UAM) Concept of Operations (ConOps)

Leadership Denver.ED 101 What you will learn in this session How to explain economic

MPLS network built on ROADM based DWDM system using GMPLS signaling PIONIER experiences in GMPLS

EHEALTH COMMISSION MEETING NOVEMBER 14TH, 2018 NOVEMBER AGENDA Call to Order 12:00 Roll Call

FROM ZERO TO HERO: MARKETING FOR STARTUPS & GROWING COMPANIES #ACC2015 Lily Leung Twitter: