Recovery of sparse signals from a mixture of linear samples Arya - PowerPoint PPT Presentation

Recovery of sparse signals from a mixture of linear samples Arya Mazumdar Soumyabrata Pal University of Massachusetts Amherst June 15, 2020 ICML 2020

A relationship between features and labels x : feature and y : label . Consider the tuple ( x , y ) with y = f ( x ):

Example: Music Perception

Application of Mixture of ML Models • Multi-modal data, Heterogeneous data • Recent Works: Stadler, Buhlmann, De Geer, 2010; Faria and Soromenho, 2010; Chaganty and Liang, 2013 • Yi, Caramanis, Sanghavi 2014-2016: Algorithms • An expressive and rich model • Modeling a complicated relation as a mixture of simple components • Advantage: Clean theoretical analysis

Semi-supervised Active Learning framework: Advantages • In this framework, we can carefully design data to query for labels. • Objective: Recover the parameters of the models with minimum number of queries/samples. • Advantage: 1. Can avoid millions of parameters used by a deep learning model to fit the data! 2. Learn with significantly less amount of data! 3. We can use crowd-knowledge which is difficult to incorporate in algorithm. • Crowdsourcing/ Active Learning has become very popular but is expensive (Dasgupta et. al., Freund et. al.)

Mixture of sparse linear regression • Suppose we have two unknown distinct vectors β 1 , β 2 ∈ R n and an oracle O : R n → R . • We assume that β 1 , β 2 have k significant entries where k << n . • The oracle O takes input a vector x ∈ R n and return noisy output ( sample ) y ∈ R : y = � x , β � + ζ where β ∼ U { β 1 , β 2 } and ζ ∼ N (0 , σ 2 ) with known σ . • Generalization of Compressed Sensing

Mixture of sparse linear regression • We also define the Signal-to-Noise Ratio (SNR) for a query x as: SNR( x ) � E |� x , β 1 − β 2 �| 2 and SNR = max SNR( x ) E ζ 2 x • Objective: For each β ∈ { β 1 , β 2 } , we want to recover ˆ β such that || ˆ β − β || ≤ c || β − β ( k ) || + γ where β ( k ) is the best k -sparse approximation of β with minimum queries for a fixed SNR.

Previous and Our results • First studied by Yin et.al. (2019) who made following assumptions 1. the unknown vectors are exactly k -sparse, i.e., has at most k nonzero entries; j ∈ supp β 1 ∩ supp β 2 2. β 1 j � = β 2 for each j 3. for some ǫ > 0 , β 1 , β 2 ∈ { 0 , ± ǫ, ± 2 ǫ, ± 3 ǫ, . . . } n . and showed query complexity exponential in σ/ǫ . • Krishnamurthy et. al. (2019) removed the first two assumptions but their query complexity was still exponential in ( σ/ǫ ) 2 / 3 . • We get rid of all assumptions and need a query complexity of k log n log 2 k � �� σ 4 + σ 2 � √ max 1 , γ 4 √ O γ 2 log( σ SNR /γ ) SNR which is polynomial in σ .

Insight 1: Compressed Sensing 1. If β 1 = β 2 (single unknown vector), the objective is exactly the same as in Compressed sensing. 2. It is well known (Candes and Tao) that for the following m × n matrix A with m = O ( k log n ),  N (0 , 1) N (0 , 1)  . . . 1 . ... A � √ m .   .   N (0 , 1) . . . N (0 , 1) using its rows as queries is sufficient in the CS setting. 3. Can we cluster the samples in our framework?

Insight 2: (Gaussian mixtures) 1. For a given x ∈ R n , repeating x as query to the oracle gives us samples which are distributed according to 1 2 N ( � x , β 1 � , σ 2 ) + 1 2 N ( � x , β 2 � , σ 2 ) . 2. With known σ 2 , how many samples do we need to recover � x , β 1 � , � x , β 2 � ?

Recover means of Gaussian mixture with same & known variance Obtain samples from a mixture of Gaussians M with two components Input: M � 1 2 N ( µ 1 , σ 2 ) + 1 2 N ( µ 2 , σ 2 ) . Return ˆ µ 1 , ˆ µ 2 . Output:

EM algorithm (Daskalakis et.al. 2017, Xu et.al. 2016)

Method of Moments (Hardt and Price 2015) • Estimate the first and second central moments • Set up system of equations to calculate ˆ µ 1 , ˆ µ 2 where µ 2 ) 2 = 4 ˆ µ 2 = 2 ˆ M 2 − 4 σ 2 µ 1 + ˆ ˆ M 1 , (ˆ µ 1 − ˆ

Fit a single Gaussian (Daskalakis et. al. 2017) Estimate the mean ˆ M 1 and return as both ˆ µ 1 , ˆ µ 2

How to choose which algorithm to use We can design a test to infer the parameter regime correctly.

Stage 1: Denoising We sample x ∼ N (0 , I n × n ). � ≤ γ. � � • For unknown permutation π : { 1 , 2 } → { 1 , 2 } , ˆ µ 1 , ˆ µ 2 satisfies � ˆ µ i − µ π ( i ) � γ 2 ) log η − 1 � γ 4 || β 1 − β 2 || 2 + σ 2 σ 5 • We can show that E ( T 1 + T 2 ) ≤ O ( • We follow identical steps for x 1 , x 2 , . . . , x m .

Stage 2: Alignment across queries

Stage 3: Cluster & Recover • After the denoising and alignment steps, we are able to recover two vectors u and v of length m = O ( k log n ) each such that � � � � � u [ i ] − � x i , β π (1) � � v [ i ] − � x i , β π (2) � � ≤ 10 γ ; � ≤ 10 γ � � � � for some permutation π : { 1 , 2 } → { 1 , 2 } for all i ∈ [ m ] w.p. at least 1 − η . • We now solve the following convex optimization problems to recover ˆ β π (1) , ˆ β π (2) . 1 √ m [ x 1 x 2 x 3 x m ] T A = . . . π (1) = min u ˆ z ∈ R n || z || 1 s.t. || Az − √ m || 2 ≤ 10 γ β π (2) = min v ˆ z ∈ R n || z || 1 s.t. || Az − √ m || 2 ≤ 10 γ β

Simulations

Conclusion and Future Work • Our work removes any assumption for two unknown vectors that previous papers depended on. • Our algorithm contains all main ingredients for extension to larger L . The main technical bottleneck is tight bounds in untangling Gaussian mixtures for more than two components. • Can we handle other noise distributions? • Lower bounds on query complexity?

Recovery of sparse signals from a mixture of linear samples Arya - PowerPoint PPT Presentation

Recovery of sparse signals from a mixture of linear samples Arya Mazumdar Soumyabrata Pal University of Massachusetts Amherst June 15, 2020 ICML 2020 A relationship between features and labels x : feature and y : label . Consider the tuple (

Asynchronous Events: Signals Signals Concepts Generating Signals Catching Signals

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Asynchronous Events: Signals Signals Concepts Generating Signals

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Recovery and Fourier Sampling Eric Price MIT Eric Price (MIT) Sparse Recovery and

Topic 1: LTI Systems Overview: Introduction to Signals Types of Signals: CT/DT,

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Adaptive Sparse Recovery Eric Price MIT 2012-04-26 Joint work with Piotr Indyk and David

Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation

New Algorithms for Sparse Representation of Discrete Signals Based on p - 2 Optimization

Signal Encoding Techniques Digital Data, Analog Signals Analog Data, Digital Signals ITS323:

091031 091031 VIDEO SIGNALS VIDEO SIGNALS Lecturer: Marco Marcon 091032 - AUDIO AND VIDEO

6.003: Signals and Systems Signals and Systems September 8, 2011 1 6.003: Signals and Systems

An Integrated Formal Methods Tool-Chain and its Application to Verifying a File System Model From

Rotor: A Tool for Renaming Values in OCamls Module System Reuben N. S. Rowe, Hugo Fre,

System description: Isabelle/jEdit in 2014 Makarius Wenzel Univ. Paris-Sud, LRI July 2014

Two approaches to writing interfaces Interface projected from implementation: No

Probing the Universe through the Stochastic GW Background Towards optimal detection Sachiko

TPC Noise at MicroBooNE and ProtoDUNE-SP Michael Mooney Colorado State University Workshop on

Image Assessment San Diego, November 2005 Pawel A. Penczek Pawel A. Penczek The University of

Multiresolution time-frequency searches for gravitational wave bursts Shourov K. Chatterji