Lecture 5 : Sparse Models Homework 3 discussion (Nima) Sparse - PowerPoint PPT Presentation

Lecture 5 : Sparse Models • Homework 3 discussion (Nima) • Sparse Models Lecture - Reading : Murphy, Chapter 13.1, 13.3, 13.6.1 - Reading : Peter Knee, Chapter 2 • Paolo Gabriel (TA) : Neural Brain Control • After class - Project groups - Installation Tensorflow, Python, Jupyter

Homework 3 : Fisher Discriminant

Sparse model • Linear regression (with sparsity constraints) • Slide 4 from Lecture 4

Sparse model • y : measurements, A : dictionary • n : noise, x : sparse weights • Dictionary (A) – either from physical models or learned from data (dictionary learning)

Sparse processing • Linear regression (with sparsity constraints) – An underdetermined system of equations has many solutions – Utilizing x is sparse it can often be solved – This depends on the structure of A (RIP – Restricted Isometry Property) • Various sparse algorithms – Convex optimization (Basis pursuit / LASSO / L 1 regularization) – Greedy search (Matching pursuit / OMP) – Bayesian analysis (Sparse Bayesian learning / SBL) • Low-dimensional understanding of high-dimensional data sets • Also referred to as compressive sensing (CS)

Different applications, but the same algorithm y A x Frequency signal DFT matrix Time-signal Compressed-Image Random matrix Pixel-image Array signals Beam weight Source-location Reflection sequence Time delay Layer-reflector

� CS approach to geophysical data analysis CS beamforming Sequential CS CS of Earthquakes � 0 180 135 40 DOA (deg) 90 35 30 25 45 20 15 0 10 5 10 15 20 25 30 35 40 45 50 Time Xenaki, JASA 2014, 2015 Mecklenbrauker, TSP 2013 Yao, GRL 2011, PNAS 2013 Gerstoft JASA 2015 CS fathometer CS matched field CS Sound speed estimation Gemba, JASA 2016 Yardim, JASA 2014 Bianco, JASA 2016

Sparse signals /compressive signals are important • We don’t need to sample at the Nyquist rate • Many signals are sparse, but are solved them under non-sparse assumptions – Beamforming – Fourier transform – Layered structure • Inverse methods are inherently sparse: We seek the simplest way to describe the data • All this requires new developments - Mathematical theory - New algorithms (interior point solvers, convex optimization) - Signal processing - New applications/demonstrations

Sparse Recovery • We try to find the sparsest solution which explains our noisy measurements • L 0 -norm • Here, the L 0 -norm is a shorthand notation for counting the number of non-zero elements in x. 10

Sparse Recovery using L 0 -norm Underdetermined problem y = Ax , M < N Prior information x : K-sparse, K ⌧ N x n N X k x k 0 = 1 x n 6 =0 = K n =1 n Not really a norm: k a x k 0 = k x k 0 6 = | a | k x k 0 There are only few sources with unknown locations and amplitudes • L 0 -norm solution involves exhaustive search Combinatorial complexity, not computationally feasible •

L p -norm 1/ p " % M ∑ | x m | p || x || p = for p > 0 $ ' # & m = 1 • Classic choices for p are 1, 2, and ∞ . • We will misuse notation and allow also p = 0. 12

L p -norm (graphical representation) 1/ p " % p M ∑ $ ' x p = x m $ ' # & m = 1

Solutions for sparse recovery • Exhaustive search - L 0 regularization, not computationally feasible • Convex optimization - Basis pursuit / LASSO / L 1 regularization • Greedy search - Matching pursuit / Orthogonal matching pursuit (OMP) • Bayesian analysis - Sparse Bayesian Learning / SBL • Regularized least squares - L 2 regularization, reference solution, not actually sparse

• Slide 8/9, Lecture 4 • Regularized least squares solution • Solution not sparse

Basis Pursuit / LASSO / L 1 regularization • The L 0 -norm minimization is not convex and requires combinatorial search making it computationally impractical • We make the problem convex by substituting the L 1 -norm in place of the L 0 -norm min || x || 1 subject to || Ax − b || 2 < ε x • This can also be formulated as

The unconstrained -LASSO- formulation Constrained formulation of the ` 1 -norm minimization problem: b x ` 1 ( ✏ ) = arg min x ∈ C N k x k 1 subject to k y � Ax k 2  ✏ Unconstrained formulation in the form of least squares optimization with an ` 1 -norm regularizer: k y � Ax k 2 b x LASSO ( µ ) = arg min 2 + µ k x k 1 x ∈ C N For every ✏ exists a µ so that the two formulations are equivalent µ Regularization parameter :

Basis Pursuit / LASSO / L 1 regularization • Why is it OK to substitute the L 1 -norm for the L 0 -norm? • What are the conditions such that the two problems have the same solution? min || x || 1 min || x || 0 x x subject to || Ax − b || 2 < ε subject to || Ax − b || 2 < ε • Restricted Isometry Property (RIP) 18

Geometrical view (Figure from Bishop) L 2 regularization L 1 regularization

Regularization parameter selection The objective function of the LASSO problem: L ( x , µ ) = k y � Ax k 2 2 + µ k x k 1 µ • Regularization parameter : µ • Sparsity depends on µ • large, x = 0 µ • small, non-sparse

Regularization Path (Figure from Murphy) L 2 regularization L 1 regularization 1/µ 1/µ • As regularization parameter µ is decreased, more and more weights become active • Thus µ controls sparsity of solutions

Applications • MEG/EEG/MRI source location (earthquake location) • Channel equalization • Compressive sampling (beyond Nyquist sampling) • Compressive camera! • Beamforming • Fathometer • Geoacoustic inversion • Sequential estimation

Beamforming / DOA estimation

Additional Resources

Lecture 5 : Sparse Models Homework 3 discussion (Nima) Sparse - PowerPoint PPT Presentation

Lecture 5 : Sparse Models Homework 3 discussion (Nima) Sparse Models Lecture - Reading : Murphy, Chapter 13.1, 13.3, 13.6.1 - Reading : Peter Knee, Chapter 2 Paolo Gabriel (TA) : Neural Brain Control After class - Project groups

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Lecture 14: Planted Sparse Vector Lecture Outline Part I: Planted Sparse Vector and 2 to 4

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Machine Learning and Sparsity Klaus-Robert Mller !!et al.!! Todays Talk sensing, sparse

Sparse Coding and Dictionary Learning for Image Analysis Part IV: New sparse models Francis

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Sparse Feature Learning Philipp Koehn 3 March 2015 Philipp Koehn Machine Translation: Sparse

Sparse Feature Learning Philipp Koehn 1 March 2016 Philipp Koehn Machine Translation: Sparse

Sparse and Low-Rank Optimization for Dense Wireless Networks Part I: Models Jun Zhang Yuanming

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

Variance bounds for estimators in autoregressive models with constraints Wolfgang Wefelmeyer

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010

Recent Developments in the Statistical Analysis of Interval Data The Case of Regression Ulrich

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

COMMISSION Workshop for Deaf and Hard of Hearing people, and their families, friends and

Pointers and Structs Returning Multiple Values 1 Returning two Values from a Function We

Errors and Asserts Motivation Specifications assign responsibility When code crashes, who