The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation Joseph Salmon http://josephsalmon.eu IMAG, Univ. Montpellier, CNRS Series of works with: Quentin Bertrand (INRIA) Mathurin Massias (University of Genova) Olivier Fercoq (Institut Polytechnique de Paris) Alexandre Gramfort (INRIA) 1 / 40
Table of Contents Neuroimaging The M/EEG problem Stastistical model Estimation procedures Sparsity and Multi-task approaches √ Smoothing interpretation of concomitant and Lasso Optimization algorithm 2 / 40
The M/EEG inverse problem ◮ observe magnetoelectric field outside the scalp (100 sensors) ◮ reconstruct cerebral activity inside the brain (10,000 locations) p = 10,000 locations n = 100 sensors n ≪ p : ill-posed problem ◮ Motivation : identify brain regions responsible for the signals ◮ Applications : epilepsy treatment, brain aging, anesthesia risks 3 / 40
M/EEG inverse problem for brain imaging ◮ sensors: electric and magnetic fields during a cognitive task 4 / 40
MEG elements: magnometers and gradiometers Sensors Detail of a sensor Device 5 / 40
M/EEG = MEG + EEG Photo Credit: Stephen Whitmarsh 6 / 40
Table of Contents Neuroimaging The M/EEG problem Stastistical model Estimation procedures Sparsity and Multi-task approaches √ Smoothing interpretation of concomitant and Lasso Optimization algorithm 7 / 40
Source modeling SPACE TIME Position a few thousands candidate sources over the brain ( e.g., every 5mm) 8 / 40
Design matrix - Forward operator 9 / 40
Mathematical model: linear regression 10 / 40
Experiments repeated r times Stimuli Stimulated M/EEG observed signals patient Repetition Repetition 11 / 40
M/EEG specifity # 1: combined measurements Sensor detail Device Sensors Structure of Y and X : 12 / 40
Sensor types & noise structure EEG covariance 0 N ave =55 EEG (59 channels) 20 5 uV blank 40 0 − 5 0 25 50 75 100 125 150 175 0 50 Gradiometers covariance 0 Gradiometers (203 channels) 200 fT/cm blank 0 100 0 25 50 75 100 125 150 175 200 0 200 Magnetometers (102 channels) Magnetometers covariance 500 0 fT blank 0 − 500 0 25 50 75 100 125 150 175 50 Time (ms) 100 0 100 13 / 40
M/EEG specificity # 2: averaging repetitions of experiment Stimuli Stimulated M/EEG observed signals patient Repetition Repetition 14 / 40
M/EEG specificity # 2: averaging repetitions of experiment Stimuli Stimulated M/EEG observed signals patient Averaged signal Repetition Repetition 14 / 40
M/EEG specificity # 2: averaged signals (EEG only) (EEG only) (EEG only) Limit on the repetitions: subject/patient fatigue 15 / 40
A multi-task framework Multi-task regression notation: ◮ n observations (number of sensors) ◮ T tasks (temporal information) ◮ p features (spatial description) ◮ r number of repetitions for the experiment ◮ Y (1) , . . . , Y ( r ) ∈ R n × T observation matrices; ¯ Y = 1 l Y ( l ) � r ◮ X ∈ R n × p forward matrix Y ( l ) = X B ∗ + S ∗ E ( l ) , where ◮ B ∗ ∈ R p × T : true source activity matrix ( unknown ) ++ co-standard deviation matrix (1) ( unknown ) ◮ S ∗ ∈ S n ◮ E (1) , . . . , E ( r ) ∈ R n × T : white noise (standard Gaussian) (1) S � σ means S − σ Id n is Semi-Definite Positive 16 / 40
Table of Contents Neuroimaging The M/EEG problem Stastistical model Estimation procedures Sparsity and Multi-task approaches √ Smoothing interpretation of concomitant and Lasso Optimization algorithm 17 / 40
Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40
Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds ◮ Wavelets for images (1990’s) (2) (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40
Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds ◮ Wavelets for images (1990’s) (2) ◮ Dictionary learning for images (2000’s) (3) (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40
Sparsity everywhere Signals can often be represented combining few atoms/features: ◮ Fourier decomposition for sounds ◮ Wavelets for images (1990’s) (2) ◮ Dictionary learning for images (2000’s) (3) ◮ Neuroimaging: measurements assumed to be explained by a few active brain sources (2) I. Daubechies. Ten lectures on wavelets . SIAM, 1992. (3) B. A. Olshausen and D. J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision research (1997). 18 / 40
Justification for dipolarity assumption Sparsity holds: dipolar patterns equivalent to focal sources ◮ short duration ◮ simple cognitive task ◮ repetitions of experiment average out other sources ◮ ICA recovers dipolar patterns, (4) well modeled by focal sources: (4) A. Delorme et al. “Independent EEG sources are dipolar”. In: PloS one 7.2 (2012), e30135. 19 / 40
(Structured) Sparsity inducing penalties (5) � 1 � ˆ 2 nT � Y − X B � 2 B ∈ arg min F + λ � B � 1 B ∈ R p × T Sparse support: no structure ✗ Lasso penalty sources p T � � � � B � 1 | B jt | j =1 t =1 time (5) G. Obozinski, B. Taskar, and M. I. Jordan. “Joint covariate selection and joint subspace selection for multiple classification problems”. In: Statistics and Computing 20.2 (2010), pp. 231–252. 20 / 40
(Structured) Sparsity inducing penalties (5) � 1 � ˆ 2 nT � Y − X B � 2 B ∈ arg min F + λ � B � 2 , 1 B ∈ R p × T Sparse support: group structure ✓ Group-Lasso penalty sources p � � B � 2 , 1 � � B j : � 2 j =1 with B j : , j -th row of B time (5) G. Obozinski, B. Taskar, and M. I. Jordan. “Joint covariate selection and joint subspace selection for multiple classification problems”. In: Statistics and Computing 20.2 (2010), pp. 231–252. 20 / 40
Data-fitting term and experiment repetitions ◮ Classical estimator: use averaged (6) signal ¯ Y � 1 2 � � � ˆ � ¯ B ∈ arg min Y − X B F + λ � B � 2 , 1 � � 2 nT � B ∈ R p × T ◮ How to take advantage of the number of repetitions? Intuitive estimator: r � � 1 2 B repet ∈ arg min � Y ( l ) − X B � � ˆ � F + λ � B � 2 , 1 � � 2 nTr � B ∈ R p × T l =1 (6) & whitened, say using baseline data 21 / 40
Data-fitting term and experiment repetitions ◮ Classical estimator: use averaged (6) signal ¯ Y � 1 2 � � � ˆ � ¯ B ∈ arg min Y − X B F + λ � B � 2 , 1 � � 2 nT � B ∈ R p × T ◮ How to take advantage of the number of repetitions? Intuitive estimator: r � � 1 2 B repet ∈ arg min � Y ( l ) − X B � � ˆ � F + λ � B � 2 , 1 � � 2 nTr � B ∈ R p × T l =1 B repet = ˆ ◮ Fail: ˆ B (because of datafit �·� 2 F ) (6) & whitened, say using baseline data 21 / 40
Data-fitting term and experiment repetitions ◮ Classical estimator: use averaged (6) signal ¯ Y � 1 2 � � � ˆ � ¯ B ∈ arg min Y − X B F + λ � B � 2 , 1 � � 2 nT � B ∈ R p × T ◮ How to take advantage of the number of repetitions? Intuitive estimator: r � � 1 2 B repet ∈ arg min � Y ( l ) − X B � � ˆ � F + λ � B � 2 , 1 � � 2 nTr � B ∈ R p × T l =1 B repet = ˆ ◮ Fail: ˆ B (because of datafit �·� 2 F ) → investigate other datafits ֒ (6) & whitened, say using baseline data 21 / 40
Table of Contents Neuroimaging The M/EEG problem Stastistical model Estimation procedures Sparsity and Multi-task approaches √ Smoothing interpretation of concomitant and Lasso Optimization algorithm 22 / 40
Lasso (7) , (8) : the “modern least-squares” (9) 1 2 n � y − Xβ � 2 + λ � β � 1 ˆ β ∈ arg min β ∈ R p ◮ y ∈ R n : observations ◮ X ∈ R n × p : design matrix ◮ sparsity : for λ large enough, � ˆ β � 0 ≪ p (7) R. Tibshirani. “Regression Shrinkage and Selection via the Lasso”. In: J. R. Stat. Soc. Ser. B Stat. Methodol. 58.1 (1996), pp. 267–288. (8) S. S. Chen and D. L. Donoho. “Atomic decomposition by basis pursuit”. In: SPIE . 1995. (9) E. J. Candès, M. B. Wakin, and S. P. Boyd. “Enhancing Sparsity by Reweighted l 1 Minimization”. In: J. Fourier Anal. Applicat. 14.5-6 (2008), pp. 877–905. 23 / 40
Recommend
More recommend