Rémi Gribonval Inria Rennes - Bretagne Atlantique remi.gribonval@inria.fr
Contributors & Collaborators Anthony Bourrier Nicolas Keriven Yann Traonmilin Gilles Puy Gilles Blanchard Mike Davies Tomer Peleg Patrick Perez 2 R. GRIBONVAL - CSA 2015 - Berlin
Agenda From Compressive Sensing to Compressive Learning ? Information-preserving projections & sketches Compressive Clustering / Compressive GMM Conclusion 3 R. GRIBONVAL - CSA 2015 - Berlin
Machine Learning Available data X training collection of feature vectors = point cloud Goals infer parameters to achieve a certain task generalization to future samples with the same probability distribution Examples PCA Clustering Dictionary learning Classification principal subspace centroids dictionary classifier parameters (e.g. support vectors) 4 R. GRIBONVAL - CSA 2015 - Berlin
Challenging dimensions Point cloud = large matrix of feature vectors X 5 R. GRIBONVAL - CSA 2015 - Berlin
Challenging dimensions Point cloud = large matrix of feature vectors X x 1 5 R. GRIBONVAL - CSA 2015 - Berlin
Challenging dimensions Point cloud = large matrix of feature vectors X x 1 x 2 5 R. GRIBONVAL - CSA 2015 - Berlin
Challenging dimensions Point cloud = large matrix of feature vectors … X X x 1 x 2 x N 5 R. GRIBONVAL - CSA 2015 - Berlin
Challenging dimensions Point cloud = large matrix of feature vectors … X X x 1 x 2 x N High feature dimension n Large collection size N 5 R. GRIBONVAL - CSA 2015 - Berlin
Challenging dimensions Point cloud = large matrix of feature vectors … X X x 1 x 2 x N High feature dimension n Large collection size N Challenge: compress before learning ? X 5 R. GRIBONVAL - CSA 2015 - Berlin
Compressive Machine Learning ? Point cloud = large matrix of feature vectors … X X x 1 x 2 x N M … Y = MX y 1 y 2 y N 6 R. GRIBONVAL - CSA 2015 - Berlin
Compressive Machine Learning ? Point cloud = large matrix of feature vectors … X X x 1 x 2 x N Reduce feature dimension [Calderbank & al 2009, Reboredo & al 2013] (Random) feature projection Exploits / needs low-dimensional feature model … Y = MX y 1 y 2 y N 6 R. GRIBONVAL - CSA 2015 - Berlin
Challenges of large collections Feature projection: limited impact X Y = MX 7 R. GRIBONVAL - CSA 2015 - Berlin
Challenges of large collections Feature projection: limited impact X Y = MX “Big Data” Challenge: compress collection size 7 R. GRIBONVAL - CSA 2015 - Berlin
Compressive Machine Learning ? Point cloud = … empirical probability distribution X 8 R. GRIBONVAL - CSA 2015 - Berlin
Compressive Machine Learning ? Point cloud = … empirical probability distribution X Reduce collection dimension coresets see e.g. [Agarwal & al 2003, Felman 2010] sketching & hashing see e.g. [Thaper & al 2002, Cormode & al 2005] 8 R. GRIBONVAL - CSA 2015 - Berlin
Compressive Machine Learning ? Point cloud = … empirical probability distribution M z ∈ R m X Sketching operator nonlinear in the feature vectors linear in their probability distribution Reduce collection dimension coresets see e.g. [Agarwal & al 2003, Felman 2010] sketching & hashing see e.g. [Thaper & al 2002, Cormode & al 2005] 8 R. GRIBONVAL - CSA 2015 - Berlin
Compressive Machine Learning ? Point cloud = … empirical probability distribution M z ∈ R m X Sketching operator nonlinear in the feature vectors linear in their probability distribution Reduce collection dimension coresets see e.g. [Agarwal & al 2003, Felman 2010] sketching & hashing see e.g. [Thaper & al 2002, Cormode & al 2005] 8 R. GRIBONVAL - CSA 2015 - Berlin
Example: Compressive Clustering M z ∈ R m X N = 1000; n = 2 m = 60 Recovery algorithm estimated centroids ground truth 9 R. GRIBONVAL - CSA 2015 - Berlin
Computational impact of sketching Computation time Memory Memory (bytes) Time (s) Collection size N Collection size N Ph.D. A. Bourrier & N. Keriven 10 R. GRIBONVAL - CSA 2015 - Berlin
The Sketch Trick Data distribution X ∼ p ( x ) Sketch N z ` = 1 X h ` ( x i ) N i =1 11 R. GRIBONVAL - CSA 2015 - Berlin
The Sketch Trick Data distribution X ∼ p ( x ) Sketch N z ` = 1 X h ` ( x i ) N i =1 ≈ E h ` ( X ) 11 R. GRIBONVAL - CSA 2015 - Berlin
The Sketch Trick Data distribution X ∼ p ( x ) Sketch N z ` = 1 X h ` ( x i ) N i =1 ≈ E h ` ( X ) Z = h ` ( x ) p ( x ) dx 11 R. GRIBONVAL - CSA 2015 - Berlin
The Sketch Trick Data distribution X ∼ p ( x ) Sketch N z ` = 1 X h ` ( x i ) N i =1 ≈ E h ` ( X ) Z = h ` ( x ) p ( x ) dx nonlinear in the feature vectors linear in the distribution p(x) 11 R. GRIBONVAL - CSA 2015 - Berlin
The Sketch Trick Signal Processing Machine Learning Data distribution inverse problems method of moments compressive sensing compressive learning X ∼ p ( x ) Signal Sketch space Probability N space z ` = 1 X h ` ( x i ) N i =1 p x ≈ E h ` ( X ) Linear M M Z “projection” = h ` ( x ) p ( x ) dx nonlinear in the feature vectors z y linear in the distribution p(x) Observation space Sketch space 11 R. GRIBONVAL - CSA 2015 - Berlin
The Sketch Trick Information preservation ? Signal Processing Machine Learning Data distribution inverse problems method of moments compressive sensing compressive learning X ∼ p ( x ) Signal Sketch space Probability N space z ` = 1 X h ` ( x i ) N i =1 p x ≈ E h ` ( X ) Linear M M Z “projection” = h ` ( x ) p ( x ) dx nonlinear in the feature vectors z y linear in the distribution p(x) Observation space Sketch space 11 R. GRIBONVAL - CSA 2015 - Berlin
The Sketch Trick Dimension reduction ? Signal Processing Machine Learning Data distribution inverse problems method of moments compressive sensing compressive learning X ∼ p ( x ) Signal Sketch space Probability N space z ` = 1 X h ` ( x i ) N i =1 p x ≈ E h ` ( X ) Linear M M Z “projection” = h ` ( x ) p ( x ) dx nonlinear in the feature vectors z y linear in the distribution p(x) Observation space Sketch space 12 R. GRIBONVAL - CSA 2015 - Berlin
Information preserving projections
Stable recovery Signal space R n Ex: set of k -sparse vectors Model set Σ Σ k = { x 2 R n , k x k 0 k } = signals of interest x Linear M “projection” y Observation space R m m ⌧ n 14 R. GRIBONVAL - CSA 2015 - Berlin
Stable recovery Signal space R n Ex: set of k -sparse vectors Model set Σ Σ k = { x 2 R n , k x k 0 k } = signals of interest Ideal goal : build decoder ∆ with the guarantee that x Recovery k x � ∆ ( M x + e ) k C k e k , 8 x 2 Σ algorithm ∆ Linear M “projection” = (instance optimality [Cohen & al 2009] ) “decoder” y Observation space R m m ⌧ n 14 R. GRIBONVAL - CSA 2015 - Berlin
Stable recovery Signal space R n Ex: set of k -sparse vectors Model set Σ Σ k = { x 2 R n , k x k 0 k } = signals of interest Ideal goal : build decoder ∆ with the guarantee that x Recovery k x � ∆ ( M x + e ) k C k e k , 8 x 2 Σ algorithm ∆ Linear M “projection” = (instance optimality [Cohen & al 2009] ) “decoder” y Observation space R m Are there such decoders? m ⌧ n 14 R. GRIBONVAL - CSA 2015 - Berlin
Stable recovery of k-sparse vectors Typical decoders L1 minimization ∆ ( y ) := arg min k x k 1 x : M x = y LASSO [Tibshirani 1994] ,Basis Pursuit [Chen & al 1999] Greedy algorithms (Orthonormal) Matching Pursuit [Mallat & Zhang 1993] , Iterative Hard Thresholding (IHT) [Blumensath & Davies 2009] , … Guarantees Assume Restricted isometry property [Candès & al 2004] 1 � δ k M z k 2 Exact recovery 2 1 + δ k z k 2 Stability to noise 2 Robustness to model error when k z k 0 2 k 15 R. GRIBONVAL - CSA 2015 - Berlin
Stable recovery Signal space R n Model set Low-dimensional model Σ Sparse = signals of interest x Linear M “projection” y Observation space R m m ⌧ n 16 R. GRIBONVAL - CSA 2015 - Berlin
Stable recovery Signal space R n Model set Low-dimensional model Σ Sparse = signals of interest Sparse in dictionary D x Linear M “projection” y Observation space R m m ⌧ n 17 R. GRIBONVAL - CSA 2015 - Berlin
Stable recovery Signal space R n Model set Low-dimensional model Σ Sparse = signals of interest Sparse in dictionary D Co-sparse in analysis operator A total variation, physics-driven sparse models .. x Linear M “projection” y Observation space R m m ⌧ n 18 R. GRIBONVAL - CSA 2015 - Berlin
Stable recovery Signal space R n Model set Low-dimensional model Σ Sparse = signals of interest Sparse in dictionary D Co-sparse in analysis operator A total variation, physics-driven sparse models … Low-rank matrix or tensor x matrix completion, phase-retrieval, blind sensor calibration … Linear M “projection” y Observation space R m m ⌧ n 19 R. GRIBONVAL - CSA 2015 - Berlin
Recommend
More recommend