Learning from random moments Rémi Gribonval - Inria Rennes - Bretagne Atlantique remi.gribonval@inria.fr Joint work with: G. Blanchard (U. Potsdam) N. Keriven, Y Traonmilin (Inria Rennes) 1
Main Contributors & Collaborators Anthony Bourrier Nicolas Keriven Yann Traonmilin Gilles Puy Nicolas Tremblay Gilles Blanchard Mike Davies Patrick Perez R. GRIBONVAL 2 Inverse Problems and Machine Learning, Caltech, February 2018
Foreword Signal processing & machine learning inverse problems & generalized method of moments embeddings with random projections & random features /kernels image super-resolution, source localization & k-means Continuous vs discrete ? wavelets (1990s): from continuous to discrete compressive sensing (2000s): in the discrete world current trends : back to continuous ! off-the-grid compressive sensing, FRI, high-resolution methods compressive statistical learning from random moments R. GRIBONVAL 3 Inverse Problems and Machine Learning, Caltech, February 2018
Learning from random moments: the concept Compressive Statistical Learning (guarantees) Recent developments & perspectives R. GRIBONVAL 4 Inverse Problems and Machine Learning, Caltech, February 2018
Large-scale learning X x n x 2 x 1 R. GRIBONVAL 5 Inverse Problems and Machine Learning, Caltech, February 2018
Large-scale learning X x n x 2 x 1 High feature dimension d Large collection size n = “volume” R. GRIBONVAL 5 Inverse Problems and Machine Learning, Caltech, February 2018
Large-scale learning X x n x 2 x 1 High feature dimension d Large collection size n = “volume” Challenge: compress before learning ? X R. GRIBONVAL 5 Inverse Problems and Machine Learning, Caltech, February 2018
Compressive learning: three routes X x n x 2 x 1 dimension subsampling sketching reduction Y = MX random projections - Johnson Lindenstrauss lemma see e.g. [Calderbank & al 2009, Reboredo & al 2013] R. GRIBONVAL 6 Inverse Problems and Machine Learning, Caltech, February 2018
Compressive learning: three routes X x n x 2 x 1 dimension subsampling sketching reduction x 2 x n x 1 Nyström method & coresets see e.g. [Williams&Seeger 2000, Agarwal & al 2003, Felman 2010] R. GRIBONVAL 7 Inverse Problems and Machine Learning, Caltech, February 2018
Compressive learning: three routes X x n x 2 x 1 dimension subsampling random reduction moments E Φ 1 ( X ) z ∈ R m … E Φ m ( X ) Inspiration : compressive sensing [Foucart & Rauhut 2013] sketching/hashing [Thaper & al 2002, Cormode & al 2005] Connections with : generalized method of moments [Hall 2005] kernel mean embeddings [Smola & al 2007, Sriperimbudur & al 2010] R. GRIBONVAL 8 Inverse Problems and Machine Learning, Caltech, February 2018
Example: Compressive K-means X Training set n = 70000; d = 784; k = 10 R. GRIBONVAL 9 Inverse Problems and Machine Learning, Caltech, February 2018
Example: Compressive K-means X Training set n = 70000; d = k = 10 1 1 0.5 0.5 Dim. 6 Dim. 6 0 0 -0.5 -0.5 -1 -1 -1 -1 -0.5 -0.5 0 0 0.5 0.5 1 1 Dim. 5 Dim. 5 Spectral features R. GRIBONVAL 9 Inverse Problems and Machine Learning, Caltech, February 2018
Example: Compressive K-means Sketch vector X Training set memory size independent of n n = 70000; d = k = 10 m & kd = 100 1 1 Sketch( X ) n = 1 X z ∈ R m Φ ( x i ) 0.5 0.5 n streaming / distributed i =1 computation Dim. 6 Dim. 6 0 0 -0.5 -0.5 -1 -1 -1 -1 -0.5 -0.5 0 0 0.5 0.5 1 1 Dim. 5 Dim. 5 Spectral features R. GRIBONVAL 9 Inverse Problems and Machine Learning, Caltech, February 2018
Example: Compressive K-means Sketch vector memory size independent of n n = 70000; d = k = 10 m & kd = 100 n = 1 X z ∈ R m Φ ( x i ) n streaming / distributed i =1 computation Privacy-aware R. GRIBONVAL 9 Inverse Problems and Machine Learning, Caltech, February 2018
Example: Compressive K-means Sketch vector memory size independent of n n = 70000; d = k = 10 m & kd = 100 1 1 n = 1 X z ∈ R m Φ ( x i ) 0.5 0.5 n streaming / distributed i =1 computation Privacy-aware Dim. 6 Dim. 6 0 0 -0.5 -0.5 Learn centroids from sketch = moment fitting -1 -1 -1 -1 -0.5 -0.5 0 0 0.5 0.5 1 1 Dim. 5 Dim. 5 R. GRIBONVAL 9 Inverse Problems and Machine Learning, Caltech, February 2018
Example: Compressive K-means Sketch vector memory size independent of n n = 70000; d = k = 10 m & kd = 100 1 1 1 1 n = 1 X z ∈ R m Φ ( x i ) 0.5 0.5 0.5 0.5 n streaming / distributed i =1 computation Privacy-aware Dim. 6 Dim. 6 Dim. 6 Dim. 6 0 0 0 0 -0.5 -0.5 -0.5 -0.5 Learn centroids from sketch = moment fitting -1 -1 -1 -1 -1 -1 -1 -1 -0.5 -0.5 -0.5 -0.5 0 0 0 0 0.5 0.5 0.5 0.5 1 1 1 1 Dim. 5 Dim. 5 Dim. 5 Dim. 5 R. GRIBONVAL 9 Inverse Problems and Machine Learning, Caltech, February 2018
Example: Compressive K-means Sketch vector memory size independent of n n = 70000; d = k = 10 m & kd = 100 1 1 1 1 n = 1 X z ∈ R m Φ ( x i ) 0.5 0.5 0.5 0.5 n streaming / distributed i =1 computation Privacy-aware Dim. 6 Dim. 6 Dim. 6 Dim. 6 0 0 0 0 -0.5 -0.5 -0.5 -0.5 Learn centroids from sketch = moment fitting -1 -1 -1 -1 -1 -1 -1 -1 -0.5 -0.5 -0.5 -0.5 0 0 0 0 0.5 0.5 0.5 0.5 1 1 1 1 Dim. 5 Dim. 5 Dim. 5 Dim. 5 Using: random Fourier features Φ ( x ) := { e ı ω T j x } m j =1 Vector -valued function R. GRIBONVAL 9 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Neural networks Sketching for k-means empirical characteristic function n z ` = 1 e jw > X ` x i n i =1 R. GRIBONVAL 10 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Neural networks Sketching for k-means empirical characteristic function n z ` = 1 e jw > X ` x i n i =1 X R. GRIBONVAL 10 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Neural networks Sketching for k-means empirical characteristic function n z ` = 1 e jw > X ` x i n i =1 w T ` X w T ` X R. GRIBONVAL 10 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Neural networks Sketching for k-means empirical characteristic function n z ` = 1 e jw > X ` x i n i =1 W m WX X R. GRIBONVAL 10 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Neural networks Sketching for k-means empirical characteristic function n z ` = 1 e jw > X ` x i n i =1 h ( · ) = e j ( · ) h ( WX ) W m WX X R. GRIBONVAL 10 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Neural networks Sketching for k-means m empirical characteristic function n z z ` = 1 e jw > X ` x i n i =1 average h ( · ) = e j ( · ) h ( WX ) W m WX X R. GRIBONVAL 10 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Neural networks Sketching for k-means m ~ One-layer random empirical characteristic function neural net n z z ` = 1 DNN ~ hierarchical sketching ? e jw > X ` x i see also [Bruna & al 2013, Giryes & al 2015] n i =1 average h ( · ) = e j ( · ) h ( WX ) W m WX X R. GRIBONVAL 10 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Privacy Sketching ~ One-layer random empirical characteristic function neural net n z z ` = 1 DNN ~ hierarchical sketching ? e jw > X ` x i see also [Bruna & al 2013, Giryes & al 2015] n i =1 average h ( · ) = e j ( · ) Privacy-reserving sketch and forget h ( WX ) W WX X R. GRIBONVAL 11 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Online Learning Sketching Streaming algorithms empirical characteristic function One pass; online update n z z ` = 1 e jw > X ` x i n i =1 average h ( · ) = e j ( · ) h ( WX ) W WX streaming … X R. GRIBONVAL 12 Inverse Problems and Machine Learning, Caltech, February 2018
Sketching & Distributed Computing Sketching Distributed computing empirical characteristic function Decentralized (HADOOP) / n parallel (GPU) z z ` = 1 e jw > X ` x i n i =1 average h ( · ) = e j ( · ) DIS TRI BU TED h ( WX ) W WX … … … … X R. GRIBONVAL 13 Inverse Problems and Machine Learning, Caltech, February 2018
Learning from random moments: the concept Compressive Statistical Learning (guarantees) Recent developments & perspectives R. GRIBONVAL 14 Inverse Problems and Machine Learning, Caltech, February 2018
Statistical learning 101 Statistical risk R ( p, ✓ ) = E x ∼ p ` ( x, ✓ ) θ ? ∈ arg min Target R ( p ? , θ ) ✓ ˆ Empirical version θ n ∈ arg min R (ˆ p n , θ ) n θ X p n := 1 x i ∼ p ? , i.i.d. δ x i ˆ n i =1 PAC / e xcess risk control / generalization error R ( p ? , ˆ θ n ) ≤ R ( p ? , θ ? ) + η n p n , θ ) − R ( p ? , θ ) | ≤ η n / 2 can be achieved if uniform convergence, i.e. whp sup | R (ˆ ✓ R. GRIBONVAL 15 Inverse Problems and Machine Learning, Caltech, February 2018
Recommend
More recommend