Introduction Deep Architecture Results Summary UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , Yann Dauphin 1 , Xavier Glorot 1 , Gr´ Salah Rifai 1 , Yoshua Bengio 1 et al. 1 LISA, Universit´ e de Montr´ eal, Canada 2 LITIS, Universit´ e de Rouen, France July 2 nd 2011 UTL Challenge, ICML Workshop 1/ 25
Introduction Deep Architecture Results Summary Plan Introduction 1 Deep Architecture 2 Preprocessing Feature Extraction Postprocessing Results 3 Summary 4 UTL Challenge, ICML Workshop 2/ 25
Introduction Deep Architecture Results Summary UTL Challenge Presentation Dates : Phase 1 : Unsupervised Learning ; start : january 3, end : march 4. Phase 2 : Transfer Learning ; start : march 4, end : april 15. Five different Data sets : data set # samples dimension sparsity AVICENNA Arabic Manuscripts 150205 120 0 % HARRY Human actions 69652 5000 98 % RITA CIFAR-10 111808 7200 1 % SYLVESTER Ecology 572820 100 0 % TERRY NLP 217034 47236 99 % UTL Challenge, ICML Workshop 3/ 25
Introduction Deep Architecture Results Summary UTL Challenge Evaluation ALC : Area under Learning Curve 1 to 64 samples per class UTL Challenge, ICML Workshop 4/ 25
Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? UTL Challenge, ICML Workshop 5/ 25
Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) UTL Challenge, ICML Workshop 5/ 25
Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) UTL Challenge, ICML Workshop 5/ 25
Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) ALC with the given labels (Phase 2) UTL Challenge, ICML Workshop 5/ 25
Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) ALC with the given labels (Phase 2) UTL Challenge, ICML Workshop 5/ 25
Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) ALC with the given labels (Phase 2) From phase 1 to phase 2, we over-explored the hyperparameters of the next models to grab the 1 st place. UTL Challenge, ICML Workshop 5/ 25
Introduction Deep Architecture Results Summary Deep Architecture Stack different blocks We used this template : Pre-processing : PCA w/wo whitening, Contrast Normalization, 1 Uniformization Feature Extraction : Rectifiers, DAE, CAE, µ -ss-RBM 2 Post-processing : Transductive PCA 3 UTL Challenge, ICML Workshop 6/ 25
Introduction Deep Architecture Results Summary Deep Architecture Stack different blocks We used this template : Pre-processing : PCA w/wo whitening, Contrast Normalization, 1 Uniformization Feature Extraction : Rectifiers, DAE, CAE, µ -ss-RBM 2 Post-processing : Transductive PCA 3 UTL Challenge, ICML Workshop 6/ 25
Introduction Deep Architecture Results Summary Deep Architecture Stack different blocks We used this template : Pre-processing : PCA w/wo whitening, Contrast Normalization, 1 Uniformization Feature Extraction : Rectifiers, DAE, CAE, µ -ss-RBM 2 Post-processing : Transductive PCA 3 UTL Challenge, ICML Workshop 6/ 25
Introduction Deep Architecture Results Summary Deep Architecture Stack different blocks We used this template : Pre-processing : PCA w/wo whitening, Contrast Normalization, 1 Uniformization Feature Extraction : Rectifiers, DAE, CAE, µ -ss-RBM 2 Post-processing : Transductive PCA 3 UTL Challenge, ICML Workshop 6/ 25
Introduction Deep Architecture Results Summary Plan Introduction 1 Deep Architecture 2 Preprocessing Feature Extraction Postprocessing Results 3 Summary 4 UTL Challenge, ICML Workshop 7/ 25
Introduction Deep Architecture Results Summary Preprocessing Given a training set D = { x ( j ) } j =1 ... n where x ( j ) ∈ R d : Uniformization (t-IDF) Rank all the x ( j ) and map them to [0 , 1] i Contrast Normalization For each x ( j ) , compute its mean µ ( j ) = � d i =1 x ( j ) and its i deviation σ ( j ) . x ( j ) ← ( x ( j ) − µ ( j ) ) /σ ( j ) Principal Component Analysis with/without whitening i.e divide by the squared root eigen value or not. UTL Challenge, ICML Workshop 8/ 25
Introduction Deep Architecture Results Summary Preprocessing Given a training set D = { x ( j ) } j =1 ... n where x ( j ) ∈ R d : Uniformization (t-IDF) Rank all the x ( j ) and map them to [0 , 1] i Contrast Normalization For each x ( j ) , compute its mean µ ( j ) = � d i =1 x ( j ) and its i deviation σ ( j ) . x ( j ) ← ( x ( j ) − µ ( j ) ) /σ ( j ) Principal Component Analysis with/without whitening i.e divide by the squared root eigen value or not. UTL Challenge, ICML Workshop 8/ 25
Introduction Deep Architecture Results Summary Preprocessing Given a training set D = { x ( j ) } j =1 ... n where x ( j ) ∈ R d : Uniformization (t-IDF) Rank all the x ( j ) and map them to [0 , 1] i Contrast Normalization For each x ( j ) , compute its mean µ ( j ) = � d i =1 x ( j ) and its i deviation σ ( j ) . x ( j ) ← ( x ( j ) − µ ( j ) ) /σ ( j ) Principal Component Analysis with/without whitening i.e divide by the squared root eigen value or not. UTL Challenge, ICML Workshop 8/ 25
Introduction Deep Architecture Results Summary Preprocessing Given a training set D = { x ( j ) } j =1 ... n where x ( j ) ∈ R d : Uniformization (t-IDF) Rank all the x ( j ) and map them to [0 , 1] i Contrast Normalization For each x ( j ) , compute its mean µ ( j ) = � d i =1 x ( j ) and its i deviation σ ( j ) . x ( j ) ← ( x ( j ) − µ ( j ) ) /σ ( j ) Principal Component Analysis with/without whitening i.e divide by the squared root eigen value or not. UTL Challenge, ICML Workshop 8/ 25
Introduction Deep Architecture Results Summary Plan Introduction 1 Deep Architecture 2 Preprocessing Feature Extraction Postprocessing Results 3 Summary 4 UTL Challenge, ICML Workshop 9/ 25
Introduction Deep Architecture Results Summary Feature Extraction µ -ss-RBM µ -Spike & Slab Restricted Boltzmann Machine modelizes the interac- tion between three random vectors : visible vector v representing the observed data 1 binary “ spike ” variables h 2 real-valued “ slab ” variables s 3 UTL Challenge, ICML Workshop 10/ 25
Introduction Deep Architecture Results Summary Feature Extraction µ -ss-RBM µ -Spike & Slab Restricted Boltzmann Machine modelizes the interac- tion between three random vectors : visible vector v representing the observed data 1 binary “ spike ” variables h 2 real-valued “ slab ” variables s 3 It is defined by the energy function : N � N � v T W i s i h i + 1 � 2 v T � E ( v , s , h ) = − Λ + Φ i h i v i =1 i =1 N N N N 1 � � � � 2 s T µ T µ T + i α i s i − i α i s i h i − b i h i + i α i µ i h i , i =1 i =1 i =1 i =1 In training, we use Persistent Contrastive Divergence with a Gibbs Sampling procedure. UTL Challenge, ICML Workshop 10/ 25
Introduction Deep Architecture Results Summary Feature Extraction µ -ss-RBM more details in A.Courville, J.Bergstra and Y.Bengio, Unsupervised Models of Images by Spike-and-Slab RBMs , ICML 2011 . Pools of filters learned on CIFAR-10 UTL Challenge, ICML Workshop 11/ 25
Introduction Deep Architecture Results Summary Feature Extraction Denoising Autoencoders A Denoising Autoencoder is an autoencoder trained to denoise artifi- cially corrupted training samples. x = x + ǫ where ǫ ∼ N (0 , σ 2 ) Corruption e.g ˜ Encoder : h (˜ x ) = s ( W ˜ x + b ) where s is the sigmoid function. ′ (tied weights). x ) = W T h (˜ Decoder : r (˜ x ) + b UTL Challenge, ICML Workshop 12/ 25
Introduction Deep Architecture Results Summary Feature Extraction Denoising Autoencoders A Denoising Autoencoder is an autoencoder trained to denoise artifi- cially corrupted training samples. x = x + ǫ where ǫ ∼ N (0 , σ 2 ) Corruption e.g ˜ Encoder : h (˜ x ) = s ( W ˜ x + b ) where s is the sigmoid function. ′ (tied weights). x ) = W T h (˜ Decoder : r (˜ x ) + b Different loss functions to be minimized using stochastic gradient de- scent : x ) − x � 2 � r (˜ 2 (linear reconstruction and MSE) x )) − x � 2 � s ( r (˜ 2 (non-linear reconstruction) − � i x i log r (˜ x i ) − (1 − x i ) log(1 − r (˜ x i )) (cross-entropy) UTL Challenge, ICML Workshop 12/ 25
Recommend
More recommend