CSCE 479/879 Lecture 5: Autoencoders CSCE 479/879 Lecture 5: Stephen Scott Autoencoders Introduction Basic Idea Stacked AE Stephen Scott Transposed Convolutions Denoising AE (Adapted from Eleanor Quint and Ian Goodfellow) Sparse AE Contractive AE Variational AE t-SNE GAN sscott@cse.unl.edu 1 / 41
Introduction CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Autoencoding is training a network to replicate its Introduction input to its output Basic Idea Applications: Stacked AE Transposed Unlabeled pre-training for semi-supervised learning Convolutions Learning embeddings to support information retrieval Denoising AE Generation of new instances similar to those in the Sparse AE training set Contractive Data compression AE Variational AE t-SNE GAN 2 / 41
Outline CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Basic idea Introduction Stacking Basic Idea Stacked AE Types of autoencoders Transposed Denoising Convolutions Sparse Denoising AE Contractive Sparse AE Variational Contractive Generative adversarial networks AE Variational AE t-SNE GAN 3 / 41
Basic Idea (Mitchell, 1997) CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive Sigmoid activation functions, 5000 training epochs, AE square loss, no regularization Variational AE t-SNE What’s special about the hidden layer outputs? GAN 4 / 41
Basic Idea CSCE 479/879 An autoencoder is a network trained to learn the Lecture 5: Autoencoders identity function: output = input Stephen Scott Subnetwork called Introduction encoder f ( · ) maps input Basic Idea to an embedded Stacked AE Transposed representation Convolutions Subnetwork called Denoising AE decoder g ( · ) maps back Sparse AE Contractive to input space AE Variational AE Can be thought of as lossy compression of input t-SNE GAN Need to identify the important attributes of inputs to reproduce faithfully 5 / 41
Basic Idea CSCE 479/879 Lecture 5: Autoencoders Stephen Scott General types of autoencoders based on size of hidden layer Introduction Undercomplete autoencoders have hidden layer size Basic Idea smaller than input layer size Stacked AE ⇒ Dimension of embedded space lower than that of input Transposed Convolutions space ⇒ Cannot simply memorize training instances Denoising AE Overcomplete autoencoders have much larger hidden Sparse AE layer sizes Contractive AE ⇒ Regularize to avoid overfitting, e.g., enforce a sparsity Variational AE constraint t-SNE GAN 6 / 41
Basic Idea Example: Principal Component Analysis CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive A 3-2-3 autoencoder with linear units and square loss AE performs principal component analysis : Find linear Variational AE t-SNE transformation of data to maximize variance GAN 7 / 41
Stacked Autoencoders CSCE 479/879 Lecture 5: Autoencoders A stacked Stephen Scott autoencoder Introduction has multiple Basic Idea hidden layers Stacked AE Transposed Convolutions Denoising AE Can share parameters to reduce their number by exploiting symmetry: W 4 = W ⊤ 1 and W 3 = W ⊤ Sparse AE 2 Contractive AE Variational AE weights1 = tf.Variable(weights1_init, dtype=tf.float32, name="weights1") weights2 = tf.Variable(weights2_init, dtype=tf.float32, name="weights2") t-SNE weights3 = tf.transpose(weights2, name="weights3") # shared weights weights4 = tf.transpose(weights1, name="weights4") # shared weights GAN 8 / 41
Stacked Autoencoders Incremental Training CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Can simplify training by starting with single hidden Contractive layer H 1 AE Variational AE Then, train a second AE to mimic the output of H 1 t-SNE Insert this into first network GAN Can build by using H 1 ’s output as training set for Phase 2 9 / 41
Stacked Autoencoders Incremental Training (Single TF Graph) CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive AE Variational AE Previous approach requires multiple TensorFlow graphs t-SNE GAN Can instead train both phases in a single graph: First left side, then right 10 / 41
Stacked Autoencoders Visualization CSCE Input MNIST Digit Network Output 479/879 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive AE Weights (features selected) for five nodes from H 1 : Variational AE t-SNE GAN 11 / 41
Stacked Autoencoders Semi-Supervised Learning CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive AE Variational AE t-SNE Can pre-train network with unlabeled data GAN ⇒ learn useful features and then train “logic” of dense layer with labeled data 12 / 41
Transfer Learning from Trained Classifier CSCE 479/879 Lecture 5: Can also Autoencoders Stephen Scott transfer from a classifier Introduction trained on Basic Idea different task, Stacked AE e.g., transfer a Transposed Convolutions GoogleNet Denoising AE architecture to Sparse AE ultrasound Contractive AE classification Variational AE t-SNE Often choose existing one from a model zoo GAN 13 / 41
Transposed Convolutions CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Introduction What if some encoder layers are convolutional? How to Basic Idea upsample to original resolution? Stacked AE Can use, e.g., linear interpolation , bilinear Transposed Convolutions interpolation , etc. Denoising AE Or, transposed convolution , e.g., Sparse AE tf.layers.conv2d transpose Contractive AE Variational AE t-SNE GAN 14 / 41
Transposed Convolutions (2) CSCE 479/879 Lecture 5: Autoencoders Consider this example convolution Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive AE Variational AE t-SNE GAN 15 / 41
Transposed Convolutions (3) CSCE 479/879 Lecture 5: An alternative way of representing the kernel Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive AE Variational AE t-SNE GAN 16 / 41
Transposed Convolutions (4) CSCE This representation works with matrix multiplication on 479/879 Lecture 5: flattened input: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive AE Variational AE t-SNE GAN 17 / 41
Transposed Convolutions (5) Transpose kernel, multiply by flat 2 × 2 to get flat 4 × 4 CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive AE Variational AE t-SNE GAN 18 / 41
Denoising Autoencoders Vincent et al. (2010) CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Can train an autoencoder to learn to denoise input by giving input corrupted instance ˜ x and targeting Introduction uncorrupted instance x Basic Idea Stacked AE Example noise models: Transposed x = x + z , where z ∼ N ( 0 , σ 2 I ) Gaussian noise: ˜ Convolutions Masking noise: zero out some fraction ν of Denoising AE components of x Sparse AE Salt-and-pepper noise: choose some fraction ν of Contractive AE components of x and set each to its min or max value (equally likely) Variational AE t-SNE GAN 19 / 41
Denoising Autoencoders CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive AE Variational AE t-SNE GAN 20 / 41
Denoising Autoencoders Example CSCE 479/879 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Transposed Convolutions Denoising AE Sparse AE Contractive AE Variational AE t-SNE GAN 21 / 41
Denoising Autoencoders CSCE How does it work? 479/879 Lecture 5: Even though, e.g., MNIST data are in a Autoencoders 784-dimensional space, they lie on a low-dimensional Stephen Scott manifold that captures their most important features Introduction Corruption process moves instance x off of manifold Basic Idea Encoder f θ and decoder g θ ′ are trained to project ˜ x back Stacked AE onto manifold Transposed Convolutions Denoising AE Sparse AE Contractive AE Variational AE t-SNE GAN 22 / 41
Recommend
More recommend