CSCE 496/896 Lecture 5: Autoencoders CSCE 496/896 Lecture 5: Stephen Scott Autoencoders Introduction Basic Idea Stacked AE Stephen Scott Denoising AE Sparse AE Contractive (Adapted from Paul Quint and Ian Goodfellow) AE Variational AE GAN sscott@cse.unl.edu 1 / 34
Introduction CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Autoencoding is training a network to replicate its Introduction input to its output Basic Idea Applications: Stacked AE Denoising AE Unlabeled pre-training for semi-supervised learning Learning embeddings to support information retrieval Sparse AE Generation of new instances similar to those in the Contractive AE training set Variational AE Data compression GAN 2 / 34
Outline CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Basic idea Introduction Stacking Basic Idea Stacked AE Types of autoencoders Denoising AE Denoising Sparse AE Sparse Contractive Contractive AE Variational Variational AE Generative adversarial networks GAN 3 / 34
Basic Idea CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Denoising AE Sparse AE Contractive AE Variational AE Sigmoid activation functions, 5000 training epochs, GAN square loss, no regularization What’s special about the hidden layer outputs? 4 / 34
Basic Idea CSCE 496/896 An autoencoder is a network trained to learn the Lecture 5: Autoencoders identity function: output = input Stephen Scott Subnetwork called Introduction encoder f ( · ) maps input Basic Idea to an embedded Stacked AE Denoising AE representation Sparse AE Subnetwork called Contractive decoder g ( · ) maps back AE Variational AE to input space GAN Can be thought of as lossy compression of input Need to identify the important attributes of inputs to reproduce faithfully 5 / 34
Basic Idea CSCE 496/896 Lecture 5: Autoencoders Stephen Scott General types of autoencoders based on size of hidden layer Introduction Undercomplete autoencoders have hidden layer size Basic Idea smaller than input layer size Stacked AE ⇒ Dimension of embedded space lower than that of input Denoising AE space Sparse AE ⇒ Cannot simply memorize training instances Contractive Overcomplete autoencoders have much larger hidden AE layer sizes Variational AE ⇒ Regularize to avoid overfitting, e.g., enforce a sparsity GAN constraint 6 / 34
Basic Idea Example: Principal Component Analysis CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Denoising AE Sparse AE Contractive AE Variational AE A 3-2-3 autoencoder with linear units and square loss GAN performs principal component analysis : Find linear transformation of data to maximize variance 7 / 34
Stacked Autoencoders CSCE 496/896 Lecture 5: Autoencoders A stacked Stephen Scott autoencoder Introduction has multiple Basic Idea hidden layers Stacked AE Denoising AE Sparse AE Can share parameters to reduce their number by Contractive AE exploiting symmetry: W 4 = W ⊤ 1 and W 3 = W ⊤ 2 Variational AE GAN weights1 = tf.Variable(weights1_init, dtype=tf.float32, name="weights1") weights2 = tf.Variable(weights2_init, dtype=tf.float32, name="weights2") weights3 = tf.transpose(weights2, name="weights3") # shared weights weights4 = tf.transpose(weights1, name="weights4") # shared weights 8 / 34
Stacked Autoencoders Incremental Training CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Denoising AE Sparse AE Contractive AE Can simplify training by starting with single hidden Variational AE layer H 1 GAN Then, train a second AE to mimic the output of H 1 Insert this into first network Can build by using H 1 ’s output as training set for Phase 2 9 / 34
Stacked Autoencoders Incremental Training (Single TF Graph) CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Denoising AE Sparse AE Contractive AE Variational AE GAN Previous approach requires multiple TensorFlow graphs Can instead train both phases in a single graph: First left side, then right 10 / 34
Stacked Autoencoders Visualization CSCE Input MNIST Digit Network Output 496/896 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Denoising AE Sparse AE Contractive AE Variational AE GAN Weights (features selected) for five nodes from H 1 : 11 / 34
Stacked Autoencoders Semi-Supervised Learning CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Denoising AE Sparse AE Contractive AE Variational AE GAN Can pre-train network with unlabeled data ⇒ learn useful features and then train “logic” of dense layer with labeled data 12 / 34
Transfer Learning from Trained Classifier CSCE 496/896 Lecture 5: Can also Autoencoders Stephen Scott transfer from a classifier Introduction trained on Basic Idea different task, Stacked AE e.g., transfer a Denoising AE GoogleNet Sparse AE Contractive architecture to AE ultrasound Variational AE classification GAN Often choose existing one from a model zoo 13 / 34
Denoising Autoencoders Vincent et al. (2010) CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Can train an autoencoder to learn to denoise input by giving input corrupted instance ˜ x and targeting Introduction uncorrupted instance x Basic Idea Stacked AE Example noise models: Denoising AE x = x + z , where z ∼ N ( 0 , σ 2 I ) Gaussian noise: ˜ Sparse AE Masking noise: zero out some fraction ν of Contractive components of x AE Salt-and-pepper noise: choose some fraction ν of Variational AE components of x and set each to its min or max value GAN (equally likely) 14 / 34
Denoising Autoencoders CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Denoising AE Sparse AE Contractive AE Variational AE GAN 15 / 34
Denoising Autoencoders Example CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Denoising AE Sparse AE Contractive AE Variational AE GAN 16 / 34
Denoising Autoencoders CSCE How does it work? 496/896 Lecture 5: Even though, e.g., MNIST data are in a Autoencoders 784-dimensional space, they lie on a low-dimensional Stephen Scott manifold that captures their most important features Introduction Corruption process moves instance x off of manifold Basic Idea Encoder f θ and decoder g θ ′ are trained to project ˜ x back Stacked AE onto manifold Denoising AE Sparse AE Contractive AE Variational AE GAN 17 / 34
Sparse Autoencoders CSCE An overcomplete architecture 496/896 Lecture 5: Regularize outputs of hidden layer to enforce sparsity : Autoencoders Stephen Scott ˜ J ( x ) = J ( x , g ( f ( x ))) + α Ω( h ) , Introduction where J is loss function, f is encoder, g is decoder, Basic Idea h = f ( x ) , and Ω penalizes non-sparsity of h Stacked AE E.g., can use Ω( h ) = � i | h i | and ReLU activation to Denoising AE force many zero outputs in hidden layer Sparse AE Can also measure average activation of h i across Contractive AE mini-batch and compare it to user-specified target Variational AE sparsity value p (e.g., 0.1) via square error or GAN Kullback-Leibler divergence : q + ( 1 − p ) log 1 − p p log p 1 − q , where q is average activation of h i over mini-batch 18 / 34
Contractive Autoencoders CSCE 496/896 Lecture 5: Similar to sparse autoencoder, but use Autoencoders Stephen Scott m n � 2 � ∂ h i � � Ω( h ) = Introduction ∂ x j Basic Idea j = 1 i = 1 Stacked AE I.e., penalize large partial derivatives of encoder Denoising AE outputs wrt input values Sparse AE This contracts the output space by mapping input Contractive AE points in a neighborhood near x to a smaller output Variational AE neighborhood near f ( x ) GAN ⇒ Resists perturbations of input x If h has sigmoid activation, encoding near binary and a CE pushes embeddings to corners of a hypercube 19 / 34
Variational Autoencoders CSCE 496/896 Lecture 5: VAE is an autoencoder that is also generative model Autoencoders Stephen Scott ⇒ Can generate new instances according to a probability distribution Introduction E.g., hidden Markov models, Bayesian networks Basic Idea Contrast with discriminative models , which predict Stacked AE classifications Denoising AE Sparse AE Encoder f outputs [ µ , σ ] ⊤ Contractive Pair ( µ i , σ i ) parameterizes AE Gaussian distribution for Variational AE dimension i = 1 , . . . , n GAN Draw z i ∼ N ( µ i , σ i ) Decode this latent variable z to get g ( z ) 20 / 34
Variational Autoencoders Latent Variables CSCE 496/896 Lecture 5: Autoencoders Independence of z dimensions makes it easy to Stephen Scott generate instances wrt complex distributions via decoder g Introduction Latent variables can be thought of as values of Basic Idea attributes describing inputs Stacked AE Denoising AE E.g., for MNIST, latent variables might represent “thickness”, “slant”, “loop closure” Sparse AE Contractive AE Variational AE GAN 21 / 34
Variational Autoencoders Architecture CSCE 496/896 Lecture 5: Autoencoders Stephen Scott Introduction Basic Idea Stacked AE Denoising AE Sparse AE Contractive AE Variational AE GAN 22 / 34
Recommend
More recommend