Understanding Geometric Attributes with Autoencoders Alasdair Newson Télécom ParisTech alasdair.newson@telecom-paristech.fr April 3, 2019 1 / 57
Subject of this talk Understanding geometric attributes of images with Autoencoders Encoder Decoder Subjects of this talk Understand how autoencoders can encode/decode basic geometric 1 attributes Size Position Propose an autoencoder algorithm which effectively separates different 2 image attributes in the latent space PCA-like autoencoder Encourage meaningful interpolation and navigation of the latent space 2 / 57
Collaborators This work was carried out in collaboration with the following colleagues Saïd Ladjal Andrés Almansa Chi-Hieu Pham Yann Gousseau (T élécom ParisT ech) (Université Paris Descartes) (T élécom ParisT ech) (T élécom ParisT ech) 3 / 57
Introduction Autoencoders - introduction Deep neural networks Cascaded operations : filtering, non-linearities Great flexibility : approximate a large class of functions Autoencoder : neural network designed for compressing and uncompressing data (ongoing) Goal(s) of this work ? Describe the mechanisms autoencoders use to encode/decode simple geometric shapes Propose an autoencoder architecture/algorithm where the latent space is interpretable Meaningful interpolation, navigation of latent space 4 / 57
Autoencoders, introduction What are autoencoders ? Autoencoder (AE) : neural network which compresses (encoding) and decompresses (decoding) some input information Encoder Decoder Often uses convolution and subsampling/upsampling Underlying goal : learn the data manifold/space 5 / 57
Introduction What are autoencoders used for ? Synthesis of high-level/abstract images Synthesis examples from “Real NVP” † Produce impressive results, however, autoencoder mechanisms not necessarily understandood Our work attempts to understand underlying mechanisms, and create interpretable latent spaces † Density estimation using Real NVP , L. Dinh, J. Sohl-Dickstein, S. Bengio, arXiv:1605.08803 2016 6 / 57
Summary Autoencoding size (disks) 1 Autoencoding Position 2 PCA-like Autoencoder 3 Applications and future work 4 7 / 57
Disk autoencoder : problem setup Autoencoding size Can AEs encode and decode a disk “optimally”; if so, how ? Training set : square, disk, images of size 64 × 64 Blurred slightly to avoid discrete parameterisation Each image contains one centred disk of random radius r Optimality : perfect reconstruction : x = D ◦ E ( x ) , with smallest d possible ( d = 1 ) 8 / 57
Disk autoencoder : problem setup Disk autoencoder design Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Subsampling Subsampling Subsampling Subsampling Subsampling Subsampling Bias Bias Bias Bias Bias Bias LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Upsampling Upsampling Upsampling Upsampling Upsampling Upsampling Bias Bias Bias Bias Bias Bias LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu Four operations : convolution, sub/up-sampling, additive biases, Leaky ReLU : � t , if t > 0 φ α ( t ) = αt , if t ≤ 0 Number of layers determined by subsampling factor s = 1 2 9 / 57
Disk autoencoder Disk autoencoding training minimisation problem Θ E , ˆ ˆ � D ◦ E ( x r ) − x r � 2 � Θ D = arg min (1) 2 Θ E , Θ D x r Θ E , Θ D : parameters of the network (weights and biases) x r : image containing disk of radius r NB : We do not enter into the minimisation details here (Adam optimiser) 10 / 57
Investigating autoencoders First question, can we compress disks to 1 dimension ? Yes ! Input ( x ) Output ( y ) Let us try to understand how this works 11 / 57
Investigating autoencoders How does the autoencoder work in the case of disks ? First idea, inspect network weights ; Unfortunately, very difficult to interpret; Example of weights ( 3 × 3 convolutions) 12 / 57
Investigating autoencoders How does the encoder work : inspect the latent space 17 . 5 15 . 0 12 . 5 z 10 . 0 7 . 5 5 . 0 2 . 5 0 . 0 0 5 10 15 20 25 30 35 r Encoding relatively simple to understand : averaging filter How about decoding ? Inspecting weights and biases is tricky We can describe the decoding function when we remove the biases (ablation study) 13 / 57
Decoding a disk Ablation study : remove biases of the network Input Output 1 . 2 1 . 2 1 . 2 1 . 2 1 . 2 1 . 2 1 . 0 1 . 0 1 . 0 1 . 0 1 . 0 1 . 0 0 . 8 0 . 8 0 . 8 0 . 8 0 . 8 0 . 8 Disk profile Disk profile Disk profile Disk profile y ( t ) y ( t ) y ( t ) y ( t ) y ( t ) y ( t ) 0 . 6 0 . 6 0 . 6 0 . 6 0 . 6 0 . 6 Output profile Output profile Output profile Output profile 0 . 4 0 . 4 0 . 4 0 . 4 0 . 4 0 . 4 Disk profile Disk profile 0 . 2 0 . 2 0 . 2 0 . 2 0 . 2 0 . 2 Output profile Output profile 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 t (spatial position) t (spatial position) t (spatial position) t (spatial position) t (spatial position) t (spatial position) 14 / 57
Investigating autoencoders Positive Multiplicative Action of the Decoder Without Bias Consider a decoder, without biases, with D ℓ +1 = LeakyReLU α � U ( D ℓ ) ∗ w ℓ � , where U is an upsampling operator. In this case, we have ∀ z, ∀ λ ∈ R + , D ( λz ) = λD ( z ) . (2) 15 / 57
Investigating autoencoders Positive Multiplicative Action of the Decoder Without Bias Consider a decoder, without biases, with D ℓ +1 = LeakyReLU α � U ( D ℓ ) ∗ w ℓ � , where U is an upsampling operator. In this case, we have ∀ z, ∀ λ ∈ R + , D ( λz ) = λD ( z ) . (2) � U ( λz ) ∗ w ℓ � = λ max � U ( z ) ∗ w ℓ , 0 � + λα min � U ( z ) ∗ w ℓ , 0 � D ( λz ) = LeakyReLU α � U ( z ) ∗ w ℓ � = λ LeakyReLU α = λD ( z ) . Output can be written y = h ( r ) f , with f learned during training In the case without bias, we can rewrite the training problem in a simpler form 15 / 57
Decoding a disk Disk autoencoding training problem (continuous case), without biases � R � f, ✶ B r � 2 dr ˆ f = arg max (3) 0 f Proof : The continuous training minimisation problem can be written as � R � ( h ( r ) f ( t ) − ✶ B r ( t )) 2 dt dr f, ˆ ˆ h = arg min (4) 0 Ω f,h Also, for a fixed f , the optimal h is given by h ( r ) = � f, ✶ B r � ˆ (5) � f � 2 2 16 / 57
Decoding a disk We insert the optimal ˆ h ( r ) , and choose the (arbitrary) normalisation � f � 2 2 = 1 Since the disks are radially symmetric , the integration can be simplified to one dimension This gives us the final result : � R ˆ − � f, ✶ B r � 2 f = arg min (6) 2 dr 0 f � R � f, ✶ B r � 2 = arg max (7) 2 dr. 0 f 17 / 57
Decoding a disk We insert the optimal ˆ h ( r ) , and choose the (arbitrary) normalisation � f � 2 2 = 1 Since the disks are radially symmetric , the integration can be simplified to one dimension This gives us the final result : � R ˆ − � f, ✶ B r � 2 f = arg min (6) 2 dr 0 f � R � f, ✶ B r � 2 = arg max (7) 2 dr. 0 f The first variation of the functional in Equation (3) leads to a differential equation, Airy’s equation. f ′′ ( ρ ) = − kf ( ρ ) ρ, (8) with f (0) = 1 , f ′ (0) = 0 17 / 57
Decoding a disk The functional is indeed minimised by the training procedure; Comparison of autoencoder, numerical minimisation and Airy’s equation 1 . 0 0 . 8 0 . 6 f ( t ) 0 . 4 0 . 2 Result of autoencoder Numerical minimisation of energy Airy’s function 0 . 0 0 5 10 15 20 25 30 t 18 / 57
Decoding a disk Summary of disk encoder/decoder Encoder : integration (averaging filter) sufficient Decoder : a function learned, scaled and thresholded Further work : apply to general scaling Scaled mnist data 19 / 57
Decoding a disk Further questions What happens when samples are missing from the database ? Image synthesis results of “Real NVP” † Is it possible to interpolate in the latent space ? † Density estimation using Real NVP , L. Dinh, J. Sohl-Dickstein, S. Bengio, arXiv:1605.08803 2016 20 / 57
Investigating autoencoders Interpolation of disks in the learned space Effect of linearly increasing z Interpolation in the latent space is meaningful here What about interpolating inside unobserved regions in data set ? 21 / 57
Interpolation Interpolating disks We trained our AE with missing radii of 11-18 pixels Input Output 22 / 57
Interpolation What is this due to ? Inspect latent space 35 30 25 20 r 15 10 5 0 − 8 − 6 − 4 − 2 0 2 z How can this be remedied ? Regularisation of latent space 23 / 57
Interpolation Various regularisation approaches available Maintaining norm between objects in latent space Denoising AEs etc. Regularising weights 24 / 57
Interpolation Various regularisation approaches available Maintaining norm between objects in latent space Denoising AEs etc. Regularising weights ℓ 2 -regularisation in latent space (type 1) ( � x − x ′ � 2 2 − � E ( x ) − E ( x ′ ) � 2 2 ) 2 Denoising autoencoder (type 2) � L ℓ =1 � D ( E ( x + η )) − x � 2 2 Weight regularisation, of encoder (type 3) � L ℓ =1 � W ℓ � 2 2 24 / 57
Recommend
More recommend