Understanding Geometric Attributes with Autoencoders Alasdair Newson - PowerPoint PPT Presentation

Understanding Geometric Attributes with Autoencoders Alasdair Newson Télécom ParisTech alasdair.newson@telecom-paristech.fr April 3, 2019 1 / 57

Subject of this talk Understanding geometric attributes of images with Autoencoders Encoder Decoder Subjects of this talk Understand how autoencoders can encode/decode basic geometric 1 attributes Size Position Propose an autoencoder algorithm which effectively separates different 2 image attributes in the latent space PCA-like autoencoder Encourage meaningful interpolation and navigation of the latent space 2 / 57

Collaborators This work was carried out in collaboration with the following colleagues Saïd Ladjal Andrés Almansa Chi-Hieu Pham Yann Gousseau (T élécom ParisT ech) (Université Paris Descartes) (T élécom ParisT ech) (T élécom ParisT ech) 3 / 57

Introduction Autoencoders - introduction Deep neural networks Cascaded operations : filtering, non-linearities Great flexibility : approximate a large class of functions Autoencoder : neural network designed for compressing and uncompressing data (ongoing) Goal(s) of this work ? Describe the mechanisms autoencoders use to encode/decode simple geometric shapes Propose an autoencoder architecture/algorithm where the latent space is interpretable Meaningful interpolation, navigation of latent space 4 / 57

Autoencoders, introduction What are autoencoders ? Autoencoder (AE) : neural network which compresses (encoding) and decompresses (decoding) some input information Encoder Decoder Often uses convolution and subsampling/upsampling Underlying goal : learn the data manifold/space 5 / 57

Introduction What are autoencoders used for ? Synthesis of high-level/abstract images Synthesis examples from “Real NVP” † Produce impressive results, however, autoencoder mechanisms not necessarily understandood Our work attempts to understand underlying mechanisms, and create interpretable latent spaces † Density estimation using Real NVP , L. Dinh, J. Sohl-Dickstein, S. Bengio, arXiv:1605.08803 2016 6 / 57

Summary Autoencoding size (disks) 1 Autoencoding Position 2 PCA-like Autoencoder 3 Applications and future work 4 7 / 57

Disk autoencoder : problem setup Autoencoding size Can AEs encode and decode a disk “optimally”; if so, how ? Training set : square, disk, images of size 64 × 64 Blurred slightly to avoid discrete parameterisation Each image contains one centred disk of random radius r Optimality : perfect reconstruction : x = D ◦ E ( x ) , with smallest d possible ( d = 1 ) 8 / 57

Disk autoencoder : problem setup Disk autoencoder design Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Subsampling Subsampling Subsampling Subsampling Subsampling Subsampling Bias Bias Bias Bias Bias Bias LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Upsampling Upsampling Upsampling Upsampling Upsampling Upsampling Bias Bias Bias Bias Bias Bias LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu Four operations : convolution, sub/up-sampling, additive biases, Leaky ReLU : � t , if t > 0 φ α ( t ) = αt , if t ≤ 0 Number of layers determined by subsampling factor s = 1 2 9 / 57

Disk autoencoder Disk autoencoding training minimisation problem Θ E , ˆ ˆ � D ◦ E ( x r ) − x r � 2 � Θ D = arg min (1) 2 Θ E , Θ D x r Θ E , Θ D : parameters of the network (weights and biases) x r : image containing disk of radius r NB : We do not enter into the minimisation details here (Adam optimiser) 10 / 57

Investigating autoencoders First question, can we compress disks to 1 dimension ? Yes ! Input ( x ) Output ( y ) Let us try to understand how this works 11 / 57

Investigating autoencoders How does the autoencoder work in the case of disks ? First idea, inspect network weights ; Unfortunately, very difficult to interpret; Example of weights ( 3 × 3 convolutions) 12 / 57

Investigating autoencoders How does the encoder work : inspect the latent space 17 . 5 15 . 0 12 . 5 z 10 . 0 7 . 5 5 . 0 2 . 5 0 . 0 0 5 10 15 20 25 30 35 r Encoding relatively simple to understand : averaging filter How about decoding ? Inspecting weights and biases is tricky We can describe the decoding function when we remove the biases (ablation study) 13 / 57

Decoding a disk Ablation study : remove biases of the network Input Output 1 . 2 1 . 2 1 . 2 1 . 2 1 . 2 1 . 2 1 . 0 1 . 0 1 . 0 1 . 0 1 . 0 1 . 0 0 . 8 0 . 8 0 . 8 0 . 8 0 . 8 0 . 8 Disk profile Disk profile Disk profile Disk profile y ( t ) y ( t ) y ( t ) y ( t ) y ( t ) y ( t ) 0 . 6 0 . 6 0 . 6 0 . 6 0 . 6 0 . 6 Output profile Output profile Output profile Output profile 0 . 4 0 . 4 0 . 4 0 . 4 0 . 4 0 . 4 Disk profile Disk profile 0 . 2 0 . 2 0 . 2 0 . 2 0 . 2 0 . 2 Output profile Output profile 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 t (spatial position) t (spatial position) t (spatial position) t (spatial position) t (spatial position) t (spatial position) 14 / 57

Investigating autoencoders Positive Multiplicative Action of the Decoder Without Bias Consider a decoder, without biases, with D ℓ +1 = LeakyReLU α � U ( D ℓ ) ∗ w ℓ � , where U is an upsampling operator. In this case, we have ∀ z, ∀ λ ∈ R + , D ( λz ) = λD ( z ) . (2) 15 / 57

Investigating autoencoders Positive Multiplicative Action of the Decoder Without Bias Consider a decoder, without biases, with D ℓ +1 = LeakyReLU α � U ( D ℓ ) ∗ w ℓ � , where U is an upsampling operator. In this case, we have ∀ z, ∀ λ ∈ R + , D ( λz ) = λD ( z ) . (2) � U ( λz ) ∗ w ℓ � = λ max � U ( z ) ∗ w ℓ , 0 � + λα min � U ( z ) ∗ w ℓ , 0 � D ( λz ) = LeakyReLU α � U ( z ) ∗ w ℓ � = λ LeakyReLU α = λD ( z ) . Output can be written y = h ( r ) f , with f learned during training In the case without bias, we can rewrite the training problem in a simpler form 15 / 57

Decoding a disk Disk autoencoding training problem (continuous case), without biases � R � f, ✶ B r � 2 dr ˆ f = arg max (3) 0 f Proof : The continuous training minimisation problem can be written as � R � ( h ( r ) f ( t ) − ✶ B r ( t )) 2 dt dr f, ˆ ˆ h = arg min (4) 0 Ω f,h Also, for a fixed f , the optimal h is given by h ( r ) = � f, ✶ B r � ˆ (5) � f � 2 2 16 / 57

Decoding a disk We insert the optimal ˆ h ( r ) , and choose the (arbitrary) normalisation � f � 2 2 = 1 Since the disks are radially symmetric , the integration can be simplified to one dimension This gives us the final result : � R ˆ − � f, ✶ B r � 2 f = arg min (6) 2 dr 0 f � R � f, ✶ B r � 2 = arg max (7) 2 dr. 0 f 17 / 57

Decoding a disk We insert the optimal ˆ h ( r ) , and choose the (arbitrary) normalisation � f � 2 2 = 1 Since the disks are radially symmetric , the integration can be simplified to one dimension This gives us the final result : � R ˆ − � f, ✶ B r � 2 f = arg min (6) 2 dr 0 f � R � f, ✶ B r � 2 = arg max (7) 2 dr. 0 f The first variation of the functional in Equation (3) leads to a differential equation, Airy’s equation. f ′′ ( ρ ) = − kf ( ρ ) ρ, (8) with f (0) = 1 , f ′ (0) = 0 17 / 57

Decoding a disk The functional is indeed minimised by the training procedure; Comparison of autoencoder, numerical minimisation and Airy’s equation 1 . 0 0 . 8 0 . 6 f ( t ) 0 . 4 0 . 2 Result of autoencoder Numerical minimisation of energy Airy’s function 0 . 0 0 5 10 15 20 25 30 t 18 / 57

Decoding a disk Summary of disk encoder/decoder Encoder : integration (averaging filter) sufficient Decoder : a function learned, scaled and thresholded Further work : apply to general scaling Scaled mnist data 19 / 57

Decoding a disk Further questions What happens when samples are missing from the database ? Image synthesis results of “Real NVP” † Is it possible to interpolate in the latent space ? † Density estimation using Real NVP , L. Dinh, J. Sohl-Dickstein, S. Bengio, arXiv:1605.08803 2016 20 / 57

Investigating autoencoders Interpolation of disks in the learned space Effect of linearly increasing z Interpolation in the latent space is meaningful here What about interpolating inside unobserved regions in data set ? 21 / 57

Interpolation Interpolating disks We trained our AE with missing radii of 11-18 pixels Input Output 22 / 57

Interpolation What is this due to ? Inspect latent space 35 30 25 20 r 15 10 5 0 − 8 − 6 − 4 − 2 0 2 z How can this be remedied ? Regularisation of latent space 23 / 57

Interpolation Various regularisation approaches available Maintaining norm between objects in latent space Denoising AEs etc. Regularising weights 24 / 57

Interpolation Various regularisation approaches available Maintaining norm between objects in latent space Denoising AEs etc. Regularising weights ℓ 2 -regularisation in latent space (type 1) ( � x − x ′ � 2 2 − � E ( x ) − E ( x ′ ) � 2 2 ) 2 Denoising autoencoder (type 2) � L ℓ =1 � D ( E ( x + η )) − x � 2 2 Weight regularisation, of encoder (type 3) � L ℓ =1 � W ℓ � 2 2 24 / 57

Understanding Geometric Attributes with Autoencoders Alasdair Newson - PowerPoint PPT Presentation

Understanding Geometric Attributes with Autoencoders Alasdair Newson Tlcom ParisTech alasdair.newson@telecom-paristech.fr April 3, 2019 1 / 57 Subject of this talk Understanding geometric attributes of images with Autoencoders Encoder

61A Lecture 16 Terminology: Python object system: Functions are objects. Wednesday, October 3

Data Examples Announcements Examples: Objects Land Owners Instance attributes are found before

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Introduction to Data Science: Principles ordered categorical data do not have magnitude

From E/R Diagrams to Relations Entity set relation Attributes attributes

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

Descriptor Codes with Attributes Descriptor Codes with Attributes Oscar R. Cantu August 2009

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and

61A Lecture 16 Wednesday, October 3 Terminology: Attributes, Functions, and Methods 2

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

Adversarially Regularized Autoencoders Junbo (Jake) Zhao, Yoon Kim, Kelly Zhang, Alexander M.

Educating Text Autoencoders: Latent Representation Guidance via Denoising Tianxiao Shen Jonas

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

Hierarchical Importance Weighted Autoencoders Chin-Wei Huang Kris Sankaran Eeshan Dhekane

Computational steps towards unwrapping the role of myelin in olfactory signal processing Justjn

John Rubenstein, UCSF BBRF Webinar May 13, 2014 Hypothesis: Many Psychiatric Disorders Are

No conflicts of interest related to this Exploring Brain Connectivity in Autism presentation and

Figure 4.1 Drawing hands by MC Escher Figure 1.2 The brain viewed from above, with right hemisphere

Introduction Image Analysis & Computer Vision Guido Gerig CS/BIOEN 6640 FALL 2012 Courses

Functional data analysis with the refund package Philip T. Reiss University of Haifa

Vision ``to know what is where, by looking. CMSC 426: Image Processing (Marr).

Brain Network Analysis with Pluto Micah Chambers Laboratory of Neuro Imaging Graph Theory and

Understanding Geometric Attributes with Autoencoders Alasdair Newson - PowerPoint PPT Presentation

Understanding Geometric Attributes with Autoencoders Alasdair Newson Tlcom ParisTech alasdair.newson@telecom-paristech.fr April 3, 2019 1 / 57 Subject of this talk Understanding geometric attributes of images with Autoencoders Encoder

61A Lecture 16 Terminology: Python object system: Functions are objects. Wednesday, October 3

Data Examples Announcements Examples: Objects Land Owners Instance attributes are found before

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Introduction to Data Science: Principles ordered categorical data do not have magnitude

From E/R Diagrams to Relations Entity set relation Attributes attributes

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

Descriptor Codes with Attributes Descriptor Codes with Attributes Oscar R. Cantu August 2009

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and

61A Lecture 16 Wednesday, October 3 Terminology: Attributes, Functions, and Methods 2

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

Adversarially Regularized Autoencoders Junbo (Jake) Zhao, Yoon Kim, Kelly Zhang, Alexander M.

Educating Text Autoencoders: Latent Representation Guidance via Denoising Tianxiao Shen Jonas

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

Hierarchical Importance Weighted Autoencoders Chin-Wei Huang Kris Sankaran Eeshan Dhekane

Computational steps towards unwrapping the role of myelin in olfactory signal processing Justjn

John Rubenstein, UCSF BBRF Webinar May 13, 2014 Hypothesis: Many Psychiatric Disorders Are

No conflicts of interest related to this Exploring Brain Connectivity in Autism presentation and

Figure 4.1 Drawing hands by MC Escher Figure 1.2 The brain viewed from above, with right hemisphere

Introduction Image Analysis &amp; Computer Vision Guido Gerig CS/BIOEN 6640 FALL 2012 Courses

Functional data analysis with the refund package Philip T. Reiss University of Haifa

Vision ``to know what is where, by looking. CMSC 426: Image Processing (Marr).

Brain Network Analysis with Pluto Micah Chambers Laboratory of Neuro Imaging Graph Theory and

Introduction Image Analysis & Computer Vision Guido Gerig CS/BIOEN 6640 FALL 2012 Courses