Invertible Residual Networks Jens Behrmann * Will Grathwohl* Ricky - PowerPoint PPT Presentation

Invertible Residual Networks Jens Behrmann * Will Grathwohl* Ricky T. Q. Chen David Duvenaud Jörn-Henrik Jacobsen* (*equal contribution)

What are Invertible Neural Networks? Non-invertible Invertible Invertible Neural Networks (INNs) are bijective function approximators which have a forward mapping and an inverse mapping 2 Invertible Residual Networks

Why Invertible Networks? • Mostly known because of Normalizing Flows – Training via maximum-likelihood and evaluation of likelihood Generated samples from GLOW (Kingma et al. 2018) 3 Invertible Residual Networks

Why Invertible Networks? • Generative modeling via invertible mappings with exact likelihoods (Dinh et al. 2014, Dinh et a. 2016, Kingma et al. 2018, Ho et al. 2019) – Normalizing Flows • Mutual information preservation • Analysis and regularization of invariance (Jacobsen et al. 2019) • Memory-efficient backprop (Gomez et al. 2017) • Analyzing inverse problems (Ardizzone et al. 2019) Workshop: Invertible Networks and Normalizing Flows 4 Invertible Residual Networks

Invertible Networks use Exotic Architectures • Dimension partitioning and coupling layers (Dinh et al. 2014/2016, Gomez et al. 2017, Jacobsen et al. 2018, Kingma et al. 2018) – Transforms one part of the input at a time – Choice of partitioning is important 5 Invertible Residual Networks

Invertible Networks use Exotic Architectures • Dimension partitioning and coupling layers (Dinh et al. 2014/2016, Gomez et al. 2017, Jacobsen et al. 2018, Kingma et al. 2018) – Transforms one part of the input at a time – Choice of partitioning is important • Invertible dynamics via Neural ODEs (Chen et al. 2018, Grathwohl et al. 2019) – Requires numerical integration – Hard to tune and often slow due to need of ODE-solver 6 Invertible Residual Networks

Why do we move away from standard architectures? • Partitioning, coupling layers, ODE-based approaches move further away from standard architectures – Many new design choices necessary and not well understood yet • Why not use most successful discriminative architecture? ResNets • Use connection of ResNet and Euler integration of ODEs (Haber et al. 2018) 7 Invertible Residual Networks

Making ResNets invertible Theorem (sufficient condition for invertible residual layer): Let be a residual layer, then it is invertible if where 8 Invertible Residual Networks

Making ResNets invertible Theorem (sufficient condition for invertible residual layer): Let be a residual layer, then it is invertible if where Invertible Residual Networks (i-ResNet) 9 Invertible Residual Networks

i-ResNets: Constructive Proof Theorem: (invertible residual layer) Let be a residual layer, then it is invertible if Proof: Features: Fixed-point equation: 10 Invertible Residual Networks

i-ResNets: Constructive Proof Theorem: (invertible residual layer) Let be a residual layer, then it is invertible if Proof: Features: Fixed-point equation:  Use fixed-point iteration: 11 Invertible Residual Networks

i-ResNets: Constructive Proof Theorem: (invertible residual layer) Let be a residual layer, then it is invertible if Proof: Features: Fixed-point equation:  Use fixed-point iteration:  Guaranteed convergence to x if g contractive (Banach fixed-point theorem) 12 Invertible Residual Networks

Inverting i-ResNets • Inversion method from proof • Fixed-point iteration: – Init: – Iteration: 13 Invertible Residual Networks

Inverting i-ResNets • Inversion method from proof • Fixed-point iteration: – Init: – Iteration: • Rate of convergence depends on Lipschitz constant • In practice: cost of inverse is 5-10 forward passes 14 Invertible Residual Networks

How to build i-ResNets • Satisfy Lip-condition: data-independent upper bound 15 Invertible Residual Networks

How to build i-ResNets • Satisfy Lip-condition: data-independent upper bound • Spectral normalization (Miyato et al. 2018, Gouk et al. 2018) approx of largest singular value via power-iteration 16 Invertible Residual Networks

How to build i-ResNets • Satisfy Lip-condition: data-independent upper bound • Spectral normalization (Miyato et al. 2018, Gouk et al. 2018) approx of largest singular value via power-iteration 17 Invertible Residual Networks

Validation • Reconstructions CIFAR10 Data Reconstructions: i-ResNet Reconstructions: standard ResNet 18 Invertible Residual Networks

Classification Performance • Competetive performance • But what do we get additionally? Generative models via Normalizing Flows 19 Invertible Residual Networks

Maximum-Likelihood Generative Modeling with i-ResNets Gaussian distribution • We can define a simple generative model as Data distribution 20 Invertible Residual Networks

Maximum-Likelihood Generative Modeling with i-ResNets Gaussian distribution • We can define a simple generative model as • Maximization (and evaluation) of likelihood via change-of-variables … if is invertible Data distribution 21 Invertible Residual Networks

Maximum-Likelihood Generative Modeling with i-ResNets Gaussian distribution • Maximization (and evaluation) of likelihood via change-of-variables … if is invertible • Challenges: – Flexible invertible models – Efficient computation of log-determinant Data distribution 22 Invertible Residual Networks

Efficient Estimation of Likelihood • Likelihood with log-determinant of Jacobian • Previous approaches: – exact computation of log-determinant via constraining architecture to be triangular (Dinh et al. 2016, Kingma et al. 2018) – ODE-solver and estimation only of trace of Jacobian (Grathwohl et al. 2019) • We propose an efficient estimator for i-ResNets based on trace-estimation and truncation of a power series 23 Invertible Residual Networks

Generative Modeling Results Data Samples GLOW 24 Invertible Residual Networks

Generative Modeling Results i-ResNets Data Samples GLOW 25 Invertible Residual Networks

Generative Modeling Results GLOW (Kingma et al. 2018) FFJORD (Grathwohl et al. 2019) i-ResNet 26 Invertible Residual Networks

i-ResNets Across Tasks • i-ResNet as an architecture which works well both in discriminative and generative modeling • i-ResNets are generative models which use the best discriminative architecture • Promising for: – Unsupervised pre-training – Semi-supervised learning 27 Invertible Residual Networks

Drawbacks • Iterative inverse – Fast convergence in practice – Rate depends on Lip-constant and not on dimension • Requires estimation of log-determinant – Due to free-form of Jacobian – Properties of i-ResNets allows to design efficient estimator 28 Invertible Residual Networks

Conclusion • Simple modification makes ResNets invertible • Stability is guaranteed by construction • New class of likelihood-based generative models – without structural constraints • Excellent performance in discriminative/ generative tasks – with one unified architecture • Promising approach for: – unsupervised pre-training – semi-supervised learning – tasks which require invertibility 29 Invertible Residual Networks

See us at Poster #11 (Pacific Ballroom) Paper: Code: Follow-up work: Residual Flows for Invertible Generative Modeling Invertible Networks and Normalizing Flows, workshop on Saturday (contributed talk) 30 Invertible Residual Networks

Invertible Residual Networks Jens Behrmann * Will Grathwohl* Ricky - PowerPoint PPT Presentation

Invertible Residual Networks Jens Behrmann * Will Grathwohl* Ricky T. Q. Chen David Duvenaud Jrn-Henrik Jacobsen* (*equal contribution) What are Invertible Neural Networks? Non-invertible Invertible Invertible Neural Networks (INNs) are

Pipeline Strategies and conversations behind securing a Residual Bequest Agenda 1. Why Residual?

Residual Flows for Invertible Generative Modeling Ricky T. Q. Chen, Jens Behrmann, David

Invertible Linear Mappings A mapping L : X Y is called invertible if there exists L 1 : Y

Is the matrix invertible? Matrix A nxn is invertible if there exists B nxn such that AB = BA =

Invertible Convolutional Flow M. Karami , J. Sohl-Dickstein, D. Schuurmans, L. Dinh, D. Duckworth

Proof: The theorem is proven using this scheme: a. j. Definition of invertible. j. d.

2.3 Characterizations of Invertible Matrices Theorem 8 (The Invertible Matrix Theorem) Let A be a

Residual Networks (ResNet) Residual Networks (ResNet) In [1]: import d2l from mxnet import gluon,

Clarifying Residual Flow s for Surface Water Takes August 2017 Clarifying Residual Flow s

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

Highway Networks and Residual Networks Renjie Liao University of Toronto Jan 26, 2016 Renjie

Invertible Harmonic Mappings in the Plane Higher Dimensions Elliptic Operators Giovanni

Hybrid Models with Deep and Invertible Features Eric Nalisnick , Akihiro Matsukawa, Yee Whye

Invertible Objects: An Elementary Introduction to Picard Groups Richard Wong Math Club 2020

SPOT Farm East (Elveden) 2016 Residual Herbicide Demonstration Report Background The urea

Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2017 1 Residual

Re Reverse-Eng Engine neeri ring ng De Deep Re ReLU Ne Networ orks David Rolnick and

Matrix Mechanism and Data Dependent algorithms CompSci 590.03

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

High-speed key encapsulation from NTRU Andreas Hlsing 1 , Joost Rijneveld 2 , John Schanck 3,4 ,

Basic Idea Guess And Determine Determine partial internal state by guessing Use this to reduce

A comparison of pairing-friendly curves at the 192-bit security level Aurore Guillevic Inria

Lecture 7 Spring 2020 Shafi Goldwasser Today: Search for one-way functions 1. Discrete Log

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Residual Networks Jens Behrmann * Will Grathwohl* Ricky - PowerPoint PPT Presentation

Invertible Residual Networks Jens Behrmann * Will Grathwohl* Ricky T. Q. Chen David Duvenaud Jrn-Henrik Jacobsen* (*equal contribution) What are Invertible Neural Networks? Non-invertible Invertible Invertible Neural Networks (INNs) are

Pipeline Strategies and conversations behind securing a Residual Bequest Agenda 1. Why Residual?

Residual Flows for Invertible Generative Modeling Ricky T. Q. Chen, Jens Behrmann, David

Invertible Linear Mappings A mapping L : X Y is called invertible if there exists L 1 : Y

Is the matrix invertible? Matrix A nxn is invertible if there exists B nxn such that AB = BA =

Invertible Convolutional Flow M. Karami , J. Sohl-Dickstein, D. Schuurmans, L. Dinh, D. Duckworth

Proof: The theorem is proven using this scheme: a. j. Definition of invertible. j. d.

2.3 Characterizations of Invertible Matrices Theorem 8 (The Invertible Matrix Theorem) Let A be a

Residual Networks (ResNet) Residual Networks (ResNet) In [1]: import d2l from mxnet import gluon,

Clarifying Residual Flow s for Surface Water Takes August 2017 Clarifying Residual Flow s

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

Highway Networks and Residual Networks Renjie Liao University of Toronto Jan 26, 2016 Renjie

Invertible Harmonic Mappings in the Plane Higher Dimensions Elliptic Operators Giovanni

Hybrid Models with Deep and Invertible Features Eric Nalisnick *, Akihiro Matsukawa*, Yee Whye

Invertible Objects: An Elementary Introduction to Picard Groups Richard Wong Math Club 2020

SPOT Farm East (Elveden) 2016 Residual Herbicide Demonstration Report Background The urea

Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2017 1 Residual

Re Reverse-Eng Engine neeri ring ng De Deep Re ReLU Ne Networ orks David Rolnick and

Matrix Mechanism and Data Dependent algorithms CompSci 590.03

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

High-speed key encapsulation from NTRU Andreas Hlsing 1 , Joost Rijneveld 2 , John Schanck 3,4 ,

Basic Idea Guess And Determine Determine partial internal state by guessing Use this to reduce

A comparison of pairing-friendly curves at the 192-bit security level Aurore Guillevic Inria

Lecture 7 Spring 2020 Shafi Goldwasser Today: Search for one-way functions 1. Discrete Log

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Hybrid Models with Deep and Invertible Features Eric Nalisnick , Akihiro Matsukawa, Yee Whye