Cheap Orthogonal Constraints in Neural Networks: A Simple - PowerPoint PPT Presentation

Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group Mario Lezcano-Casado David Martínez-Rubio Mathematical Institute Department of Computer Science June 12, 2019

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints We study the optimization of neural networks with orthogonal constraints B ∈ R n × n , B ⊺ B = I Visit our poster (#27 on Wednesday) 1 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints We study the optimization of neural networks with orthogonal constraints B ∈ R n × n , B ⊺ B = I Motivation: Visit our poster (#27 on Wednesday) 1 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints We study the optimization of neural networks with orthogonal constraints B ∈ R n × n , B ⊺ B = I Motivation: ◮ Orthogonal matrices have eigenvalues with norm 1 . Visit our poster (#27 on Wednesday) 1 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints We study the optimization of neural networks with orthogonal constraints B ∈ R n × n , B ⊺ B = I Motivation: ◮ Orthogonal matrices have eigenvalues with norm 1 . ◮ Convenient for exploding and vanishing gradient problems within RNN s. ◮ They constitute a implicit regularization method. Visit our poster (#27 on Wednesday) 1 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints We study the optimization of neural networks with orthogonal constraints B ∈ R n × n , B ⊺ B = I Motivation: ◮ Orthogonal matrices have eigenvalues with norm 1 . ◮ Convenient for exploding and vanishing gradient problems within RNN s. ◮ They constitute a implicit regularization method. ◮ They are the basic building block for matrix factorizations like SVD or QR . Visit our poster (#27 on Wednesday) 1 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints We study the optimization of neural networks with orthogonal constraints B ∈ R n × n , B ⊺ B = I Motivation: ◮ Orthogonal matrices have eigenvalues with norm 1 . ◮ Convenient for exploding and vanishing gradient problems within RNN s. ◮ They constitute a implicit regularization method. ◮ They are the basic building block for matrix factorizations like SVD or QR . ◮ They allow for the implementation of factorized linear layers. Visit our poster (#27 on Wednesday) 1 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints B ∈ SO( n ) f ( B ) min is equivalent to solving A ∈ Skew( n ) f (exp ( A )) min Visit our poster (#27 on Wednesday) 2 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints B ∈ SO( n ) f ( B ) min is equivalent to solving A ∈ Skew( n ) f (exp ( A )) min � �� constrained problem. unconstrained problem. Visit our poster (#27 on Wednesday) 2 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints B ∈ SO( n ) f ( B ) min is equivalent to solving A ∈ Skew( n ) f (exp ( A )) min � �� constrained problem. unconstrained problem. ◮ The matrix exponential maps skew-symmetric matrices to orthogonal matrices . Visit our poster (#27 on Wednesday) 2 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints B ∈ SO( n ) f ( B ) min is equivalent to solving A ∈ Skew( n ) f (exp ( A )) min � �� constrained problem. unconstrained problem. ◮ The matrix exponential maps skew-symmetric matrices to orthogonal matrices . ◮ Compute the exponential to optimize over the unconstrained space of skew symmetric matrices. Visit our poster (#27 on Wednesday) 2 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints B ∈ SO( n ) f ( B ) min is equivalent to solving A ∈ Skew( n ) f (exp ( A )) min � �� constrained problem. unconstrained problem. ◮ The matrix exponential maps skew-symmetric matrices to orthogonal matrices . ◮ Compute the exponential to optimize over the unconstrained space of skew symmetric matrices. ◮ No orthogonality needs to be enforced. Visit our poster (#27 on Wednesday) 2 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints B ∈ SO( n ) f ( B ) min is equivalent to solving A ∈ Skew( n ) f (exp ( A )) min � �� constrained problem. unconstrained problem. ◮ The matrix exponential maps skew-symmetric matrices to orthogonal matrices . ◮ Compute the exponential to optimize over the unconstrained space of skew symmetric matrices. ◮ No orthogonality needs to be enforced. ◮ It has negligible overhead in your neural network. Visit our poster (#27 on Wednesday) 2 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints B ∈ SO( n ) f ( B ) min is equivalent to solving A ∈ Skew( n ) f (exp ( A )) min � �� constrained problem. unconstrained problem. ◮ The matrix exponential maps skew-symmetric matrices to orthogonal matrices . ◮ Compute the exponential to optimize over the unconstrained space of skew symmetric matrices. ◮ No orthogonality needs to be enforced. ◮ It has negligible overhead in your neural network. ◮ General purpose optimizers can be used ( SGD , ADAM , ADAGRAD , . . . ). Visit our poster (#27 on Wednesday) 2 4

Cheap Orthogonal Constraints in Neural Networks Optimization with orthogonal constraints B ∈ SO( n ) f ( B ) min is equivalent to solving A ∈ Skew( n ) f (exp ( A )) min � �� constrained problem. unconstrained problem. ◮ The matrix exponential maps skew-symmetric matrices to orthogonal matrices . ◮ Compute the exponential to optimize over the unconstrained space of skew symmetric matrices. ◮ No orthogonality needs to be enforced. ◮ It has negligible overhead in your neural network. ◮ General purpose optimizers can be used ( SGD , ADAM , ADAGRAD , . . . ). ◮ No new extremal points are created in the main parametrization region. Visit our poster (#27 on Wednesday) 2 4

Cheap Orthogonal Constraints in Neural Networks 0 . 020 Baseline EURNN LSTM scoRNN expRNN 0 . 015 Cross entropy 0 . 010 0 . 005 0 . 000 0 500 1000 1500 2000 2500 3000 3500 4000 Iterations Cross entropy in the copying problem for L = 2000 . The copying problem uses synthetic data of the form: Random numbers Wait for L steps Recall Input: 14221 ------ :---- Output: ----- ------ 14221 Visit our poster (#27 on Wednesday) 3 4

Cheap Orthogonal Constraints in Neural Networks M ODEL # PARAM V ALID . T EST N 224 ≈ 83 K 5 . 34 5 . 30 EXPRNN 322 ≈ 135 K 4 . 42 4 . 38 EXPRNN 425 ≈ 200 K 5 . 52 5 . 48 EXPRNN 224 ≈ 83 K 9 . 26 8 . 50 SCORNN 322 ≈ 135 K 8 . 48 7 . 82 SCORNN 425 ≈ 200 K 7 . 97 7 . 36 SCORNN ≈ 83 K 15 . 42 14 . 30 LSTM 84 120 ≈ 135 K 13 . 93 12 . 95 LSTM 158 ≈ 200 K 13 . 66 12 . 62 LSTM 158 ≈ 83 K 15 . 57 18 . 51 EURNN ≈ 135 K 15 . 90 15 . 31 EURNN 256 378 ≈ 200 K 16 . 00 15 . 15 EURNN RGD 128 ≈ 83 K 15 . 07 14 . 58 192 ≈ 135 K 15 . 10 14 . 50 RGD ≈ 200 K 14 . 96 14 . 69 RGD 256 RNN s trained on a speech prediction task on the TIMIT dataset. It shows the best validation MSE accuracy. Visit our poster (#27 on Wednesday) 4 4

Cheap Orthogonal Constraints in Neural Networks: A Simple - PowerPoint PPT Presentation

Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group Mario Lezcano-Casado David Martnez-Rubio Mathematical Institute Department of Computer Science June 12, 2019 Cheap Orthogonal

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Orthogonal Complements and Orthonormal Matrices Orthogonal Complements Defn. For a set W , the

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Orthogonal range searching Orthogonal range searching Problem: Given a set of n points Orthogonal

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

nouvelle mthode de placement MSc Jacques VERCRUYSSE GEO-GREEN sprl-bvba Cheap-GSHPs (Cheap and

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Latin Squares and Orthogonal Arrays Lucia Moura School of Electrical Engineering and Computer

Classification of self-orthogonal F q + u F q -codes Classification of self-orthogonal F q + u F q

Designs of Orthogonal Filter Banks and Orthogonal Cosine-Modulated Filter Banks Jie Yan

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Quiz Let W be a vector space, and let U be a subspace of W . Define the orthogonal complement

Final exam: 68:50pm in Clough 152 Cumulative final covers the whole class pretty evenly.

Lecture 3.1: Fourier series and orthogonality Matthew Macauley Department of Mathematical

Dirichlet problems for Ornstein-Uhlenbeck operators in subsets of Hilbert spaces Alessandra

Relations among partitions. I: Partitions of a finite set R. A. Bailey University of St Andrews

Inner Product Spaces and Orthogonality Mongi BLEL King Saud University August 30, 2019 Mongi

Sets of Orthogonal Hypercubes Gary L. Mullen Penn State University mullen@math.psu.edu Dec.

Asymptotics of orthogonal polynomials in normal matrix ensemble Seung-Yeop Lee (University of