Neural Network Optimization 1 CS 519: Deep Learning, Winter 2018 - PowerPoint PPT Presentation

Neural Network Optimization 1 CS 519: Deep Learning, Winter 2018 Fuxin Li With materials from Zsolt Kira

Backpropagation learning of a network • The algorithm • 1. Compute a forward pass on the compute graph (DAG) from the input to all the outputs • 2. Backpropagate all the outputs back all the way to the input and collect all gradients • 3. for all the weights in all layers

Modules (Layers) • Each layer can be seen as a module • Given input, return • Output � �� • Network gradient � �� • Gradient of module parameters � �� • During backprop, propagate/update • Backpropagated gradient 𝜖𝐹 𝜖𝑔 � �� 𝑿 � = �� where � ��

The abundance of online layers

Learning Rates • Gradient descent is only guaranteed to converge with small enough learning rates • So that’s a sign you should decrease your learning rate if it explodes • Example: � • � � • Learning rate of • • •

Weight decay regularization • Instead of using a normal step, add a • This corresponds to: � 1 + 1 2 𝜇 𝐗 � min 𝑂 � 𝑚(𝑔 𝑦 � ; 𝐗 , 𝑧 � ) 𝐗 �� • Early stopping as well! • Help generalization

Momentum • Basic updating equation (with momentum): � �� • , a lot of “inertia” in optimization • Check the previous example with a momentum of 0.5

Normalization color indicates training case w 1 w 2 • Each component to 0 mean, 1 standard deviation • For ease of L2 regularization + optimization convergence rates 101, 101 0.1, 10 101, 99 0.1, -10 1, 1 1, -1 1, 1 1, -1

Computing the energy function and gradient • Usual ERM energy function � min � 𝐹 𝑔 = � 𝑀(𝑔 𝑦 � ; 𝑋 , 𝑧 � ) �� 𝛼 � 𝐹 = � 𝜖𝑀(𝑔 𝑦 � ; 𝑋 , 𝑧 � ) 𝜖𝑋 �� • One problem: • Very slow to compute when is large • One gradient step takes long time! • Approximate?

Stochastic Mini-batch Approximation � min � 𝐹 𝑔 = � 𝑀(𝑔 𝑦 � ; 𝑋 , 𝑧 � ) �� 𝛼 � 𝐹 = � 𝜖𝑀(𝑔 𝑦 � ; 𝑋 , 𝑧 � ) 𝜖𝑋 �� ≈ � 𝜖𝑀(𝑔 𝑦 � ; 𝑋 , 𝑧 � ) 𝛼 � 𝐹 � 𝜖𝑋 �∈� � • Ensure the expectation is the same � = 𝛼 � 𝐹 𝔽 𝛼 � 𝐹 • Uniformly sample every time • Sample how many? 1 (SGD) – 256 (Mini-batch SGD) • Common mini-batch size is 32-256 • In practice: dependent on GPU memory size

In Practice • Randomly re-arrange the input examples • Use a fixed order on the input examples • Define an iteration to be every time the gradient is computed • An epoch to be every time that all the input examples is looped through once Iteration Iteration Iteration Data Epoch

A practical run of training a neural network • Check: • Energy • Training error • Validation error

Data Augmentation • Create artificial data to increase the size of the dataset • Example: Elastic deformations on MNIST

Data Augmentation Horizontal Flip 224x224 224x224 224x224 224x224 256x256 224x224 224x224 Training Image

Data Augmentation Horizontal Flip • One of the easiest ways to prevent overfitting is to augment the dataset 224x224 224x224 224x224 224x224 256x256 224x224 224x224 Training Image

CIFAR-10 dataset • 60,000 images in 10 classes • 50,000 training • 10,000 test • Designed to mimic MNIST • 32x32 • Assignment (will post on Canvas with more explicity): • Write your own backpropagation NN to test on CIFAR-10

Neural Network Optimization 1 CS 519: Deep Learning, Winter 2018 - PowerPoint PPT Presentation

Neural Network Optimization 1 CS 519: Deep Learning, Winter 2018 Fuxin Li With materials from Zsolt Kira Backpropagation learning of a network The algorithm 1. Compute a forward pass on the compute graph (DAG) from the input to all the

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

CHAPTER IV IV CHAPTER Combinatorial Optimization Combinatorial Optimization by Neural Networks

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Neural network applications ALVINN (Pomerleau, mid 1990s) Autonomous Land Vehicle in Neural

Deep Learning Primer Nishith Khandwala Neural Networks Overview Neural Network Basics

Bea eam pos osition on mon onitor or system em of of KA KAGR GRA Phot oton on

Introduction to Computer Graphics Animation (2) May 30, 2019 Kenshi Takayama

Deformable Bodies Deformation rest space deformed space x p ( x ) Given a rest shape x and

On the cubic instability in the Q-tensor theory of nematic liquid crystals Arghir Zarnescu

Cloth Simulation CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2019 Cloth

glasses J.-M. Delaye 1 with the contributions of S. Ispas 2 , L.-H. Kieu 1 , D. Kilymis 1,2 , S.

Introduction to Quantum Collision Theory Pierre Capel 16 July 2015 1 / 30 Quantum Collisions

Fredericksburg High School Suborbital Aeroscience Studies Raising The Bar of Technology