Analyzing Backprop 3-4-16 Reading Quiz Q1: If a neural network has - PowerPoint PPT Presentation

Analyzing Backprop 3-4-16

Reading Quiz Q1: If a neural network has 3 layers with 10 input, 6 hidden, and 8 output units, what is the dimension of backpropagation’s local search space? a) 10 + 6 + 8 = 24 b) 10 + 6 * 8 = 58 c) 10 * 6 + 6 * 8 = 108 d) 10 * 6 + 10 * 8 + 6 * 8 = 188 e) 10 * 6 * 8 = 480

Reading Quiz Q2: An arbitrary function can be approximated by a neural network with ____ (non-input) layers. a) 1 b) 2 c) 3 d) 4 e) infinite

Backpropagation Review for 1:epochs for each example in training_data: run example through network compute error for each output node for each layer (starting from output): for each node in layer: update_weights(node)

Updating weights for each incoming edge i: if node is in the output layer: if node is in a hidden layer: all nodes in the next layer

Local search issues Backpropagation is performing local search in a high-dimensional space. Like other local search methods, it can get stuck in: ● Local minima ● Plateaus High dimensionality helps a bit, because it’s hard to be at a local minimum in every dimension simultaneously.

Local search improvements We can use the techniques we already know for improving local search. ● random moves ○ We’re already doing this (by randomly ordering training examples on each epoch). ○ Non-random moves would mean computing average error over all training examples before doing a backpropagation step. ● random restarts ○ In conx , the function n.reset() gives new random initial weights. ● momentum ○ Keep moving in the same direction:

Overfitting Don’t just run n.train() !!! This will learn the training data perfectly and fit the test data badly. Possible solutions: ● Weight decay: dampen all weights by some small factor every round. ● Learn with targets of 0.1 and 0.9 instead of 0 and 1. ● Cross validation: split into training and test sets; stop training when performance stops improving on the test set .

Output representation For classification: ● Round the output sigmoids (treat them as thresholds). ● 1-of-n is better than more compact representations. Why? For regression: ● Sigmoid output is continuous, but bounded between 0 and 1. ● Normalize the targets to the range [0,1] before training. For dimensionality reduction: ● Throw away the output layer and make the hidden units the output.

A perspective from 15 years ago ● Backpropagation is extremely slow to converge and requires tons of input data on networks with many hidden layers. ● Having multiple hidden layers makes the network hard to interpret. ● A 3-layer network can represent any function. ● Why bother with deep (many-layer) networks?

A more recent perspective ● Shallow networks with huge hidden layers make the learning problem harder. ● We can use GPU parallelization to speed up training. ● If we need tons of data, we can get it. ● We can set backpropagation up for success by how we design the network.

Deep Learning Convolutional neural networks ○ Hidden layer units connected to only a small subset of the previous layer. ○ Connections have spatial locality (input from several nearby pixels). ○ These hidden units “convolve” the input (like a blurring filter). Deep belief networks ○ Unsupervised pre-training of hidden layers (like the encoder example). ○ Use weight reduction or smaller layers to avoid exact matching. ○ Puts the backprop starting point in a good region of weight space.

Analyzing Backprop 3-4-16 Reading Quiz Q1: If a neural network has - PowerPoint PPT Presentation

Analyzing Backprop 3-4-16 Reading Quiz Q1: If a neural network has 3 layers with 10 input, 6 hidden, and 8 output units, what is the dimension of backpropagations local search space? a) 10 + 6 + 8 = 24 b) 10 + 6 * 8 = 58 c) 10 * 6 + 6 *

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

Beyond Backprop: Online Alternating Minimization with Auxiliary Variables NYU IBM Sadhana

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Jules Hedges 1 Bundles

Machine Learning 2 DS 4420 - Spring 2020 Neural Networks & backprop Byron C Wallace Neural

Neural Networks Learning the network: Backprop 11-785, Spring 2020 Lecture 4 1 Recap: The MLP

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

Backpropagation and Gradients Agenda Motivation Backprop Tips & Tricks

The time complexity of Backprop; Auto-Diff; and the Baur-Strassen theorem Instructor: Sham Kakade

Scalable natural gradient using probabilistic models of backprop Roger Grosse Overview

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv

Lecture 7: Neural Nets Mark Hasegawa-Johnson ECE 417: Multimedia Signal Processing, Fall 2020

Neural Networks Learning the network: Backprop 11-785, Spring 2018 Lecture 4 1 Design exercise

t r

Dening neural networks with Keras IN TRODUCTION TO TEN S ORF LOW IN P YTH ON Isaiah Hull

Main Result samples 1 , , ReLU Main Theorem 1

Heavy vy Flavor in Small Systems Alexandre Lebedev (Iowa State University) for the PHENIX

CSSE463: Image Recognition Day 18 Upcoming schedule: Lightning talks shortly Midterm

Descriptive Image Paragraphs Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

Advanced Machine Learning Course IV - (Hierarchical) Clustering L. Omar Chehab (1) and Frdric

Briefing on Management of Low-Level Waste, High-Level Waste, and Spent Nuclear Fuel September

Sambuz

Useful Links

Newsletter

Mail Us

Analyzing Backprop 3-4-16 Reading Quiz Q1: If a neural network has - PowerPoint PPT Presentation

Analyzing Backprop 3-4-16 Reading Quiz Q1: If a neural network has 3 layers with 10 input, 6 hidden, and 8 output units, what is the dimension of backpropagations local search space? a) 10 + 6 + 8 = 24 b) 10 + 6 * 8 = 58 c) 10 * 6 + 6 *

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

Beyond Backprop: Online Alternating Minimization with Auxiliary Variables NYU IBM Sadhana

Bundles, Lenses &amp; Machine Learning Motivation Backprop as Functor Jules Hedges 1 Bundles

Machine Learning 2 DS 4420 - Spring 2020 Neural Networks &amp; backprop Byron C Wallace Neural

Neural Networks Learning the network: Backprop 11-785, Spring 2020 Lecture 4 1 Recap: The MLP

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

Backpropagation and Gradients Agenda Motivation Backprop Tips &amp; Tricks

The time complexity of Backprop; Auto-Diff; and the Baur-Strassen theorem Instructor: Sham Kakade

Scalable natural gradient using probabilistic models of backprop Roger Grosse Overview

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv

Lecture 7: Neural Nets Mark Hasegawa-Johnson ECE 417: Multimedia Signal Processing, Fall 2020

Neural Networks Learning the network: Backprop 11-785, Spring 2018 Lecture 4 1 Design exercise

t r

Dening neural networks with Keras IN TRODUCTION TO TEN S ORF LOW IN P YTH ON Isaiah Hull

Main Result samples 1 , , ReLU Main Theorem 1

Heavy vy Flavor in Small Systems Alexandre Lebedev (Iowa State University) for the PHENIX

CSSE463: Image Recognition Day 18 Upcoming schedule: Lightning talks shortly Midterm

Descriptive Image Paragraphs Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

Advanced Machine Learning Course IV - (Hierarchical) Clustering L. Omar Chehab (1) and Frdric

Briefing on Management of Low-Level Waste, High-Level Waste, and Spent Nuclear Fuel September

Sambuz

Useful Links

Newsletter

Mail Us

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Jules Hedges 1 Bundles

Machine Learning 2 DS 4420 - Spring 2020 Neural Networks & backprop Byron C Wallace Neural

Backpropagation and Gradients Agenda Motivation Backprop Tips & Tricks