Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 1 - PowerPoint PPT Presentation

Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 1

Administrative: Project Proposal Due tomorrow, 4/24 on GradeScope 1 person per group needs to submit, but tag all group members Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 2

Administrative: Alternate Midterm See Piazza for form to request alternate midterm time or other midterm accommodations Alternate midterm requests due Thursday! Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 3

Administrative: A2 A2 is out, due Wednesday 5/1 We recommend using Google Cloud for the assignment, especially if your local machine uses Windows Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 4

Where we are now... Computational graphs x s (scores) * hinge L + loss W R Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 22, 2019 5

Where we are now... Neural Networks Linear score function: 2-layer Neural Network x h s W1 W2 10 3072 100 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 6

Where we are now... Convolutional Neural Networks Illustration of LeCun et al. 1998 from CS231n 2017 Lecture 1 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 22, 2019 7

Where we are now... Convolutional Layer activation map 32x32x3 image 5x5x3 filter 32 28 convolve (slide) over all spatial locations 28 32 3 1 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 22, 2019 8

Where we are now... For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: Convolutional Layer activation maps 32 28 Convolution Layer 28 32 3 6 We stack these up to get a “new image” of size 28x28x6! Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 22, 2019 9

Where we are now... Learning network parameters through optimization Landscape image is CC0 1.0 public domain Walking man image is CC0 1.0 public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 22, 2019 10

Where we are now... Mini-batch SGD Loop: 1. Sample a batch of data 2. Forward prop it through the graph (network), get loss 3. Backprop to calculate the gradients 4. Update the parameters using the gradient Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 22, 2019 11

Where we are now... Hardware + Software PyTorch TensorFlow Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 22, 2019 12

Next: Training Neural Networks Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 13

Overview 1. One time setup activation functions, preprocessing, weight initialization, regularization, gradient checking 2. Training dynamics babysitting the learning process, parameter updates, hyperparameter optimization 3. Evaluation model ensembles, test-time augmentation Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 14

Part 1 - Activation Functions - Data Preprocessing - Weight Initialization - Batch Normalization - Babysitting the Learning Process - Hyperparameter Optimization Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 15

Activation Functions Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 16

Activation Functions Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 17

Activation Functions Leaky ReLU Sigmoid tanh Maxout ELU ReLU Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 18

Activation Functions - Squashes numbers to range [0,1] - Historically popular since they have nice interpretation as a saturating “firing rate” of a neuron Sigmoid Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 19

Activation Functions - Squashes numbers to range [0,1] - Historically popular since they have nice interpretation as a saturating “firing rate” of a neuron 3 problems: 1. Saturated neurons “kill” the Sigmoid gradients Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 20

x sigmoid gate What happens when x = -10? What happens when x = 0? What happens when x = 10? Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 21

Activation Functions - Squashes numbers to range [0,1] - Historically popular since they have nice interpretation as a saturating “firing rate” of a neuron 3 problems: 1. Saturated neurons “kill” the Sigmoid gradients 2. Sigmoid outputs are not zero-centered Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 22

Consider what happens when the input to a neuron is always positive... What can we say about the gradients on w ? Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 23

Consider what happens when the input to a neuron is always positive... allowed gradient update directions zig zag path allowed gradient update directions hypothetical What can we say about the gradients on w ? optimal w vector Always all positive or all negative :( Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 24

Consider what happens when the input to a neuron is always positive... allowed gradient update directions zig zag path allowed gradient update directions hypothetical What can we say about the gradients on w ? optimal w vector Always all positive or all negative :( (For a single element! Minibatches help) Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 25

Activation Functions - Squashes numbers to range [0,1] - Historically popular since they have nice interpretation as a saturating “firing rate” of a neuron 3 problems: 1. Saturated neurons “kill” the Sigmoid gradients 2. Sigmoid outputs are not zero-centered 3. exp() is a bit compute expensive Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 26

Activation Functions - Squashes numbers to range [-1,1] - zero centered (nice) - still kills gradients when saturated :( tanh(x) [LeCun et al., 1991] Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 27

- Computes f(x) = max(0,x) Activation Functions - Does not saturate (in +region) - Very computationally efficient - Converges much faster than sigmoid/tanh in practice (e.g. 6x) ReLU (Rectified Linear Unit) [Krizhevsky et al., 2012] Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 28

- Computes f(x) = max(0,x) Activation Functions - Does not saturate (in +region) - Very computationally efficient - Converges much faster than sigmoid/tanh in practice (e.g. 6x) - Not zero-centered output ReLU (Rectified Linear Unit) Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 29

- Computes f(x) = max(0,x) Activation Functions - Does not saturate (in +region) - Very computationally efficient - Converges much faster than sigmoid/tanh in practice (e.g. 6x) - Not zero-centered output - An annoyance: ReLU (Rectified Linear Unit) hint: what is the gradient when x < 0? Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 30

Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 1 - PowerPoint PPT Presentation

Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 1 Administrative: Project Proposal Due tomorrow, 4/24 on GradeScope 1 person per

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Results April 25, 2019 Tarkett Q1 2019 Financial Results April 25, 2019 1 Q1 2019

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

3/26/2019 WATER DISTRICT SLIDE (1) WILL GO HERE 1 3/26/2019 2 3/26/2019 3 3/26/2019 4

12/4/2019 OSCEOLA RIVER PORT 2019 1 12/4/2019 2 12/4/2019 3 12/4/2019 4 12/4/2019 5

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

A Stepping Stone into your Kernel Moritz

Some Results on Integrable Algorithms X ING -B IAO H U ICMSEC, AMSS, Chinese Academy of Sciences

Economics of Small Scale LNG Shipping: Application for Indonesia and SEA Eduardo Perez

The Positive Grassmannian (from a mathematicians perspective) Lauren K. Williams, UC Berkeley

Saturation of General Clause Sets Corollary 3.36: Let N be a set of general clauses saturated

Nonlinear Control Lecture # 38 Tracking & Regulation Nonlinear Control Lecture # 38 Tracking

Saturation of Sets of General Clauses Corollary 3.27: Let N be a set of general clauses saturated

Subset-Saturated Cost Partitioning for Optimal Classical Planning Jendrik Seipp, Malte Helmert

Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 1 - PowerPoint PPT Presentation

Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - Lecture 7 - April 22, 2019 April 22, 2019 1 Administrative: Project Proposal Due tomorrow, 4/24 on GradeScope 1 person per

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Results April 25, 2019 Tarkett Q1 2019 Financial Results April 25, 2019 1 Q1 2019

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

3/26/2019 WATER DISTRICT SLIDE (1) WILL GO HERE 1 3/26/2019 2 3/26/2019 3 3/26/2019 4

12/4/2019 OSCEOLA RIVER PORT 2019 1 12/4/2019 2 12/4/2019 3 12/4/2019 4 12/4/2019 5

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

A Stepping Stone into your Kernel Moritz

Some Results on Integrable Algorithms X ING -B IAO H U ICMSEC, AMSS, Chinese Academy of Sciences

Economics of Small Scale LNG Shipping: Application for Indonesia and SEA Eduardo Perez

The Positive Grassmannian (from a mathematicians perspective) Lauren K. Williams, UC Berkeley

Saturation of General Clause Sets Corollary 3.36: Let N be a set of general clauses saturated

Nonlinear Control Lecture # 38 Tracking &amp; Regulation Nonlinear Control Lecture # 38 Tracking

Saturation of Sets of General Clauses Corollary 3.27: Let N be a set of general clauses saturated

Subset-Saturated Cost Partitioning for Optimal Classical Planning Jendrik Seipp, Malte Helmert

Nonlinear Control Lecture # 38 Tracking & Regulation Nonlinear Control Lecture # 38 Tracking