Deep Learning (CNNs) Deep Learning Readings: Matt Gormley Murphy - PowerPoint PPT Presentation

10-‑601 ¡Introduction ¡to ¡Machine ¡Learning Machine ¡Learning ¡Department School ¡of ¡Computer ¡Science Carnegie ¡Mellon ¡University Deep ¡Learning (CNNs) Deep ¡Learning ¡Readings: Matt ¡Gormley Murphy ¡28 Bishop ¡-‑-‑ Lecture ¡21 HTF ¡-‑-‑ April ¡05, ¡2017 Mitchell ¡-‑-‑ 1

Reminders • Homework 5 (Part II): ¡Peer ¡Review – Release: ¡Wed, ¡Mar. ¡29 Expectation: ¡You ¡ should ¡spend ¡at ¡most ¡1 ¡ – Due: ¡Wed, ¡Apr. ¡05 ¡at ¡11:59pm hour ¡on ¡your ¡reviews • Peer ¡Tutoring • Homework 7: ¡Deep ¡Learning – Release: ¡Wed, ¡Apr. ¡05 ¡ – Watch for multiple due dates!! 2

BACKPROPAGATION 3

A ¡Recipe ¡for ¡ Background Machine ¡Learning 1. ¡Given ¡training ¡data: 3. ¡Define ¡goal: 2. ¡Choose ¡each ¡of ¡these: – Decision ¡function 4. ¡Train ¡with ¡SGD: (take ¡small ¡steps ¡ opposite ¡the ¡gradient) – Loss ¡function 4

Backpropagation Training Whiteboard – Example: ¡Backpropagation ¡for ¡Calculus ¡Quiz ¡#1 Calculus ¡Quiz ¡#1: Suppose ¡x ¡= ¡2 ¡and ¡z ¡= ¡3, ¡what ¡are ¡dy/dx ¡ and ¡dy/dz for ¡the ¡function ¡below? 5

Backpropagation Training Automatic ¡Differentiation ¡– Reverse ¡Mode ¡(aka. ¡Backpropagation) Forward ¡Computation 1. Write ¡an ¡ algorithm for ¡evaluating ¡the ¡function ¡y ¡= ¡f( x ). ¡The ¡ algorithm ¡defines ¡a ¡ directed ¡acyclic ¡graph , ¡where ¡each ¡variable ¡is ¡a ¡ node ¡(i.e. ¡the ¡“ computation ¡graph” ) 2. Visit ¡each ¡node ¡in ¡ topological ¡order . ¡ For ¡variable ¡u i with ¡inputs ¡v 1 ,…, ¡v N a. Compute ¡u i = ¡g i (v 1 ,…, ¡v N ) b. Store ¡the ¡result ¡at ¡the ¡node Backward ¡Computation 1. Initialize all ¡partial ¡derivatives ¡dy/du j to ¡0 ¡and ¡dy/dy = ¡1. 2. Visit ¡each ¡node ¡in ¡ reverse ¡topological ¡order . ¡ For ¡variable ¡u i = ¡g i (v 1 ,…, ¡v N ) a. We ¡already ¡know ¡dy/du i b. Increment ¡dy/dv j by ¡(dy/du i )(du i /dv j ) (Choice ¡of ¡algorithm ¡ensures ¡computing ¡(du i /dv j ) ¡is ¡easy) Return ¡ partial ¡derivatives ¡dy/du i ¡ for ¡all ¡variables 6

� �� Backpropagation Training Simple Example: The goal is to compute J = �� ( �� ( x 2 ) + 3 x 2 ) on the forward pass and the derivative dJ dx on the backward pass. Forward Backward J = cos ( u ) u = u 1 + u 2 u 1 = sin ( t ) u 2 = 3 t t = x 2 7

Backpropagation Training Simple Example: The goal is to compute J = �� ( �� ( x 2 ) + 3 x 2 ) on the forward pass and the derivative dJ dx on the backward pass. Forward Backward dJ J = cos ( u ) du � = − sin ( u ) dJ � = dJ du du dJ � = dJ du du u = u 1 + u 2 = 1 = 1 , , du 1 du du 1 du 1 du 2 du du 2 du 2 dJ dt � = dJ du 1 du 1 u 1 = sin ( t ) dt = �� ( t ) dt , du 1 dJ dt � = dJ du 2 du 2 u 2 = 3 t dt = 3 dt , du 2 dJ dx � = dJ dt dt t = x 2 dx = 2 x dx, dt 8

Backpropagation Training Output Case ¡1: Logistic ¡ θ 2 θ 3 θ M θ 1 Regression … Input Forward Backward y + (1 − y ∗ ) dJ dy = y ∗ J = y ∗ �� y + (1 − y ∗ ) �� (1 − y ) y − 1 1 �� ( − a ) dJ da = dJ dy da, dy y = da = 1 + �� ( − a ) ( �� ( − a ) + 1) 2 dy D dJ = dJ da , da � a = = x j θ j x j d θ j da d θ j d θ j j =0 dJ = dJ da , da = θ j dx j da dx j dx j 9

Backpropagation Training (F) Loss (E) Output (sigmoid) 1 y = 1+ �� ( − b ) Output (D) Output (linear) b = � D j =0 β j z j … Hidden ¡Layer (C) Hidden (sigmoid) 1 z j = 1+ �� ( − a j ) , ∀ j … Input (B) Hidden (linear) a j = � M i =0 α ji x i , ∀ j (A) Input Given x i , ∀ i 10

Backpropagation Training (F) Loss J = 1 2 ( y − y ∗ ) 2 (E) Output (sigmoid) 1 y = 1+ �� ( − b ) Output (D) Output (linear) b = � D j =0 β j z j … Hidden ¡Layer (C) Hidden (sigmoid) 1 z j = 1+ �� ( − a j ) , ∀ j … Input (B) Hidden (linear) a j = � M i =0 α ji x i , ∀ j (A) Input Given x i , ∀ i 11

Backpropagation Training Forward Backward Case ¡2: y + (1 − y ∗ ) dJ dy = y ∗ Neural ¡ J = y ∗ �� y + (1 − y ∗ ) �� (1 − y ) y − 1 Network 1 �� ( − b ) dJ db = dJ db , dy dy y = db = 1 + �� ( − b ) ( �� ( − b ) + 1) 2 dy … D dJ = dJ db , db � … b = β j z j = z j d β j db d β j d β j j =0 dJ = dJ db , db = β j dz j db dz j dz j 1 �� ( − a j ) dJ = dJ dz j , dz j z j = = 1 + �� ( − a j ) ( �� ( − a j ) + 1) 2 da j dz j da j da j M dJ = dJ da j , da j � a j = = x i α ji x i d α ji da j d α ji d α ji i =0 D dJ = dJ da j , da j � = α ji dx i da j dx i dx i j =0 12

Backpropagation Training Forward Backward Case ¡2: y + (1 − y ∗ ) dJ dy = y ∗ Neural ¡ Loss J = y ∗ �� y + (1 − y ∗ ) �� (1 − y ) y − 1 Network 1 �� ( − b ) dJ db = dJ dy db , dy y = db = Sigmoid 1 + �� ( − b ) ( �� ( − b ) + 1) 2 dy … D dJ = dJ db , db � … b = β j z j = z j d β j db d β j d β j j =0 Linear dJ = dJ db , db = β j dz j db dz j dz j 1 �� ( − a j ) dJ = dJ dz j , dz j Sigmoid z j = = 1 + �� ( − a j ) ( �� ( − a j ) + 1) 2 da j dz j da j da j M dJ = dJ da j , da j � a j = = x i α ji x i d α ji da j d α ji d α ji i =0 Linear D dJ = dJ da j , da j � = α ji dx i da j dx i dx i j =0 13

Backpropagation Training Whiteboard – SGD ¡for ¡Neural ¡Network – Example: ¡Backpropagation ¡for ¡Neural ¡Network 14

Backpropagation Training Backpropagation ¡(Auto.Diff. ¡-‑ Reverse ¡Mode) Forward ¡Computation 1. Write ¡an ¡ algorithm for ¡evaluating ¡the ¡function ¡y ¡= ¡f( x ). ¡The ¡ algorithm ¡defines ¡a ¡ directed ¡acyclic ¡graph , ¡where ¡each ¡variable ¡is ¡a ¡ node ¡(i.e. ¡the ¡“ computation ¡graph” ) 2. Visit ¡each ¡node ¡in ¡ topological ¡order . ¡ a. Compute ¡the ¡corresponding ¡variable’s ¡value b. Store ¡the ¡result ¡at ¡the ¡node Backward ¡Computation 1. Initialize all ¡partial ¡derivatives ¡dy/du j to ¡0 ¡and ¡dy/dy = ¡1. 2. Visit ¡each ¡node ¡in ¡ reverse ¡topological ¡order . ¡ For ¡variable ¡u i = ¡g i (v 1 ,…, ¡v N ) a. We ¡already ¡know ¡dy/du i b. Increment ¡dy/dv j by ¡(dy/du i )(du i /dv j ) (Choice ¡of ¡algorithm ¡ensures ¡computing ¡(du i /dv j ) ¡is ¡easy) Return ¡ partial ¡derivatives ¡dy/du i ¡ for ¡all ¡variables 15

A ¡Recipe ¡for ¡ Background Gradients Machine ¡Learning 1. ¡Given ¡training ¡data: 3. ¡Define ¡goal: Backpropagation can ¡compute ¡this ¡ gradient! ¡ And ¡it’s ¡a ¡ special ¡case ¡of ¡a ¡more ¡ general ¡algorithm ¡ called ¡reverse-‑ 2. ¡Choose ¡each ¡of ¡these: mode ¡automatic ¡differentiation ¡that ¡ – Decision ¡function 4. ¡Train ¡with ¡SGD: can ¡compute ¡the ¡gradient ¡of ¡any ¡ differentiable ¡function ¡efficiently! (take ¡small ¡steps ¡ opposite ¡the ¡gradient) – Loss ¡function 16

Summary 1. Neural ¡Networks … – provide ¡a ¡way ¡of ¡learning ¡features – are ¡highly ¡nonlinear ¡prediction ¡functions – (can ¡be) ¡a ¡highly ¡parallel ¡network ¡of ¡logistic ¡ regression ¡classifiers – discover ¡useful ¡hidden ¡representations ¡of ¡the ¡ input 2. Backpropagation … – provides ¡an ¡efficient ¡way ¡to ¡compute ¡gradients – is ¡a ¡special ¡case ¡of ¡reverse-‑mode ¡automatic ¡ differentiation 17

DEEP ¡LEARNING 18

Deep ¡Learning ¡Outline • Background: ¡Computer ¡Vision – Image ¡Classification – ILSVRC ¡2010 ¡-‑ 2016 – Traditional ¡Feature ¡Extraction ¡Methods – Convolution ¡as ¡Feature ¡Extraction • Convolutional ¡Neural ¡Networks ¡(CNNs) – Learning ¡Feature ¡Abstractions – Common ¡CNN ¡Layers: • Convolutional ¡Layer • Max-‑Pooling ¡Layer • Fully-‑connected ¡Layer ¡(w/tensor ¡input) • Softmax Layer • ReLU Layer – Background: ¡Subgradient – Architecture: ¡LeNet – Architecture: ¡AlexNet • Training ¡a ¡CNN – SGD ¡for ¡CNNs – Backpropagation ¡for ¡CNNs 19

Why ¡is ¡everyone ¡talking ¡ Motivation about ¡Deep ¡Learning? • Because ¡a ¡lot ¡of ¡money ¡is ¡invested ¡in ¡it… – DeepMind: ¡ ¡Acquired ¡by ¡Google ¡for ¡ $400 ¡ million – DNNResearch: ¡ ¡ Three ¡person ¡startup ¡ (including ¡Geoff ¡Hinton) ¡acquired ¡by ¡Google ¡ for ¡unknown ¡price ¡tag – Enlitic, ¡Ersatz, ¡MetaMind, ¡Nervana, ¡Skylab: ¡ Deep ¡Learning ¡startups ¡commanding ¡ millions ¡ of ¡VC ¡dollars • Because ¡it ¡made ¡the ¡ front ¡page ¡ of ¡the ¡ New ¡York ¡Times 20

Why ¡is ¡everyone ¡talking ¡ Motivation about ¡Deep ¡Learning? Deep ¡learning: ¡ 1960s – Has ¡won ¡numerous ¡pattern ¡recognition ¡ competitions 1980s – Does ¡so ¡with ¡minimal ¡feature ¡ engineering 1990s This ¡wasn’t ¡always ¡the ¡case! Since ¡1980s: ¡ Form ¡of ¡models ¡hasn’t ¡changed ¡much, ¡ 2006 but ¡lots ¡of ¡new ¡tricks… – More ¡hidden ¡units – Better ¡(online) ¡optimization 2016 – New ¡nonlinear ¡functions ¡(ReLUs) – Faster ¡computers ¡(CPUs ¡and ¡GPUs) 21

BACKGROUND: ¡COMPUTER ¡VISION 22

Deep Learning (CNNs) Deep Learning Readings: Matt Gormley Murphy - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Matt Gormley Murphy 28 Bishop

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

Distributed Optimization of CNNs and RNNs GTC 2015 William Chan williamchan.ca

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level

Deep Learning in Image Processing Topics: Image Filtering 101 CNNs 101 Image

Deep Learning (CNNs) Jumpstart 2018 Chaoqi Wang, Amlan Kar Why study it? To the basics and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Click to edit Master title style RARE ISOTOPES, RARE EVENTS and ACTIVE TARGET DETECTORS for

Figures from Zorich, 2008 Some low-dimensional dynamical systems induce a dynamical system of the

CMB Bispectrum and non-Gaussian Inflation James Fergusson and Paul Shellard (DAMTP, Cambridge)

CS 2334 Lab 13 13 Recursion DrawLine drawLine(int x1, int y1, int x2, int y2) Say you want to

Malaysian Healthy Ageing Society Is aging a problem or is it a problem of ageism? The views of

Low-rank modeling for data representation Chong Peng College of Science and Technology, Qingdao

Revisions to the Fuel Cycle Revisions to the Fuel Cycle Oversight Process Oversight Process

weeks Philip Rodrigues Data selection meeting 22 March 2019 Overall plan reminder New

Deep Learning (CNNs) Deep Learning Readings: Matt Gormley Murphy - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Matt Gormley Murphy 28 Bishop

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye &amp; Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

Distributed Optimization of CNNs and RNNs GTC 2015 William Chan williamchan.ca

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level

Deep Learning in Image Processing Topics: Image Filtering 101 CNNs 101 Image

Deep Learning (CNNs) Jumpstart 2018 Chaoqi Wang, Amlan Kar Why study it? To the basics and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Click to edit Master title style RARE ISOTOPES, RARE EVENTS and ACTIVE TARGET DETECTORS for

Figures from Zorich, 2008 Some low-dimensional dynamical systems induce a dynamical system of the

CMB Bispectrum and non-Gaussian Inflation James Fergusson and Paul Shellard (DAMTP, Cambridge)

CS 2334 Lab 13 13 Recursion DrawLine drawLine(int x1, int y1, int x2, int y2) Say you want to

Malaysian Healthy Ageing Society Is aging a problem or is it a problem of ageism? The views of

Low-rank modeling for data representation Chong Peng College of Science and Technology, Qingdao

Revisions to the Fuel Cycle Revisions to the Fuel Cycle Oversight Process Oversight Process

weeks Philip Rodrigues Data selection meeting 22 March 2019 Overall plan reminder New

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung