Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - - PowerPoint PPT Presentation

Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

Today • Single Layer Perceptron • Multi-Layer Perceptron • Convolutional Neural Network

Linear Binary Classification T [65,09,67,.......,78,66,76,215] x ∈ R D x ∈ C 1 w T x + w 0 ≥ < 0 x ∈ C 2 4

Linear Binary Classification T [65,09,67,.......,78,66,76,215] x ∈ R D “Perceptron” x ∈ C 1 w T x + w 0 ≥ < 0 x ∈ C 2 4

Linear Binary Classification T [65,09,67,.......,78,66,76,215] x ∈ R D “Linear x ∈ C 1 Discriminant” w T x + w 0 ≥ < 0 x ∈ C 2 4

Why Linear? • Linear discriminant functions are useful in this regard as the number of required samples is linear with respect to the n dimensionality . D No. of samples Dimensionality ( D )

Perceptron • Rosenblatt simulated the perceptron on a IBM 704 computer at Cornell in 1957. • Input scene (i.e. printed character) was illuminated by powerful lights and captured on a 20x20 cadmium sulphide photo cells. • Weights of perceptron were applied using variable rotary resistors. • Often times referred to as the very first neural network. “Frank Rosenblatt”

Perceptron

Linear Discriminant Functions a x 2 y > 0 . y = 0 C 1 pen- R 1 the y < 0 . R 2 C 2 gen- en x w y ( x ) ∥ w ∥ x ⊥ x 1 − w 0 ∥ w ∥

Linear Binary Classification T [65,09,67,.......,78,66,76,215] x ∈ R D x ∈ C 1  w � T  x � ≥ < 0 1 w 0 x ∈ C 2 9

Linear Binary Classification T [65,09,67,.......,78,66,76,215] x ∈ R D x ∈ C 1 ≥ w T x < 0 x ∈ C 2 9

Perceptron Linear Discriminant t i = +1 t i = − 1 binary labels x i = i -th training example w = weight vector N X max(0 , t n · x T arg min n w ) w n =1

Perceptron Linear Discriminant t i = +1 t i = − 1 binary labels x i = i -th training example w = weight vector N X E ( t n · x T arg min n w ) w n =1

Perceptron Linear Discriminant margin ∝ ( w T w ) − 1 N n w ) + λ X E ( t n · x T 2 || w || 2 arg min 2 w n =1

Other Objectives • Other objectives are possible, E ( z ) least-squares ← || z − 1 || 2 2 hinge ← max(0 , 1 − z ) 1 sigmoid ← 1 + exp( − z ) z − 2 − 1 0 1 2

Optimizing Weights • Expressing the final objective as, N n w ) + λ X E ( t n · x T 2 || w || 2 f ( w ) = 2 n =1 • Simplest strategy is to employ gradient-descent optimization, w → w − η ∂ f ( w ) ∂ w

Optimizing Weights • Expressing the final objective as, N n w ) + λ X E ( t n · x T 2 || w || 2 f ( w ) = 2 n =1 • Simplest strategy is to employ gradient-descent optimization, w → w − η ∂ f ( w ) ∂ w “Learning Rate”

Gradient-Descent Optimization • Works for any function that can have a gradient estimated. • Guaranteed to converge towards local-minima. • Scales well to extremely large amounts of data. • Notoriously slow (linear convergence). • Often guess work associated tuning the learning rate.

Optimizing Weights   ∂ f ( w )     w 1 w 1 ∂ w 1 . . .   . . .      + η  ←   . . .     ∂ f ( w ) w K w K ∂ w K

Optimizing Weights - Per Sample • Objective nearly always summation over N samples, N X f ( w ) = f n ( w ) n =1 • So one can update the weights per sample, ∂ f n ( w ) w → w − η N ∂ w “Learning Rate”

Single Layer - Example f n ( w ) = 1 2 + λ 2 || 1 − t n · x T n w || 2 2 N || w || 2 2

Single Layer - Example f n ( w ) = 1 2 + λ 2 || 1 − t n · x T n w || 2 2 N || w || 2 2 ∂ f n ( w ) n w − t n ) x n + λ = ( x T N w ∂ w

Today • Single-Layer Perceptron • Multi-Layer Perceptron • Convolutional Neural Network

Shallow Networks • Theorem:!Gaussian!kernel!machines!need!at!least! k !examples! to!learn!a!func:on!that!has! 2k !zeroZcrossings!along!some!line! ! ! ! ! ! • Theorem:!For!a!Gaussian!kernel!machine!to!learn!some! maximally!varying!func:ons!!over! d !inputs!requires!O( 2 d )! examples! ! Y. Bengio, O. Delalleau, and N. Le Roux, “The Curse of Highly Variable Functions for Local Kernel Machines”, NIPS 2006

Hierarchical Learning Simple Complex View-tuned cells Bob Crimi

Hierarchical Learning V1 Ventral Visual Stream V2/V4 IT Simple Complex View-tuned cells Bob Crimi

Hierarchical Learning (Lee,!Grosse,!Ranganath!&!Ng,!ICML!2009)! Successive!model!layers!learn!deeper!intermediate!representa:ons! ! HighZlevel! linguis:c!representa:ons! Layer!3! Parts!combine! to!form!objects! Layer!2! Layer!1! 12! Prior:$underlying$factors$&$concepts$compactly$expressed$w/$mul/ple$levels$of$abstrac/on$ !

Why Deep? • Deep network can be considered as an MLP with several or more hidden layers. • Deeper nets are exponentially more expressive than shallow ones. Shallow Network Deep Network Montufar, Guido F., et al. "On the number of linear regions of deep neural networks." NIPS 2014.

Shallow Computer Program subroutine1 includes subroutine2 includes subsub1 code and subsub2 code and subsub2 code and subsub3 code and subsubsub1 code subsubsub3 code and … main

Deep Computer Program subsubsub2 subsubsub1 subsubsub3 subsub1 subsub2 subsub3 sub1 sub2 sub3 main

Multi-Layer Perceptron

Multi-Layer Perceptron ( M × D ) W (1) x

Multi-Layer Perceptron 1 h ( x ) 0.5 0 -0.5 -1 -4 -3 -2 -1 0 1 2 3 4 ( M × D ) x h ( W (1) x ) W (1) x

Multi-Layer Perceptron ( M × D ) (1 × M ) T   0 x ∈ C 1 0 ≥ < 0   0 x ∈ C 2 [ w (2) ] T z W (1) x

Multi-Layer Perceptron hidden units o- corre- z M input, w (1) w (2) MD KM x D rep- y K pa- outputs inputs input y 1 direc- x 1 z 1 w (2) 10 x 0 z 0

Layer 1 - MLP   h [ x T w (1)   1 ] z 1 . .   . . z =    ← . .      h [ x T w (1) z M M ] h () = non-linear function [ w (1) 1 , . . . , w (1) M ] = 1st layer’s D × M weights x = D × 1 raw input

Layer 2 - MLP T [65,09,67,.......,78,66,76,215] x ∈ R D z ∈ R M z ∈ C 1 ≥ z T w (2) < 0 z ∈ C 2 z = M × 1 output of layer 1 w (2) = 2nd layer’s M × 1 weight vector

Obvious Questions? • How many layers? • Is the solution globally optimal? • What non-linearity should you use? • What learning rate? • How to should I estimate my gradients?

How Deep? • Recent work has suggested that network depth is crucial for good performance (e.g. ImageNet). • Counter intuitively, naively trained deeper networks tend to have higher train error than shallow networks. • Innovation of residual learning has greatly helped with this. x weight layer F ( x ) relu x weight layer identity F ( x ) � + � x relu Figure 2. Residual learning: a building block. He, Kaiming, et al. "Deep residual learning for image recognition." arXiv preprint arXiv:1512.03385 (2015).

How Deep? 20 ResNet-20 ResNet-32 ResNet-44 ResNet-56 ResNet-110 error (%) 10 20-layer 110-layer 5 0 0 1 2 3 4 5 6 iter. (1e4) training error, and bold lines denote testing error He, Kaiming, et al. "Deep residual learning for image recognition." arXiv preprint arXiv:1512.03385 (2015).

Obvious Questions? • How many layers? • Is the solution globally optimal? • What non-linearity should you use? • What learning rate? • How to should I estimate my gradients?

Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - - PowerPoint PPT Presentation

Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Today Single Layer Perceptron Multi-Layer Perceptron Convolutional Neural Network Linear Binary Classification T

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Mobile Capabilities And Credentials Contents Mobile Landscape Mobile Functionalities

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

It's Polleverywhere Time! Introduction Mobile learning Mobile learning is the use of mobile

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

ts tr s

Secure Systems Engineering Chester Rebeiro Indian Institute of Technology Madras Secure Systems

with sapsploit eXtended 1.1 Alexander @sh2kerr Polyakov. Company Digital Security Research Group

Members of the Panel Mr David Wong (Chairman, Independent Director) Ms Chong Siak Ching

Game theory (Ch. 17.5) MCTS How to find which actions are good? The Upper Confidence

Risk Management Strategy Implementation Slides Head of Quality & Risk June 2016

Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish

CS 4803 / 7643: Deep Learning Topics: Regularization Neural Networks Optimization

Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - - PowerPoint PPT Presentation

Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Today Single Layer Perceptron Multi-Layer Perceptron Convolutional Neural Network Linear Binary Classification T

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Mobile Capabilities And Credentials Contents Mobile Landscape Mobile Functionalities

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

It's Polleverywhere Time! Introduction Mobile learning Mobile learning is the use of mobile

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

ts tr s

Secure Systems Engineering Chester Rebeiro Indian Institute of Technology Madras Secure Systems

with sapsploit eXtended 1.1 Alexander @sh2kerr Polyakov. Company Digital Security Research Group

Members of the Panel Mr David Wong (Chairman, Independent Director) Ms Chong Siak Ching

Game theory (Ch. 17.5) MCTS How to find which actions are good? The Upper Confidence

Risk Management Strategy Implementation Slides Head of Quality &amp; Risk June 2016

Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish

CS 4803 / 7643: Deep Learning Topics: Regularization Neural Networks Optimization

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Risk Management Strategy Implementation Slides Head of Quality & Risk June 2016