Introduction to Deep Learning: Concepts and Terminologies CSE - PowerPoint PPT Presentation

Introduction to Deep Learning: Concepts and Terminologies CSE 5194.01 Autumn ‘20 Arpan Jain The Ohio State University E-mail: jain.575@osu.edu

Outline • Introduction • DNN Training • Essential Concepts • Parallel and Distributed DNN Training Network Based Computing Laboratory CSE 5194.01 2

Deep Learning According to Yoshua Bengio • “ Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features ” Deep Learning • Uses Deep Neural Networks and its • variants Based on learning data representation • It can be supervised or unsupervised • Examples Convolutional Neural • Network (CNN), Recurrent Neural Network, Hybrid Networks Sour urce: https://thenewstack.io/demystifying-deep eep-le learnin ing-and-artif ific icia ial-intel telligence/ e/ Network Based Computing Laboratory CSE 5194.01 3

One Line (Unofficial) Definitions • Machine Learning - Ability of machines to learn without being programmed • Supervised Learning - We provide the machine with the “right answers” (labels) – Classification – Discrete value output (e.g. email is spam or not-spam) – Regression – Continuous output values (e.g. house prices) • Unsupervised Learning - No “right answers” given. Learn yourself; no labels for you! – Clustering – Group the data points that are ”close” to each other (e.g. cocktail party problem) • finding structure in data is the key here! • Features – Input attributes (e.g. tumor size, age, etc. in cancer detection problem) – A very important concept in learning so please remember this! • Deep Learning – learning that uses Deep Neural Networks Network Based Computing Laboratory CSE 5194.01 4

Spot Quiz: Supervised vs. Unsupervised? X2 X2 X1 X1 • Left Picture: Supervised/Unsupervised? • What is X1 and X2? • Right Picture: Supervised/Unsupervised? • What do colors/shapes represent? • What is the green line? Network Based Computing Laboratory CSE 5194.01 5

TensorFlow playground • To actually train a network, please visit: http://playground.tensorflow.org Network Based Computing Laboratory CSE 5194.01 6

Handwritten Numbers (Quick Demo) • To try handwritten numbers, please visit: https://microsoft.github.io/onnxjs-demo/#/mnist Network Based Computing Laboratory CSE 5194.01 7

DNN Training: Forward Pass Input Hidden Hidden Output Layer Layer Layer Layer Network Based Computing Laboratory CSE 5194.01 9

DNN Training: Forward Pass Forward Pass W 1 X W 2 Input Hidden Hidden Output Layer Layer Layer Layer Network Based Computing Laboratory CSE 5194.01 10

DNN Training: Forward Pass Forward Pass W 3 W 1 W 4 X W 5 W 2 W 6 Input Hidden Hidden Output Layer Layer Layer Layer Network Based Computing Laboratory CSE 5194.01 11

DNN Training: Forward Pass Forward Pass W 3 W 1 W 7 W 4 X W 5 W 2 W 8 W 6 Input Hidden Hidden Output Layer Layer Layer Layer Network Based Computing Laboratory CSE 5194.01 12

DNN Training: Forward Pass Forward Pass W 3 W 1 W 7 W 4 X Pred W 5 Error = Loss(Pred,Output) W 2 W 8 W 6 Input Hidden Hidden Output Layer Layer Layer Layer Network Based Computing Laboratory CSE 5194.01 13

DNN Training: Backward Pass Forward Pass E 7 Error = Loss(Pred,Output) E 8 Backward Pass Input Hidden Hidden Output Layer Layer Layer Layer Network Based Computing Laboratory CSE 5194.01 14

DNN Training: Backward Pass Forward Pass E 3 E 7 E 4 E 5 Error = Loss(Pred,Output) E 8 E 6 Backward Pass Input Hidden Hidden Output Layer Layer Layer Layer Network Based Computing Laboratory CSE 5194.01 15

DNN Training: Backward Pass Forward Pass E 3 E 1 E 7 E 4 E 5 Error = Loss(Pred,Output) E 2 E 8 E 6 Backward Pass Input Hidden Hidden Output Layer Layer Layer Layer Network Based Computing Laboratory CSE 5194.01 16

DNN Training Network Based Computing Laboratory CSE 5194.01 17

Essential Concepts: Activation function and Back-propagation • Back-propagation involves complicated mathematics. – Luckily, most DL Frameworks give you a one line implementation -- model.backward() • What are Activation functions? Courtesy: https://www.jeremyjordan.me/neural-networks-training/ – RELU (a Max fn.) is the most common activation fn. I encourage everyone to take CSE 5526! – Sigmoid, tanh, etc. are also used Network Based Computing Laboratory CSE 5194.01 19

Essential Concepts: Stochastic Gradient Descent (SGD) • Goal of SGD: – Minimize a cost fn. – J( θ) as a function of θ • SGD is iterative • Only two equations to remember: θ i := θ i + Δθ i Δθ i = −α * ( ∂ J( θ) / ∂θ i) • α = learning rate Courtesy: https://www.jeremyjordan.me/gradient-descent/ Network Based Computing Laboratory CSE 5194.01 20

Essential Concepts: Learning Rate ( α ) Courtesy: https://www.jeremyjordan.me/nn-learning-rate/ Network Based Computing Laboratory CSE 5194.01 21

Essential Concepts: Batch Size • Batched Gradient Descent N – Batch Size = N • Stochastic Gradient Descent – Batch Size = 1 • Mini-batch Gradient Descent – Somewhere in the middle – Common: • Batch Size = 64, 128, 256, etc. • Finding the optimal batch One full pass over N is called an epoch of training Batch Size size will yield the fastest learning. Courtesy: https://www.jeremyjordan.me/gradient-descent/ Network Based Computing Laboratory CSE 5194.01 22

Mini-batch Gradient Descent (Example) Network Based Computing Laboratory CSE 5194.01 23

Essential Concepts: Model Size • How to define the “size” of a model? (model is also called a DNN or a network) • Size means several things and context is important – Model Size: # of parameters ( weights on edges ) Weights on Edges – Model Size: # of layers ( model depth ) Model Depth (No. of Layers) Network Based Computing Laboratory CSE 5194.01 24

Essential Concepts: Accuracy and Throughput (Speed) • What is the end goal of training a model with SGD and Back-propagation? – Of course, train the machine to predict something useful for you • How do we measure success? – Well, accuracy of the trained model on “new” data is the metric of success • How quickly we can reach there is: – ”good to have” for some models – “practically necessary” for most state-of-the-art models – In Computer Vision: images/second is the metric of throughput/speed • Why? – Let’s hear some opinions from the class Network Based Computing Laboratory CSE 5194.01 25

Impact of Model Size and Dataset Size model > data • Large models  better accuracy • More data  better accuracy • Single-node Training; good for – Small model and small dataset data > model • Distributed Training; good for: – Large models and large datasets Courtesy: http://engineering.skymind.io/distributed-deep-learning-part-1-an-introduction-to-distributed-training-of-neural-networks Network Based Computing Laboratory CSE 5194.01 27

Overfitting and Underfitting • Overfitting – model > data  so model is not learning but memorizing your data • Underfitting – data > model  so model is not learning because it cannot capture the complexity of your data Courtesy: https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html Network Based Computing Laboratory CSE 5194.01 28

Parallelization Strategies Model Parallelism • What are the Parallelization Strategies – Model Parallelism – Data Parallelism (Received the most attention) Data Parallelism – Hybrid Parallelism – Automatic Selection Hybrid (Model and Data) Parallelism Courtesy: http://engineering.skymind.io/distributed-deep-learning-part-1-an-introduction-to-distributed-training-of-neural-networks Network Based Computing Laboratory CSE 5194.01 29

Need for Data Parallelism Let’s revisit Mini-Batch Gradient Descent Drawback: If the dataset has 1 million images, then it will take forever to run the model on such a big dataset Solution: Can we use multiple machines to speedup the training of Deep learning models? (i.e. Utilize Supercomputers to Parallelize) Network Based Computing Laboratory CSE 5194.01 30

Need for Communication in Data Parallelism Y Y N N Y Y Y Y N N Y Y Y Y Machine 1 Y Machine 4 Y Y Y N N Y Y Y Y N N Y Y Y Y Y Machine 2 Y Machine 5 Y N Problem: Train a single model on whole dataset, Y Y not 5 models on different sets of dataset N Y Y Machine 3 Y Network Based Computing Laboratory CSE 5194.01 31

Introduction to Deep Learning: Concepts and Terminologies CSE - PowerPoint PPT Presentation

Introduction to Deep Learning: Concepts and Terminologies CSE 5194.01 Autumn 20 Arpan Jain The Ohio State University E-mail: jain.575@osu.edu Outline Introduction DNN Training Essential Concepts Parallel and Distributed DNN

Terminologies & Terminologies & Ontologies? Ontologies? What are they for? What would

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

CONCEPTS AND CONCEPTS AND CONCEPTS AND CONCEPTS AND PR PR PRINC PRINC NCIPLES OF NCIPLES

Current C Current C Current C Current C Concepts of Concepts of Concepts of Concepts of

Business and Technical Concepts of Business and Technical Concepts of Business and Technical

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Joint European Standing Group Meeting 33 Place your chosen image here. The four corners must

Advanced Machine Learning CS 7140 - Spring 2018 Lecture 16: Project Discussion Jan-Willem van de

Mod 1 Unit 3 Lesson 1 Introduction to Triangle Proofs Lecture Slides.notebook December 10, 2014 1

Lecture 2: Tiling matrix-matrix multiply, code tuning David Bindel 1 Feb 2010 Logistics

<this slide is intentionally blank> idealistic realistic state of the art state of the

Automated Reasoning in First-Order Logic Peter Baumgartner

Off Earth Mining under the Outer Space Treaty: Legal with Future Challenges 1. Current National

IC1207 COST Action PARSEME PARSing and Multi-Word Expressions Towards linguistic precision and

Introduction to Deep Learning: Concepts and Terminologies CSE - PowerPoint PPT Presentation

Introduction to Deep Learning: Concepts and Terminologies CSE 5194.01 Autumn 20 Arpan Jain The Ohio State University E-mail: jain.575@osu.edu Outline Introduction DNN Training Essential Concepts Parallel and Distributed DNN

Terminologies &amp; Terminologies &amp; Ontologies? Ontologies? What are they for? What would

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

CONCEPTS AND CONCEPTS AND CONCEPTS AND CONCEPTS AND PR PR PRINC PRINC NCIPLES OF NCIPLES

Current C Current C Current C Current C Concepts of Concepts of Concepts of Concepts of

Business and Technical Concepts of Business and Technical Concepts of Business and Technical

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Joint European Standing Group Meeting 33 Place your chosen image here. The four corners must

Advanced Machine Learning CS 7140 - Spring 2018 Lecture 16: Project Discussion Jan-Willem van de

Mod 1 Unit 3 Lesson 1 Introduction to Triangle Proofs Lecture Slides.notebook December 10, 2014 1

Lecture 2: Tiling matrix-matrix multiply, code tuning David Bindel 1 Feb 2010 Logistics

&lt;this slide is intentionally blank&gt; idealistic realistic state of the art state of the

Automated Reasoning in First-Order Logic Peter Baumgartner

Off Earth Mining under the Outer Space Treaty: Legal with Future Challenges 1. Current National

IC1207 COST Action PARSEME PARSing and Multi-Word Expressions Towards linguistic precision and

Terminologies & Terminologies & Ontologies? Ontologies? What are they for? What would

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

<this slide is intentionally blank> idealistic realistic state of the art state of the