Deep Learning = Hierarchical Compositionality “car” Low-Level Mid-Level High-Level Trainable Feature Feature Feature Classifier Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra & Zsolt Kira 43
Traditional Machine Learning VISION hand-crafted your favorite features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted your favorite features \ˈd ē p\ classifier MFCC fixed learned NLP hand-crafted This burrito place your favorite features “+” classifier is yummy and fun! Bag-of-words fixed learned 44 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Feature Engineering SIFT Spin Images HoG Textons and many many more…. (C) Dhruv Batra & Zsolt Kira 45
Traditional Machine Learning (more accurately) “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ˈd ē p\ Gaussians fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised (C) Dhruv Batra & Zsolt Kira 46 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Deep Learning = End-to-End Learning “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ˈd ē p\ Gaussians fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised (C) Dhruv Batra & Zsolt Kira 47 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
“Shallow” vs Deep Learning • “Shallow” models hand-crafted “Simple” Trainable Feature Extractor Classifier fixed learned • Deep models Trainable Trainable Trainable Feature- Feature- Feature- Transform / Transform / Transform / Classifier Classifier Classifier Learned Internal Representations Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra & Zsolt Kira 49
Distributed Representations Toy Example • Local vs Distributed (C) Dhruv Batra & Zsolt Kira 50 Slide Credit: Moontae Lee
Distributed Representations Toy Example • Can we interpret each dimension? (C) Dhruv Batra & Zsolt Kira 51 Slide Credit: Moontae Lee
Ideal Feature Extractor (C) Dhruv Batra & Zsolt Kira 52
Power of distributed representations! Local Distributed (C) Dhruv Batra & Zsolt Kira 53 Slide Credit: Moontae Lee
Power of distributed representations! • United States:Dollar :: Mexico:? (C) Dhruv Batra & Zsolt Kira 54 Slide Credit: Moontae Lee
ThisPlusThat.me Image Credit: (C) Dhruv Batra & Zsolt Kira 55 http://insightdatascience.com/blog/thisplusthat_a_search_engine_that_lets_you_add_words_as_vectors.html
So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra & Zsolt Kira 56
Benefits of Deep/Representation Learning • (Usually) Better Performance – “Because gradient descent is better than you” Yann LeCun • New domains without “experts” – RGBD – Multi-spectral data – Gene-expression data – Unclear how to hand-engineer (C) Dhruv Batra & Zsolt Kira 57
“Expert” intuitions can be misleading • “Every time I fire a linguist, the performance of our speech recognition system goes up” – Fred Jelinik, IBM ’98 (C) Dhruv Batra & Zsolt Kira 58
Benefits of Deep/Representation Learning • Modularity! • Plug and play architectures! (C) Dhruv Batra & Zsolt Kira 59
Differentiable Computation Graph Any DAG of differentialble modules is allowed! (C) Dhruv Batra & Zsolt Kira 60 Slide Credit: Marc'Aurelio Ranzato
(C) Dhruv Batra & Zsolt Kira 61
Logistic Regression as a Cascade Given a library of simple functions Compose into a complicate function (C) Dhruv Batra & Zsolt Kira 62 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Logistic Regression as a Cascade Given a library of simple functions Compose into a complicate function (C) Dhruv Batra & Zsolt Kira 63 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Key Computation: Forward-Prop (C) Dhruv Batra & Zsolt Kira 64 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Key Computation: Back-Prop (C) Dhruv Batra & Zsolt Kira 65 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Differentiable Computation Graph Any DAG of differentialble modules is allowed! (C) Dhruv Batra & Zsolt Kira 66 Slide Credit: Marc'Aurelio Ranzato
Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das
Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das
Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das
Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das
Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das
Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das
Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das
Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das
Yes it works, but how? (C) Dhruv Batra & Zsolt Kira 89
Outline • What is Deep Learning, the field, about? • What is this class about? • What to expect? – Logistics • FAQ (C) Dhruv Batra & Zsolt Kira 90
Outline • What is Deep Learning, the field, about? • What is this class about? • What to expect? – Logistics • FAQ (C) Dhruv Batra & Zsolt Kira 91
What is this class about? (C) Dhruv Batra & Zsolt Kira 92
What is this class about? • Introduction to Deep Learning • Goal: – After finishing this class, you should be ready to get started on your first DL research project. • Convolutional Neural Networks (CNNs) • Recurrent Neural Networks (RNNs) • Deep Reinforcement Learning • Generative Models (VAEs, GANs) • Target Audience: – Senior undergrads, MS-ML, and new PhD students • Note : Materials largely follows those developed by Dhruv Batra but with slight modifications (C) Dhruv Batra & Zsolt Kira 93
What this class is NOT • NOT the target audience: – Advanced grad-students already working in ML/DL areas – People looking to understand latest and greatest cutting- edge research (e.g. GANs, AlphaGo, etc) – Undergraduate/Masters students looking to graduate with a DL class on their resume. • NOT the goal: – Teaching a toolkit. “Intro to TensorFlow/PyTorch” – Intro to Machine Learning (C) Dhruv Batra & Zsolt Kira 94
Caveat • This is an ADVANCED Machine Learning class – This should NOT be your first introduction to ML – You will need a formal class; not just self-reading/courser – Taking these concurrently does not count! – If you took CS 7641/ISYE 6740/CSE 6740 @GT, you’re in the right place – If you took an equivalent class elsewhere, see list of topics taught in CS 7641 to be sure. (C) Dhruv Batra & Zsolt Kira 95
Prerequisites • Intro Machine Learning – Classifiers, regressors, loss functions, MLE, MAP • Linear Algebra – Matrix multiplication, eigenvalues, positive semi-definiteness… • Calculus – Multi-variate gradients, hessians, jacobians… If you do not have these pre-requisite, consider dropping! • This is for your benefit, as well as benefit of others (C) Dhruv Batra & Zsolt Kira 96
Prerequisites • Intro Machine Learning – Classifiers, regressors, loss functions, MLE, MAP • Linear Algebra – Matrix multiplication, eigenvalues, positive semi-definiteness… • Calculus – Multi-variate gradients, hessians, jacobians… (C) Dhruv Batra & Zsolt Kira 97
Prerequisites • Intro Machine Learning – Classifiers, regressors, loss functions, MLE, MAP • Linear Algebra – Matrix multiplication, eigenvalues, positive semi-definiteness… • Calculus – Multi-variate gradients, hessians, jacobians… • Programming! – Homeworks will require Python, C++! – Libraries/Frameworks: PyTorch – HW1 (pure python + PyTorch), HW2-4 (PyTorch) – Your language of choice for project (C) Dhruv Batra & Zsolt Kira 98
Course Information • Instructor: Zsolt Kira – zkira@gatech – Location: 222 CCB • I will always be available; just contact me or come to office hours • My job is to: – Teach the course such that you learn a lot – Provide any support needed towards that – Have fun and develop a passion for these topics (C) Dhruv Batra and Zsolt Kira 99
Course Information • Instructor: Zsolt Kira – zkira@gatech – Location: CODA room S1181B Incoming Ph.D. • Zubair Irshad • Ben Wilson • James Smith (C) Dhruv Batra & Zsolt Kira 100
Current TAs Sameer Dharur Rahul Duggal Patrick Grady MS-CS student 2 nd year CS PhD student 2 nd year Robotics PhD student https://www.linkedin.com/in/sameerdharur/ http://www.rahulduggal.com/ https://www.linkedin.com/in/patrick-grady Jiachen Yang Anishi Mehta Yinquan Lu 2 nd year MSCSE student 2nd year ML PhD MSCS student https://www.cc.gatech.edu/~jyang462/ https://www.linkedin.com/in/anishimehta https://www.cc.gatech.edu/~jyang462/ More TAs coming soon! (C) Dhruv Batra & Zsolt Kira 101
Organization & Deliverables • PS0 (2%) + 4 homeworks (78%) – PS0 is warm-up graded pass/fail – Do it! – In general PS/HWs a mix of theory and implementation – First real one goes out next week • Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early • Final project (20%) – Projects done in groups of 3-4 • (Bonus) Class Participation (up to 3%) – Top contributors to discussions (mainly on Piazza) – Ask questions, answer questions (C) Dhruv Batra & Zsolt Kira 102
New Element: FB Co-Teaching! • Several elements including: – Guest Lectures – 6 in-class lectures by FB • Data wrangling • Embeddings and world2vec • Self-attention and transformers • Language modeling and translation • Large-scale systems • Fairness, privacy, ethics – Assignments – Volunteers developing some new elements for assignments – Project ideas – Instructors will provide ideas for real-world projects and possible (surrogate/public) data sources that mirror some of the challenges they are working on (C) Dhruv Batra & Zsolt Kira 103
Late Days • “Free” Late Days – 7 late days for the semester • Use for HWs • Cannot use for project related deadlines – After free late days are used up: • 25% penalty for each late day (C) Dhruv Batra & Zsolt Kira 104
PS0 • Out today; due 01/14 – Available on website (will show up on Canvas today) • Grading: pass/fail – <=80% means that you might not be prepared for the class – Consider dropping or talk to me if that’s the case! • Topics – Probability, calculus, convexity, proving things (C) Dhruv Batra & Zsolt Kira 105
Project • Goal – Chance to try Deep Learning – Encouraged to apply to your research (computer vision, NLP, robotics,…) – Must be done this semester. – Can combine with other classes with separated thrusts • get permission from both instructors; delineate different parts – Extra credit for shooting for a publication – Teams of 3-4 people • Undergraduate and graduates on separate teams • Contributions of each member must be explained and cannot just be report writing, etc. • Main categories – Application/Survey • Compare a bunch of existing algorithms on a new application domain of your interest – Formulation/Development • Formulate a new model or algorithm for a new or old problem – Theory • Theoretically analyze an existing algorithm (C) Dhruv Batra & Zsolt Kira 106
Computing • Major bottleneck – GPUs • Options – Your own / group / advisor’s resources – Google Cloud Credits • $50 credits to every registered student courtesy Google – Google colaboratory allows free TPU access!! • https://colab.research.google.com/notebooks/welcome.ipynb – Minsky cluster in IC (C) Dhruv Batra & Zsolt Kira 107
Recommend
More recommend