CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/ Piazza: https://piazza.com/gatech/fall2019/cs48037643 Canvas: https://gatech.instructure.com/courses/60374 (4803) https://gatech.instructure.com/courses/60364 (7643) Gradescope: https://www.gradescope.com/courses/56799 (4803) https://www.gradescope.com/courses/53817 (7643) Dhruv Batra School of Interactive Computing Georgia Tech
What are we here to discuss? Some of the most exciting developments in Machine Learning, Vision, NLP, Speech, Robotics & AI in general in the last decade! (C) Dhruv Batra 2
Proxy for public interest (C) Dhruv Batra 3
AlphaGo vs Lee Sedol (C) Dhruv Batra 4
Outline • What is Deep Learning, the field, about? – Highlight of some recent projects from my lab • What is this class about? – What to expect? – Logistics • FAQ (C) Dhruv Batra 5
Outline • What is Deep Learning, the field, about? – Highlight of some recent projects from my lab • What is this class about? – What to expect? – Logistics • FAQ (C) Dhruv Batra 6
Demo time vqa.cloudcv.org. demo.visualdialog.org (C) Dhruv Batra 7
Concepts (C) Dhruv Batra 8 Image Credit: https://www.sumologic.com/blog/machine-learning-deep-learning/
What is (general) intelligence? • Boring textbook answer The ability to acquire and apply knowledge and skills – Dictionary • My favorite The ability to navigate in problem space – Siddhartha Mukherjee, Columbia (C) Dhruv Batra 9
What is artificial intelligence? • Boring textbook answer Intelligence demonstrated by machines – Wikipedia • My favorite The science and engineering of making computers behave in ways that, until recently, we thought required human intelligence. – Andrew Moore, CMU (C) Dhruv Batra 10
What is machine learning? • My favorite Study of algorithms that improve their performance (P) at some task (T) with experience (E) – Tom Mitchell, CMU (C) Dhruv Batra 11
Image Classification ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 1000 object classes 1.4M/50k/100k images Person Dalmatian http://image-net.org/challenges/LSVRC/{2010,…,2015} (C) Dhruv Batra 12
Image Classification (C) Dhruv Batra 13
Tasks are getting bolder A group of young people playing a game of Frisbee Vinyals et al., 2015 Antol et al., 2015 Das et al., 2017 (C) Dhruv Batra 15
(C) Dhruv Batra 23
Embodied Question Answering [CVPR ’18] Georgia Gkioxari Abhishek Das Samyak Datta (FAIR) (Georgia Tech) (Georgia Tech) Devi Parikh Dhruv Batra Stefan Lee (Georgia Tech / FAIR) (Georgia Tech / FAIR) (Georgia Tech)
(C) Dhruv Batra 26
What is to the left of the shower? Cabinet
PACMAN-RL
PACMAN-RL
So what is Deep (Machine) Learning? • Representation Learning • Neural Networks • Deep Unsupervised/Reinforcement/Structured/ <insert-qualifier-here> Learning • Simply: Deep Learning (C) Dhruv Batra 33
So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra 34
Traditional Machine Learning VISION hand-crafted your favorite features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted your favorite features \ˈd ē p\ classifier MFCC fixed learned NLP hand-crafted This burrito place your favorite features “+” classifier is yummy and fun! Bag-of-words fixed learned 35 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Hierarchical Compositionality VISION pixels edge texton motif part object SPEECH spectral sample formant motif phone word band NLP character word NP/VP/.. clause sentence story 36 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Building A Complicated Function Given a library of simple functions Compose into a complicate function (C) Dhruv Batra 37 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Building A Complicated Function Given a library of simple functions Idea 1: Linear Combinations Compose into a • Boosting • Kernels complicate function • … X f ( x ) = α i g i ( x ) i (C) Dhruv Batra 38 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Building A Complicated Function Given a library of simple functions Idea 2: Compositions Compose into a • Deep Learning • Grammar models complicate function • Scattering transforms… f ( x ) = g 1 ( g 2 ( . . . ( g n ( x ) . . . )) (C) Dhruv Batra 39 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Building A Complicated Function Given a library of simple functions Idea 2: Compositions Compose into a • Deep Learning • Grammar models complicate function • Scattering transforms… f ( x ) = log(cos(exp(sin 3 ( x )))) (C) Dhruv Batra 40 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Deep Learning = Hierarchical Compositionality “car” Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Deep Learning = Hierarchical Compositionality “car” Low-Level Mid-Level High-Level Trainable Feature Feature Feature Classifier Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra 44
Traditional Machine Learning VISION hand-crafted your favorite features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted your favorite features \ˈd ē p\ classifier MFCC fixed learned NLP hand-crafted This burrito place your favorite features “+” classifier is yummy and fun! Bag-of-words fixed learned 45 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Feature Engineering SIFT Spin Images HoG Textons and many many more…. (C) Dhruv Batra 46
Traditional Machine Learning (more accurately) “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ˈd ē p\ Gaussians fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised (C) Dhruv Batra 47 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
Deep Learning = End-to-End Learning “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ˈd ē p\ Gaussians fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised (C) Dhruv Batra 48 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
“Shallow” vs Deep Learning • “Shallow” models hand-crafted “Simple” Trainable Feature Extractor Classifier fixed learned • Deep models Trainable Trainable Trainable Feature- Feature- Feature- Transform / Transform / Transform / Classifier Classifier Classifier Learned Internal Representations Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra 51
Distributed Representations Toy Example • Local vs Distributed (C) Dhruv Batra 52 Slide Credit: Moontae Lee
Distributed Representations Toy Example • Can we interpret each dimension? (C) Dhruv Batra 53 Slide Credit: Moontae Lee
Power of distributed representations! Local Distributed (C) Dhruv Batra 54 Slide Credit: Moontae Lee
Power of distributed representations! • United States:Dollar :: Mexico:? (C) Dhruv Batra 55 Slide Credit: Moontae Lee
ThisPlusThat.me Image Credit: (C) Dhruv Batra 56 http://insightdatascience.com/blog/thisplusthat_a_search_engine_that_lets_you_add_words_as_vectors.html
So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra 57
Recommend
More recommend