cs 4803 7643 deep learning
play

CS 4803 / 7643: Deep Learning Website: - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/ Piazza: https://piazza.com/gatech/spring2020/cs4803dl7643a/ Staff mailing list (personal questions): cs4803-7643-staff@lists.gatech.edu Gradescope:


  1. Deep Learning = Hierarchical Compositionality “car” Low-Level Mid-Level High-Level Trainable Feature Feature Feature Classifier Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  2. So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra & Zsolt Kira 43

  3. Traditional Machine Learning VISION hand-crafted your favorite features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted your favorite features \ˈd ē p\ classifier MFCC fixed learned NLP hand-crafted This burrito place your favorite features “+” classifier is yummy and fun! Bag-of-words fixed learned 44 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  4. Feature Engineering SIFT Spin Images HoG Textons and many many more…. (C) Dhruv Batra & Zsolt Kira 45

  5. Traditional Machine Learning (more accurately) “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ˈd ē p\ Gaussians fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised (C) Dhruv Batra & Zsolt Kira 46 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  6. Deep Learning = End-to-End Learning “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ˈd ē p\ Gaussians fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised (C) Dhruv Batra & Zsolt Kira 47 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  7. “Shallow” vs Deep Learning • “Shallow” models hand-crafted “Simple” Trainable Feature Extractor Classifier fixed learned • Deep models Trainable Trainable Trainable Feature- Feature- Feature- Transform / Transform / Transform / Classifier Classifier Classifier Learned Internal Representations Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  8. So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra & Zsolt Kira 49

  9. Distributed Representations Toy Example • Local vs Distributed (C) Dhruv Batra & Zsolt Kira 50 Slide Credit: Moontae Lee

  10. Distributed Representations Toy Example • Can we interpret each dimension? (C) Dhruv Batra & Zsolt Kira 51 Slide Credit: Moontae Lee

  11. Ideal Feature Extractor (C) Dhruv Batra & Zsolt Kira 52

  12. Power of distributed representations! Local Distributed (C) Dhruv Batra & Zsolt Kira 53 Slide Credit: Moontae Lee

  13. Power of distributed representations! • United States:Dollar :: Mexico:? (C) Dhruv Batra & Zsolt Kira 54 Slide Credit: Moontae Lee

  14. ThisPlusThat.me Image Credit: (C) Dhruv Batra & Zsolt Kira 55 http://insightdatascience.com/blog/thisplusthat_a_search_engine_that_lets_you_add_words_as_vectors.html

  15. So what is Deep (Machine) Learning? • A few different ideas: • (Hierarchical) Compositionality – Cascade of non-linear transformations – Multiple layers of representations • End-to-End Learning – Learning (goal-driven) representations – Learning to feature extraction • Distributed Representations – No single neuron “encodes” everything – Groups of neurons work together (C) Dhruv Batra & Zsolt Kira 56

  16. Benefits of Deep/Representation Learning • (Usually) Better Performance – “Because gradient descent is better than you” Yann LeCun • New domains without “experts” – RGBD – Multi-spectral data – Gene-expression data – Unclear how to hand-engineer (C) Dhruv Batra & Zsolt Kira 57

  17. “Expert” intuitions can be misleading • “Every time I fire a linguist, the performance of our speech recognition system goes up” – Fred Jelinik, IBM ’98 (C) Dhruv Batra & Zsolt Kira 58

  18. Benefits of Deep/Representation Learning • Modularity! • Plug and play architectures! (C) Dhruv Batra & Zsolt Kira 59

  19. Differentiable Computation Graph Any DAG of differentialble modules is allowed! (C) Dhruv Batra & Zsolt Kira 60 Slide Credit: Marc'Aurelio Ranzato

  20. (C) Dhruv Batra & Zsolt Kira 61

  21. Logistic Regression as a Cascade Given a library of simple functions Compose into a complicate function (C) Dhruv Batra & Zsolt Kira 62 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  22. Logistic Regression as a Cascade Given a library of simple functions Compose into a complicate function (C) Dhruv Batra & Zsolt Kira 63 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  23. Key Computation: Forward-Prop (C) Dhruv Batra & Zsolt Kira 64 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  24. Key Computation: Back-Prop (C) Dhruv Batra & Zsolt Kira 65 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

  25. Differentiable Computation Graph Any DAG of differentialble modules is allowed! (C) Dhruv Batra & Zsolt Kira 66 Slide Credit: Marc'Aurelio Ranzato

  26. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  27. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  28. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  29. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  30. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  31. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  32. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  33. Visual Dialog Model #1 Late Fusion Encoder Slide Credit: Abhishek Das

  34. Yes it works, but how? (C) Dhruv Batra & Zsolt Kira 89

  35. Outline • What is Deep Learning, the field, about? • What is this class about? • What to expect? – Logistics • FAQ (C) Dhruv Batra & Zsolt Kira 90

  36. Outline • What is Deep Learning, the field, about? • What is this class about? • What to expect? – Logistics • FAQ (C) Dhruv Batra & Zsolt Kira 91

  37. What is this class about? (C) Dhruv Batra & Zsolt Kira 92

  38. What is this class about? • Introduction to Deep Learning • Goal: – After finishing this class, you should be ready to get started on your first DL research project. • Convolutional Neural Networks (CNNs) • Recurrent Neural Networks (RNNs) • Deep Reinforcement Learning • Generative Models (VAEs, GANs) • Target Audience: – Senior undergrads, MS-ML, and new PhD students • Note : Materials largely follows those developed by Dhruv Batra but with slight modifications (C) Dhruv Batra & Zsolt Kira 93

  39. What this class is NOT • NOT the target audience: – Advanced grad-students already working in ML/DL areas – People looking to understand latest and greatest cutting- edge research (e.g. GANs, AlphaGo, etc) – Undergraduate/Masters students looking to graduate with a DL class on their resume. • NOT the goal: – Teaching a toolkit. “Intro to TensorFlow/PyTorch” – Intro to Machine Learning (C) Dhruv Batra & Zsolt Kira 94

  40. Caveat • This is an ADVANCED Machine Learning class – This should NOT be your first introduction to ML – You will need a formal class; not just self-reading/courser – Taking these concurrently does not count! – If you took CS 7641/ISYE 6740/CSE 6740 @GT, you’re in the right place – If you took an equivalent class elsewhere, see list of topics taught in CS 7641 to be sure. (C) Dhruv Batra & Zsolt Kira 95

  41. Prerequisites • Intro Machine Learning – Classifiers, regressors, loss functions, MLE, MAP • Linear Algebra – Matrix multiplication, eigenvalues, positive semi-definiteness… • Calculus – Multi-variate gradients, hessians, jacobians… If you do not have these pre-requisite, consider dropping! • This is for your benefit, as well as benefit of others (C) Dhruv Batra & Zsolt Kira 96

  42. Prerequisites • Intro Machine Learning – Classifiers, regressors, loss functions, MLE, MAP • Linear Algebra – Matrix multiplication, eigenvalues, positive semi-definiteness… • Calculus – Multi-variate gradients, hessians, jacobians… (C) Dhruv Batra & Zsolt Kira 97

  43. Prerequisites • Intro Machine Learning – Classifiers, regressors, loss functions, MLE, MAP • Linear Algebra – Matrix multiplication, eigenvalues, positive semi-definiteness… • Calculus – Multi-variate gradients, hessians, jacobians… • Programming! – Homeworks will require Python, C++! – Libraries/Frameworks: PyTorch – HW1 (pure python + PyTorch), HW2-4 (PyTorch) – Your language of choice for project (C) Dhruv Batra & Zsolt Kira 98

  44. Course Information • Instructor: Zsolt Kira – zkira@gatech – Location: 222 CCB • I will always be available; just contact me or come to office hours • My job is to: – Teach the course such that you learn a lot – Provide any support needed towards that – Have fun and develop a passion for these topics (C) Dhruv Batra and Zsolt Kira 99

  45. Course Information • Instructor: Zsolt Kira – zkira@gatech – Location: CODA room S1181B Incoming Ph.D. • Zubair Irshad • Ben Wilson • James Smith (C) Dhruv Batra & Zsolt Kira 100

  46. Current TAs Sameer Dharur Rahul Duggal Patrick Grady MS-CS student 2 nd year CS PhD student 2 nd year Robotics PhD student https://www.linkedin.com/in/sameerdharur/ http://www.rahulduggal.com/ https://www.linkedin.com/in/patrick-grady Jiachen Yang Anishi Mehta Yinquan Lu 2 nd year MSCSE student 2nd year ML PhD MSCS student https://www.cc.gatech.edu/~jyang462/ https://www.linkedin.com/in/anishimehta https://www.cc.gatech.edu/~jyang462/ More TAs coming soon! (C) Dhruv Batra & Zsolt Kira 101

  47. Organization & Deliverables • PS0 (2%) + 4 homeworks (78%) – PS0 is warm-up graded pass/fail – Do it! – In general PS/HWs a mix of theory and implementation – First real one goes out next week • Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early • Final project (20%) – Projects done in groups of 3-4 • (Bonus) Class Participation (up to 3%) – Top contributors to discussions (mainly on Piazza) – Ask questions, answer questions (C) Dhruv Batra & Zsolt Kira 102

  48. New Element: FB Co-Teaching! • Several elements including: – Guest Lectures – 6 in-class lectures by FB • Data wrangling • Embeddings and world2vec • Self-attention and transformers • Language modeling and translation • Large-scale systems • Fairness, privacy, ethics – Assignments – Volunteers developing some new elements for assignments – Project ideas – Instructors will provide ideas for real-world projects and possible (surrogate/public) data sources that mirror some of the challenges they are working on (C) Dhruv Batra & Zsolt Kira 103

  49. Late Days • “Free” Late Days – 7 late days for the semester • Use for HWs • Cannot use for project related deadlines – After free late days are used up: • 25% penalty for each late day (C) Dhruv Batra & Zsolt Kira 104

  50. PS0 • Out today; due 01/14 – Available on website (will show up on Canvas today) • Grading: pass/fail – <=80% means that you might not be prepared for the class – Consider dropping or talk to me if that’s the case! • Topics – Probability, calculus, convexity, proving things (C) Dhruv Batra & Zsolt Kira 105

  51. Project • Goal – Chance to try Deep Learning – Encouraged to apply to your research (computer vision, NLP, robotics,…) – Must be done this semester. – Can combine with other classes with separated thrusts • get permission from both instructors; delineate different parts – Extra credit for shooting for a publication – Teams of 3-4 people • Undergraduate and graduates on separate teams • Contributions of each member must be explained and cannot just be report writing, etc. • Main categories – Application/Survey • Compare a bunch of existing algorithms on a new application domain of your interest – Formulation/Development • Formulate a new model or algorithm for a new or old problem – Theory • Theoretically analyze an existing algorithm (C) Dhruv Batra & Zsolt Kira 106

  52. Computing • Major bottleneck – GPUs • Options – Your own / group / advisor’s resources – Google Cloud Credits • $50 credits to every registered student courtesy Google – Google colaboratory allows free TPU access!! • https://colab.research.google.com/notebooks/welcome.ipynb – Minsky cluster in IC (C) Dhruv Batra & Zsolt Kira 107

Recommend


More recommend