Lecture 1: Review of 109A Preview of 109B CS109B Introduction to Data Science Pavlos Protopapas and Mark Glickman
Outline Who • • What have we learned in 109a? • What is covered in 109b Course Logistics • CS109B, P ROTOPAPAS , G LICKMAN 2
Outline Who • • What have we learned in 109a? • What is covered in 109b Course Logistics • CS109B, P ROTOPAPAS , G LICKMAN 3
Who: Instructors Mark Glickman: Senior Lecturer in Statistics CS109B, P ROTOPAPAS , G LICKMAN 4
Who: Instructors (cont) About Mark Glickman: • BA in Statistics from Princeton; PhD in Statistics from Harvard • Chess master, inventor of Glicko and Glicko-2 rating systems for head-to-head competition, ratings committee chair of US Chess • Former Editor-in-Chief of the Journal of Quantitative Analysis in Sports (2015-2017) • Director of the Harvard Sports Analytics Laboratory • Senior Statistician at the Center for Healthcare Organization and Implementation Research, a Veterans Administration Center of Innovation • Fellow of the American Statistical Association • Board of Directors member of the American Statistical Association (ASA); Co-Chair of the Committee on Data Science of the ASA. CS109B, P ROTOPAPAS , G LICKMAN 5
Who: Instructors (cont) Who Pavlos Protopapas: Scientific Director of the Institute for Applied Pavlos Protopapas Computational Science (IACS) CS109B, P ROTOPAPAS , G LICKMAN 6 8
Who About Pavlos Protopapas Pavlos Protopapas • BSc in Physics, Imperial College, PhD in T heoretical Physics, UPENN • Teaches CS109 and the IACS Capstone Course • Active member of the astrostatistics community. Research at the intersection of astronomy, machine learning and statistics • Member of Alerce, an intelligent broker for online annotating celestial Scientific Director of the Institute for Applied objects from streaming data Computational Science (IACS) CS109 and the Capstone course for the Data • Loves classical music, hiking and anything adventurous Science masters program. Research in astrostatistics and excited about the new telescopes coming online in the next few years. 8 CS109B, P ROTOPAPAS , G LICKMAN 7
Who: Lab Instructors • Rahul Dave Lecturer at IACS. PhD in cosmology and teaches AM207. He loves climbing, hiking and he is also known as the human Google. Lab : AWS and scaling up your calculations (Lab 4) • Eleni Kaxiras Eleni has been the CS109a/b Head TF for 3 years. She is also a staff member at SEAS, advising courses in the use of computation for teaching and learning. She holds a Bachelor’s in Physics and she produces her own olive oil. Labs : NN optimization (Lab 3) and CNNs (Lab5) Head TF CS109B, P ROTOPAPAS , G LICKMAN 8
Who: Lab Instructors • Will Claybaugh IACS Master's student, former social network analyst at Booz Allen Hamilton. Former fencer, built and flew on a cluster of 18 weather balloons, Labs : Setting up environments (Lab 1), Smoothing/GAM (Lab2), Clustering Lab 7), Bayes 2 (Lab 9) • Srivatsan Srinivasan IACS Masters Student, Former summer data science intern at Facebook. Incoming Research Engineer at DeepMind. Enjoy occasional creative writing and theater. Labs : RNNs (Lab 6) and GANS (Lab 11) Advanced Sections: Deep RL (a-sec 6), Variational Inference (a-sec 7) CS109B, P ROTOPAPAS , G LICKMAN 9
Who: Lab Instructors • Vivek Hv Vivek is a graduate student in the Design Engineering program. He has a background in product development, healthcare, and computer science. After his undergraduate studies in Aerospace Engineering, he joined Honeywell, where he worked on rapid prototyping and development of products for private jets. Beyond this, Vivek enjoys art, cats, soccer, waffles, programming, and trekking. Labs: Bayes (Lab 8), Autoencoders and variational autoencoders (Lab 10). Advanced Sections: GANS (A-sec 8) CS109B, P ROTOPAPAS , G LICKMAN 10
Who: Advanced Section Leaders • Javier Zazo Postdoc at SEAS. Works in optimal transport and neural signal processing. Comes from Madrid. Loves going to the mountains and good weather, and being outdoors. Many hobbies, from playing Go, watching movies, and hanging out. Hates cooking. Survives on minimal effort cooked foodstuff. But still loves delicious food. Advanced Sections: Optimization (a-sec 1), Dropout (a-sec 2), Advanced CNNs (a-sec 3), NN transfer learning (a-sec 5) • Marios Matthaiakis He is a postdoctoral fellow at IACS, computational physicist and trying to apply physical laws in Neural Network architectures. I came from Crete, a beautiful island in Greece. Advanced Section: LSTN, GRU in NLP + (a-sec 4) CS109B, P ROTOPAPAS , G LICKMAN 11
Who: Teaching Fellows • Sol Girouard Reaching Fellow for 109a/b, while a Top of Class and Award Wining Student graduating as part of Harvard Class of 2018. She is a Quant, Mathematical Economist and Data Scientist who channels her applied interdisciplinary background in the intersection of financial markets and technology. Sol is training for her 2nd degree black belt in full contact Tae KwonDo. • Brandon Walker Principal data scientist for LexisNexis Risk Solutions Healthcare Analytics Group. He has TF’ed CS109a twice. • Yujiao Chen Ph.D student at GSD. She loves TF’ing 109. CS109B, P ROTOPAPAS , G LICKMAN 12
Who: Teaching Fellows Rashmi Banthia She has been TF for long time for CS109A/B. Interests - Indian food and latest - Orangetheory (doesn’t mean I’m good at it) Evan Mackay Harvard College Evan is from Florida and enjoys biking, podcasts, and sweet potatoes Alex Lin Harvard College Alex enjoys working with Python(s) CS109B, P ROTOPAPAS , G LICKMAN 13
Who: Teaching Fellows Curtis Hsu Curtis Hsu is a Senior at Harvard College living in Mather House studying statistics and computer science. He enjoys hip hop dancing in his free time! Anirudh (Ani) Suresh Ani (’20) is a Harvard undergraduate concentrating in Math & CS. CS109B, P ROTOPAPAS , G LICKMAN 14
Outline Who • • What have we learned in 109a? • What is covered in 109b Course Logistics • CS109B, P ROTOPAPAS , G LICKMAN 15
109A Scraping, skLearn, numpy, Pandas, matplotlib • • Visualization best practices • Linear, multiple and polynomial regression Model Selection and regularization • • Logistic Regression, multiple and polynomial. • kNN classification Decision Trees, RF, Boosting, Stacking • • SVM • AB testing and experimental design CS109B, P ROTOPAPAS , G LICKMAN 16
Outline Who • • What have we learned in 109a? • What is covered in 109b Course Logistics • CS109B, P ROTOPAPAS , G LICKMAN 17
Topics The semester is divided into 2 parts. • Part 1: Smoothing, Unsupervised Learning and Bayesian inference all in python, Neural Networks in python and Keras. Modules: Incorporates everything from 109a and 109b into • modules. CS109B, P ROTOPAPAS , G LICKMAN 18
Course topics covered by Glickman Regression splines, smoothers, additive and generalized • additive models Unsupervised learning and cluster analysis • • Introduction to Bayesian methods Ø Hierarchical modeling Ø Latent Dirichlet Allocation (topic modeling) CS109B, P ROTOPAPAS , G LICKMAN 19
Course topics covered by Glickman (cont) Smoothers and GAMs: (raw data) CS109B, P ROTOPAPAS , G LICKMAN 20
Course topics covered by Glickman (cont) Smoothers and GAMs: (smoothed fit) CS109B, P ROTOPAPAS , G LICKMAN 21
Course topics covered by Glickman (cont) Cluster analysis: CS109B, P ROTOPAPAS , G LICKMAN 22
Course topics covered by Glickman (cont) Example use of cluster analysis: CS109B, P ROTOPAPAS , G LICKMAN 23
Course topics covered by Glickman (cont) Bayesian statistics: CS109B, P ROTOPAPAS , G LICKMAN 24
Course topics covered by Glickman (cont) Bayesian statistics: Hierarchical modeling CS109B, P ROTOPAPAS , G LICKMAN 25
Course topics covered by Glickman (cont) Bayesian statistics: Hierarchical modeling Hospital Variation in Carotid Stenting Outcomes CS109B, P ROTOPAPAS , G LICKMAN 26
Course topics covered by Glickman (cont) Bayesian statistics: Latent Dirichlet Allocation CS109B, P ROTOPAPAS , G LICKMAN 27
Course topics covered by Pavlos Deep Neural Network • Review from 109a: Neural Net Basics & Math, Deep Feed Forward, Regularization • Optimization • CNNs RNNs • • Autoencoders • Variational Autoencoders GANs • • Deep reinforcement learning CS109B, P ROTOPAPAS , G LICKMAN 28
Course topics covered by Pavlos (cont) SGD is slow when there is high accuracy SGD with momentum SGD CS109B, P ROTOPAPAS , G LICKMAN 29
30
Course topics covered by Pavlos (cont) You Only Look Once (YOLO) - 2016 CS109B, P ROTOPAPAS , G LICKMAN 31
Course topics covered by Pavlos (cont) Mask- RCNN - 2017 CS109B, P ROTOPAPAS , G LICKMAN 32
Course topics covered by Pavlos (cont) RNN classification, e.g. sentiment analysis Sentence: While the music was great, the screenplay was not so engaging and hence even if I started to enjoy it, the movie failed to work for me eventually. Actual Sentiment: Negative Predicted Sentiment: ? CS109B, P ROTOPAPAS , G LICKMAN 33
Course topics covered by Pavlos (cont) RNN sequence to sequence modeling English : I love this course Spanish: Me encanta esta clase Greek: Λατρεύω αυτή την τάξη CS109B, P ROTOPAPAS , G LICKMAN 34
Recommend
More recommend