ece 6504 advanced topics in machine learning
play

ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Dhruv Batra Virginia Tech What is this class about? Some of the most exciting developments in Machine Learning, AI, Statistics &


  1. ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Dhruv Batra Virginia Tech

  2. What is this class about? Some of the most exciting developments in Machine Learning, AI, Statistics & related fields in the last 3 decades (C) Dhruv Batra 2

  3. First Caveat • This is an ADVANCED Machine Learning class – This should not be your first introduction to ML – You will need a formal class; not just self-reading/coursera – If you took ECE 4984/5984, you’re in the right place – If you took ECE 5524 or equivalent, see list of topics taught in ECE 4984/5984. (C) Dhruv Batra 3

  4. Topics Covered in Intro to ML&P • Basics of Statistical Learning • Loss function, MLE, MAP, Bayesian estimation, bias-variance tradeoff, overfitting, regularization, cross-validation • Supervised Learning • Naïve Bayes, Logistic Regression, Nearest Neighbour, Neural Networks, Support Vector Machines, Kernels • Ensemble Methods: Bagging, Boosting • Unsupervised Learning • Clustering: k-means, Gaussian mixture models, EM • Dimensionality reduction: PCA, SVD, LDA • Perception • Applications to Vision, Natural Language Processing (C) Dhruv Batra 4

  5. What is this class about? • Making global predictions from local observations • Learning such models from large quantities of data (C) Dhruv Batra 5

  6. Exciting Developments • Probabilistic Graphical Models – Directed: Bayesian Networks (Bayes Nets) – Undirected: Markov/Conditional Random Fields – Structured Prediction • Large-Scale Learning – Online learning – Distributed learning • Deep Learning – Convolutional Nets Not covered in this class – Distributed backprop – Dropout (C) Dhruv Batra 6

  7. What is Machine Learning? • What is learning? • [Kevin Murphy] algorithms that – automatically detect patterns in data – use the uncovered patterns to predict future data or other outcomes of interest • [Tom Mitchell] algorithms that – improve their performance (P) – at some task (T) – with experience (E) (C) Dhruv Batra 7

  8. Tasks Supervised Learning x Classification y Discrete x Regression y Continuous Unsupervised Learning x Clustering c Discrete ID Dimensionality x z Continuous Reduction (C) Dhruv Batra 8

  9. Classification x Classification y Discrete (C) Dhruv Batra 9

  10. Speech Recognition (C) Dhruv Batra Slide Credit: Carlos Guestrin 10

  11. Machine Translation (C) Dhruv Batra Figure Credit: Kevin Gimpel 11

  12. Object/Face ¡detec,on ¡ • Many ¡new ¡digital ¡cameras ¡now ¡detect ¡faces ¡ – Canon, ¡Sony, ¡Fuji, ¡… ¡ ¡ ¡ ¡ ¡ ¡ ¡ (C) Dhruv Batra Slide Credit: Noah Snavely, Steve Seitz, Pedro Felzenschwalb 12

  13. Reading a noun (vs verb) [Rustandi et al., 2005] Slide Credit: Carlos Guestrin 13

  14. Regression x Regression y Continuous (C) Dhruv Batra 14

  15. Stock market (C) Dhruv Batra 15

  16. Weather Prediction Temperature (C) Dhruv Batra Slide Credit: Carlos Guestrin 16

  17. Tasks Supervised Learning x Classification y Discrete x Regression y Continuous Unsupervised Learning x Clustering c Discrete ID Dimensionality x z Continuous Reduction (C) Dhruv Batra 17

  18. Need for Joint Prediction (C) Dhruv Batra 18

  19. Handwriting recognition Character recognition, e.g., kernel SVMs a a a a a c a b c c e b r r r r c r a c c z b

  20. Handwriting recognition 2

  21. Local Ambiguity [Smyth ¡et ¡al., ¡1994] ¡

  22. Local Ambiguity (C) Dhruv Batra 22 slide credit: Fei-Fei Li, Rob Fergus & Antonio Torralba

  23. Joint Prediction x1, x2, … , xn Classification y1, y2, … ,yn Discrete x1, x2, … , xn Regression y1, y2, … ,yn Continuous (C) Dhruv Batra 23

  24. How many parameters? • P(X 1 , X 2 , … , X n ) • Each X i takes k states • What if all X i are independent? (C) Dhruv Batra 24

  25. Probabilistic Graphical Models • One of the most exciting advancements in statistical AI in the last 10-20 years • Marriage – Graph Theory + Probability • Compact representation for exponentially-large probability distributions – Exploit conditional independencies • Generalize – naïve Bayes – logistic regression – Many more … (C) Dhruv Batra 25

  26. Types of PGMs Markov chains Directed Factor Graph HMM Dynamic Bayes nets LDS Graphical Directed Bayesian Models Networks Latent Mixture cluster- variable Discrete models ing models Continuous Influence diagrams dimen- reduct Chain Graphs over- complete Strong repres. JT Decision theory Undirected Graphs Markov network input dependent Factor CRF Clique Graphs Graphs Pairwise Boltz. machine (disc.) Junction Clique tree Gauss. tree Process (C) Dhruv Batra Image Credit: David Barber 26 (cont)

  27. Main Issues in PGMs • Representation – How do we store P(X 1 , X 2 , … , X n ) – What does my model mean/imply/assume? (Semantics) • Inference – How do I answer questions/queries with my model? such as – Marginal Estimation: P(X 5 | X 1 , X 4 ) – Most Probable Explanation: argmax P(X 1 , X 2 , … , X n ) • Learning – How do we learn parameters and structure of P(X 1 , X 2 , … , X n ) from data? – What model is the right for my data? (C) Dhruv Batra 27

  28. Key Ingredient • Exploit independence assumptions – Encoded in the graph structure • Structured Prediction vs Unstructured Prediction (C) Dhruv Batra 28

  29. Application: Evolutionary Biology [Friedman et al.] (C) Dhruv Batra 29

  30. Application: Computer Vision Chain model Interpreting sign (hidden Markov model) language sequences (C) Dhruv Batra Image Credit: Simon JD Prince 30

  31. Application: Speech (C) Dhruv Batra 31

  32. Application: Sensor Network C ¡ B ¡ A ¡ Image Credit: Carlos Guestrin (C) Dhruv Batra & Erik Sudderth 32

  33. Application: Medical Diagnosis (C) Dhruv Batra Image Credit: Erik Sudderth 33

  34. Application: Coding Observed Bits True Bits Parity Constraints (C) Dhruv Batra 34

  35. Application: Protein Folding • Foldit – http://youtu.be/bTlNNFQxs_A?t=175 – http://www.youtube.com/watch?v=lGYJyur4FUA (C) Dhruv Batra 35

  36. Application: Protein Folding • Foldit – http://youtu.be/bTlNNFQxs_A?t=175 – http://www.youtube.com/watch?v=lGYJyur4FUA (C) Dhruv Batra 36

  37. Application: Computer Vision Parsing the human body Tree model (C) Dhruv Batra Image Credit: Simon JD Prince 37

  38. Application: Computer Vision Grid model Semantic Markov random field segmentation (blue nodes) (C) Dhruv Batra Image Credit: Simon JD Prince 38

  39. Application: Computer Vision • Geometric Labelling – [Hoiem et al. IJCV ’07], [Hoiem et al. CVPR ’08], [Saxena PAMI ’08], [Ramalingam et al. CVPR ‘08]. (C) Dhruv Batra 39

  40. Application: Computer Vision • Name-Face Association [Berg et al. CVPR ’04, Phd-Thesis ‘07], [Gallagher et al. CVPR ’08]. – Lisa Mildred Mildred and Lisa Probability of Birth Year 0.07 Mildred 0.06 Lisa Nora 0.05 Peyton Linda Probability 0.04 0.03 0.02 0.01 0 1900 1920 1940 1960 1980 2000 (C) Dhruv Batra 40 Birth Year

  41. Application: Computer Vision • Name-Face Association [Berg et al. CVPR ’04, Phd-Thesis ‘07], [Gallagher et al. CVPR ’08]. – President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters British director Sam Mendes and his partner actress Kate Winslet arrive at the London premiere of ’The Road to Perdition’, September 18, 2002. The films stars Tom Hanks as a Chicago hit man who has a separate family life and co-stars Paul Newman and Jude Law. REUTERS/Dan Chung (C) Dhruv Batra 41

  42. And many many many many many more … (C) Dhruv Batra 42

  43. Course Information • Instructor: Dhruv Batra – dbatra@vt – Office Hours: Fri 1-2pm – Location: 468 Whittemore (C) Dhruv Batra 43

  44. Syllabus • Directed Graphical Models (Bayes Nets) – Representation: Directed Acyclic Graphs (DAGs), Conditional Probability Tables (CPTs), d-Separation, v-structures, Markov Blanket, I-Maps – Parameter Learning: MLE, MAP, EM – Structure Learning: Chow-Liu, Decomposable scores, hill climbing – Inference: Marginals, MAP/MPE, Variable Elimination • Undirected Graphical Models (MRFs/CRFs) – Representation: Junction trees, Factor graphs, treewidth, Local Makov Assumptions, Moralization, Triangulation – Inference: Belief Propagation, Message Passing, Linear Programming Relaxations, Dual-Decomposition, Variational Inference, Mean Field – Parameter Learning: MLE, gradient descent – Structured Prediction: Structured SVMs, Cutting-Plane training • Large-Scale Learning – Online learning: perceptrons, stochastic (sub-)gradients – Distributed Learning: Dual Decomposition, Alternating Direction Method of Multipliers (ADMM) (C) Dhruv Batra 44

Recommend


More recommend