ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Dhruv Batra Virginia Tech

What is this class about? Some of the most exciting developments in Machine Learning, AI, Statistics & related fields in the last 3 decades (C) Dhruv Batra 2

First Caveat • This is an ADVANCED Machine Learning class – This should not be your first introduction to ML – You will need a formal class; not just self-reading/coursera – If you took ECE 4984/5984, you’re in the right place – If you took ECE 5524 or equivalent, see list of topics taught in ECE 4984/5984. (C) Dhruv Batra 3

Topics Covered in Intro to ML&P • Basics of Statistical Learning • Loss function, MLE, MAP, Bayesian estimation, bias-variance tradeoff, overfitting, regularization, cross-validation • Supervised Learning • Naïve Bayes, Logistic Regression, Nearest Neighbour, Neural Networks, Support Vector Machines, Kernels • Ensemble Methods: Bagging, Boosting • Unsupervised Learning • Clustering: k-means, Gaussian mixture models, EM • Dimensionality reduction: PCA, SVD, LDA • Perception • Applications to Vision, Natural Language Processing (C) Dhruv Batra 4

What is this class about? • Making global predictions from local observations • Learning such models from large quantities of data (C) Dhruv Batra 5

Exciting Developments • Probabilistic Graphical Models – Directed: Bayesian Networks (Bayes Nets) – Undirected: Markov/Conditional Random Fields – Structured Prediction • Large-Scale Learning – Online learning – Distributed learning • Deep Learning – Convolutional Nets Not covered in this class – Distributed backprop – Dropout (C) Dhruv Batra 6

What is Machine Learning? • What is learning? • [Kevin Murphy] algorithms that – automatically detect patterns in data – use the uncovered patterns to predict future data or other outcomes of interest • [Tom Mitchell] algorithms that – improve their performance (P) – at some task (T) – with experience (E) (C) Dhruv Batra 7

Tasks Supervised Learning x Classification y Discrete x Regression y Continuous Unsupervised Learning x Clustering c Discrete ID Dimensionality x z Continuous Reduction (C) Dhruv Batra 8

Classification x Classification y Discrete (C) Dhruv Batra 9

Speech Recognition (C) Dhruv Batra Slide Credit: Carlos Guestrin 10

Machine Translation (C) Dhruv Batra Figure Credit: Kevin Gimpel 11

Object/Face ¡detec,on ¡ • Many ¡new ¡digital ¡cameras ¡now ¡detect ¡faces ¡ – Canon, ¡Sony, ¡Fuji, ¡… ¡ ¡ ¡ ¡ ¡ ¡ ¡ (C) Dhruv Batra Slide Credit: Noah Snavely, Steve Seitz, Pedro Felzenschwalb 12

Reading a noun (vs verb) [Rustandi et al., 2005] Slide Credit: Carlos Guestrin 13

Regression x Regression y Continuous (C) Dhruv Batra 14

Stock market (C) Dhruv Batra 15

Weather Prediction Temperature (C) Dhruv Batra Slide Credit: Carlos Guestrin 16

Tasks Supervised Learning x Classification y Discrete x Regression y Continuous Unsupervised Learning x Clustering c Discrete ID Dimensionality x z Continuous Reduction (C) Dhruv Batra 17

Need for Joint Prediction (C) Dhruv Batra 18

Handwriting recognition Character recognition, e.g., kernel SVMs a a a a a c a b c c e b r r r r c r a c c z b

Handwriting recognition 2

Local Ambiguity [Smyth ¡et ¡al., ¡1994] ¡

Local Ambiguity (C) Dhruv Batra 22 slide credit: Fei-Fei Li, Rob Fergus & Antonio Torralba

Joint Prediction x1, x2, … , xn Classification y1, y2, … ,yn Discrete x1, x2, … , xn Regression y1, y2, … ,yn Continuous (C) Dhruv Batra 23

How many parameters? • P(X 1 , X 2 , … , X n ) • Each X i takes k states • What if all X i are independent? (C) Dhruv Batra 24

Probabilistic Graphical Models • One of the most exciting advancements in statistical AI in the last 10-20 years • Marriage – Graph Theory + Probability • Compact representation for exponentially-large probability distributions – Exploit conditional independencies • Generalize – naïve Bayes – logistic regression – Many more … (C) Dhruv Batra 25

Types of PGMs Markov chains Directed Factor Graph HMM Dynamic Bayes nets LDS Graphical Directed Bayesian Models Networks Latent Mixture cluster- variable Discrete models ing models Continuous Influence diagrams dimen- reduct Chain Graphs over- complete Strong repres. JT Decision theory Undirected Graphs Markov network input dependent Factor CRF Clique Graphs Graphs Pairwise Boltz. machine (disc.) Junction Clique tree Gauss. tree Process (C) Dhruv Batra Image Credit: David Barber 26 (cont)

Main Issues in PGMs • Representation – How do we store P(X 1 , X 2 , … , X n ) – What does my model mean/imply/assume? (Semantics) • Inference – How do I answer questions/queries with my model? such as – Marginal Estimation: P(X 5 | X 1 , X 4 ) – Most Probable Explanation: argmax P(X 1 , X 2 , … , X n ) • Learning – How do we learn parameters and structure of P(X 1 , X 2 , … , X n ) from data? – What model is the right for my data? (C) Dhruv Batra 27

Key Ingredient • Exploit independence assumptions – Encoded in the graph structure • Structured Prediction vs Unstructured Prediction (C) Dhruv Batra 28

Application: Evolutionary Biology [Friedman et al.] (C) Dhruv Batra 29

Application: Computer Vision Chain model Interpreting sign (hidden Markov model) language sequences (C) Dhruv Batra Image Credit: Simon JD Prince 30

Application: Speech (C) Dhruv Batra 31

Application: Sensor Network C ¡ B ¡ A ¡ Image Credit: Carlos Guestrin (C) Dhruv Batra & Erik Sudderth 32

Application: Medical Diagnosis (C) Dhruv Batra Image Credit: Erik Sudderth 33

Application: Coding Observed Bits True Bits Parity Constraints (C) Dhruv Batra 34

Application: Protein Folding • Foldit – http://youtu.be/bTlNNFQxs_A?t=175 – http://www.youtube.com/watch?v=lGYJyur4FUA (C) Dhruv Batra 35

Application: Protein Folding • Foldit – http://youtu.be/bTlNNFQxs_A?t=175 – http://www.youtube.com/watch?v=lGYJyur4FUA (C) Dhruv Batra 36

Application: Computer Vision Parsing the human body Tree model (C) Dhruv Batra Image Credit: Simon JD Prince 37

Application: Computer Vision Grid model Semantic Markov random field segmentation (blue nodes) (C) Dhruv Batra Image Credit: Simon JD Prince 38

Application: Computer Vision • Geometric Labelling – [Hoiem et al. IJCV ’07], [Hoiem et al. CVPR ’08], [Saxena PAMI ’08], [Ramalingam et al. CVPR ‘08]. (C) Dhruv Batra 39

Application: Computer Vision • Name-Face Association [Berg et al. CVPR ’04, Phd-Thesis ‘07], [Gallagher et al. CVPR ’08]. – Lisa Mildred Mildred and Lisa Probability of Birth Year 0.07 Mildred 0.06 Lisa Nora 0.05 Peyton Linda Probability 0.04 0.03 0.02 0.01 0 1900 1920 1940 1960 1980 2000 (C) Dhruv Batra 40 Birth Year

Application: Computer Vision • Name-Face Association [Berg et al. CVPR ’04, Phd-Thesis ‘07], [Gallagher et al. CVPR ’08]. – President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters British director Sam Mendes and his partner actress Kate Winslet arrive at the London premiere of ’The Road to Perdition’, September 18, 2002. The films stars Tom Hanks as a Chicago hit man who has a separate family life and co-stars Paul Newman and Jude Law. REUTERS/Dan Chung (C) Dhruv Batra 41

And many many many many many more … (C) Dhruv Batra 42

Course Information • Instructor: Dhruv Batra – dbatra@vt – Office Hours: Fri 1-2pm – Location: 468 Whittemore (C) Dhruv Batra 43

Syllabus • Directed Graphical Models (Bayes Nets) – Representation: Directed Acyclic Graphs (DAGs), Conditional Probability Tables (CPTs), d-Separation, v-structures, Markov Blanket, I-Maps – Parameter Learning: MLE, MAP, EM – Structure Learning: Chow-Liu, Decomposable scores, hill climbing – Inference: Marginals, MAP/MPE, Variable Elimination • Undirected Graphical Models (MRFs/CRFs) – Representation: Junction trees, Factor graphs, treewidth, Local Makov Assumptions, Moralization, Triangulation – Inference: Belief Propagation, Message Passing, Linear Programming Relaxations, Dual-Decomposition, Variational Inference, Mean Field – Parameter Learning: MLE, gradient descent – Structured Prediction: Structured SVMs, Cutting-Plane training • Large-Scale Learning – Online learning: perceptrons, stochastic (sub-)gradients – Distributed Learning: Dual Decomposition, Alternating Direction Method of Multipliers (ADMM) (C) Dhruv Batra 44

ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Dhruv Batra Virginia Tech What is this class about? Some of the most exciting developments in Machine Learning, AI, Statistics &

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:]

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J - - Advanced Topics in Advanced Topics in ECE 697J Computer Networks Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

RNA Search and Whirlwind tour of ncRNA search & discovery Motif Discovery RNA motif

Modern Information Retrieval Boolean information retrieval and document preprocessing 1 Hamid

Paul: Surprises along David B. Capes Houston Graduate the Way School of Theology Rediscovering

R e c e n t P r o g r e s s i n S i l i c o n M o d u l e s a n d

0. Model problem: diffusion in a 1. Dynamics on the moduli space periodic billiard 2.

Designing Pop-Up Cards Carleton Algorithms Seminar Zachary Abel, Erik D. Demaine, Martin L.

Entire functions arising from trees Weiwei Cui Mathematisches Seminar, CAU Kiel Topics in

Introduction to Global Optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen

ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Dhruv Batra Virginia Tech What is this class about? Some of the most exciting developments in Machine Learning, AI, Statistics &

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:]

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J - - Advanced Topics in Advanced Topics in ECE 697J Computer Networks Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

RNA Search and Whirlwind tour of ncRNA search &amp; discovery Motif Discovery RNA motif

Modern Information Retrieval Boolean information retrieval and document preprocessing 1 Hamid

Paul: Surprises along David B. Capes Houston Graduate the Way School of Theology Rediscovering

R e c e n t P r o g r e s s i n S i l i c o n M o d u l e s a n d

0. Model problem: diffusion in a 1. Dynamics on the moduli space periodic billiard 2.

Designing Pop-Up Cards Carleton Algorithms Seminar Zachary Abel, Erik D. Demaine, Martin L.

Entire functions arising from trees Weiwei Cui Mathematisches Seminar, CAU Kiel Topics in

Introduction to Global Optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen

RNA Search and Whirlwind tour of ncRNA search & discovery Motif Discovery RNA motif