ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics – Bayes Nets – (Finish) Parameter Learning – Structure Learning Readings: KF 18.1, 18.3; Barber 9.5, 10.4 Dhruv Batra Virginia Tech

Administrativia • HW1 – Out – Due in 2 weeks: Feb 17, Feb 19, 11:59pm – Please please please please start early – Implementation: TAN, structure + parameter learning – Please post questions on Scholar Forum. (C) Dhruv Batra 2

Recap of Last Time (C) Dhruv Batra 3

Learning Bayes nets Known structure Unknown structure Fully observable Very easy Hard data Missing data Somewhat easy Very very hard (EM) Data CPTs – x (1) P(X i | Pa Xi ) … x (m) structure parameters (C) Dhruv Batra Slide Credit: Carlos Guestrin 4

Learning the CPTs For each discrete variable X i Data x (1) … P MLE ( X i = a | Pa X i = b ) = Count( X i = a, Pa X i = b ) ˆ x (m) Count(Pa X i = b ) (C) Dhruv Batra Slide Credit: Carlos Guestrin 5

Plan for today • (Finish) BN Parameter Learning – Parameter Sharing – Plate notation • (Start) BN Structure Learning – Log-likelihood score – Decomposability – Information never hurts (C) Dhruv Batra 6

Meta BN • Explicitly showing parameters as variables • Example on board – One variable X; parameter θ X – Two variables X,Y; parameters θ X , θ Y|X (C) Dhruv Batra 7

Global parameter independence • Global parameter independence: Flu Allergy – All CPT parameters are independent – Prior over parameters is product of prior over Sinus CPTs Nose Headache • Proposition : For fully observable data D , if prior satisfies global parameter independence, then

Parameter Sharing • What if X 1 , … , X n are n random variables for coin tosses of the same coin? (C) Dhruv Batra 9

Naïve Bayes vs Bag-of-Words • What’s the difference? • Parameter sharing! (C) Dhruv Batra 10

Text classification • Classify e-mails – Y = {Spam,NotSpam} • What about the features X ? – X i represents i th word in document; i = 1 to doc-length – X i takes values in vocabulary, 10,000 words, etc. (C) Dhruv Batra 11

Bag of Words • Position in document doesn’t matter : P(X i =x i |Y=y) = P(X k =x i |Y=y) – Order of words on the page ignored – Parameter sharing When the lecture is over, remember to wake up the person sitting next to you in the lecture room. (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

Bag of Words • Position in document doesn’t matter : P(X i =x i |Y=y) = P(X k =x i |Y=y) – Order of words on the page ignored – Parameter sharing in is lecture lecture next over person remember room sitting the the the to to up wake when you (C) Dhruv Batra Slide Credit: Carlos Guestrin 13

HMMs semantics: Details X 1 = {a, … z} X 2 = {a, … z} X 3 = {a, … z} X 4 = {a, … z} X 5 = {a, … z} O 1 = O 2 = O 3 = O 4 = O 5 = Just 3 distributions: (C) Dhruv Batra Slide Credit: Carlos Guestrin 14

N-grams • Learnt from Darwin’s On the Origin of Species Bigrams Unigrams _ a b c d e f g h i j k l m n o p q r s t u v w x y z 1 0.16098 _ _ 2 0.06687 a a 3 0.01414 b b 4 0.02938 c c 5 0.03107 d d 6 0.11055 e e 7 0.02325 f f 8 0.01530 g g 9 0.04174 h h 10 0.06233 i i 11 0.00060 j j 12 0.00309 k k 13 0.03515 l l 14 0.02107 m m 15 0.06007 n n 16 0.06066 o o 17 0.01594 p p 18 0.00077 q q 19 0.05265 r r 20 0.05761 s s 21 0.07566 t t 22 0.02149 u u 23 0.00993 v v 24 0.01341 w w 25 0.00208 x x 26 0.01381 y y 27 0.00039 z z (C) Dhruv Batra Image Credit: Kevin Murphy 15

Plate Notation • X 1 , … , X n are n random variables for coin tosses of the same coin • Plate denotes replication (C) Dhruv Batra 16

Plate Notation Y Plates denote X j replication of random variables D (C) Dhruv Batra 17

Hierarchical Bayesian Models • Why stop with a single prior? (C) Dhruv Batra 18

BN: Parameter Learning: What you need to know • Parameter Learning – MLE • Decomposes; results in counting procedure • Will shatter dataset if too many parents – Bayesian Estimation • Conjugate priors • Priors = regularization (also viewed as smoothing) • Hierarchical priors – Plate notation – Shared parameters (C) Dhruv Batra 19

Learning Bayes nets Known structure Unknown structure Fully observable Very easy Hard data Missing data Somewhat easy Very very hard (EM) Data CPTs – x (1) P(X i | Pa Xi ) … x (m) structure parameters (C) Dhruv Batra Slide Credit: Carlos Guestrin 20

Goals of Structure Learning • Prediction – Care about a good structure because presumably it will lead to good predictions • Discovery – I want to understand some system Data CPTs – x (1) P(X i | Pa Xi ) … x (m) structure parameters (C) Dhruv Batra 21

Types of Errors • Truth: Flu Allergy Sinus Nose Headache • Recovered: Flu Flu Allergy Allergy Sinus Sinus Nose Nose Headache Headache (C) Dhruv Batra 22

Learning the structure of a BN Data • Constraint-based approach – Test conditional independencies in data – Find an I-map <x 1 (1) , … ,x n (1) > … • Score-based approach <x 1 (m) , … ,x n (m) > – Finding a structure and parameters is a density Learn structure and estimation task parameters – Evaluate model as we evaluated parameters • Maximum likelihood • Bayesian • etc. Flu Allergy Sinus Nose Headache (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

Score-based approach Possible structures Data Flu Allergy Score structure Learn parameters Sinus -52 Nose Headache <x 1 (1) , … ,x n (1) > … <x 1 (m) , … ,x n (m) > Flu Allergy Score structure Learn parameters Sinus -60 Nose Headache Flu Allergy Score structure Learn parameters Sinus -500 Nose Headache (C) Dhruv Batra Slide Credit: Carlos Guestrin 24

How many graphs? • N vertices. • How many (undirected) graphs? • How many (undirected) trees? (C) Dhruv Batra 25

What’s a good score? • Score(G) = log-likelihood(G : D, θ MLE ) (C) Dhruv Batra 26

Information-theoretic interpretation of Maximum Likelihood Score • Consider two node graph – Derived on board (C) Dhruv Batra 27

Information-theoretic interpretation of Maximum Likelihood Score • For a general graph G Flu Allergy Sinus Nose Headache (C) Dhruv Batra Slide Credit: Carlos Guestrin 28

Information-theoretic interpretation of Maximum Likelihood Score Flu Allergy Sinus Nose Headache • Implications: – Intuitive: higher mutual info à higher score – Decomposes over families in BN (node and it’s parents) – Same score for I-equivalent structures! – Information never hurts! (C) Dhruv Batra 29

Chow-Liu tree learning algorithm 1 • For each pair of variables X i ,X j – Compute empirical distribution: – Compute mutual information: • Define a graph – Nodes X 1 , … ,X n – Edge (i,j) gets weight (C) Dhruv Batra Slide Credit: Carlos Guestrin 30

Chow-Liu tree learning algorithm 2 • Optimal tree BN – Compute maximum weight spanning tree – Directions in BN: pick any node as root, and direct edges away from root • breadth-first-search defines directions (C) Dhruv Batra Slide Credit: Carlos Guestrin 31

Can we extend Chow-Liu? • Tree augmented naïve Bayes (TAN) [Friedman et al. ’ 97] – Naïve Bayes model overcounts, because correlation between features not considered – Same as Chow-Liu, but score edges with: (C) Dhruv Batra Slide Credit: Carlos Guestrin 32

ECE 6504: Advanced Topics in Machine Learning Probabilistic - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets (Finish) Parameter Learning Structure Learning Readings: KF 18.1, 18.3; Barber 9.5, 10.4 Dhruv Batra Virginia

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:]

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J - - Advanced Topics in Advanced Topics in ECE 697J Computer Networks Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

Practical Strategies for Integrating Clinical and Community Asthma Innovation with Sustainable

Mariana Castells, M.D., Ph.D. Associate Professor in Medicine Allergy and Clinical Immunology

Allergy and Hypersensitivity K. J. Goodrum 2005 1 Fig.12.6 Early IL-4 response promotes Th2

XQuery Advanced Topics Alin Deutsch Roadmap Use of XQuery for Web Data Integration

Addressing Health Literacy Issues - A Constant Challenge Ruchi Gupta, MD, MPH Associate

TSWF Geriatrics AIM Form Training May May Aug 2020 Form Version Medically Ready ForceReady

A familial connection between mast cell disorders, EDS and dysautonomia What is serum Tryptase?

Alcon 1Q20 Earnings Presentation May 13, 2020 1 Legal Disclaimers Forward-Looking Statements