Statistical Learning (part II) October 28, 2008 CS 486/686 - PowerPoint PPT Presentation

Statistical Learning (part II) October 28, 2008 CS 486/686 University of Waterloo

Incomplete data • So far… – Values of all attributes are known – Learning is relatively easy • But many real-world problems have hidden variables (a.k.a latent variables) – Incomplete data – Values of some attributes missing 3 CS486/686 Lecture Slides (c) 2008 P. Poupart

Unsupervised Learning • Incomplete data � unsupervised learning • Examples: – Categorisation of stars by astronomers – Categorisation of species by anthropologists – Market segmentation for marketing – Pattern identification for fraud detection – Research in general! 4 CS486/686 Lecture Slides (c) 2008 P. Poupart

Maximum Likelihood Learning • ML learning of Bayes net parameters: – For θ V=true,pa(V)= v = Pr(V=true|par(V) = v ) – θ V=true,pa(V)= v = #[V=true,pa(V)= v ] #[V=true,pa(V)= v ] + #[V=false,pa(V)= v ] – Assumes all attributes have values… • What if values of some attributes are missing? 5 CS486/686 Lecture Slides (c) 2008 P. Poupart

“Naive” solutions for incomplete data • Solution #1: Ignore records with missing values – But what if all records are missing values (i.e., when a variable is hidden, none of the records have any value for that variable) • Solution #2: Ignore hidden variables – Model may become significantly more complex! 6 CS486/686 Lecture Slides (c) 2008 P. Poupart

Heart disease example 2 2 2 2 2 2 Smoking Diet Exercise Smoking Diet Exercise 54 HeartDisease 6 6 6 54 162 486 Symptom 1 Symptom 2 Symptom 3 Symptom 1 Symptom 2 Symptom 3 (b) (a) • a) simpler (i.e., fewer CPT parameters) • b) complex (i.e., lots of CPT parameters) 7 CS486/686 Lecture Slides (c) 2008 P. Poupart

“Direct” maximum likelihood • Solution 3: maximize likelihood directly – Let Z be hidden and E observable – h ML = argmax h P( e |h) = argmax h Σ Z P( e , Z |h) = argmax h Σ Z Π i CPT(V i ) = argmax h log Σ Z Π i CPT(V i ) – Problem: can’t push log past sum to linearize product 8 CS486/686 Lecture Slides (c) 2008 P. Poupart

Expectation-Maximization (EM) • Solution #4: EM algorithm – Intuition: if we knew the missing values, computing h ML would be trival • Guess h ML • Iterate – Expectation: based on h ML , compute expectation of the missing values – Maximization: based on expected missing values, compute new estimate of h ML 9 CS486/686 Lecture Slides (c) 2008 P. Poupart

Expectation-Maximization (EM) • More formally: – Approximate maximum likelihood – Iteratively compute: h i+1 = argmax h Σ Z P( Z |h i , e ) log P( e , Z |h) Expectation Maximization 10 CS486/686 Lecture Slides (c) 2008 P. Poupart

Expectation-Maximization (EM) • Log inside sum can linearize product – h i+1 = argmax h Σ Z P( Z |h i , e ) log P( e , Z |h) = argmax h Σ Z P( Z |h i , e ) log Π j CPT j = argmax h Σ Z P( Z |h i , e ) Σ j log CPT j • Monotonic improvement of likelihood – P( e |h i+1 ) ≥ P( e |h i ) 12 CS486/686 Lecture Slides (c) 2008 P. Poupart

Candy Example • Suppose you buy two bags of candies of unknown type (e.g. flavour ratios) • You plan to eat sufficiently many candies of each bag to learn their type • Ignoring your plan, your roommate mixes both bags… • How can you learn the type of each bag despite being mixed? 13 CS486/686 Lecture Slides (c) 2008 P. Poupart

Unsupervised Clustering • “Class” variable is hidden • Naïve Bayes model P ( 1) Bag= Bag C P ( F=cherry | B ) Bag 1 F 1 2 F 2 Flavor Wrapper Holes X (a) (b) 15 CS486/686 Lecture Slides (c) 2008 P. Poupart

Candy Example • Unknown Parameters: – θ i = P(Bag=i) – θ Fi = P(Flavour=cherry|Bag=i) – θ Wi = P(Wrapper=red|Bag=i) – θ Hi = P(Hole=yes|Bag=i) • When eating a candy: – F, W and H are observable – B is hidden 16 CS486/686 Lecture Slides (c) 2008 P. Poupart

Candy Example • Let true parameters be: – θ =0.5, θ F1 = θ W1 = θ H1 =0.8, θ F2 = θ W2 = θ H2 =0.3 • After eating 1000 candies: W=red W=green H=1 H=0 H=1 H=0 F=cherry 273 93 104 90 F=lime 79 100 94 167 17 CS486/686 Lecture Slides (c) 2008 P. Poupart

Candy Example • EM algorithm • Guess h 0 : – θ =0.6, θ F1 = θ W1 = θ H1 =0.6, θ F2 = θ W2 = θ H2 =0.4 • Alternate: – Expectation: expected # of candies in each bag – Maximization: new parameter estimates 18 CS486/686 Lecture Slides (c) 2008 P. Poupart

Candy Example • Expectation: expected # of candies in each bag – #[Bag=i] = Σ j P(B=i|f j ,w j ,h j ) – Compute P(B=i|f j ,w j ,h j ) by variable elimination (or any other inference alg.) • Example: – #[Bag=1] = 612 – #[Bag=2] = 388 19 CS486/686 Lecture Slides (c) 2008 P. Poupart

Candy Example • Expectation: expected # of cherry candies in each bag – #[B=i,F=cherry] = Σ j P(B=i|f j =cherry,w j ,h j ) – Compute P(B=i|f j =cherry,w j ,h j ) by variable elimination (or any other inference alg.) • Maximization: – θ F 1 = #[B=1,F=cherry] / #[B=1] = 0.668 – θ F 2 = #[B=2,F=cherry] / #[B=2] = 0.389 21 CS486/686 Lecture Slides (c) 2008 P. Poupart

Bayesian networks • EM algorithm for general Bayes nets • Expectation: – #[V i =v ij ,Pa(V i )=pa ik ] = expected frequency • Maximization: – θ vij,paik = #[V i =v ij ,Pa(V i )=pa ik ] / #[Pa(V i )=pa ik ] 23 CS486/686 Lecture Slides (c) 2008 P. Poupart

Statistical Learning (part II) October 28, 2008 CS 486/686 - PowerPoint PPT Presentation

Statistical Learning (part II) October 28, 2008 CS 486/686 University of Waterloo Outline Learning from incomplete Data EM algorithm Reading: R&N Ch 20.3 2 CS486/686 Lecture Slides (c) 2008 P. Poupart Incomplete data

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

COMPLETE STATISTICAL THEORY OF LEARNING LEARNING USING STATISTICAL INVARIANTS Vladimir Vapnik

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

Statistical presentation Statistical presentation Statistical tabulations by age, sex and 3 digit

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62

Chapter 26: Tests of Significance Procedure: State the null and alternative in words and in

Hands-On: Diet Puzzle What is the secret of your long life? a man was asked. Build

I10 - Multiple comparisons STAT 401 (Engineering) - Iowa State University March 2, 2018

MONGODB A NoSQL, document-oriented database DATABASES organized collections of data Database

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase

14/09/2016 Department of Large Animal Sciences Linear programming head points and case Katarina

1 Just like this time last year - My social media links have been bombarded. Stop doing

Statistical Learning (part II) October 28, 2008 CS 486/686 - PowerPoint PPT Presentation

Statistical Learning (part II) October 28, 2008 CS 486/686 University of Waterloo Outline Learning from incomplete Data EM algorithm Reading: R&N Ch 20.3 2 CS486/686 Lecture Slides (c) 2008 P. Poupart Incomplete data

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

COMPLETE STATISTICAL THEORY OF LEARNING LEARNING USING STATISTICAL INVARIANTS Vladimir Vapnik

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

Statistical presentation Statistical presentation Statistical tabulations by age, sex and 3 digit

EFTA Statistical Cooperation &amp; the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation &amp; the European Statistical System EEA Seminar EEA Seminar

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62

Chapter 26: Tests of Significance Procedure: State the null and alternative in words and in

Hands-On: Diet Puzzle What is the secret of your long life? a man was asked. Build

I10 - Multiple comparisons STAT 401 (Engineering) - Iowa State University March 2, 2018

MONGODB A NoSQL, document-oriented database DATABASES organized collections of data Database

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase

14/09/2016 Department of Large Animal Sciences Linear programming head points and case Katarina

1 Just like this time last year - My social media links have been bombarded. Stop doing

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar