CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec - PowerPoint PPT Presentation

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

Statistical Learning • View: we have uncertain knowledge of the world • Idea: learning simply reduces this uncertainty University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

Terminology • Probability distribution: – A specification of a probability for each event in our sample space – Probabilities must sum to 1 • Assume the world is described by two (or more) random variables – Joint probability distribution • Specification of probabilities for all combinations of events University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

Joint distribution • Given two random variables ! and " : • Joint distribution: Pr(! = ' Λ " = )) for all ', ) • Marginalisation (sumout rule): Pr(! = ') = Σ ) Pr(! = ' Λ " = )) Pr(" = )) = Σ ' Pr(! = ' Λ " = )) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

Example: Joint Distribution sunny ~sunny cold ~cold cold ~cold headache 0.072 0.008 headache 0.108 0.012 ~headache 0.144 0.576 ~headache 0.016 0.064 P(headache Λ sunny Λ cold) = P(~headache Λ sunny Λ ~cold) = P(headacheVsunny) = P(headache) = marginalization University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

Conditional Probability • Pr($|&) : fraction of worlds in which & is true that also have $ true H=“Have headache” F=“Have Flu” F Pr(() = 1/10 Pr(-) = 1/40 Pr((|-) = 1/2 H Headaches are rare and flu is rarer, but if you have the flu, then there is a 50-50 chance you will have a headache University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

Conditional Probability F Pr($|*) = Fraction of flu inflicted worlds in which you have a headache H =(# worlds with flu and headache)/ (# worlds with flu) = (Area of “H and F” region)/ H=“Have headache” (Area of “F” region) F=“Have Flu” = Pr($ Λ *)/ Pr(*) Pr($) = 1/10 Pr(*) = 1/40 Pr($|*) = 1/2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

Conditional Probability • Definition: Pr($|&) = Pr($ Λ &) / Pr(&) • Chain rule: Pr($ Λ &) = Pr($|&) Pr(&) Memorize these! University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Inference F One day you wake up with a headache. You think “Drat! 50% of flues are associated with headaches so I must have a 50- H 50 chance of coming down with the flu” H=“Have headache” Is your reasoning correct? F=“Have Flu” Pr(*Λ$) = Pr($) = 1/10 Pr(*) = 1/40 Pr * $ = Pr($|*) = 1/2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

Example: Joint Distribution sunny ~sunny cold ~cold cold ~cold headache 0.072 0.008 headache 0.108 0.012 ~headache 0.144 0.576 ~headache 0.016 0.064 Pr(ℎ%&'&(ℎ% Λ (*+' | -.//0) = Pr(ℎ%&'&(ℎ% Λ (*+' | ~-.//0) = University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

Bayes Rule • Note Pr($|&)Pr(&) = Pr($Λ&) = Pr(&Λ$) = Pr(&|$)*+($) • Bayes Rule Pr(&|$) = [(Pr($|&)Pr(&)]/Pr($) Memorize this! University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

Using Bayes Rule for inference • Often we want to form a hypothesis about the world based on what we have observed • Bayes rule is vitally important when viewed in terms of stating the belief given to hypothesis H, given evidence e Prior probability Likelihood Posterior probability Normalizing constant University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

Bayesian Learning • Prior: Pr($) • Likelihood: Pr(&|$) • Evidence: ( = < & 1 , & 2 , … , & / > • Bayesian Learning amounts to computing the posterior using Bayes’ Theorem: Pr($|() = 1 Pr((|$)Pr($) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

Bayesian Prediction • Suppose we want to make a prediction about an unknown quantity X • Pr($|&) = Σ * Pr($|&, ℎ - ).(ℎ * |&) = Σ * Pr($|ℎ - ).(ℎ * |&) • Predictions are weighted averages of the predictions of the individual hypotheses • Hypotheses serve as “intermediaries” between raw data and prediction University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

Candy Example • Favorite candy sold in two flavors: – Lime (hugh) – Cherry (yum) • Same wrapper for both flavors • Sold in bags with different ratios: – 100% cherry – 75% cherry + 25% lime – 50% cherry + 50% lime – 25% cherry + 75% lime – 100% lime University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

Candy Example • You bought a bag of candy but don’t know its flavor ratio • After eating ! candies: – What’s the flavor ratio of the bag? – What will be the flavor of the next candy? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

Statistical Learning • Hypothesis H: probabilistic theory of the world – ℎ 1 : 100% cherry – ℎ 2 : 75% cherry + 25% lime – ℎ 3 : 50% cherry + 50% lime – ℎ 4 : 25% cherry + 75% lime – ℎ 5 : 100% lime • Examples E: evidence about the world – ' 1 : 1 st candy is cherry – ' 2 : 2 nd candy is lime – ' 3 : 3 rd candy is lime – … University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

Candy Example • Assume prior Pr($) = < 0.1, 0.2, 0.4, 0.2, 0.1 > • Assume candies are i.i.d. (identically and independently distributed) Pr(/|ℎ) = P 2 3(4 2 |ℎ) • Suppose first 10 candies all taste lime: Pr(/|ℎ 5 ) = Pr(/|ℎ 3 ) = Pr(/|ℎ 1 ) = University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

Posterior University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19

Prediction Probability that next candy is lime University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20

Bayesian Learning • Bayesian learning properties: – Optimal (i.e. given prior, no other prediction is correct more often than the Bayesian one) – No overfitting (all hypotheses considered and weighted) • There is a price to pay: – When hypothesis space is large, Bayesian learning may be intractable – i.e. sum (or integral) over hypothesis often intractable • Solution: approximate Bayesian learning University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21

Maximum a posteriori (MAP) • Idea: make prediction based on most probable hypothesis ℎ "#$ ℎ "#$ = &'()&* ℎ+ Pr(ℎ + |0) Pr(2|0) » Pr(2|ℎ 345 ) • In contrast, Bayesian learning makes prediction based on all hypotheses weighted by their probability University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22

MAP properties • MAP prediction less accurate than Bayesian prediction since it relies only on one hypothesis ℎ "#$ • But MAP and Bayesian predictions converge as data increases • Controlled overfitting (prior can be used to penalize complex hypotheses) • Finding ℎ "#$ may be intractable: – ℎ "#$ = &'()&* + Pr(ℎ|0) – Optimization may be difficult University of Waterloo CS480/680 Spring 2019 Pascal Poupart 23

Maximum Likelihood (ML) • Idea: simplify MAP by assuming uniform prior (i.e., Pr(ℎ % ) = Pr(ℎ ( ) " ), ( ) ℎ +,- = ./01.2 ℎ Pr(ℎ) Pr(3|ℎ) ℎ +5 = ./01.2 ℎ Pr(3|ℎ) • Make prediction based on ℎ +5 only: Pr(6|3) » Pr(6|ℎ 78 ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 24

ML properties • ML prediction less accurate than Bayesian and MAP predictions since it ignores prior info and relies only on one hypothesis ℎ "# • But ML, MAP and Bayesian predictions converge as data increases • Subject to overfitting (no prior to penalize complex hypothesis that could exploit statistically insignificant data patterns) • Finding ℎ "# is often easier than ℎ "$% ℎ "# = '()*'+ ℎ Σ - log Pr(4 - |ℎ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 25

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec - PowerPoint PPT Presentation

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Statistical Learning View: we have uncertain knowledge of the world Idea:

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

Lectures 5&6: analyze idiom structure marks (geoms) Points Lines Areas

Who Knows: The Problem of Dualism in Academic Advising Kurt Xyst Undergraduate Academic A ff airs

Status of the Kernel Self Protection Project Linux Security Summit 2016 Aug 25-26, Toronto Kees

Discrete Mathematics & Mathematical Reasoning Chapter 7 (continued): Markov and

Bootstrapping A Statistical Speech Translator From A Rule-Based One Manny Rayner, Paula Estrella

My ranting about mobile measurement Lin Zhong http://www.recg.org Managing participants was

Expanding Query Answers on Medical Knowledge Bases Chuan Lei Vasilis Efthymiou Rebecca Geis

Measured Impact Webinar Follow-up Case Discussion Mrs. K. September 20, 2017 National Capacity

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec - PowerPoint PPT Presentation

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Statistical Learning View: we have uncertain knowledge of the world Idea:

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

Lectures 5&amp;6: analyze idiom structure marks (geoms) Points Lines Areas

Who Knows: The Problem of Dualism in Academic Advising Kurt Xyst Undergraduate Academic A ff airs

Status of the Kernel Self Protection Project Linux Security Summit 2016 Aug 25-26, Toronto Kees

Discrete Mathematics &amp; Mathematical Reasoning Chapter 7 (continued): Markov and

Bootstrapping A Statistical Speech Translator From A Rule-Based One Manny Rayner, Paula Estrella

My ranting about mobile measurement Lin Zhong http://www.recg.org Managing participants was

Expanding Query Answers on Medical Knowledge Bases Chuan Lei Vasilis Efthymiou Rebecca Geis

Measured Impact Webinar Follow-up Case Discussion Mrs. K. September 20, 2017 National Capacity

Lectures 5&6: analyze idiom structure marks (geoms) Points Lines Areas

Discrete Mathematics & Mathematical Reasoning Chapter 7 (continued): Markov and