Lecture 7: Maximum Likelihood Estimation (MLE) Maximum a Posteriori - PowerPoint PPT Presentation

Lecture 7: − Maximum Likelihood Estimation (MLE) − Maximum a Posteriori (MAP) Aykut Erdem October 2016 Hacettepe University

Administrative • Assignment 2 will be out on Thursday − It is due November 10 (i.e. in 2 weeks) − You will implement Naive Bayes Classifier for sentiment analysis   • on movie reviews 2

Administrative • Project proposal due October 31 • A half page description − problem to be investigated, − why it is interesting, − what data you will use, − related work. 3

Today • Probabilities - Dependence, Independence, Conditional Independence   • Parameter estimation - Maximum Likelihood Estimation (MLE) - Maximum a Posteriori (MAP) 4

Last time… Sample space Def : A sample space Ω is the set of all � possible outcomes of a (conceptual or physical) random experiment. ( Ω can be finite or infinite.) � Examples: • Ω may be the set of all possible outcomes of a � � dice roll (1,2,3,4,5,6)   • Pages of a book opened randomly. (1-157)   slide by Barnabás Póczos & Alex Smola • Real numbers for temperature, location, time, etc 5

Last time… Events We will ask the question: What is the probability of a particular event? Def: Event A is a subset of the sample space Ω Examples: What is the probability of - the book is open at an odd number slide by Barnabás Póczos & Alex Smola - rolling a dice the number <4 - a random person’s height X : a<X<b 6

Last time… Probability Def: Probability P(A), the probability that event (subset) A happens , is a function that maps the event A onto the interval [0, 1]. P(A) is also called the probability measure of A. outcomes in which A is false sample space � 1,3,5,6 outcomes in which A is slide by Barnabás Póczos & Alex Smola true 2,4 Example: Example: What is the probability that What is the probability that the P(A) is the volume of the area. the number on the dice is 2 or 4? number on the dice is 2 or 4? 10 7

Last time… Kolmogorov Axioms Consequences: slide by Barnabás Póczos & Alex Smola 8

Last time… Venn Diagram B A slide by Barnabás Póczos & Alex Smola �� P ( A U B ) = P ( A ) + P ( B ) - P ( A � B ) 9

Last time… Random Variables Def: Real valued random variable is a function of the outcome of a randomized experiment Examples: Discrete random variable examples ( � is discrete): • X( � ) = True if a randomly drawn person ( � ) from our • slide by Barnabás Póczos & Alex Smola class ( � ) is female X( � ) = The hometown X( � ) of a randomly drawn person • ( � ) from our class ( � ) 10

Last time… Discrete Distributions • Bernoulli distribution: Ber( p ) • Binomial distribution: Bin(n,p) Suppose a coin with head prob. p is tossed n times. What is the probability of getting k heads and n-k tails? slide by Barnabás Póczos & Alex Smola 17 11

Last time… Conditional Probability P(X|Y) = Fraction of worlds in which X event is true given Y event is true. No Flu Flu Headache 1/80 7/80 Y slide by Barnabás Póczos & Alex Smola X � Y X 1/80 71/80 No Headache 28 15

Last time… Conditional Probability P(X|Y) = Fraction of worlds in which X event is true given Y event is true. No Flu Flu Headache 1/80 7/80 Y slide by Barnabás Póczos & Alex Smola X � Y X 1/80 71/80 No Headache 28 16

Independence Independent random variables: Y and X don’t contain information about each other. Observing Y doesn’t help predicting X. Observing X doesn’t help predicting Y. Examples: slide by Barnabás Póczos & Alex Smola Independent: Winning on roulette this week and next week. Dependent: Russian roulette 17

Dependent / Independent Y Y slide by Barnabás Póczos & Alex Smola X X Independent X,Y Dependent X,Y 18

Conditionally Independent Conditionally independent : Knowing Z makes X and Y independent Examples: Dependent: shoe size of children and reading skills Conditionally independent: shoe size of children and reading skills given age slide by Barnabás Póczos & Alex Smola Stork deliver babies:   Highly statistically significant correlation   exists between stork populations and   human birth rates across Europe. 7 19

Conditionally Independent • London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally, another study pointed out that people wear slide by Barnabás Póczos & Alex Smola coats when it rains… 20

Correlation ≠ Causation Number people who drowned by falling into a swimming-pool correlates with slide by Barnabás Póczos & Alex Smola Number of films Nicolas Cage appeared in Correlation: 0.666004 21 http://www.tylervigen.com

Conditional Independence Formally: X is conditionally independent of Y given Z Equivalent to: slide by Barnabás Póczos & Alex Smola Note: does NOT mean Thunder is independent of Rain But given Lightning knowing Rain doesn’t give more info about Thunder 22

Conditional vs. Marginal Independence • C calls A and B separately and tells them a number n ∈ {1,...,10} • Due to noise in the phone, A and B each imperfectly (and independently) draw a conclusion about what the number was. • A thinks the number was n a and B thinks it was n b . n b . • Are n a and n b marginally independent? n a n b - No,we expect e.g. P(n a =1|n b =1)>P(n a =1) = 1) • Are n a and n b conditionally independent given n? ? n - Yes, because if we know the true number, the outcomes n a and n b slide by Barnabás Póczos & Alex Smola are purely determined by the noise in each phone.     P(n a =1|n b =1,n=2)=P(n a =1|n=2) 25

Parameter estimation: MLE, MAP Estimating Probabilities slide by Barnabás Póczos & Alex Smola 26

Flipping a Coin I have a coin, if I flip it, what’s the probability that it will fall with the head up? Let us flip it a few times to estimate the probability: slide by Barnabás Póczos & Alex Smola “Frequency of heads” The estimated probability is: 3/5 27

Flipping a Coin 3/5 “Frequency of heads” The estimated probability is: Questions: (1) Why frequency of heads??? (2) How good is this estimation??? slide by Barnabás Póczos & Alex Smola (3) Why is this a machine learning problem??? We are going to answer these questions 31

Question (1) Why frequency of heads???   • Frequency of heads is exactly the   maximum likelihood estimator for this problem   • MLE has nice properties   (interpretation, statistical guarantees, simple) slide by Barnabás Póczos & Alex Smola 32

33 Maximum Likelihood Estimation slide by Barnabás Póczos & Alex Smola

MLE for Bernoulli distribution Data, D = P(Heads) = θ , P(Tails) = 1- θ Flips are i.i.d. : – Independent events slide by Barnabás Póczos & Alex Smola Identically distributed according to Bernoulli distribution – MLE: Choose θ that maximizes the probability of observed data 34

Lecture 7: Maximum Likelihood Estimation (MLE) Maximum a Posteriori - PowerPoint PPT Presentation

Lecture 7: Maximum Likelihood Estimation (MLE) Maximum a Posteriori (MAP) Aykut Erdem October 2016 Hacettepe University Administrative Assignment 2 will be out on Thursday It is due November 10 (i.e. in 2 weeks) You will

Lecture # 5 - Monday, Aug 30th In this lecture I reviewed the previous lecture 4, and then

Algorithms (2IL15) Lecture 13 Wrap-up lecture 1 TU/e Algorithms (2IL15) Lecture 13

In 2020SP, this lecture and lecture 20 are both optional extra material CS 5412/LECTURE 17 Ken

Recall last lecture ... Lecture 8 Also last lecture: Painter's Algorithm More Hidden Surface

Plan Lecture 1 - String diagrams and symmetric monoidal categories Lecture 2 -

Where are we at - Topic overview Lecture 1A: Security requirements/features Lecture 7A

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Usability of Programming Languages Lecture 4 - directed by your research interests Lecture

Introduction to AI & Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture

Introduction to Numerical Optimization Biostatistics 615/815 Lecture 14 Lecture 14 Course is

Lecture 12: Clustering 1 6.0002 LECTURE 12 Re Reading Chapter 23 6.0002 LECTURE 12 2 Mach Ma

Lecture Outline Regeltechniek Previous lecture: Stability and transient response. Lecture 4

Psycholinguistics Lecture 2 By Dr.Chelli Lecture Objectives At the end of this lecture, students

Methodology for Lecture Methodology for Lecture Computer Graphics (Spring 2008) Computer

Lecture Outline Regeltechniek Previous lecture: Nyquist plot and stability criterion. Lecture 11

CSE Fall 2014 311 Lecture 1 Lecture 1 Lecture 1: Propositional Logic Lecture 1 Foundations

Proteomics Steven Meinhardt Lectures Lecture 1 Introduction review of proteins

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Lecture 1: Neurons Lecture 2: Coding with spikes Lecture 3: Tuning curves and receptive fields

Algorithms (2IL15) Lecture 10 NP-Completeness, II 1 TU/e Algorithms (2IL15) Lecture 10

Lecture 1: Bioinformatic Algorithms In this lecture Logistics of the course

Lecture 7: Maximum Likelihood Estimation (MLE) Maximum a Posteriori - PowerPoint PPT Presentation

Lecture 7: Maximum Likelihood Estimation (MLE) Maximum a Posteriori (MAP) Aykut Erdem October 2016 Hacettepe University Administrative Assignment 2 will be out on Thursday It is due November 10 (i.e. in 2 weeks) You will

Lecture # 5 - Monday, Aug 30th In this lecture I reviewed the previous lecture 4, and then

Algorithms (2IL15) Lecture 13 Wrap-up lecture 1 TU/e Algorithms (2IL15) Lecture 13

In 2020SP, this lecture and lecture 20 are both optional extra material CS 5412/LECTURE 17 Ken

Recall last lecture ... Lecture 8 Also last lecture: Painter's Algorithm More Hidden Surface

Plan Lecture 1 - String diagrams and symmetric monoidal categories Lecture 2 -

Where are we at - Topic overview Lecture 1A: Security requirements/features Lecture 7A

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Usability of Programming Languages Lecture 4 - directed by your research interests Lecture

Introduction to AI &amp; Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture

Introduction to Numerical Optimization Biostatistics 615/815 Lecture 14 Lecture 14 Course is

Lecture 12: Clustering 1 6.0002 LECTURE 12 Re Reading Chapter 23 6.0002 LECTURE 12 2 Mach Ma

Lecture Outline Regeltechniek Previous lecture: Stability and transient response. Lecture 4

Psycholinguistics Lecture 2 By Dr.Chelli Lecture Objectives At the end of this lecture, students

Methodology for Lecture Methodology for Lecture Computer Graphics (Spring 2008) Computer

Lecture Outline Regeltechniek Previous lecture: Nyquist plot and stability criterion. Lecture 11

CSE Fall 2014 311 Lecture 1 Lecture 1 Lecture 1: Propositional Logic Lecture 1 Foundations

Proteomics Steven Meinhardt Lectures Lecture 1 Introduction review of proteins

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Lecture 1: Neurons Lecture 2: Coding with spikes Lecture 3: Tuning curves and receptive fields

Algorithms (2IL15) Lecture 10 NP-Completeness, II 1 TU/e Algorithms (2IL15) Lecture 10

Lecture 1: Bioinformatic Algorithms In this lecture Logistics of the course

Introduction to AI & Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture