A Probabilistic View of Machine Learning (2/2) CMSC 422 M ARINE C - PowerPoint PPT Presentation

A Probabilistic View of Machine Learning (2/2) CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Some slides based on material by Tom Mitchell

What we know so far… • Bayes rule • A probabilistic view of machine learning – If we know the data generating distribution, we can define the Bayes optimal classifier – Under iid assumption • How to estimate a probability distribution from data? – Maximum likelihood estimation

T oday • How to compute Maximum Likelihood Estimates – For Bernouilli and Categorical Distributions • Naïve Bayes classifier

Maximum Likelihood Estimates Given a data set D of iid flips, which contains 𝛽 1 ones and 𝛽 0 zeros 𝑄 𝜄 (𝐸) = 𝜄 𝛽 1 (1 − 𝜄) 𝛽 0 𝛽 1 𝜄 𝑁𝑀𝐹 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 𝑄 𝜄 𝐸 = 𝛽 1 + 𝛽 0

Maximum Likelihood Estimates Given a data set D of iid rolls, which contains 𝑦 𝑙 outcomes 𝑙 for each 𝑙 𝐿 K sided die 𝑦 𝑙 𝑄 𝜄 (𝐸) = 𝜄 𝑙 ∀ 𝑙, 𝑄 𝑌 = 𝑙 = 𝜄 𝑙 𝑙=1 (Categorical Distribution) 𝜄 𝑁𝑀𝐹 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 𝑄 𝜄 𝐸 Problem: = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 log 𝑄 𝜄 𝐸 This objective lacks 𝐿 constraints! = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 𝑦 𝑙 log(𝜄 𝑙 ) 𝑙=1

Maximum Likelihood Estimates A constrained optimization problem 𝐿 𝜄 𝑁𝑀𝐹 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 𝑦 𝑙 log(𝜄 𝑙 ) 𝑙=1 K sided die 𝐿 ∀ 𝑙, 𝑄 𝑌 = 𝑙 = 𝜄 𝑙 𝑥𝑗𝑢ℎ 𝜄 𝑙 = 1 𝑙=1 How to solve it? Use lagrange multipliers to turn it into unconstrained objective (on board)

Maximum Likelihood Estimates The parameters that maximize the likelihood of the data are given by: 𝑦 𝑙 K sided die 𝜄 𝑙 = ∀ 𝑙, 𝑄 𝑌 = 𝑙 = 𝜄 𝑙 𝑙 𝑦 𝑙 This is the relative frequency of rolls where side k comes up!

T oday • How to compute Maximum Likelihood Estimates – For Bernouilli and Categorical Distributions • Naïve Bayes classifier

Let’s learn a classifier by learning P(Y|X) • Goal: learn a classifier P(Y|X) • Prediction: – Given an example x – Predict 𝑧 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝑦)

Parameters for P(X,Y) vs. P(Y|X) Y = Wealth X = <Gender, Hours_worked> Joint probability distribution P(X,Y) Conditional probability distribution P(Y|X)

Parameters for P(X,Y) and P(Y|X) • P(Y|X) requires estimating fewer parameters than P(X,Y) • But that is still too many parameters in practice! • So we need simplifying assumptions to make estimation more practical

Naïve Bayes Assumption Naïve Bayes assumes 𝑒 𝑄 𝑌 1 , 𝑌 2 , … 𝑌 𝑒 𝑍 = 𝑗=1 𝑄(𝑌 𝑗 |𝑍) i.e., that 𝑌 𝑗 and 𝑌 𝑘 are conditionally independent given Y, for all 𝑗 ≠ 𝑘

Conditional Independence • Definition: X is conditionally independent of Y given Z if P(X|Y,Z) = P(X|Z) • Recall that X is independent of Y if P(X|Y)=P(Y)

Naïve Bayes classifier 𝑧 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝑦) = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄(𝑍 = 𝑧)𝑄 𝑌 = 𝑦 𝑍 = 𝑧) 𝑒 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄(𝑍 = 𝑧) 𝑄 𝑌 𝑗 = 𝑦 𝑗 𝑍 = 𝑧) 𝑗=1 Bayes rule + Conditional independence assumption

How many parameters do we need to learn? • To describe P(Y)? • To describe 𝑄 𝑌 = < 𝑌 1 , 𝑌 2 , … 𝑌 𝑒 > 𝑍 ) – Without conditional independence assumption? – With conditional independence assumption? (Suppose all random variables are Boolean)

Training a Naïve Bayes classifier Let’s assume discrete Xi and Y # 𝑓𝑦𝑏𝑛𝑞𝑚𝑓𝑡 𝑔𝑝𝑠 𝑥ℎ𝑗𝑑ℎ 𝑍 = 𝑧 𝑙 TrainNaïveBayes (Data) # 𝑓𝑦𝑏𝑛𝑞𝑚𝑓𝑡 for each value 𝑧 𝑙 of Y estimate 𝜌 𝑙 = 𝑄(𝑍 = 𝑧 𝑙 ) for each value 𝑦 𝑗𝑘 of 𝑌 𝑗 estimate 𝜄 𝑗𝑘𝑙 = 𝑄 𝑌 𝑗 = 𝑦 𝑗𝑘 𝑍 = 𝑧 𝑙 ) # 𝑓𝑦𝑏𝑛𝑞𝑚𝑓𝑡 𝑔𝑝𝑠 𝑥ℎ𝑗𝑑ℎ 𝑌 𝑗 = 𝑦 𝑗𝑘 𝑏𝑜𝑒 𝑍 = 𝑧 𝑙 # 𝑓𝑦𝑏𝑛𝑞𝑚𝑓𝑡 𝑔𝑝𝑠 𝑥ℎ𝑗𝑑ℎ 𝑍 = 𝑧 𝑙

Naïve Bayes Wrap-up • A simple classifier, that performs well in practice • Subtleties – Often the Xi are not really conditionally independent – What if the Maximum Likelihood estimate for P(Xi|Y) is zero?

What you should know • The Naïve Bayes classifier – Conditional independence assumption – How to train it? – How to make predictions? – How does it relate to other classifiers we know? [HW] • Fundamental Machine Learning concepts – iid assumption – Bayes optimal classifier

A Probabilistic View of Machine Learning (2/2) CMSC 422 M ARINE C - PowerPoint PPT Presentation

A Probabilistic View of Machine Learning (2/2) CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Some slides based on material by Tom Mitchell What we know so far Bayes rule A probabilistic view of machine learning If we know the

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

IGES view on new market IGES view on new market- IGES view on new market IGES view on new market

A O I Posterior View A O I Posterior View A O I

101 iOS Container View Controllers Container View Controllers Display a view controller inside

Conditional Independence Testing using Adversarial Neural Networks Alexis Bellot Mihaela van der

An illustration of Conditional Independence Martin Emms October 8, 2020 An illustration of

Overview Independence Belief Networks Conditional Independence Belief networks Chris

Lecture 7: Probability Review (contd.) Maximum Likelihood Estimation (MLE) Aykut Erdem

Conditional Probability and Independence Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

Conditional independence ideals with hidden variables Fatemeh Mohammadi (IST Austria) Johannes

Conditional Independence CMPUT 366: Intelligent Systems P&M 8.2 Lecture Outline 1.

CSCI 446: Artificial Intelligence Bayes Nets Instructors: Michele Van Dyne [These slides were

A Probabilistic View of Machine Learning (2/2) CMSC 422 M ARINE C - PowerPoint PPT Presentation

A Probabilistic View of Machine Learning (2/2) CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Some slides based on material by Tom Mitchell What we know so far Bayes rule A probabilistic view of machine learning If we know the

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

IGES view on new market IGES view on new market- IGES view on new market IGES view on new market

A O I Posterior View A O I Posterior View A O I

101 iOS Container View Controllers Container View Controllers Display a view controller inside

Conditional Independence Testing using Adversarial Neural Networks Alexis Bellot Mihaela van der

An illustration of Conditional Independence Martin Emms October 8, 2020 An illustration of

Overview Independence Belief Networks Conditional Independence Belief networks Chris

Lecture 7: Probability Review (contd.) Maximum Likelihood Estimation (MLE) Aykut Erdem

Conditional Probability and Independence Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

Conditional independence ideals with hidden variables Fatemeh Mohammadi (IST Austria) Johannes

Conditional Independence CMPUT 366: Intelligent Systems P&amp;M 8.2 Lecture Outline 1.

CSCI 446: Artificial Intelligence Bayes Nets Instructors: Michele Van Dyne [These slides were

Conditional Independence CMPUT 366: Intelligent Systems P&M 8.2 Lecture Outline 1.