Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 33
Logistics • Piazza: https://piazza.com/colorado/fall2017/csci5622/ • Office hour • HW1 due • Final projects • Feedback Machine Learning: Chenhao Tan | Boulder | 2 of 33
Recap • Supervised learning • K-nearest neighbor • Training/validation/test, overfitting/underfitting Machine Learning: Chenhao Tan | Boulder | 3 of 33
Overview Generative vs. Discriminative models Naïve Bayes Classifier Motivating Naïve Bayes Example Naïve Bayes Definition Estimating Probability Distributions Logistic regression Logistic Regression Example Machine Learning: Chenhao Tan | Boulder | 4 of 33
Generative vs. Discriminative models Outline Generative vs. Discriminative models Naïve Bayes Classifier Motivating Naïve Bayes Example Naïve Bayes Definition Estimating Probability Distributions Logistic regression Logistic Regression Example Machine Learning: Chenhao Tan | Boulder | 5 of 33
Generative vs. Discriminative models Probabilistic Models • hypothesis function h : X → Y . Machine Learning: Chenhao Tan | Boulder | 6 of 33
Generative vs. Discriminative models Probabilistic Models • hypothesis function h : X → Y . In this special case, we define h based on estimating a probabilistic model P ( X , Y ) . Machine Learning: Chenhao Tan | Boulder | 6 of 33
Generative vs. Discriminative models Probabilistic Classification Input : S train = { ( x i , y i ) } N i = 1 training examples y i ∈ { c 1 , c 2 , . . . , c J } Goal : h : X → Y For each class c j , estimate P ( y = c j | x , S train ) Assign to x the class with the highest probability ˆ y = h ( x ) = arg max P ( y = c | x , S train ) c Machine Learning: Chenhao Tan | Boulder | 7 of 33
Generative vs. Discriminative models Generative vs. Discriminative Models Discriminative Generative Model only conditional probability p ( y | x ) , excluding the data x . Model joint probability p ( x , y ) including the data x . Logistic regression Naïve Bayes • Logistic: A special mathematical function it uses • Uses Bayes rule to reverse conditioning p ( x | y ) → p ( y | x ) • Regression: Combines a weight vector with observations to create an • Naïve because it ignores joint answer probabilities within the data distribution • General cookbook for building conditional probability distributions Machine Learning: Chenhao Tan | Boulder | 8 of 33
Naïve Bayes Classifier Outline Generative vs. Discriminative models Naïve Bayes Classifier Motivating Naïve Bayes Example Naïve Bayes Definition Estimating Probability Distributions Logistic regression Logistic Regression Example Machine Learning: Chenhao Tan | Boulder | 9 of 33
Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem • Suppose that I have two coins, C 1 and C 2 • Now suppose I pull a coin out of my pocket, flip it a bunch of times, record the coin and outcomes, and repeat many times: C1: 0 1 1 1 1 C1: 1 1 0 C2: 1 0 0 0 0 0 0 1 C1: 0 1 C1: 1 1 0 1 1 1 C2: 0 0 1 1 0 1 C2: 1 0 0 0 Machine Learning: Chenhao Tan | Boulder | 10 of 33
Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem • Suppose that I have two coins, C 1 and C 2 • Now suppose I pull a coin out of my pocket, flip it a bunch of times, record the coin and outcomes, and repeat many times: C1: 0 1 1 1 1 C1: 1 1 0 C2: 1 0 0 0 0 0 0 1 C1: 0 1 C1: 1 1 0 1 1 1 C2: 0 0 1 1 0 1 C2: 1 0 0 0 • Now suppose I am given a new sequence, 0 0 1 ; which coin is it from? Machine Learning: Chenhao Tan | Boulder | 10 of 33
Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: • different numbers of covariates for each observation • number of covariates can be large However, there is some structure: • Easy to get P ( C 1 ) , P ( C 2 ) • Also easy to get P ( X i = 1 | C 1 ) and P ( X i = 1 | C 2 ) • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) • Can we use these to get P ( C 1 | X = 0 0 1 ) ? Machine Learning: Chenhao Tan | Boulder | 11 of 33
Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: • different numbers of covariates for each observation • number of covariates can be large However, there is some structure: • Easy to get P ( C 1 )= 4 / 7 , P ( C 2 )= 3 / 7 • Also easy to get P ( X i = 1 | C 1 ) and P ( X i = 1 | C 2 ) • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) • Can we use these to get P ( C 1 | X = 0 0 1 ) ? Machine Learning: Chenhao Tan | Boulder | 11 of 33
Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: • different numbers of covariates for each observation • number of covariates can be large However, there is some structure: • Easy to get P ( C 1 )= 4 / 7 , P ( C 2 )= 3 / 7 • Also easy to get P ( X i = 1 | C 1 )= 12 / 16 and P ( X i = 1 | C 2 )= 6 / 18 • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) • Can we use these to get P ( C 1 | X = 0 0 1 ) ? Machine Learning: Chenhao Tan | Boulder | 11 of 33
Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem Summary: have P ( data | class ) , want P ( class | data ) Solution: Bayes’ rule! P ( class | data ) = P ( data | class ) P ( class ) P ( data ) P ( data | class ) P ( class ) = � C class = 1 P ( data | class ) P ( class ) To compute, we need to estimate P ( data | class ) , P ( class ) for all classes Machine Learning: Chenhao Tan | Boulder | 12 of 33
Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem However, there is some structure: • Easy to get P ( C 1 )= 4 / 7 , P ( C 2 )= 3 / 7 • Also easy to get P ( X i = 1 | C 1 )= 12 / 16 and P ( X i = 1 | C 2 ) = = 6 / 18 • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) 4 / 7 × 4 / 16 × 4 / 16 × 12 / 16 P ( C 1 | X = 0 0 1 ) = 4 / 7 × 4 / 16 × 4 / 16 × 12 / 16 + 3 / 7 × 12 / 18 × 12 / 18 × 6 / 18 Machine Learning: Chenhao Tan | Boulder | 13 of 33
Naïve Bayes Classifier | Naïve Bayes Definition The Naïve Bayes classifier • The Naïve Bayes classifier is a probabilistic classifier. • We compute the probability of a document d being in a class c as follows: � P ( c | d ) ∝ P ( c , d ) = P ( c ) P ( w i | c ) 1 ≤ i ≤ n d Machine Learning: Chenhao Tan | Boulder | 14 of 33
Naïve Bayes Classifier | Naïve Bayes Definition The Naïve Bayes classifier • The Naïve Bayes classifier is a probabilistic classifier. • We compute the probability of a document d being in a class c as follows: � P ( c | d ) ∝ P ( c , d ) = P ( c ) P ( w i | c ) 1 ≤ i ≤ n d Machine Learning: Chenhao Tan | Boulder | 14 of 33
Naïve Bayes Classifier | Naïve Bayes Definition The Naïve Bayes classifier • The Naïve Bayes classifier is a probabilistic classifier. • We compute the probability of a document d being in a class c as follows: � P ( c | d ) ∝ P ( c , d ) = P ( c ) P ( w i | c ) 1 ≤ i ≤ n d • n d is the length of the document. (number of tokens) • P ( w i | c ) is the conditional probability of term w i occurring in a document of class c • P ( w i | c ) as a measure of how much evidence w i contributes that c is the correct class. • P ( c ) is the prior probability of c . • If a document’s terms do not provide clear evidence for one class vs. another, we choose the c with higher P ( c ) . Machine Learning: Chenhao Tan | Boulder | 14 of 33
Naïve Bayes Classifier | Naïve Bayes Definition Maximum a posteriori class • Our goal is to find the “best” class. • The best class in Naïve Bayes classification is the most likely or maximum a posteriori (MAP) class c MAP : ˆ ˆ � ˆ c MAP = arg max P ( c j | d ) = arg max P ( c j ) P ( w i | c j ) c j ∈ C c j ∈ C 1 ≤ i ≤ n d • We write ˆ P for P since these values are estimates from the training set. Machine Learning: Chenhao Tan | Boulder | 15 of 33
Naïve Bayes Classifier | Naïve Bayes Definition Naive Bayes Classifier: More examples This works because the coin flips are independent given the coin parameter. What about this case: • want to identify the type of fruit given a set of features: color, shape and size • color: red, green, yellow or orange (discrete) • shape: round, oval or long+skinny (discrete) • size: diameter in inches (continuous) Machine Learning: Chenhao Tan | Boulder | 16 of 33
Naïve Bayes Classifier | Naïve Bayes Definition Naive Bayes Classifier: More examples Conditioned on type of fruit, these features are not necessarily independent: Given category “apple,” the color “green” has a higher probability given “size < 2”: P ( green | size < 2 , apple ) > P ( green | apple ) Machine Learning: Chenhao Tan | Boulder | 17 of 33
Recommend
More recommend