Introduction to Machine Learning: Classification and The Noisy Channel Model CMSC 473/673 UMBC Some slides adapted from 3SLP
Outline Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier Evaluation
Probabilistic Classification π π π) = β(π; π) Directly model the posterior Discriminatively trained classifier Model the π π π) β π π π) β π(π) posterior with Bayes rule Generatively trained classifier
Outline Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier Evaluation
Classification P OLITICS T ERRORISM Three people have been fatally shot, and five S PORTS people, including a mayor, were seriously wounded T ECH as a result of a Shining Path attack today against a H EALTH community in Junin department, central F INANCE Peruvian mountain region. β¦
Classification P OLITICS T ERRORISM Three people have been fatally shot, and five S PORTS people, including a mayor, were seriously wounded T ECH as a result of a Shining Path attack today against a H EALTH community in Junin department, central F INANCE Peruvian mountain region. β¦
Classification P OLITICS Electronic alerts have T ERRORISM been used to assist the authorities in moments of S PORTS chaos and potential danger: after the Boston T ECH bombing in 2013, when the Boston suspects were H EALTH still at large, and last month in Los Angeles, F INANCE during an active shooter scare at the airport. β¦ Source: http://www.nytimes.com/2016/09/20/nyregion/cellphone-alerts-used-in-search-of- manhattan-bombing-suspect.html
Classification P OLITICS Electronic alerts have T ERRORISM been used to assist the authorities in moments of S PORTS chaos and potential danger: after the Boston T ECH bombing in 2013, when the Boston suspects were H EALTH still at large, and last month in Los Angeles, F INANCE during an active shooter scare at the airport. β¦ Source: http://www.nytimes.com/2016/09/20/nyregion/cellphone-alerts-used-in-search-of- manhattan-bombing-suspect.html
Classify with Uncertainty Use probabilities
Classify with Uncertainty Use probabilities* *There are non- probabilistic ways to handle uncertainty⦠but probabilities sure are handy!
Classification P OLITICS .05 Electronic alerts have T ERRORISM .48 been used to assist the authorities in moments of S PORTS .0001 chaos and potential danger: after the Boston T ECH .39 bombing in 2013, when the Boston suspects were H EALTH .0001 still at large, and last month in Los Angeles, F INANCE .0002 during an active shooter scare at the airport. β¦ Source: http://www.nytimes.com/2016/09/20/nyregion/cellphone-alerts-used-in-search-of- manhattan-bombing-suspect.html
Text Classification Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification
Text Classification Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification Input : a document a fixed set of classes C = { c 1 , c 2 ,β¦, c J } Output : a predicted class c from C
Text Classification Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification Input : a document linguistic blob a fixed set of classes C = { c 1 , c 2 ,β¦, c J } Output : a predicted class c from C
Text Classification: Hand-coded Rules? Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification Rules based on combinations of words or other features spam: black-list- address OR (βdollarsβ AND βhave been selectedβ) Accuracy can be high If rules carefully refined by expert Building and maintaining these rules is expensive Can humans faithfully assign uncertainty?
Text Classification: Supervised Machine Learning Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification Input: a document d a fixed set of classes C = { c 1 , c 2 ,β¦, c J } A training set of m hand-labeled documents (d 1 ,c 1 ),....,(d m ,c m ) Output: a learned classifier Ξ³ that maps documents to classes
Text Classification: Supervised Machine Learning Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification Input: NaΓ―ve Bayes a document d Logistic regression a fixed set of classes C = { c 1 , c 2 ,β¦, c J } A training set of m hand-labeled Support-vector documents (d 1 ,c 1 ),....,(d m ,c m ) machines Output: a learned classifier Ξ³ that maps k-Nearest Neighbors documents to classes β¦
Text Classification: Supervised Machine Learning Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification Input: NaΓ―ve Bayes a document d Logistic regression a fixed set of classes C = { c 1 , c 2 ,β¦, c J } A training set of m hand-labeled Support-vector documents (d 1 ,c 1 ),....,(d m ,c m ) machines Output: a learned classifier Ξ³ that maps k-Nearest Neighbors documents to classes β¦
Multi-class Classification Given input π¦ , predict discrete label π§ Multi-label Classification
Multi-class Classification Given input π¦ , predict discrete label π§ If π§ β {0,1} (or π§ β {True, False} ), then a binary classification task Multi-label Classification
Multi-class Classification Given input π¦ , predict discrete label π§ If π§ β {0,1} (or π§ β If π§ β {0,1, β¦ , πΏ β 1} (for {True, False} ), then a finite K), then a multi-class binary classification task classification task Q: What are some examples of multi-class classification? Multi-label Classification
Multi-class Classification Given input π¦ , predict discrete label π§ If π§ β {0,1} (or π§ β If π§ β {0,1, β¦ , πΏ β 1} (for Single {True, False} ), then a finite K), then a multi-class output binary classification task classification task If multiple π§ π are Multi- predicted, then a multi- output label classification task Multi-label Classification
Multi-class Classification Given input π¦ , predict discrete label π§ If π§ β {0,1} (or π§ β If π§ β {0,1, β¦ , πΏ β 1} (for Single {True, False} ), then a finite K), then a multi-class output binary classification task classification task If multiple π§ π are Multi- predicted, then a multi- output label classification task Given input π¦ , predict multiple discrete labels π§ = (π§ 1 , β¦ , π§ π ) Multi-label Classification
Multi-class Classification Given input π¦ , predict discrete label π§ If π§ β {0,1} (or π§ β If π§ β {0,1, β¦ , πΏ β 1} (for Single {True, False} ), then a finite K), then a multi-class output binary classification task classification task If multiple π§ π are Each π§ π could be binary or Multi- predicted, then a multi- multi-class output label classification task Given input π¦ , predict multiple discrete labels π§ = (π§ 1 , β¦ , π§ π ) Multi-label Classification
Outline Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier Evaluation
Probabilistic Text Classification Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification class π π π) = π π π) β π(π) π(π) observed data
Probabilistic Text Classification Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification prior class-based likelihood probability of (language model) class class π π π) = π π π) β π(π) π(π) observed observation likelihood (averaged over all classes) data
Probabilistic Text Classification Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection β¦ Authorship identification prior class-based likelihood probability of (language model) class class π π π) = π π π) β π(π) π(π) observed observation likelihood (averaged over all classes) data
Classification with Bayes Rule argmax π π π π)
Classification with Bayes Rule π π π) β π(π) argmax π π(π)
Classification with Bayes Rule π π π) β π(π) argmax π π(π) constant with respect to Y
Classification with Bayes Rule argmax π π π π) β π(π)
Classification with Bayes Rule argmax π log π π π) + log π(π)
Recommend
More recommend