Classifica)on Example: Spam Filter Input: an email Dear - PowerPoint PPT Presentation

Machine ¡Learning ¡ CS ¡232: ¡Ar)ficial ¡Intelligence ¡ ¡ Naïve ¡Bayes ¡ Oct ¡26, ¡2015 ¡ § Part ¡1 ¡of ¡course: ¡ how ¡use ¡a ¡model ¡ to ¡make ¡op)mal ¡decisions ¡(state ¡ space, ¡MDPs) ¡ § Machine ¡learning: ¡ how ¡to ¡acquire ¡a ¡model ¡from ¡data ¡ / ¡experience ¡ § Learning ¡parameters ¡(e.g. ¡probabili)es) ¡ § Learning ¡structure ¡(e.g. ¡Bayesian ¡Nets ¡graphs) ¡ § Learning ¡hidden ¡concepts ¡(e.g. ¡clustering) ¡ § Today: ¡model-‑based ¡classifica)on ¡with ¡Naive ¡Bayes ¡ [These ¡slides ¡were ¡created ¡by ¡Dan ¡Klein ¡and ¡Pieter ¡Abbeel ¡for ¡CS188 ¡Intro ¡to ¡AI ¡at ¡UC ¡Berkeley. ¡ ¡All ¡CS188 ¡materials ¡are ¡available ¡at ¡hPp://ai.berkeley.edu.] ¡ Classifica)on ¡ Example: ¡Spam ¡Filter ¡ § Input: ¡an ¡email ¡ Dear ¡Sir. ¡ ¡ § Output: ¡spam/ham ¡ First, ¡I ¡must ¡solicit ¡your ¡confidence ¡in ¡ this ¡transac)on, ¡this ¡is ¡by ¡virture ¡of ¡its ¡ § Setup: ¡ nature ¡as ¡being ¡uPerly ¡confidencial ¡and ¡ top ¡secret. ¡… ¡ § Get ¡a ¡large ¡collec)on ¡of ¡example ¡emails, ¡each ¡labeled ¡ “spam” ¡or ¡“ham” ¡ TO ¡BE ¡REMOVED ¡FROM ¡FUTURE ¡ MAILINGS, ¡SIMPLY ¡REPLY ¡TO ¡THIS ¡ § Note: ¡someone ¡has ¡to ¡hand ¡label ¡all ¡this ¡data! ¡ MESSAGE ¡AND ¡PUT ¡"REMOVE" ¡IN ¡THE ¡ § Want ¡to ¡learn ¡to ¡predict ¡labels ¡of ¡new, ¡future ¡emails ¡ SUBJECT. ¡ ¡ § Features: ¡The ¡aPributes ¡used ¡to ¡make ¡the ¡ham ¡/ ¡ 99 ¡ ¡MILLION ¡EMAIL ¡ADDRESSES ¡ ¡ ¡FOR ¡ONLY ¡$99 ¡ spam ¡decision ¡ § Words: ¡FREE! ¡ Ok, ¡Iknow ¡this ¡is ¡blatantly ¡OT ¡but ¡I'm ¡ beginning ¡to ¡go ¡insane. ¡Had ¡an ¡old ¡Dell ¡ § Text ¡PaPerns: ¡$dd, ¡CAPS ¡ Dimension ¡XPS ¡simng ¡in ¡the ¡corner ¡and ¡ § Non-‑text: ¡SenderInContacts ¡ decided ¡to ¡put ¡it ¡to ¡use, ¡I ¡know ¡it ¡was ¡ § … ¡ working ¡pre ¡being ¡stuck ¡in ¡the ¡corner, ¡ but ¡when ¡I ¡plugged ¡it ¡in, ¡hit ¡the ¡power ¡ nothing ¡happened. ¡ 1

Example: ¡Digit ¡Recogni)on ¡ Other ¡Classifica)on ¡Tasks ¡ § Classifica)on: ¡given ¡inputs ¡x, ¡predict ¡ labels ¡(classes) ¡y ¡ § Input: ¡images ¡/ ¡pixel ¡grids ¡ 0 ¡ § Output: ¡a ¡digit ¡0-‑9 ¡ § Examples: ¡ § Spam ¡detec)on ¡(input: ¡document, ¡ 1 ¡ ¡classes: ¡spam ¡/ ¡ham) ¡ § Setup: ¡ § OCR ¡(input: ¡images, ¡classes: ¡characters) ¡ § Get ¡a ¡large ¡collec)on ¡of ¡example ¡images, ¡each ¡labeled ¡with ¡a ¡digit ¡ § Medical ¡diagnosis ¡(input: ¡symptoms, ¡ § Note: ¡someone ¡has ¡to ¡hand ¡label ¡all ¡this ¡data! ¡ 2 ¡ ¡classes: ¡diseases) ¡ § Want ¡to ¡learn ¡to ¡predict ¡labels ¡of ¡new, ¡future ¡digit ¡images ¡ § Automa)c ¡essay ¡grading ¡(input: ¡document, ¡ ¡classes: ¡grades) ¡ 1 ¡ § Fraud ¡detec)on ¡(input: ¡account ¡ac)vity, ¡ § Features: ¡ The ¡aPributes ¡used ¡to ¡make ¡the ¡digit ¡decision ¡ ¡classes: ¡fraud ¡/ ¡no ¡fraud) ¡ § Pixels: ¡(6,8)=ON ¡ § Customer ¡service ¡email ¡rou)ng ¡ § Shape ¡PaPerns: ¡NumComponents, ¡AspectRa)o, ¡NumLoops ¡ § … ¡many ¡more ¡ ?? ¡ § … ¡ § Classifica)on ¡is ¡an ¡important ¡commercial ¡technology! ¡ Model-‑Based ¡Classifica)on ¡ Model-‑Based ¡Classifica)on ¡ § Model-‑based ¡approach ¡ § Build ¡a ¡model ¡(e.g. ¡a ¡Bayesian ¡ network, ¡BN) ¡where ¡both ¡the ¡label ¡ and ¡features ¡are ¡random ¡variables ¡ § Instan)ate ¡any ¡observed ¡features ¡ § Query ¡for ¡the ¡distribu)on ¡of ¡the ¡label ¡ condi)oned ¡on ¡the ¡features ¡ § Challenges ¡ § What ¡structure ¡should ¡the ¡BN ¡have? ¡ § How ¡should ¡we ¡learn ¡its ¡parameters? ¡ 2

Naïve ¡Bayes ¡for ¡Digits ¡ General ¡Naïve ¡Bayes ¡ § Naïve ¡Bayes: ¡Assume ¡all ¡features ¡are ¡independent ¡effects ¡of ¡the ¡label ¡ § A ¡general ¡Naive ¡Bayes ¡model: ¡ Y ¡ Y ¡ § Simple ¡digit ¡recogni)on ¡version: ¡ § One ¡feature ¡(variable) ¡F ij ¡for ¡each ¡grid ¡posi)on ¡<i,j> ¡ |Y| ¡parameters ¡ § Feature ¡values ¡are ¡on ¡/ ¡off, ¡based ¡on ¡whether ¡intensity ¡ ¡is ¡more ¡or ¡less ¡than ¡0.5 ¡in ¡underlying ¡image ¡ F 1 ¡ F 2 ¡ F n ¡ F 1 ¡ F 2 ¡ F n ¡ § Each ¡input ¡maps ¡to ¡a ¡feature ¡vector, ¡e.g. ¡ |Y| ¡x ¡|F| n ¡values ¡ n ¡x ¡|F| ¡x ¡|Y| ¡ parameters ¡ § Here: ¡lots ¡of ¡features, ¡each ¡is ¡binary ¡valued ¡ § We ¡only ¡have ¡to ¡specify ¡how ¡each ¡feature ¡depends ¡on ¡the ¡class ¡ § Naïve ¡Bayes ¡model: ¡ § Total ¡number ¡of ¡parameters ¡is ¡ linear ¡in ¡n ¡ § What ¡do ¡we ¡need ¡to ¡learn? ¡ § Model ¡is ¡very ¡simplis)c, ¡but ¡ouen ¡works ¡anyway ¡ Inference ¡for ¡Naïve ¡Bayes ¡ General ¡Naïve ¡Bayes ¡ § Goal: ¡compute ¡posterior ¡distribu)on ¡over ¡label ¡variable ¡Y ¡ § What ¡do ¡we ¡need ¡in ¡order ¡to ¡use ¡Naïve ¡Bayes? ¡ § Step ¡1: ¡get ¡joint ¡probability ¡of ¡label ¡and ¡evidence ¡for ¡each ¡label ¡ § Inference ¡method ¡ ¡ § Start ¡with ¡a ¡bunch ¡of ¡probabili)es: ¡P(Y) ¡and ¡the ¡P(F i |Y) ¡tables ¡ § Use ¡standard ¡inference ¡to ¡compute ¡P(Y|F 1 …F n ) ¡ ¡ ¡ § Es)mates ¡of ¡local ¡condi)onal ¡probability ¡tables ¡ § P(Y), ¡the ¡prior ¡over ¡labels ¡ + ¡ § P(F i |Y) ¡for ¡each ¡feature ¡(evidence ¡variable) ¡ § These ¡probabili)es ¡are ¡collec)vely ¡called ¡the ¡ parameters ¡ of ¡the ¡model ¡ and ¡denoted ¡by ¡ θ ¡ § Step ¡2: ¡sum ¡to ¡get ¡probability ¡of ¡evidence ¡ § Up ¡un)l ¡now, ¡we ¡assumed ¡these ¡appeared ¡by ¡magic, ¡but… ¡ § Step ¡3: ¡normalize ¡by ¡dividing ¡Step ¡1 ¡by ¡Step ¡2 ¡ § …they ¡typically ¡come ¡from ¡training ¡data ¡counts: ¡we’ll ¡look ¡at ¡this ¡soon ¡ 3

Classifica)on Example: Spam Filter Input: an email Dear - PowerPoint PPT Presentation

Machine Learning CS 232: Ar)ficial Intelligence Nave Bayes Oct 26, 2015 Part 1 of course: how use a model to make op)mal decisions

Spam, Spam, Spam Why is spam interesting? Everyone can observe spam. Spam / Anti-spam is a

Opinion Spam and Analysis NITIN JINDAL & BING LIU, WSDM 08 UIUC Opinion/Review Spam All

Link Spam Alliances Zoltn Gyngyi Hector Garcia-Molina Class List Spam 101 Intro to

Spam Fighting at CERN 28 April 2004 Emmanuel Ormancey 1 What is Spam ? What is Spam ? Spam

STUFF FILTER POPULARITY FILTER PERSONALITY FILTER TALENT FILTER GODS FILTER WE HAVE THE

Recursive State Estimation 2 Lecture 8 Recap Today Kalman Filter Extended Kalman Filter

Kalman filter Kalman Filter Kalman filter is used to filter true system states from noisy

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

Web Spam Dr. Marc Spaniol Dr. Marc Spaniol Saarbrcken, June 24, 2010 Databases and

Spam Is Bad John R. Levine Chair, IRTF ASRG Chair@asrg.sp.am http://asrg.sp.am Why is spam

Web Spam Marc Spaniol Marc Spaniol Saarbrcken, July 23, 2009 Databases and Information

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Filter Design Specifications Chaiwoot Boonyasiriwat September 29, 2020 Filter Design

THE REPO DOES NOT FORGET STEP 1: GIT FILTER-BRANCH git filter-branch --index-filter 'git rm -rf

Bloom Filter & Hashing Barna Saha Bloom Filter Checks for SET MEMBERSHIP efficiently Is

PURE POWER FILTERS brand guarantee the high quality and performance I N D E X Introduction Air

David Ham dolfin-adjoint: automating the adjoints of models written in the Python interface to

CS325 Artificial Intelligence Natural Language Processing I (Ch. 22) Dr. Cengiz Gnay, Emory

CS 188: Artificial Intelligence Nave Bayes Instructors: Brijen Thananjeyan and Aditya Baradwaj

Alvord Lake - Stanyan across from Haight St. William Alvord (1833 - 1904) - merchant, banker, 14th

REPEATERS, WWARA & THE FUTURE KU7M What is a Repeater? Who has heard of repeater

Syntax and Semantics Philipp Koehn 3 November 2020 Philipp Koehn Machine Translation: Syntax

Sum-of-Product Datatypes in SML CS251 Programming Languages Spring

Science, Decision-Making, and the Law: The Impact Assessment Cat in the Science Hat Dr. Aerin

Classifica)on Example: Spam Filter Input: an email Dear - PowerPoint PPT Presentation

Machine Learning CS 232: Ar)ficial Intelligence Nave Bayes Oct 26, 2015 Part 1 of course: how use a model to make op)mal decisions

Spam, Spam, Spam Why is spam interesting? Everyone can observe spam. Spam / Anti-spam is a

Opinion Spam and Analysis NITIN JINDAL &amp; BING LIU, WSDM 08 UIUC Opinion/Review Spam All

Link Spam Alliances Zoltn Gyngyi Hector Garcia-Molina Class List Spam 101 Intro to

Spam Fighting at CERN 28 April 2004 Emmanuel Ormancey 1 What is Spam ? What is Spam ? Spam

STUFF FILTER POPULARITY FILTER PERSONALITY FILTER TALENT FILTER GODS FILTER WE HAVE THE

Recursive State Estimation 2 Lecture 8 Recap Today Kalman Filter Extended Kalman Filter

Kalman filter Kalman Filter Kalman filter is used to filter true system states from noisy

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

Web Spam Dr. Marc Spaniol Dr. Marc Spaniol Saarbrcken, June 24, 2010 Databases and

Spam Is Bad John R. Levine Chair, IRTF ASRG Chair@asrg.sp.am http://asrg.sp.am Why is spam

Web Spam Marc Spaniol Marc Spaniol Saarbrcken, July 23, 2009 Databases and Information

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Filter Design Specifications Chaiwoot Boonyasiriwat September 29, 2020 Filter Design

THE REPO DOES NOT FORGET STEP 1: GIT FILTER-BRANCH git filter-branch --index-filter 'git rm -rf

Bloom Filter &amp; Hashing Barna Saha Bloom Filter Checks for SET MEMBERSHIP efficiently Is

PURE POWER FILTERS brand guarantee the high quality and performance I N D E X Introduction Air

David Ham dolfin-adjoint: automating the adjoints of models written in the Python interface to

CS325 Artificial Intelligence Natural Language Processing I (Ch. 22) Dr. Cengiz Gnay, Emory

CS 188: Artificial Intelligence Nave Bayes Instructors: Brijen Thananjeyan and Aditya Baradwaj

Alvord Lake - Stanyan across from Haight St. William Alvord (1833 - 1904) - merchant, banker, 14th

REPEATERS, WWARA &amp; THE FUTURE KU7M What is a Repeater? Who has heard of repeater

Syntax and Semantics Philipp Koehn 3 November 2020 Philipp Koehn Machine Translation: Syntax

Sum-of-Product Datatypes in SML CS251 Programming Languages Spring

Science, Decision-Making, and the Law: The Impact Assessment Cat in the Science Hat Dr. Aerin

Opinion Spam and Analysis NITIN JINDAL & BING LIU, WSDM 08 UIUC Opinion/Review Spam All

Bloom Filter & Hashing Barna Saha Bloom Filter Checks for SET MEMBERSHIP efficiently Is

REPEATERS, WWARA & THE FUTURE KU7M What is a Repeater? Who has heard of repeater