Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall - PowerPoint PPT Presentation

Classification Losses & Risks Discriminant Functions Association Rules Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall 2010 1

Classification Losses & Risks Discriminant Functions Association Rules Outline Classification 1 Losses & Risks 2 Discriminant Functions 3 Association Rules 4 2

Classification Losses & Risks Discriminant Functions Association Rules Bernoulli Distribution Random variable X ∈ 0 , 1 0 (1 − p 0 ) (1 − X ) Bernoulli: P { X = 1 } = p X Given a sample X = { x t } N t =1 t x t � we can estimate ˆ p 0 = N 3

Classification Losses & Risks Discriminant Functions Association Rules Classification Input � x = [ x 1 , x 2 ], Output C ∈ { 0 , 1 } Prediction: � C = 1 if P ( C = 1 | � x ) > 0 . 5 choose C = 0 otherwise Equivalently: � C = 1 if P ( C = 1 | � x ) > P ( C = 0 | � x ) choose C = 0 otherwise E.g., Credit scoring inputs are income and savings Output is low-risk versus high-risk 4

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability 5

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? 5

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability 5

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? 5

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood 5

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? 5

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? In effect, the reverse of what we are trying to find out. 5

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? In effect, the reverse of what we are trying to find out. P ( � x ): evidence 5

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? In effect, the reverse of what we are trying to find out. P ( � x ): evidence If we ignore the classes, how like are we to see a value � x ? 5

Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule - Multiple Classes P ( C i ) p ( � x | C i ) P ( C i | � x ) = p ( � x ) P ( C i ) p ( � x | C i ) = � K k =1 p ( � x | C k ) P ( C k ) P ( C i ) ≥ 0) and � K i =1 P ( C i ) = 1 choose C i if P ( C i | � x ) = max k P ( C k | � x ) 6

Classification Losses & Risks Discriminant Functions Association Rules Unequal Risks In many situations, different actions carry different potential gains and costs. Actions: α i Let λ ik denote the loss incurred by taking action α i when the current state is actually in C k Expected risk of taking action α i : K � R ( α i | � x ) = λ ik P ( C k | � x ) k =1 This is simply the expected value of the loss function given that we have chosen α i Choose α i if R ( α i | � x ) = min k R ( α k | � x ) 7

Classification Losses & Risks Discriminant Functions Association Rules Special Case: Equal Risks � 0 if i = k Suppose λ ik = 1 if i � = k Expected risk of taking action α i : � R ( α i | � x ) = K λ ikP ( C k | � x ) k =1 � = P ( C k | � x ) k � = i = (1 − P ( C i | � x )) Choose α i if R ( α i | � x ) = min k R ( α k | � x ) which happens when P ( C i | � x ) is largest So if all actions have equal cost, choose the action for the most probable class. 8

Classification Losses & Risks Discriminant Functions Association Rules Special Case: Indecision Suppose that making the wrong decision is more expensive than making no decision at all (i.e., falling back to some other procedure) Introduce a special reject action α K +1 that denotes the decision to not select a “real” action Cost of a reject is λ , 0 < λ < 1  0 if i = k  λ ik = λ if i = K + 1 1 if i � = k  9

Classification Losses & Risks Discriminant Functions Association Rules Discriminant Functions An alternate vision. Instead of searching for the most probable class we seek a set of functions that divide the space into K decision regions R 1 , . . . R K � � R i = � x | g i ( � x ) = max g k ( � x ) k 11

Classification Losses & Risks Discriminant Functions Association Rules Why Discriminants? In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. 12

Classification Losses & Risks Discriminant Functions Association Rules Why Discriminants? In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. Allows us to use them when we have no info of the underlying distribution 12

Classification Losses & Risks Discriminant Functions Association Rules Why Discriminants? In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. Allows us to use them when we have no info of the underlying distribution Later techniques will seek discriminant functions directly. 12

Classification Losses & Risks Discriminant Functions Association Rules Bayes Classifier as Discriminant Functions We can form a discriminant function for the Bayes classifier very simply: g i ( � x ) = − R ( α i | � x ) If we have a constant loss function, we can use g i ( � x ) = P ( C i | � x ) P ( C i ) p ( � x | C i ) = p ( � x ) 13

Classification Losses & Risks Discriminant Functions Association Rules Bayes Classifier as Discriminant Functions (cont.) P ( C i ) p ( � x | C i ) g i ( � x ) = p ( � x ) Because all the g i above would have the same denominator, we could alternatively do: g i ( � x ) = P ( C i ) p ( � x | C i ) 14

Classification Losses & Risks Discriminant Functions Association Rules Association Rules Suppose that we want to learn an association rule X → Y 15

Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall - PowerPoint PPT Presentation

Classification Losses & Risks Discriminant Functions Association Rules Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall 2010 1 Classification Losses & Risks Discriminant Functions Association Rules Outline

Bayesian decision theory Andrea Passerini passerini@disi.unitn.it Machine Learning Bayesian

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Bayesian Decision Theory with applications to Experimental Design Robbie Peck University of Bath

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University

Bayesian inference and mathematical imaging. Part I: Bayesian analysis and decision theory. Dr.

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Lecture 19: Scheme I Marvin Zhang 07/25/2016 Announcements Roadmap Introduction Functions

Lecture 10: All men are mortal. Socrates is a man. Even more on Socrates is mortal.

Operational semantics of programs Giuseppe De Giacomo 1 Programs We will consider a very simple

Program for the Simple Problem main_program { float income, tax; cin >> income; if

For Monday Read lectures 1 -5 of Learn Prolog Now:

Foundations of Data and Knowledge Systems EPCL Basic Training Camp 2012 Part Four Thomas Eiter

Statistics and Data Analysis A Brief Introduction to Data Mining Ling-Chieh Kung Department of

An Introduction to Prolog Programming Ulle Endriss Institute for Logic, Language and Computation

Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall - PowerPoint PPT Presentation

Classification Losses & Risks Discriminant Functions Association Rules Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall 2010 1 Classification Losses & Risks Discriminant Functions Association Rules Outline

Bayesian decision theory Andrea Passerini passerini@disi.unitn.it Machine Learning Bayesian

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Bayesian Decision Theory with applications to Experimental Design Robbie Peck University of Bath

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University

Bayesian inference and mathematical imaging. Part I: Bayesian analysis and decision theory. Dr.

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Lecture 19: Scheme I Marvin Zhang 07/25/2016 Announcements Roadmap Introduction Functions

Lecture 10: All men are mortal. Socrates is a man. Even more on Socrates is mortal.

Operational semantics of programs Giuseppe De Giacomo 1 Programs We will consider a very simple

Program for the Simple Problem main_program { float income, tax; cin &gt;&gt; income; if

For Monday Read lectures 1 -5 of Learn Prolog Now:

Foundations of Data and Knowledge Systems EPCL Basic Training Camp 2012 Part Four Thomas Eiter

Statistics and Data Analysis A Brief Introduction to Data Mining Ling-Chieh Kung Department of

An Introduction to Prolog Programming Ulle Endriss Institute for Logic, Language and Computation

Program for the Simple Problem main_program { float income, tax; cin >> income; if