Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Outline for “today” Simple Tuberculosis reminder of Bayes rule and how that relates to decision making Some basic discussions of what it means to make a good decision and the relation to Bayes Basic Bayesian decision making Minimum loss Application to Normal distributions Origins of linear classifiers? Why normals? Obvious and less obvious
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Special thanks… Professor Srihari in Buffalo… posted lots of slides…
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick So you go to the doctor… Assume you go to the doctor because it’s that time of year… He tells you that you’re overdue for your Tuberculosis test You take the TB test ( 𝑌 ) and it’s positive !!! ( 𝑌 + ) But then he tells you not to worry because: The detection rate is 100% 𝑄 𝑌 + 𝑈 + ) = 1 Collectively exhaustive But the false alarm rate is 5% 𝑄 𝑌 + 𝑈 − ) = 0.05 Mutually exclusive The incident rate in Atlanta of TB is 0.1% 𝑄 𝑈 + = 0.0 01 Therefore the odds that you have TB given the test are: 𝑄 𝑌 + 𝑈 + 𝑄 𝑈 + 𝑄 𝑌 + 𝑈 + 𝑄 𝑈 + 𝑄 𝑈 + 𝑌 + ) = = 𝑄 𝑌 + | 𝑈 + 𝑄 𝑈 + +𝑄 𝑌 + 𝑈 − 𝑄 ( 𝑈 − ) 𝑄 𝑌 + 1 . 0∗0 . 001 = 1 . 0∗0 . 001+0 . 05∗0 . 999 = 0.0196 (ie 20 times what it was before the test) Bayes rule
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick So… Q1: if you had to decide right then whether you have TB or not, what would you decide? Q2: would you go get a chest X-ray? Why can’t you really answer that question? Cost of the X-ray? Cost of having TB and not finding out? (Prostate cancer treatments….) So to make the “right” decisions we needed to know: Prior probabilities 𝑄 𝑈 + Likelihoods 𝑄 𝑌 + 𝑈 + and 𝑄 𝑌 + 𝑈 − Cost (loss) functions
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayes decision theory Bayesian theory is fundamental to decision theory and pattern recognition. Basically the mechanisms by which one can evaluate the probability of being right (and thus wrong). Allows one to compute an expectation of cost/reward (assuming some very non-ICBM – no infinities - types of loss) But… It presumes that that a variety of probabilities are known – or at least known about how much they are unknown (Bayes meets Rumsfeld???) We’ll ignore this concern for now…
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayes 1: Priors We have states of nature 𝜕 𝑗 that are mutually exclusive and collectively exhaustive: i P ω = ∑ ( ) 1 i Decision rule if only two classes and based only on prior: if P 𝜕 1 > P 𝜕 2 choose class 𝜕 1 otherwise 𝜕 2 .
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayes 2: Class conditional probabilities Need to know the probability of our data (measurements) given the possible states of nature: 𝑞 ( 𝑦 | 𝜕 𝑗 ) These are probability densities as opposed to distribution on the priors. I will definitely confuse this is class.
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayes rule to get data conditioned probability ω ω ( | ) ( ) p x P ω = ( | ) j j P x j ( ) p x = ∑ ω ω ( ) ( | ) ( ) p x p x P where “evidence” j j j Read “posterior is the likelihood times the prior divided by the evidence”. And since the “evidence” 𝑞 𝑦 is fixed we can usually ignore that.
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick The posteriors from the division…
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayesian decision rule If 𝑄 𝜕 1 𝑦 > 𝑄 𝜕 2 𝑦 then choose 𝜕 1 since the true state of nature is more likely 𝜕 1 …. Assuming there is no significant difference between being wrong in one direction or the other. What is probability of making an error? 𝑄 error 𝑦 = 𝑄 𝜕 1 | 𝑦 when we deicded 𝜕 2 and 𝑄 error 𝑦 = 𝑄 ω 2|x when we decided 𝜕 1 . So P error 𝑦 = min [ 𝑄 ( 𝜕 1 | 𝑦 ), 𝑄 𝜕 2 | 𝑦 ] (Bayes error)
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Obvious generalizations: Feature is a vector (no real difference) More than two classes (as long as ME and CE no problem) Introduce a general loss function which is more general than just making an error … we’ll do this in a minute… And you can refuse to give an answer “ I don’t know”. We’ll talk more about that another time.
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Loss functions and minimum risk Let 𝜕 𝑗 be the possible states of nature. Let { 𝛽 𝑘 } be the possible actions taken (usually announcing the class so as many actions as classes). Let 𝜇 𝛽 𝑘 𝜕 𝑗 be the “loss” incurred for taking action j when actual state of nature is i. Then the expected loss of taking action i when measurement 𝑦 : α = ∑ λ α ω ω ( | ) ( | ) ( | ) R x P x i i j j j So: select 𝛽 𝑗 with minimum expected loss. That’s what you’re “ risking ”. Bayes risk is the best you can do.
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick LRT – likelihood ratio test Action 𝛽 𝑗 is to choose i. Cost 𝜇 𝑗𝑘 is cost of choosing i when reality is j. α = λ ω + λ ω ( | ) ( | ) ( | ) R x P x P x Two risks: 1 11 1 12 2 α = λ ω + λ ω ( | ) ( | ) ( | ) R x P x P x 2 21 1 22 2 Choose 𝛽 1 is it’s risk is lower: λ − λ ω ω > λ − λ ω ω ( ) ( | ) ( ) ( ) ( | ) ( ) p x P p x P 21 11 1 1 12 22 2 2 Which gives a ratio test based on cost and priors: Choose 𝛽 1 if ω λ − λ ω ( | ) ( ) p x P > = 1 12 22 2 T ω λ − λ ω ( | ) ( ) p x P 2 21 11 1
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick A special loss function Cost 𝜇 𝑗𝑘 is 0 is 𝑗 = 𝑘 , 1 otherwise. Called zero-one loss funciton (duh). Which gives a ratio test: choose 𝛽 1 if ω ω ( | ) ( ) p x P > 1 2 ω ω ( | ) ( ) p x P 2 1 i.e. choose whichever class is more likely given the data. Which really means you combine likelihoods and priors, and you never separate them. That is, you just have a decision boundary on 𝒚 . That is you just discriminate based upon 𝒚 …
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Introduction to discriminant functions Let 𝑗 � = −𝑆 ( 𝛽 _ 𝑗 | 𝑦 ) (So “max” discriminant function is min risk.) For minimum error rate (zero one loss): 𝑗 � 𝑦 = 𝑄 𝜕 𝑗 𝑦 (max discrimination is max posterior) Using Bayes rule : 𝑗 � 𝑦 ∝ 𝑞 𝑦 𝜕 𝑗 𝑄 𝜕 1 Finally and then monotonicity of ln let: 𝑗 𝑦 = ln 𝑞 𝑦 𝜕 𝑗 + ln ( 𝑄 𝜕 1 )
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Two class discrimination Let 𝑦 = 1 𝑦 − 2 𝑦 Decide class 𝜕 1 if 𝑦 > 0 otherwise decide 𝜕 2
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Next time… Linear discriminants applied to normal distributions.
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Remember your first assignment! Due next Tuesday, Jan 14. Find an available data set that corresponds to “modest” number of features and “small” number of classes Modest – plausible to try all or many possible subsets of features Small - maybe less than 5. 2 is ideal. 30 would be too many. Submit a one page description of the data, how we would get it within a week. (Are you making it? That’s OK)
Recommend
More recommend