CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Pattern Recognition. Bayesian and non-Bayesian Tasks. Petr Poˇ s´ ık This lecture is based on the book Ten Lectures on Statistical and Structural Pattern Recognition by Michail I. Schlesinger and V´ aclav Hlav´ aˇ c (Kluwer, 2002). (V ˇ cesk´ e verzi kniha vyˇ sla pod n´ azvem Deset pˇ redn´ aˇ sek z teorie statistick´ eho a struktur´ aln´ ıho rozpozn´ av´ an´ ı ı ˇ ve vydavatelstv´ CVUT v roce 1999.) P. Poˇ s´ ık c � 2014 Artificial Intelligence – 1 / 21
Pattern Recognition P. Poˇ s´ ık c � 2014 Artificial Intelligence – 2 / 21
Definitions of concepts An object of interest is characterized by the following parameters: ■ observation x ∈ X (vector of numbers, graph, picture, sound, ECG, . . . ), and Pattern Recognition ■ hidden state k ∈ K . • Concepts ■ k is often viewed as the object class , but it may be something different, e.g. when we • Notes • PR task examples seek for the location k of an object based on the picture x taken by a camera. • Two types of PR Bayesian DT Non-Bayesian DT P. Poˇ s´ ık c � 2014 Artificial Intelligence – 3 / 21
Definitions of concepts An object of interest is characterized by the following parameters: ■ observation x ∈ X (vector of numbers, graph, picture, sound, ECG, . . . ), and Pattern Recognition ■ hidden state k ∈ K . • Concepts ■ k is often viewed as the object class , but it may be something different, e.g. when we • Notes • PR task examples seek for the location k of an object based on the picture x taken by a camera. • Two types of PR Bayesian DT Joint probability distribution p XK : X × K → � 0, 1 � Non-Bayesian DT p XK ( x , k ) is the joint probability that the object is in the state k and we observe x . ■ p XK ( x , k ) = p X | K ( x | k ) · p K ( k ) ■ P. Poˇ s´ ık c � 2014 Artificial Intelligence – 3 / 21
Definitions of concepts An object of interest is characterized by the following parameters: ■ observation x ∈ X (vector of numbers, graph, picture, sound, ECG, . . . ), and Pattern Recognition ■ hidden state k ∈ K . • Concepts ■ k is often viewed as the object class , but it may be something different, e.g. when we • Notes • PR task examples seek for the location k of an object based on the picture x taken by a camera. • Two types of PR Bayesian DT Joint probability distribution p XK : X × K → � 0, 1 � Non-Bayesian DT p XK ( x , k ) is the joint probability that the object is in the state k and we observe x . ■ p XK ( x , k ) = p X | K ( x | k ) · p K ( k ) ■ Decision strategy (or function or rule) q : X → D D is a set of possible decisions. (Very often D = K .) ■ q is a function that assigns a decision d = q ( x ) , d ∈ D , to each x ∈ X . ■ P. Poˇ s´ ık c � 2014 Artificial Intelligence – 3 / 21
Definitions of concepts An object of interest is characterized by the following parameters: ■ observation x ∈ X (vector of numbers, graph, picture, sound, ECG, . . . ), and Pattern Recognition ■ hidden state k ∈ K . • Concepts ■ k is often viewed as the object class , but it may be something different, e.g. when we • Notes • PR task examples seek for the location k of an object based on the picture x taken by a camera. • Two types of PR Bayesian DT Joint probability distribution p XK : X × K → � 0, 1 � Non-Bayesian DT p XK ( x , k ) is the joint probability that the object is in the state k and we observe x . ■ p XK ( x , k ) = p X | K ( x | k ) · p K ( k ) ■ Decision strategy (or function or rule) q : X → D D is a set of possible decisions. (Very often D = K .) ■ q is a function that assigns a decision d = q ( x ) , d ∈ D , to each x ∈ X . ■ Penalty function (or loss function) W : K × D → R (real numbers) ■ W ( k , d ) is a penalty for decision d if the object is in state k . P. Poˇ s´ ık c � 2014 Artificial Intelligence – 3 / 21
Definitions of concepts An object of interest is characterized by the following parameters: ■ observation x ∈ X (vector of numbers, graph, picture, sound, ECG, . . . ), and Pattern Recognition ■ hidden state k ∈ K . • Concepts ■ k is often viewed as the object class , but it may be something different, e.g. when we • Notes • PR task examples seek for the location k of an object based on the picture x taken by a camera. • Two types of PR Bayesian DT Joint probability distribution p XK : X × K → � 0, 1 � Non-Bayesian DT p XK ( x , k ) is the joint probability that the object is in the state k and we observe x . ■ p XK ( x , k ) = p X | K ( x | k ) · p K ( k ) ■ Decision strategy (or function or rule) q : X → D D is a set of possible decisions. (Very often D = K .) ■ q is a function that assigns a decision d = q ( x ) , d ∈ D , to each x ∈ X . ■ Penalty function (or loss function) W : K × D → R (real numbers) ■ W ( k , d ) is a penalty for decision d if the object is in state k . Risk R : Q → R ■ the mathematical expectation of the penalty which must be paid when using the strategy q . P. Poˇ s´ ık c � 2014 Artificial Intelligence – 3 / 21
Notes to decision tasks In the following, we consider decision tasks where ■ the decisions do not influence the state of nature (unlike game theory or control theory ). Pattern Recognition ■ a single decision is made, issues of time are ignored in the model (unlike control • Concepts theory , where decisions are typically taken continuously in real time). • Notes • PR task examples ■ the costs of obtaining the observations are not modelled (unlike sequential decision • Two types of PR theory ). Bayesian DT Non-Bayesian DT P. Poˇ s´ ık c � 2014 Artificial Intelligence – 4 / 21
Notes to decision tasks In the following, we consider decision tasks where ■ the decisions do not influence the state of nature (unlike game theory or control theory ). Pattern Recognition ■ a single decision is made, issues of time are ignored in the model (unlike control • Concepts theory , where decisions are typically taken continuously in real time). • Notes • PR task examples ■ the costs of obtaining the observations are not modelled (unlike sequential decision • Two types of PR theory ). Bayesian DT Non-Bayesian DT The hidden parameter k (state, class) is considered not observable. Common situations are: ■ k can be observed, but at a high cost. k is a future state (e.g. price of gold) and will be observed later. ■ P. Poˇ s´ ık c � 2014 Artificial Intelligence – 4 / 21
Pattern recognition task examples The description of the concepts is very general—so far we did not specify what the items of the X , K , and D sets actually are, how they are represented. Application Observation (measurement) Decisions x ∈ R n Coin value in a slot machine Value Gene-expression profile, x ∈ R n Cancerous tissue detection { yes, no } Results of medical tests, x ∈ R n Medical diagnostics Diagnosis Optical character recognition 2D bitmap, intensity image Words, numbers License plate recognition 2D bitmap, grey-level image Characters, numbers Fingerprint recognition 2D bitmap, grey-level image Personal identity { yes, no } Face detection 2D bitmap x ( t ) Speech recognition Words x ( t ) Speaker identification Personal identity x ( t ) { yes, no } Speaker verification x ( t ) EEG, ECG analysis Diagnosis Forfeit detection Various { yes, no } P. Poˇ s´ ık c � 2014 Artificial Intelligence – 5 / 21
Two types of pattern recognition 1. Statistical pattern recognition ■ Objects are represented as points in a vector space. Pattern Recognition ■ The point (vector) x contains the individual observations (in a numerical form) • Concepts as its coordinates. • Notes • PR task examples 2. Structural pattern recognition • Two types of PR Bayesian DT ■ The object observations contain a structure which is represented and used for recognition. Non-Bayesian DT ■ A typical example of the representation of a structure is a grammar . P. Poˇ s´ ık c � 2014 Artificial Intelligence – 6 / 21
Bayesian Decision Theory P. Poˇ s´ ık c � 2014 Artificial Intelligence – 7 / 21
Bayesian decision task Given the sets X , K , and D , and functions p XK : X × K → � 0, 1 � and W : K × D → R , find a strategy q : X → D which minimizes the Bayesian risk of the strategy q Pattern Recognition R ( q ) = ∑ x ∈ X ∑ p XK ( x , k ) · W ( k , q ( x )) . Bayesian DT k ∈ K • Bayesian dec. task • Characteristics of q ∗ The optimal strategy q , denoted as q ∗ , is then called the Bayesian strategy . • Two special cases • Limitations Non-Bayesian DT P. Poˇ s´ ık c � 2014 Artificial Intelligence – 8 / 21
Recommend
More recommend