Some results on Imprecise discriminant analysis 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability CARRANZA-ALARCON Yonatan-Carlos Ph.D. Candidate in Computer Science DESTERCKE Sébastien Ph.D Director 30 July 2018 to 01 August 2018
Classification Imprecise Classification Future work Conclusions Références Overview Imprecise Discriminant Analysis Classification Classification ● ❍ Decision Making ❍ Discriminant Analysis Imprecise Classification ● ❍ Imprecise Decision ❍ Imprecise Linear discriminant analysis Future work ● Conclusions ● 2 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Overview Classification ● ❍ Decision Making ❍ Discriminant Analysis Imprecise Classification ● ❍ Imprecise Decision ❍ Imprecise Linear discriminant analysis Future work ● Conclusions ● 3 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Classification - Setting A classic classification problem is composed of : ● Data training D = { x i , y i } N i = 0 such as : ❍ (Input) x i ∈ X are regressors or features (often x i ∈ R p ). ❍ (Output) y i ∈ K is a response category variable, with K = { m 1 ,..., m K } 4 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Classification - Setting A classic classification problem is composed of : ● Data training D = { x i , y i } N i = 0 such as : ❍ (Input) x i ∈ X are regressors or features (often x i ∈ R p ). ❍ (Output) y i ∈ K is a response category variable, with K = { m 1 ,..., m K } Objective Given training data D = { x i , y i } N i = 0 , we need to learn a classification rule : φ : X → Y in order to predict a new observation φ ( x ∗ ) 4 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Classification - Outline (Example) Getting Training Data 5 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Classification - Outline (Example) Getting Training Learning a classification rule : Data φ : X → Y → 5 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Classification - Outline (Example) Getting Training Learning a Predict class for classification rule : new instances : Data y ∗ : = φ ( x ∗ | X , y ) φ : X → Y � → → 5 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Classification - Outline (Example) Learning a Predict class for Getting Training Data classification rule : new instances : y ∗ : = φ ( x ∗ | X , y ) φ : X → Y � → → But : ● How can we learn the “classification rule” (model) from training data? 5 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Decision Making Discriminant Analysis Decision Making in Statistic ● In statistic : classification rule often seen as a decision-making problem under risk of getting missclassification. (1) R ( y , ϕ ( X )) = argmin E X × Y [ L ( y , ϕ ( X ))] ϕ ( X ) ∈ K ● Under 1 / 0 loss function L , minimizing R equivalent to : φ ( x ∗ | X , y ) : = argmax P ( y = m k | X = x ∗ ) (2) m k ∈ K ● Where : y ∗ = φ ( x ∗ | X , y ) is the most probable 1. The predicted class � (equation (2)). 2. This last equation (2) is also known as Bayes classifier [1, pp. 21]. 6 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Decision Making Discriminant Analysis Decision Making in Statistic ● In statistic : classification rule often seen as a decision-making problem under risk of getting missclassification. (1) R ( y , ϕ ( X )) = argmin E X × Y [ L ( y , ϕ ( X ))] ϕ ( X ) ∈ K ● Under 1 / 0 loss function L , minimizing R equivalent to : φ ( x ∗ | X , y ) : = argmax P ( y = m k | X = x ∗ ) (2) m k ∈ K ● Where : y ∗ = φ ( x ∗ | X , y ) is the most probable 1. The predicted class � (equation (2)). 2. This last equation (2) is also known as Bayes classifier [1, pp. 21]. 6 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Decision Making Discriminant Analysis Decision Making in Statistic Definition (Preference ordering [5, pp. 47]) With general loss L ( · , · ) , m a is preferred to m b , denoted by m a ≻ m b , if and only if : E P [ L ( · , m a ) | x ∗ ] < E P [ L ( · , m b ) | x ∗ ] In the particular case where L ( · , · ) is the 0 / 1 loss function we get : ⇒ P ( y = m a | X = x ∗ ) P ( y = m b | X = x ∗ ) > 1 m a � m b ⇐ where P ( Y = m a | X = x ∗ ) is the class probability. We then take the maximal element of the complete order � , i.e. ⇒ P ( y = m i K | x ∗ ) ≥ .... ≥ P ( y = m i 1 | x ∗ ) m i K � m i K − 1 � .... � m i 1 ⇐ 7 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Decision Making Discriminant Analysis (Precise) Discriminant Analysis Applying Baye’s rules to P ( Y = m a | X = x ∗ ) : P ( X = x ∗ | y = m k ) P ( y = m k ) P ( y = m k | X = x ∗ ) = � m l ∈ K P ( X = x ∗ | y = m l ) P ( y = m l ) K � where π k : = P Y = y k such as π j = 1 and G k : = P X | Y = m k ∼ N ( µ k , Σ k ) j A frequentist point estimation : π k = n k � N µ k = 1 n k � � x i , k n k i = 1 1 n k � ( x i , k − x k )( x i , k − x k ) t � Σ k = N − n k i = 1 8 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Overview Classification ● ❍ Decision Making ❍ Discriminant Analysis Imprecise Classification ● ❍ Imprecise Decision ❍ Imprecise Linear discriminant analysis Future work ● Conclusions ● 9 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis Decision Making in Imprecise Probabilities Definition (Partial Ordering by Maximality Criterion) Let P a set of probabilities, then m a is preferred to m b if the cost of exchanging m a with m b have a positive lower expectation : P ∈ P E P [ L ( · , m b ) − L ( · , m a ) | x ∗ ] > 0 ⇒ inf m a ≻ M m b ⇐ if L ( · , · ) is 1 / 0 loss function, so : P ( y = m a | X = x ∗ ) P ( y = m b | X = x ∗ ) > 1 ⇒ inf m a ≻ M m b ⇐ P ∈ P 10 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis Decision Making in Imprecise Probabilities By applying Bayes theorem on P ( y = m a | X = x ∗ ) , so : P ( x ∗ | y = m a ) P ( y = m a ) P ( x ∗ | y = m b ) P ( y = m b ) > 1 inf m a ≻ M m b ⇐ ⇒ P X | y ∈ P 1 , P y ∈ P 2 The resulting set of cautions decisions is : Y M = { m a ∈ K | � ∃ m b : m a ≻ M m b } For instance, if K = { m a , m b , m c } , we can have : � Y M = { m a ≻ M m b , m c ≻ M m b , m a ≻≺ M m c } = { m a , m c } 11 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Classification Imprecise Classification Future work Conclusions Références Imprecise Decision Imprecise Linear discriminant analysis Imprecise Linear Discriminant Analysis (ILDA) Objective : Making imprecise the parameter mean µ k of each Gaussian distribution family G k : = P X | Y = m k ∼ N ( µ k , � Σ ) Assumptions : ● Covariances precisely estimated and Homoscedasticity, i.e. Σ k = Σ : 1 K n k � � ( x i , k − x k )( x i , k − x k ) t � Σ = ( N − K ) k = 1 i = 1 π k = n k ● Prior probabilities precisely estimated : � N 12 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability
Recommend
More recommend