Making Set-valued Predictions in Evidential Classification: A Comparison of Different Approaches Liyao Ma & Thierry Denœux ISIPTA 2019 - 5th July 1
Introduction ● Classification : label predictions Ω = { ω 1 , · · · , ω n } ● Uncertainty → set-valued predictions ● Dempster-Shafer theory ISIPTA 2019 - 5th July 2
Decision making view of classification Precise assignments F = { f ω 1 , · · · , f ω n } ● Precise assignments + complete preorder : Maximum Expected Utility principle ISIPTA 2019 - 5th July 3
Decision making view of classification Precise assignments F = { f ω 1 , · · · , f ω n } ● Precise assignments + complete preorder : Maximum Expected Utility principle ● The uncertain case ❍ Precise assignments + partial preorder ❍ Partial assignments + complete preorder ISIPTA 2019 - 5th July 3
Decision making view of classification Precise assignments F = { f ω 1 , · · · , f ω n } ● Precise assignments + complete preorder : Maximum Expected Utility principle ● The uncertain case ❍ Precise assignments + partial preorder ❍ Partial assignments + complete preorder Partial assignments F = { f A , A ∈ 2 Ω \ {∅}} ISIPTA 2019 - 5th July 3
Two families of decision strategies ● Precise assignments + partial preorder ❍ F = { f ω 1 , · · · , f ω n } ❍ Interval dominance, maximality, weak dominance... ❍ Lack of information → [ E m ( f i ) , E m ( f i )] ❍ Set of non-dominated acts F ∗ = { f ω 1 , f ω 2 } ● Partial assignments + complete preorder ❍ F = { f A , A ∈ 2 Ω \ {∅}} ❍ Generalized maximin, maximax, Hurwicz, minimax regret... ❍ The optimal act F ∗ = { f { ω 1 ,ω 2 } } ISIPTA 2019 - 5th July 4
Defining the utility of set-valued predictions states of nature acts ω 1 ω 2 ω 3 f { ω 1 } 1.0000 0.2000 0.1000 f { ω 2 } 0.2000 1.0000 0.2000 f { ω 3 } 0.1000 0.2000 1.0000 ISIPTA 2019 - 5th July 5
Defining the utility of set-valued predictions states of nature acts ω 1 ω 2 ω 3 f { ω 1 } 1.0000 0.2000 0.1000 f { ω 2 } 0.2000 1.0000 0.2000 f { ω 3 } 0.1000 0.2000 1.0000 f { ω 1 ,ω 2 } ? ? ? f { ω 1 ,ω 3 } ? ? ? f { ω 2 ,ω 3 } ? ? ? f { ω 1 ,ω 2 ,ω 3 } ? ? ? ISIPTA 2019 - 5th July 5
Defining the utility of set-valued predictions ● Ordered Weighted Average (OWA) operator u A , j = F ( { u ij | ω i ∈ A } ) = � | A | ˆ k = 1 w k u A ( k ) j ❍ Tolerance degree of imprecision TOL ( w ) = � | A | | A |− k | A |− 1 w k k = 1 ❍ weights calculation ENT ( w ) := − � | A | max k = 1 w k log w k w s.t. TOL ( w ) = γ � | A | k = 1 w k = 1 ISIPTA 2019 - 5th July 6
Defining the utility of set-valued predictions states of nature acts ω 1 ω 2 ω 3 f { ω 1 } 1.0000 0.2000 0.1000 f { ω 2 } 0.2000 1.0000 0.2000 f { ω 3 } 0.1000 0.2000 1.0000 f { ω 1 ,ω 2 } 0.8400 0.8400 0.1800 f { ω 1 ,ω 3 } 0.8200 0.2000 0.8200 f { ω 2 ,ω 3 } 0.1800 0.8400 0.8400 f { ω 1 ,ω 2 ,ω 3 } 0.7373 0.7455 0.7373 ISIPTA 2019 - 5th July 7
Experimental Comparisons ● UCI and artificial Gaussian data sets ● Classification performances with varying γ ● Performances with noised test sets ● Performances with increasing training set size ISIPTA 2019 - 5th July 8
Conclusions ● Two approaches are contrasted ❍ partial preorder among precise assignments ❍ complete preorder among partial assignments ● the utility of set-valued prediction : OWA ● experimental comparisons ❍ set-valued predictions perform better ❍ cautious rules preferred ISIPTA 2019 - 5th July 9
Thank you! Making Set-valued Predictions in Evidential Classification: A Comparison of Different Approaches Liyao Ma, Thierry Denœux Two families of set-valued decision strategies Partial preorders among precise assignments Patterns are assigned to one and only one of the n classes: F = { f 1 , · · · , f n } decision criterion preference relation E m ( f i ) = � m ( B ) min ω j ∈ B u ij interval dominance f i � ID f j ⇐ ⇒ E m ( f i ) ≥ E m ( f j ) B ⊆ Ω maximality f i � max f j ⇐ ⇒ E m ( f i − f j ) ≥ 0 E m ( f i ) = � m ( B ) max ω j ∈ B u ij B ⊆ Ω weak dominance f i � WD f j ⇐ ⇒ � E m ( f i ) ≥ E m ( f j ) � ∧ � E m ( f i ) ≥ E m ( f j ) � Complete preorders among partial assignments Patterns are assigned partially to a non-empty subset of Ω : F = { f A , A ∈ 2 Ω \ {∅}} - generalized maximin f A i � ∗ f A j ⇐ ⇒ E m ( f A i ) ≥ E m ( f A j ) ⇒ E owa m , β ( f A i ) ≥ E owa - generalized OWA f A i � β f A j ⇐ m , β ( f A j ) - generalized maximax f A i � ∗ f A j ⇐ ⇒ E m ( f A i ) ≥ E m ( f A j ) - generalized minimax regret f A i � r f A j ⇐ ⇒ R ( f A i ) ≤ R ( f A j ) - generalized Hurwicz f A i � α f A j ⇐ ⇒ E m , α ( f A i ) ≥ E m , α ( f A j ) - maximum expected utility f A i � m f A j ⇐ ⇒ EU ( f A i ) ≥ EU ( f A j ) - pignistic criterion f A i � p f A j ⇐ ⇒ E p ( f A i ) ≥ E p ( f A j ) Extending utility matrix via an OWA operator Evaluation of set-valued predictions The extended utility matrix ˆ The classification performance is evaluated by the U ( 2 n − 1 ) × n is crucial to | A | both decision-making and performance evaluation. ENT ( w ) = − � averaged utility in the test set T : w k log w k , The utility of assigning one instance to set A should k = 1 | T | intuitively be a function of those utilities of each pre- 1 subject to TOL ( w ) = γ and � | A | Acc ( T ) = � u F ∗ ˆ i , i ∗ . cise assignments within A : k = 1 w k = 1 . | T | Example: the utility matrix extended by an i = 1 | A | OWA operator with γ = 0.8 u A , j = F ( { u ij | ω i ∈ A } ) = ˆ � w k u A ( k ) j . states of nature Experimental data acts k = 1 ω 1 ω 2 ω 3 Given the DM’s tolerance degree of imprecision f { ω 1 } 1.0000 0.2000 0.1000 2.5 class 1 f { ω 2 } 0.2000 1.0000 0.2000 UCI Balance 2 class 3 class 2 1.5 | A | 0.1000 0.2000 1.0000 | A | − k f { ω 3 } scale dataset 1 � attribute y TOL ( w ) = | A | − 1 w k = γ , f { ω 1 , ω 2 } 0.8400 0.8400 0.1800 and simu- 0.5 0 k = 1 f { ω 1 , ω 3 } 0.8200 0.2000 0.8200 lated Gaussian -0.5 -1 f { ω 2 , ω 3 } 0.1800 0.8400 0.8400 datasets the weights corresponding to the OWA operator are -1.5 f { ω 1 , ω 2 , ω 3 } 0.7373 0.7455 0.7373 obtained by maximizing the entropy -2 -2 -1 0 1 2 3 4 attribute x Experiments Belief functions concerning the states of nature were generated through the DS theory-based neural network classifier. DC1 DC2 DC3 DC4 DC5 DC6 DC7 DC8 DC9 averaged utility γ =0.5 0.9186 0.9188 0.9186 0.9186 0.9186 0.9186 0.9187 0.9187 0.9187 γ =0.6 0.9179 0.9184 0.9176 0.9179 0.9184 0.9176 0.9187 0.9188 0.9188 γ =0.7 0.9059 0.9064 0.9052 0.9059 0.9056 0.9054 0.9190 0.9190 0.9187 Classification γ =0.8 0.9043 0.9032 0.9028 0.9043 0.9030 0.9024 0.9191 0.9191 0.9188 performances γ =0.9 0.9319 0.9325 0.9331 0.9319 0.9192 0.9192 0.9188 0.9339 0.9339 with varying γ γ =1.0 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9194 0.9194 0.9188 (UCI Balance γ =0.5 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 97.44% 97.44% 99.97% % of precision scale dataset) γ =0.6 88.96% 89.47% 88.96% 88.96% 89.18% 89.06% 97.44% 97.44% 99.97% γ =0.7 80.10% 80.77% 80.06% 80.10% 80.22% 80.26% 97.44% 97.44% 99.97% γ =0.8 69.70% 70.14% 69.63% 69.70% 69.82% 69.63% 97.44% 97.44% 99.97% γ =0.9 57.02% 57.76% 57.12% 57.02% 57.38% 57.12% 97.44% 97.44% 99.97% γ =1.0 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 97.44% 97.44% 99.97% Performances with noised test sets (Gaussian dataset) Performances with increasing training set size (Gaussian) 0.95 1 0.95 1 0.9 F1: Maximin, Minimax regret 0.9 F1: Pignistic F1: Maximax 0.85 F1: Hurwicz 0.94 F1: OWA 0.8 Maximin, Minimax regret 0.95 F1: Maximin, Minimax regret 0.8 F2: Interval dominance % of precise predictions Maximax % of precise predictions F1: Maximax F1: Pignistic averaged utility F2: Maximality 0.7 Pignistic 0.93 F1: Hurwicz 0.75 F2: Weak dominance Hurwicz averaged utility F1: OWA 0.6 OWA 0.9 F2: Interval dominance 0.7 Interval dominance 0.92 F2: Maximality 0.5 Maximality F2: Weak dominance 0.65 Weak dominance F1: Maximin, Minimax regret 0.85 0.4 F1: Maximax 0.6 0.91 F1: Pignistic 0.3 F1: Hurwicz 0.55 F1: OWA 0.8 0.9 F2: Interval dominance 0.5 0.2 F2: Maximality F2: Weak dominance 0.45 0.1 0.89 0.75 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 parameter parameter number of training instances number of training instances Conclusions The set-valued predictions induced by a partial preorder turn into precise ones when information becomes more precise. In contrast, the criteria based on a complete preorder can provide set-valued predictions even when uncertainty is quantified by probabilities. Set-valued predictions perform better than precise ones in the case of complex data sets: therefore, the most cautious rules should be preferred in highly uncertain environments. ISIPTA 2019 - 5th July 10
Recommend
More recommend