Cautious label-wise ranking with constraint satisfaction Sébastien Destercke, Yonatan Carlos Carranza Alarcon DA2PL 2018 S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 1 / 26
Some announcements: SUM 2019 When: 16-18 december 2019 Where: Compiègne What: (scalable) undertainty management How: papers (long/short/abstracts) but also tutorials/surveys of particular areas S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 2 / 26
Where is Compiegne? S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 3 / 26
Our approach in a nutshell What? Cautious label-ranking by rank-wise decomposition How? Rank-wise decomposition For each label, predict set of ranks using imprecise probabilities Use CSP to: resolve inconsistencies remove impossible assignments Why? weak information in structured settings more prone to be of use few rank-wise approaches (except score-based) for this problem S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 4 / 26
Introduction Introduction and decomposition S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 6 / 26
Introduction Ranking data - preferences To each instance x correspond an ordering over possible labels Blog theme A blog x can be about Politic ≻ Literature ≻ Movies ≻ . . . Cutomer preferences A customer x may prefer White wine ≻ Red wine ≻ Beer ≻ . . . S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 7 / 26
Introduction Classification problem/data C = { c 1 , c 2 , c 3 } X 1 X 2 X 3 X 4 Y 107 . 1 25 Blue 60 c 3 − 50 10 Red 40 c 1 200 . 6 30 Blue 58 c 2 107 . 1 5 Green 60 c 4 . . . . . . . . . . . . . . . S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 8 / 26
Introduction Label ranking problem/data W = { w 1 , w 2 , w 3 } X 1 X 2 X 3 X 4 Y 107 . 1 25 Blue 60 w 1 ≻ w 3 ≻ w 2 − 50 10 Red 40 w 2 ≻ w 1 ≻ w 3 200 . 6 30 Blue 58 w 1 ≻ w 2 ≻ w 3 w 3 ≻ w 1 ≻ w 2 107 . 1 5 Green 60 . . . . . . . . . . . . . . . Potentially huge output space ( K ! with K labels) → naive extension (one ranking=one class) doomed to fail S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 9 / 26
Introduction One solution: rank-wise decomposition D X 1 X 2 X 3 X 4 Y 107 . 1 25 Blue 60 λ 1 ≻ λ 3 ≻ λ 2 − 50 10 Red 40 λ 2 ≻ λ 3 ≻ λ 1 200 . 6 30 Blue 58 λ 2 ≻ λ 1 ≻ λ 3 107 . 1 5 Green 33 λ 1 ≻ λ 2 ≻ λ 3 . . . . . . . . . . . . . . . D 1 D 2 D 3 X 1 X 4 Y X 1 X 4 Y X 1 X 4 Y 107 . 1 60 1 107 . 1 60 3 107 . 1 60 2 − 50 40 3 − 50 40 1 − 50 40 2 200 . 6 58 2 200 . 6 58 1 200 . 6 58 3 107 . 1 33 1 107 . 1 33 2 107 . 1 33 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . For each label, solve an ordinal regression problem S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 10 / 26
Predicting ranks Predicting candidate ranks S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 12 / 26
Predicting ranks Learning with IP: a crash course Classical case: input space X and output space Y set D = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } of data given x , estimate P ( y | x ) using D P ( y | x ) = information about y when observing x However, estimate ˆ P ( y | x ) of P ( y | x ) can be pretty bad if data are noisy, missing, imprecise estimation is based on little data Replace the estimate ˆ P ( y | x ) by a set P of estimates S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 13 / 26
Predicting ranks X 1 X 1 ? ? X 2 X 2 Lack of information Ambiguity P ( | ?) ∈ [ 0 , 0 . 7 ] P ( | ?) ∈ [ 0 . 49 , 0 . 51 ] P ( | ?) ∈ [ 0 . 3 , 1 ] P ( | ?) ∈ [ 0 . 49 , 0 . 51 ] S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 14 / 26
Predicting ranks Decision with probability sets Probability sets If ℓ ω : Y → R loss function of choice ω ∈ Y , then ω � ω ′ ⇔ P ∈P ( y | x ) E ( ℓ ω ′ − ℓ ω ) ≥ 0 = E ( ℓ ω ′ − ℓ ω ) inf � ⇔ inf P ( y | x ) ( ℓ ω ′ ( y ) − ℓ ω ′ ( y )) ≥ 0 P ∈P ( y | x ) y ∈Y ⇒ if insufficient information, we can have ω �� ω ′ and ω ′ �� ω That is, we can have E ( ℓ ω ′ − ℓ ω ) < 0 and E ( ℓ ω − ℓ ω ′ ) < 0 ⇒ Possibly optimal decisions = maximal element(s) of � S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 15 / 26
Predicting ranks Our choice of ℓ, P What? ℓ = L 1 norm between ranks, loss of predicting rank j if k is true ℓ j ( k ) = | j − k | P described by lower/upper cumulative distributions F , F Why? prediction is guaranteed to be an “interval" of ranks (dedicated CSP models) it corresponds to the set of possible medians (very easy to get) S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 16 / 26
Predicting ranks An example of rank prediction Rank j 1 2 3 4 5 F j 0 . 15 0 . 55 0 . 7 0 . 95 1 F j 0 . 1 0 . 3 0 . 45 0 . 8 1 1 0 . 75 0 . 5 0 . 25 1 2 3 4 5 Predicted rank for label: { 2 , 3 } S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 17 / 26
Making a final prediction Making a final cautious prediction S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 19 / 26
Making a final prediction Inconsistency and assignment reductions Inconsistency Consider four labels λ 1 , λ 2 , λ 3 , λ 4 , then the predicted possible ranks R 1 = { 1 , 3 } , ˆ ˆ R 2 = { 1 , 3 } , ˆ R 3 = { 1 , 3 } , ˆ R 4 = { 2 , 4 } are inconsistent → λ 1 , λ 2 , λ 3 should all take different values Removal of impossible solutions Consider the predictions R 1 = { 1 , 2 } , ˆ ˆ R 2 = { 1 , 2 , 3 } , ˆ R 3 = { 2 } , ˆ R 4 = { 1 , 2 , 3 , 4 } . As λ 3 has to take value 2, λ 1 has to take value { 1 } , . . . until we get ˆ 1 = { 1 } , ˆ 2 = { 3 } , ˆ 3 = { 2 } , ˆ R ′ R ′ R ′ R ′ 4 = { 4 } S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 20 / 26
Making a final prediction Dealing with the issue: CSP modelling A possible assignment ˆ R i ⊆ { 1 , . . . , K } Need to find if each of them can take a different value Exactly what the all different constraint does in CSP So, just apply standard librairies Bonus: if all ˆ R i intervals, efficient (polynomial) algorithms exist S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 21 / 26
Making a final prediction Experiments S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 22 / 26
Making a final prediction Setting Material and method Classification and regression data sets turned into ranking Binary decomposition + Naive imprecise classifier Measuring results quality Completeness (CP) Correctness (CR) R ) = k 2 − � k � k i = 1 | ˆ R i | ˆ i = 1 min ˆ r i − r i | R i | r i ∈ ˆ CP (ˆ CR (ˆ R ) = 1 − k 2 − k 0 . 5 k 2 Max if one ranking possible Equivalent to Spearman footrule if Min if all rankings possible one ranking predicted S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 23 / 26
Making a final prediction An example of results Discretization 6 intervals Discretization 6 intervals 1.0 10.1 9.1 8.1 10.1 7.1 6.1 5.1 4.1 3.1 9.1 2.1 8.1 0.8 7.1 6.1 0.9 5.1 1.1 0.6 4.1 Correctness Correctness 0.8 3.1 0.4 0.7 2.1 0.2 1.1 0.1 0.1 0.2 0.4 0.6 0.8 1.0 0.6 0.7 0.8 0.9 1.0 1.1 Completeness Completeness S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 24 / 26
Making a final prediction Why rank-wise approaches? S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 25 / 26
Making a final prediction Yes, why? different expressiveness when it comes to represent partial predictions, e.g., the set-valued prediction { λ 1 ≻ λ 2 ≻ λ 3 , λ 2 ≻ λ 2 ≻ λ 1 } between three labels is perfectly representable by imprecise ranks, but not through pairwise information or partial orders (i.e., interval-valued scores) not (entirely) clear how to make score-based methods imprecise (IP-SVM?) + need to turn them into imprecise ranks? S. Destercke, Y. Alarcon Cautious label ranking DA2PL 2018 26 / 26
Recommend
More recommend