superset learning and data imprecisiation
play

SUPERSET LEARNING AND DATA IMPRECISIATION Eyke Hllermeier - PowerPoint PPT Presentation

SUPERSET LEARNING AND DATA IMPRECISIATION Eyke Hllermeier Intelligent Systems Group Department of Computer Science University of Paderborn, Germany eyke@upb.de TFML 2017, Krakow, 15-FEB-2017 O UTLI NE PART 1 PART 2 PART 3 Superset


  1. SUPERSET LEARNING AND DATA IMPRECISIATION Eyke Hüllermeier Intelligent Systems Group Department of Computer Science University of Paderborn, Germany eyke@upb.de TFML 2017, Krakow, 15-FEB-2017

  2. O UTLI NE PART 1 PART 2 PART 3 Superset learning Optimistic loss Data minimization imprecisiation What it is about .... A general approach to Using superset learning superset learning .... for weighted learning ... 2

  3. SUPERSET LEARNI NG ... is a specific type of weakly supervised learning , studied under different names in machine learning: - learning from partial labels - multiple label learning - learning from ambiguously labeled examples - ... ... also connected to learning from coarse data in statistics (Rubin, 1976; Heitjan and Rubin, 1991), missing values, data augmentation (Tanner and Wong, 2012). 3

  4. SUPERSET LEARNI NG • Consider a standard setting of supervised learning with instance space X , output space Y , and hypothesis space H • Output values y n 2 Y associated with training instances x n , n = 1 , . . . , N , are not necessarily observed precisely but only characterised in terms of supersets Y n 3 y n . • Set of imprecise/ambiguous/coarse observations is denoted � O = ( x 1 , Y 1 ) , . . . , ( x N , Y N ) • An instantiation of O , denoted D , is obtained by replacing each Y n with a candidate y n ∈ Y n . 4

  5. EXAM PLE: CLASSI FI CATI O N Classes 5

  6. EXAM PLE: CLASSI FI CATI O N Classes one of many instantiations 6

  7. EXAM PLE: REG RESSI O N one of infinitely many instantiations 7

  8. DATA DI SAM BI G UATI O N How to learn from (super)set-valued data? 8

  9. DATA DI SAM BI G UATI O N Classes 9

  10. DATA DI SAM BI G UATI O N Classes 10

  11. DATA DI SAM BI G UATI O N Classes 11

  12. DATA DI SAM BI G UATI O N 12

  13. DATA DI SAM BI G UATI O N 13

  14. DATA DI SAM BI G UATI O N MORE PLAUSIBLE LESS PLAUSIBLE A plausible instantiation that can be fitted A less plausible instantiation, because reasonably well with a LINEAR model! there is no LINEAR model with a good fit! 14

  15. DATA DI SAM BI G UATI O N PLAUSIBLE PLAUSIBLE A plausible instantiation that can be fitted A plausible instantiation that can be fitted quite well with a QUADRATIC model! quite well with a QUADRATIC model! I t all depends on how y ou look at t he dat a! 15

  16. DATA DI SAM BI G UATI O N 8 = { , } 7 6 5 4 3 2 1 0 − 1 − 2 − 3 − 4 − 2 0 2 4 6 8 assume both class distributions to be Gaussian 16

  17. DATA DI SAM BI G UATI O N 8 = { , } plausible 7 instantiation 6 5 quadratic 4 discriminant 3 2 1 0 − 1 − 2 − 3 − 4 − 2 0 2 4 6 8 assume both class distributions to be Gaussian 17

  18. DATA DI SAM BI G UATI O N 8 = { , } implausible 7 instantiation 6 5 4 3 2 1 0 − 1 − 2 − 3 − 4 − 2 0 2 4 6 8 assume both class distributions to be Gaussian 18

  19. DATA DI SAM BI G UATI O N Model identification and data disambiguation should be performed simultaneously: identification DATA MODEL disambiguation ... quite natural from a Bayesian perspective: P ( h, D ) = P ( h ) P ( D | h ) = P ( D ) P ( h | D ) 19

  20. O UTLI NE PART 1 PART 2 PART 3 Superset learning Optimistic loss Data minimization imprecisiation 20

  21. M AXI M UM LI KELI HO O D ESTI M ATI O N Likelihood of a model h ∈ H : ` ( h ) = P ( O , D | h ) = P ( D | h ) P ( O | D , h ) = P ( D | h ) P ( O | D ) Imprecise observation only depends on true data, not on the model. precise imprecisiation imprecise generation MODEL DATA DATA coarsening 21

  22. SUPERSET ASSUM PTI O N Imprecise data is a superset, but no other assumption. precise imprecisiation imprecise generation MODEL DATA DATA ambiguation 22

  23. G ENERALI ZED ERM We derive a principle of generalized empirical risk minimization with the empirical risk N R emp ( h ) = 1 X L ∗ � � Y n , h ( x n ) N n =1 and the optimistic superset loss (OSL) function � L ∗ ( Y, ˆ y ) = min L ( y, ˆ y ) | y ∈ Y . how well the (precise) model fits the imprecise data 23

  24. SPECI AL CASES 24

  25. G ENERALI ZATI O N TO FUZZY DATA 1 1 interval fuzzy interval 25

  26. G ENERALI ZATI O N TO FUZZY DATA LO SS 1 α α -cut 26

  27. G ENERALI ZATI O N TO FUZZY DATA Z 1 L ∗ ⇣ ⌘ LO SS L ∗∗ ( Y, ˆ y ) = [ Y ] α , ˆ y d α 0 N R emp ( h ) = 1 L ∗∗ ⇣ ⌘ X Y n , h ( x n ) RI SK N n =1 27

  28. G ENERALI ZATI O N TO FUZZY DATA L ∗∗ ( Y, ˆ y ) à Huber loss ! 28

  29. G ENERALI ZATI O N TO FUZZY DATA à (generalized) Huber loss ! 29

  30. STRUCTURED O UTPUT PREDI CTI O N Superset learning naturally applies to learning problems with structured outputs , which are often only partially specified and can then be associated with the set of all consistent completions . 30

  31. LABEL RANKI NG ... is the problem to learn a model that maps instances to TOTAL ORDERS over a fixed set of alternatives/labels: A B C D 31

  32. LABEL RANKI NG ... is the problem to learn a model that maps instances to TOTAL ORDERS over a fixed set of alternatives/labels: A D C B � � � (0,37,46,325,1,0) ... likes more ... reads more ... recommends more ... 32

  33. LABEL RANKI NG ... is the problem to learn a model that maps instances to TOTAL ORDERS over a fixed set of alternatives/labels: A C � (0,37,46,325,1,0) Tr aining dat a is t y pic ally inc om plet e! 33

  34. LABEL RANKI NG ... is the problem to learn a model that maps instances to TOTAL ORDERS over a fixed set of alternatives/labels: : set of linear extensions (0,37,46,325,1,0) Tr aining dat a is t y pic ally inc om plet e! 34

  35. LABEL RANKI NG LO SSES K E N D A L L S P E A R M A N 35

  36. EXPERI M ENTAL STUDI ES - Cheng and H. (2015) compare an approach to label ranking based on superset learning with a state-of-the-art label ranker based on the Plackett-Luce model (PL). - Two missing label scenarios: missing at random, top-rank - General conclusion: more robust toward incompleteness authorship glass 1 0.9 0.8 0.5 0.7 0 0 30 60 0 30 60 iris pendigits 1 0.9 0.8 0.5 0.7 0 0 30 60 0 30 60 segement 1 1 0.8 0.5 0.6 0 0 30 60 0 30 60 vovel wine 1 0.95 0.5 0.9 0 0.85 36 0 30 60 0 30 60

  37. O UTLI NE PART 1 PART 2 PART 3 Superset learning Optimistic loss Data minimization imprecisiation 37

  38. DATA I M PRECI SI ATI O N So far: Observations are imprecise/incomplete, and we have to deal with that! Now: Deliberately turn precise into imprecise data, so as to modulate the influence of an observation on the learning process! 38

  39. EXAM PLE W EI G HI NG 39

  40. EXAM PLE W EI G HI NG 40

  41. EXAM PLE W EI G HI NG We suggest an alternative way of weighing examples, namely, via „data imprecisiation“ ... 1 1 1 full support for precise observation 41

  42. EXAM PLE W EI G HI NG 42

  43. EXAM PLE W EI G HI NG weighing through „imprecisiation“ 43

  44. EXAM PLE W EI G HI NG OSL l o s s weighted loss Different ways of (individually) discounting the loss function. In (Lu and H., 2015), we empirically compared standard locally weighted linear regression with this approach and essentially found no difference. 44

  45. EXAM PLE W EI G HI NG We suggest an alternative way of weighing examples, namely, via „data imprecisiation“ ... 1 certainly positive less certainly positive 45

  46. FUZZY M ARG I N LO SSES w=1 w=3/4 w=1/2 w=1/4 w=0 G E N E R A L I Z E D H I N G E L O S S 46

  47. FUZZY M ARG I N LO SSES w=1 w=1 OSL weighted loss w=3/4 w=3/4 w=1/2 w=1/2 w=1/4 w=1/4 w=0 w=0 Different ways of (individually) discounting the loss function. 47

  48. THE HAT LO SS 48

  49. DATA DI SAM BI G UATI O N 49

  50. DATA DI SAM BI G UATI O N 50

  51. DATA DI SAM BI G UATI O N 51

  52. EXPERI M ENTS Robust loss minimization techniques: § Robust truncated-hinge-loss support vector machines (RSVM) trains SVMs with the a truncated version of the hinge loss in order to be more robust toward outliers and noisy data (Wu and Liu, 2007). § One-step weighted SVM (OWSVM) first trains a standard SVM. Then, it weighs each training example based on its distance to the decision boundary and retrains using the weighted hinge loss (Wu and Liu, 2013). § Our approach (FLSVM) is the same as OWSVM, except for the weighted loss: instead of using a simple weighting of the hinge loss, we use the optimistic fuzzy loss. Non-convex optimization problem solved by concave-convex procedure (Yuille and Rangaraja, 2002). 52

  53. EXPERI M ENTAL RESULTS 53

  54. THEO RETI CAL FO UNDATI O NS Under what conditions is (successful) learning in the superset setting actually possible? 54

  55. THEO RETI CAL FO UNDATI O NS 55

  56. THEO RETI CAL FO UNDATI O NS systematic imprecisiation 56

  57. THEO RETI CAL FO UNDATI O NS non-systematicimprecisiation 57

  58. THEO RETI CAL FO UNDATI O NS Liu and Dietterich (2014) consider the ambiguity degree , which is defined as the largest probability that a particular distractor label co-occurs with the true label in multi-class classification: n o � = sup P Y ∼ D s ( x ,y ) ( ` 2 Y ) | ( x , y ) 2 X ⇥ Y , ` 2 Y , p ( x , y ) > 0 , ` 6 = y 58

Recommend


More recommend