MULTI-TARGET PREDICTION: CHALLENGES AND APPLICATIONS with : N. Bassiliades, W. Groves, M. Laliotis, N. Markantonatos, Grigorios Tsoumakas, F. Markatopoulou, C. & E. Papagiannopoulou, Y. Papanikolaou, School of informatics, E. Spyromitros-Xioufis, I. Tsamardinos, I. Vlahavas, A. Vrekou Aristotle university of Thessaloniki
MULTI-TARGET PREDICTION Tasks Challenges Applications Multi-label learning Exploiting dependencies Multimedia annotation among the targets Video, image, audio, text Multi-target regression Scaling to extreme sizes Gene function prediction Label ranking of output spaces Ecological modelling Multi-task learning Dealing with class Demand forecasting imbalance Collaborative filtering Ensemble pruning Target heterogeneity Dyadic prediction Willem Waegeman, Krzysztof Dembczynski, Eyke Hüllermeier , Multi-Target Prediction, Tutorial @ ICML 2013 2
MULTI-TARGET PREDICTION Tasks Challenges Applications Multi-label learning Exploiting dependencies Multimedia annotation among the targets Video, image, audio, text Multi-target regression Scaling to extreme sizes Gene function prediction Label ranking of output spaces Ecological modelling Multi-task learning Dealing with class Demand forecasting imbalance Collaborative filtering Ensemble pruning Target heterogeneity Dyadic prediction Willem Waegeman, Krzysztof Dembczynski, Eyke Hüllermeier , Multi-Target Prediction, Tutorial @ ICML 2013 3
OUTLINE 1. Deterministic label relationships Exploiting dependencies 2. From multi-label classification among the targets to multi-target regression 3. Semantic indexing of biomedical literature Applications 4. Multi-label classification for instance-based ensemble pruning 4
OUTLINE 1. Deterministic label relationships Papagiannopoulou, C., Tsoumakas, G., Tsamardinos, I. (2015). Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15) Papagiannopoulou, E., Tsoumakas, G., Bassiliades, N. (2015). On Discovering Relationships in Multi-Label Learning via Linked Open Data, In Proceedings of Know@LOD Workshop of ESWC 2. From multi-label classification to multi-target regression 3. Semantic indexing of biomedical literature 4. Multi-label classification for instance-based ensemble pruning 5
MULTI-LABEL LEARNING 𝑌 1 𝑌 2 … 𝑌 𝒒 𝑍 1 𝑍 2 … 𝑍 𝒓 … 12 0 1 … 1 0.12 1 training … -5 1 1 … 0 2.34 9 examples … 40 1 0 … 1 1.22 3 2.18 2 … 8 ? ? … ? unknown instances 1.76 7 … 23 ? ? … ? 𝑟 binary output variables 𝑞 input variables 6
THE SEED tower sky ImageCLEF 2011 challenge Automatic annotation of Flickr images JPG, EXIF information & user tags 99 concepts river flowers 7
Can we post-process the THE QUESTION probabilities in a sound way so that they obey the relationships? Label Relationships Sample Output Positive entailment Label Probability River → Water River 0.7 Car → Vehicle Water 0.5 Mutual exclusion Autumn 0.6 Autumn, Winter, Spring, Summer Winter 0.4 Single person, Small group, Big group, Spring 0.2 No persons Summer 0.1 … … 8
EXTRACTING RELATIONSHIPS Positive entailment Contingency table for labels 𝐵 and 𝐶 𝑏 → 𝑐 is extracted when 𝑉 = 0 𝑐 → 𝑏 is extracted when 𝑈 = 0 𝒃 ¬𝒃 The relationship’s support is 𝑇 S T 𝒄 U V ¬𝒄 Mutual exclusion 𝑏 → ¬𝑐 ∧ 𝑐 → ¬𝑏 is extracted when 𝑇 = 0 The relationship’s support is 𝑈 + 𝑉 Higher order relationships are extracted following the Apriori algorithm paradigm 9
6 labels TOY EXAMPLE A B C D E F 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 0 10 training examples 0 1 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 10
TOY EXAMPLE A B C D E F 1 1 1 0 0 0 Positive entailments 1 1 1 1 0 0 𝑏 → 𝑐 (support 3) 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 11
TOY EXAMPLE A B C D E F 1 1 1 0 0 0 Positive entailments 1 1 1 1 0 0 𝑏 → 𝑐 (support 3) 0 0 0 0 1 0 𝑏 → 𝑑 (support 3) 0 1 1 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 12
TOY EXAMPLE A B C D E F 1 1 1 0 0 0 Positive entailments 1 1 1 1 0 0 𝑏 → 𝑐 (support 3) 0 0 0 0 1 0 𝑏 → 𝑑 (support 3) 0 1 1 0 1 0 𝑐 → 𝑑 (support 5) 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 13
TOY EXAMPLE A B C D E F 1 1 1 0 0 0 Positive entailments 1 1 1 1 0 0 𝑏 → 𝑐 (support 3) 0 0 0 0 1 0 𝑏 → 𝑑 (support 3) 0 1 1 0 1 0 𝑐 → 𝑑 (support 5) 𝑒 → 𝑑 (support 3) 1 1 1 0 0 0 0 1 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 14
TOY EXAMPLE A B C D E F 1 1 1 0 0 0 Positive entailments 1 1 1 1 0 0 𝑏 → 𝑐 (support 3) 0 0 0 0 1 0 𝑏 → 𝑑 (support 3) 0 1 1 0 1 0 𝑐 → 𝑑 (support 5) 𝑒 → 𝑑 (support 3) 1 1 1 0 0 0 0 1 1 1 0 1 Mutual exclusion 0 0 1 1 1 0 {𝐵, 𝐹, 𝐺} (support 9) 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 15
EXPLOITING RELATIONSHIPS: POSITIVE ENTAILMENT 𝑐 ¬𝑐 𝐵 𝐶 Label 𝐵 entails label 𝐶 1 0 𝑏 𝑏 → 𝑐 0 1 ¬𝑏 Generalization 𝐵 1 𝑏 1 → 𝑐, … , 𝑏 𝑙 → 𝑐 𝑐 ¬𝑐 At least one parent of 𝑪 true 1 0 Leak node … 𝐶 otherwise 0 1 To consider other causes of 𝐶 Virtual label equal to 𝐵 𝑙 True where 𝐶 is true and all of its parents are false False in all other training examples 𝑀 𝐶 16
EXPLOITING RELATIONSHIPS: MUTUAL EXCLUSION 𝐶 =true Among 𝑙 labels 𝐵 1 , … , 𝐵 𝑙 𝐵 1 Leak node To cover all training examples, … 𝐶 i.e. to become exhaustive Virtual label equal to 𝑐 ¬𝑐 True where all other parents 𝐵 𝑙 of B are false Only one parent of 𝐶 true 1 0 False in all other examples otherwise 0 1 𝑀 𝐶 17
TOY EXAMPLE Positive entailments Node Node Before Before After After 𝑏 → 𝑐 (support 3) 0.400 0.400 0.022 0.022 𝐵 𝐵 𝑏 → 𝑑 (support 3) 0.350 0.350 0.082 0.082 𝑀𝑓𝑏𝑙𝐵 𝑀𝑓𝑏𝑙𝐵 𝑐 → 𝑑 (support 5) 0.250 0.250 0.096 0.096 𝐶 𝐶 𝑒 → 𝑑 (support 3) 0.600 0.600 0.031 0.031 𝐸 𝐸 Mutual exclusion 0.010 0.010 0.050 0.050 𝑀𝑓𝑏𝑙𝐶𝐸 𝑀𝑓𝑏𝑙𝐶𝐸 {𝐵, 𝐹, 𝐺} (support 9) 0.200 0.200 0.345 0.345 𝐷 𝐷 0.300 0.300 0.064 0.064 𝐺 𝐺 0.850 0.850 0.850 0.850 𝐹 𝐹 0.300 0.300 0.064 0.064 𝑀𝑓𝑏𝑙𝐹𝐺𝐵 𝑀𝑓𝑏𝑙𝐹𝐺𝐵 18
EMPIRICAL STUDY 12 multi-label datasets Relationship discovery Minimum support of 2 – increase exponentially in case of memory issues Learning Binary Relevance + Random Forest with 10 trees Weka, Mulan Inference Virtual evidence insertion, exact inference via clustering algorithm jSMILE library 19
POSITIVE ENTAILMENT IN “MEDICAL” 3 entailment relationships extracted from 978 radiologists’ reports annotated with ICD -9 codes Sup. Congenital obstruction of ureteropelvic junction Hydronephrosis 4 Shortness of breath Renal agenesis and dysgenesis 3 Vomiting alone Renal agenesis and dysgenesis 3 Ureteropelvic junction obstruction is the most common pathologic cause of antenatally detected hydronephrosis 20
MUTUAL EXCLUSION Emotions Enron quiet-still XOR amazed-surprised “ Company Business, Strategy, etc. ” XOR “friendship / affection” In business, sir , one has no friends, only correspondents ~Alexandre Dumas 21
RESULTS: POSITIVE ENTAILMENT Wilcoxon test P-value 0.0156 Minimum Number Number of % MAP Dataset Support of Labels Relations Improvement Bibtex 2 159 11 0.279 Bookmarks 2 208 4 0.068 Enron 2 53 4 0.391 ImageCLEF2011 2 99 28 2.977 ImageCLEF2012 2 94 1 0.168 Medical 2 45 6 2.284 Yeast 2 14 3 1.584 22
RESULTS: MUTUAL EXCLUSION (1/2) Wilcoxon test P-value 0.1099 Dataset Minimum Support Number of Labels Number of Relations % MAP Improvement Bibtex 128 159 76 -1.626 Bookmarks 2048 208 1 -0.068 Emotions 8 6 1 1.424 Enron 2 53 481 -8.434 ImageCLEF2011 32 99 325 1.865 ImageCLEF2012 64 94 278 -2.862 IMDB 2 28 22 4.222 Medical 16 45 31 3.769 Scene 2 6 4 3.023 Slashdot 2 22 23 11.803 TMC2007 2 22 8 6.044 Yeast 2 14 2 1.760 23
RESULTS: MUTUAL EXCLUSION (2/2) Bibtex Enron ImageCLEF2011 ImageCLEF2012 Minimum 128 256 2 32 32 128 64 256 Support Number of 3 48 1 27 8 40 76 22 325 56 Relationships % MAP -1 . 6 3 -8 . 4 0 . 21 1.87 3 . 3 4 - 2.87 0 . 63 0,60 Improvement 24
Recommend
More recommend