A meta-learning system for multi-instance classification Gitte Vanwinckelen and Hendrik Blockeel KU Leuven, Belgium
Motivation ● Performed extensive evaluation of multi-instance (MI) learners on datasets from different domains ● Performance of MI algorithms is very sensitive to the application domain ● Can we formalize this knowledge by learning a meta-model?
Outline 1) Motivation 2) What is multi-instance learning? 3) Design principles of meta-model 4) Performance evaluation of mi-learners 5) Meta-learning results 6) Conclusion
MI learning
Relationship instances – bag ● Traditional mi learning – At least one postive instance in a bag – Learn a concept that describes all positive instances (or bags) ● Generalized mi learning – All instances in a bag contribute to its label – Learn a concept that identifies the positive bags
Standard multi-instance learning Drug activity prediction Identifying musky molecule configurations [Dietterich, Artificial Intelligence 1997]
Generalized multi-instance learning Which bags describe a beach ? [J. Amores, Artificial Intelligence '13]
Meta-learning ● Which learner performs best on which MI dataset? ● Construct meta-features from original learning tasks ● Learn a model on meta-dataset (decision tree) ● Nb attributes, size train sets, correlation with output , ... ● Landmarkers: Fast algorithms [ Pfahringer '00] ● Indicate performance of expensive algorithms
Meta-learning with landmarking ● Reduce MI datasets to single-instance datasets based on different MI assumptions ● Standard MI assumption – Label instances with bag label – One-sided noisy dataset ● Collective assumption – All instances contribute equally to the bag label – Average features values over all instances in a bag
MI experiments: Datasets ● SIVAL image classification, CBIR (25) ● Synthetic newsgroups, text classification (20) ● Binary classification UCI datasets (27) – adult, tictactoe,diabetes,transfusion,spam – Iid sampled to create bags – Bag configurations: ½, ⅓, ¼, … ● Evaluation: Area Under ROC curve (AUC)
MI experiments: Algorithms ● Decision trees: SimpleMI-J48, MIWrapper-J48, Adaboost-MITI ● Rule inducer MIRI ● Nearest neighbors: CitationKNN ● OptimalBall ● Diverse Density: MDD, EM-DD, MIDD ● TLD ● Support Vector Machines: mi-SVM, MISMO (NSK) ● Logistic regression: MILR, MILR-C
Performance overview MI algorithms ● Comparison of classifiers over multiple datasets [Demsar '06] ● Are performance differences statistically significant? ● Friedman test with post-hoc Nemenyi test – Ranking of algorithms for each dataset – Average ranks over datasets same domain – Hypothesis test that algorithms perform equally good – Nemenyi test identify statistically equivalent groups of classifiers ● Critical difference diagram
Critical difference diagrams (AUC) UCI Text CBIR
Meta-learning setup ● 14 learners → binary classification tasks for all combinations of learners (one vs one) ● Leave-one-out cross-validation ● Three dataset domains (CBIR, text, UCI datasets) ● Landmarkers (standard and collective assumption) : – Naive Bayes – 1 nearest neighbors – Logistic regression – Decision stump ●
UCI Metamodel based on number of features and noise level Majority classifier wins Meta-model wins
UCI metamodel: Landmarker approach Majority classifier wins Standard MI landmarkers Meta-model wins Collective MI landmarkers Dstump, NB, 1NN, LR
CBIR metamodel: Landmarker approach Majority classifier wins Standard MI landmarkers Meta-model wins Collective MI landmarkers
Relationship landmarkers: logistic regression CBIR UCI Text
Conclusions and future work ● Demonstration large differences MI learner evaluation on different domains ● Not sufficient to evaluate on multiple datasets from same domain ● Larger meta-dataset needed ● Define alternative MI assumptions and translate to SI datasets – e.g. Meta-data assumption (NSK)
Recommend
More recommend