Keepin’ It Real: Semi-Supervised Learning with Realistic Tuning Andrew B. Goldberg Xiaojin Zhu goldberg@cs.wisc.edu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin-Madison
Gap between Semi-Supervised Learning (SSL) research and practical applications Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning
Gap between Semi-Supervised Learning (SSL) research and practical applications Semi-Supervised Learning: Using unlabeled data to build better classifiers Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning
Gap between Semi-Supervised Learning (SSL) research and practical applications Real World • natural language Semi-Supervised Learning: processing Using unlabeled data to • computer vision • web search & IR build better classifiers • bioinformatics • etc Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning
Gap between Semi-Supervised Learning (SSL) research and practical applications Assumptions • manifold? clusters? • low-density gap? • multiple views? Real World Parameters • natural language Semi-Supervised Learning: processing • regularization? Using unlabeled data to • computer vision • graph weights? • web search & IR build better classifiers • kernel parameters? • bioinformatics • etc Model Selection • Little labeled data • Many parameters • Computational costs Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning
Gap between Semi-Supervised Learning (SSL) research and practical applications Assumptions • manifold? clusters? • low-density gap? • multiple views? Wrong choices could hurt performance! Real World Parameters • natural language Semi-Supervised Learning: processing • regularization? Using unlabeled data to • computer vision How can we ensure that SSL is never worse • graph weights? • web search & IR build better classifiers • kernel parameters? than supervised learning? • bioinformatics • etc Model Selection • Little labeled data • Many parameters • Computational costs Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning
O UR F OCUS Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 3
O UR F OCUS • Two critical issues • Parameter tuning • Choosing which (if any) SSL algorithm to use Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 3
O UR F OCUS • Two critical issues • Parameter tuning • Choosing which (if any) SSL algorithm to use • Interested in realistic settings: • Practitioner is given some new labeled and unlabeled data • Must produce the best classifier possible Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 3
O UR C ONTRIBUTIONS Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 4
O UR C ONTRIBUTIONS • Medium-scale empirical study Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 4
O UR C ONTRIBUTIONS • Medium-scale empirical study • Compares one supervised learning (SL) and two SSL methods Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 4
O UR C ONTRIBUTIONS • Medium-scale empirical study • Compares one supervised learning (SL) and two SSL methods • Eight less-familiar NLP tasks, three evaluation metrics Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 4
O UR C ONTRIBUTIONS • Medium-scale empirical study • Compares one supervised learning (SL) and two SSL methods • Eight less-familiar NLP tasks, three evaluation metrics • Experimental protocol explores several real-world settings Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 4
O UR C ONTRIBUTIONS • Medium-scale empirical study • Compares one supervised learning (SL) and two SSL methods • Eight less-familiar NLP tasks, three evaluation metrics • Experimental protocol explores several real-world settings • All parameters are tuned realistically via cross validation Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 4
O UR C ONTRIBUTIONS • Medium-scale empirical study • Compares one supervised learning (SL) and two SSL methods • Eight less-familiar NLP tasks, three evaluation metrics • Experimental protocol explores several real-world settings • All parameters are tuned realistically via cross validation • Findings under these conditions: Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 4
O UR C ONTRIBUTIONS • Medium-scale empirical study • Compares one supervised learning (SL) and two SSL methods • Eight less-familiar NLP tasks, three evaluation metrics • Experimental protocol explores several real-world settings • All parameters are tuned realistically via cross validation • Findings under these conditions: • Each SSL can be worse than SL on some data sets Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 4
O UR C ONTRIBUTIONS • Medium-scale empirical study • Compares one supervised learning (SL) and two SSL methods • Eight less-familiar NLP tasks, three evaluation metrics • Experimental protocol explores several real-world settings • All parameters are tuned realistically via cross validation • Findings under these conditions: • Each SSL can be worse than SL on some data sets • Can achieve agnostic SSL by using cross validation accuracy to select among SL and SSL algorithms Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 4
O UTLINE • Introduce “realistic tuning” for SSL • Empirical study protocol • Data sets • Algorithms • Meta algorithm for SSL model selection • Performance metrics • Results • Conclusions Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 5
SSL W ITH R EALISTIC T UNING Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
SSL W ITH R EALISTIC T UNING • Given labeled and unlabeled data, { ( x 1 , y 1 ) , . . . , ( x l , y l ) , x l +1 , ..., x l + u } how should you set parameters for some algorithm? Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
SSL W ITH R EALISTIC T UNING • Given labeled and unlabeled data, { ( x 1 , y 1 ) , . . . , ( x l , y l ) , x l +1 , ..., x l + u } how should you set parameters for some algorithm? • Tune based on test set performance? Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
SSL W ITH R EALISTIC T UNING • Given labeled and unlabeled data, { ( x 1 , y 1 ) , . . . , ( x l , y l ) , x l +1 , ..., x l + u } how should you set parameters for some algorithm? • Tune based on test set performance? No, this is cheating Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
SSL W ITH R EALISTIC T UNING • Given labeled and unlabeled data, { ( x 1 , y 1 ) , . . . , ( x l , y l ) , x l +1 , ..., x l + u } how should you set parameters for some algorithm? • Tune based on test set performance? No, this is cheating • Use default values based on heuristics/experience? Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
SSL W ITH R EALISTIC T UNING • Given labeled and unlabeled data, { ( x 1 , y 1 ) , . . . , ( x l , y l ) , x l +1 , ..., x l + u } how should you set parameters for some algorithm? • Tune based on test set performance? No, this is cheating • Use default values based on heuristics/experience? May fail on new data Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
SSL W ITH R EALISTIC T UNING • Given labeled and unlabeled data, { ( x 1 , y 1 ) , . . . , ( x l , y l ) , x l +1 , ..., x l + u } how should you set parameters for some algorithm? • Tune based on test set performance? No, this is cheating • Use default values based on heuristics/experience? May fail on new data • k-fold cross validation? Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
SSL W ITH R EALISTIC T UNING • Given labeled and unlabeled data, { ( x 1 , y 1 ) , . . . , ( x l , y l ) , x l +1 , ..., x l + u } how should you set parameters for some algorithm? • Tune based on test set performance? No, this is cheating • Use default values based on heuristics/experience? May fail on new data • k-fold cross validation? Little labeled data, but best available option Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
SSL W ITH R EALISTIC T UNING • Given labeled and unlabeled data, { ( x 1 , y 1 ) , . . . , ( x l , y l ) , x l +1 , ..., x l + u } how should you set parameters for some algorithm? • Tune based on test set performance? No, this is cheating • Use default values based on heuristics/experience? May fail on new data • k-fold cross validation? Little labeled data, but best available option • Cross validation choices: Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
SSL W ITH R EALISTIC T UNING • Given labeled and unlabeled data, { ( x 1 , y 1 ) , . . . , ( x l , y l ) , x l +1 , ..., x l + u } how should you set parameters for some algorithm? • Tune based on test set performance? No, this is cheating • Use default values based on heuristics/experience? May fail on new data • k-fold cross validation? Little labeled data, but best available option • Cross validation choices: • number of folds Andrew B. Goldberg (UW-Madison), SSL with Realistic Tuning 6
Recommend
More recommend