Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo Institute of Technology
Active Learning (NLP Motivation) • NLP (common scenario) – Large amounts of unlabeled data Large amounts of unlabeled data – Labeling data is expensive • Active Learning (Optimal Experimental D Design) i ) – Allows to select the most informative examples l 1
Supervised Learning as Function Approximation y target f function i f f ( ( x ) ) y y n 1 learned function f f ( ( x ) ) 2 f ( 1 x ) ( ) f x n y 2 x x x x n 1 2 Goal: From training samples obtain that minimizes G q(x) – test input density ( ) t t i t d it 2
Design Cycle (Common) Collect Data (Active Learning) Model Selection Parameter Learning Evaluation There is a problem with this flow 3
Active Learning (AL) Target function Target function Learned function Good inputs G d i t P Poor inputs i t • Choice of training input points can significantly affect the learned function. • Active Learning – choose training input points g g p p so that generalization error is minimized 4
Setting • Linear Model • Least-squares Learning • In AL can’t use training output values g p for estimating generalization error 5
6 Orthogonal Decomposition
Bias Variance Decomposition model error C bias B variance V 7
Active Learning • Nothing can be done about model error C • Bias -> 0; (least-squares is unbiased) Bias -> 0; (least-squares is unbiased) • Minimize variance -> Minimize Error 8
Variance AL (assuming zero bias) (assuming zero bias) 9
Active Learning - Approximation • In general, simultaneous optimizing n points is not tractable • Approximation Approaches: • Optimize points one by one (greedy) Opt e po ts o e by o e (g eedy) • Optimize probability distribution from which points are drawn points are drawn 10
Bias / Variance (no unbiasedness guarantee) f f f bias variance 11
12 Best fit “min error” Bias / Variance
Model Selection (MS) • Model – could be represented by number M d l ld b t d b b and type of basis functions, e.g. • Model Selection – select appropriate Target function model M: Learned function Learned function too complex simple appropriate
Model Selection • Cross-validation: Measure generalization accuracy by testing on data unused during training training • Regularization: Penalize complex models E’=error on data + λ model complexity E error on data λ model complexity Akaike’s information criterion (AIC), Bayesian ( ), y information criterion (BIC) • Minimum description length (MDL): Kolmogorov complexity, shortest description of data l i h d i i f d • Structural risk minimization (SRM) 14
Active Learning with Model Selection Active Learning Model Selection Share the same goal: Active Learning with Model Selection: Possible Approaches: naïve sequential batch Possible Approaches: naïve, sequential, batch 15
Naïve Approach • Naïve Approach – combine existing AL and MS methods N ï A h bi i ti AL d MS th d Naïve approach is not possible due to: ALMS Dilemma – Active Learning – model should be fixed Target function (MS already performed) Learned function [Fedorov 78, MacKay 92, Kanomori and Shimodaira 04] Kanomori and Shimodaira 04] Good inputs Poor inputs – Model Selection – points should be fixed Model Selection points should be fixed (AL already performed) [Akaike 78, Rissanen 78, Schwarz 78] too simple appropriate too complex 16
Sequential Approach Model b f Selection Active plexity) Learning Learning b (comp Optimal points depend on the model n (number of samples) Has a risk of large error Has a risk of large error (due to overfitting to a different model). 17
Batch Approach I i i l MS i Initial MS is not reliable li bl b f Initial Model Selection Active Learning Active Learning mplexity) b (com Final Model Selection n (number of samples) Has a risk of large error Has a risk of large error (due to overfitting to a different model). 18
Motivation – Hedge the Risk of Large Error Active Learning with Model Selection • Naïve – impossible • Batch, Sequential – risk of large error ( (due to overfitting to a different model) f ff ) Goal: Hedge the risk of large error g g (minimize risk of overfitting to a different model) 19
Ensemble Active Learning Approach (Proposed) • Hedge the risk of overfitting by designing H d th i k f fitti b d i i input points for all of the models. Criterion EAL G G 1 G 2 G CEAL Data EAL Location of training points X X DEAL X 1 X 2 20
Evaluation D – D-EAL C – C-EAL B – Batch S S – Sequential S ti l P - Passive proposed d • Compares favorably with existing methods • Compares favorably with existing methods 21 o Minimized worst case performance (in most cases) o Surprisingly, improved average performance (in some cases)
Current / Future Work • Improving AL by utilizing existing data • My work mostly deals with theoretical aspects. • I am also looking for practical applications. – If you have any problems that involve active If h bl th t i l ti learning, I would be very glad to help. 22
References • Sugiyama. Active learning in approximately linear regression based on conditional expectation of generalization error. JMLR 2006 • Bishop, Pattern Recognition and Machine Learning • Alpaydin, Introduction to Machine Learning payd , oduc o o ac e ea g • Rubens, Sugiyama. Coping with active learning with model selection dilemma: Minimizing with model selection dilemma: Minimizing expected generalization error. IBIS 2006 23
24
25
26
27
28
29
30
31
32
33
34
35
Recommend
More recommend