active learning with active learning with model selection
play

Active Learning with Active Learning with Model Selection Neil - PowerPoint PPT Presentation

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo Institute of Technology Active Learning (NLP Motivation) NLP (common scenario) Large amounts of unlabeled data Large amounts of unlabeled


  1. Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo Institute of Technology

  2. Active Learning (NLP Motivation) • NLP (common scenario) – Large amounts of unlabeled data Large amounts of unlabeled data – Labeling data is expensive • Active Learning (Optimal Experimental D Design) i ) – Allows to select the most informative examples l 1

  3. Supervised Learning as Function Approximation y target f function i f f ( ( x ) ) y y n 1 learned function f f ( ( x ) ) 2 f ( 1 x ) ( ) f x n y 2 x x x x n 1 2 Goal: From training samples obtain that minimizes G q(x) – test input density ( ) t t i t d it 2

  4. Design Cycle (Common) Collect Data (Active Learning) Model Selection Parameter Learning Evaluation There is a problem with this flow 3

  5. Active Learning (AL) Target function Target function Learned function Good inputs G d i t P Poor inputs i t • Choice of training input points can significantly affect the learned function. • Active Learning – choose training input points g g p p so that generalization error is minimized 4

  6. Setting • Linear Model • Least-squares Learning • In AL can’t use training output values g p for estimating generalization error 5

  7. 6 Orthogonal Decomposition

  8. Bias Variance Decomposition model error C bias B variance V 7

  9. Active Learning • Nothing can be done about model error C • Bias -> 0; (least-squares is unbiased) Bias -> 0; (least-squares is unbiased) • Minimize variance -> Minimize Error 8

  10. Variance AL (assuming zero bias) (assuming zero bias) 9

  11. Active Learning - Approximation • In general, simultaneous optimizing n points is not tractable • Approximation Approaches: • Optimize points one by one (greedy) Opt e po ts o e by o e (g eedy) • Optimize probability distribution from which points are drawn points are drawn 10

  12. Bias / Variance (no unbiasedness guarantee) f f f bias variance 11

  13. 12 Best fit “min error” Bias / Variance

  14. Model Selection (MS) • Model – could be represented by number M d l ld b t d b b and type of basis functions, e.g. • Model Selection – select appropriate Target function model M: Learned function Learned function too complex simple appropriate

  15. Model Selection • Cross-validation: Measure generalization accuracy by testing on data unused during training training • Regularization: Penalize complex models E’=error on data + λ model complexity E error on data λ model complexity Akaike’s information criterion (AIC), Bayesian ( ), y information criterion (BIC) • Minimum description length (MDL): Kolmogorov complexity, shortest description of data l i h d i i f d • Structural risk minimization (SRM) 14

  16. Active Learning with Model Selection Active Learning Model Selection Share the same goal: Active Learning with Model Selection: Possible Approaches: naïve sequential batch Possible Approaches: naïve, sequential, batch 15

  17. Naïve Approach • Naïve Approach – combine existing AL and MS methods N ï A h bi i ti AL d MS th d Naïve approach is not possible due to: ALMS Dilemma – Active Learning – model should be fixed Target function (MS already performed) Learned function [Fedorov 78, MacKay 92, Kanomori and Shimodaira 04] Kanomori and Shimodaira 04] Good inputs Poor inputs – Model Selection – points should be fixed Model Selection points should be fixed (AL already performed) [Akaike 78, Rissanen 78, Schwarz 78] too simple appropriate too complex 16

  18. Sequential Approach Model b f Selection Active plexity) Learning Learning b (comp Optimal points depend on the model n (number of samples) Has a risk of large error Has a risk of large error (due to overfitting to a different model). 17

  19. Batch Approach I i i l MS i Initial MS is not reliable li bl b f Initial Model Selection Active Learning Active Learning mplexity) b (com Final Model Selection n (number of samples) Has a risk of large error Has a risk of large error (due to overfitting to a different model). 18

  20. Motivation – Hedge the Risk of Large Error Active Learning with Model Selection • Naïve – impossible • Batch, Sequential – risk of large error ( (due to overfitting to a different model) f ff ) Goal: Hedge the risk of large error g g (minimize risk of overfitting to a different model) 19

  21. Ensemble Active Learning Approach (Proposed) • Hedge the risk of overfitting by designing H d th i k f fitti b d i i input points for all of the models. Criterion EAL G G 1 G 2 G CEAL Data EAL Location of training points X X DEAL X 1 X 2 20

  22. Evaluation D – D-EAL C – C-EAL B – Batch S S – Sequential S ti l P - Passive proposed d • Compares favorably with existing methods • Compares favorably with existing methods 21 o Minimized worst case performance (in most cases) o Surprisingly, improved average performance (in some cases)

  23. Current / Future Work • Improving AL by utilizing existing data • My work mostly deals with theoretical aspects. • I am also looking for practical applications. – If you have any problems that involve active If h bl th t i l ti learning, I would be very glad to help. 22

  24. References • Sugiyama. Active learning in approximately linear regression based on conditional expectation of generalization error. JMLR 2006 • Bishop, Pattern Recognition and Machine Learning • Alpaydin, Introduction to Machine Learning payd , oduc o o ac e ea g • Rubens, Sugiyama. Coping with active learning with model selection dilemma: Minimizing with model selection dilemma: Minimizing expected generalization error. IBIS 2006 23

  25. 24

  26. 25

  27. 26

  28. 27

  29. 28

  30. 29

  31. 30

  32. 31

  33. 32

  34. 33

  35. 34

  36. 35

Recommend


More recommend