IJCNN2002 May 12-17, 2002 Release from Active Learning / Release from Active Learning / Model Selection Dilemma: Model Selection Dilemma: Optimizing Sample Points and Optimizing Sample Points and Models at the Same Time Models at the Same Time Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan Masashi Sugiyama Hidemitsu Ogawa
2 Supervised Learning: Supervised Learning: Function Approximation Function Approximation Learning target f ( x ) : ˆ x Learned result ( ) : f y y y L 1 M 2 { } M : Samples x m y , = m m 1 = + ε y f ( x ) m m m x x x L 1 2 M { } ˆ x M From , find x m y , f ( ) = m 1 m so that it is as close to as possible f ( x )
3 Active Learning Active Learning Target function Learned result x x x x 1 2 1 2 Location of sample points AFFECTS heavily { } M Determine for optimal generalization x = m m 1 min J G { x m } Generaliza tion error J : G
4 Model Selection Model Selection Target function Learned result Too simple Appropriate Too complex Choice of models AFFECTS heavily (Model refers to, e.g., order of polynomials) Select a model for optimal generalization S min J G ∈ S C Set of model candidates Generaliza tion error C : J : G
5 Simultaneous Optimization of Simultaneous Optimization of Sample Points and Models Sample Points and Models So far, active learning and model selection have been studied thoroughly, but INDEPENDENTLY Simultaneously determine { } M sample points and a model x S = m m 1 for optimal generalization { min J G ∈ x }, S C m Set of model candidates Generaliza tion error C : J : G
6 Active Learning / Model Selection Active Learning / Model Selection Dilemma Dilemma We can NOT directly optimize sample points and models simultaneously by simply combining existing active learning and model selection methods Because… Model should be fixed for active learning Sample points should be fixed for model selection
7 How to Dissolve the Dilemma How to Dissolve the Dilemma C = Model candidates : { S , S , S } 1 2 3 A set of sample points { x } m arg min J G for S arg min J G for S 1 2 { x m } { x m } = (OPT) { x } arg min J m G { x } arg min J G for S m ∈ 3 for all S C { x m } { } M 1. Find sample points that are ( OPT ) x = m m 1 commonly optimal for all models 2. Just perform model selection as usual
8 Is It Just Idealistic? Is It Just Idealistic? No! Commonly optimal sample points surely exist for trigonometric polynomial models Trigonomet ric polynomial model of order n ( ) n ∑ = θ + θ + θ ˆ f ( x ) sin px cos px + 1 2 p 2 p 1 = p 1 From here on, we assume • Least mean squares (LMS) estimate π ∫ = ˆ − • Generalization measure: J G E f ( x ) f ( x ) dx − π Expectatio n over noise : E
9 Theorem Theorem For all trigonometric polynomial models that include learning target function, equidistance sampling gives the optimal generalization capability 1-dimensional input L x x x x 1 2 3 M − π π π 2 M Number of samples M :
10 Multi-Dimensional Input Cases Multi-Dimensional Input Cases 2-dimensional input L x 15 L x x 1 2 Sampling on regular grid is optimal
11 Computer Simulations Computer Simulations (Artificial, Realizable) (Artificial, Realizable) f ∈ Learning target function: S 50 Trigonomet ric polynomial model of order S n : n = Model candidates: K C { S , S , S , , S } 0 1 2 100 Generalization measure: π ∫ = − ˆ 1 J G f ( x ) f ( x ) dx π 2 − π Sampling schemes: Equidistance sampling Random sampling
12 Simulation Results (Large Samples) Simulation Results (Large Samples) = Number of samples 500 = = Noise variance Noise variance 0 . 02 0 . 08 Horizontal: Order of models Vertical: Generalization error Averaged over 100 trials Equidistance sampling outperforms random sampling for all models!
13 Simulation Results (Small Samples) Simulation Results (Small Samples) = Number of samples 230 = = Noise variance Noise variance 0 . 02 0 . 08 Horizontal: Order of models Vertical: Generalization error Averaged over 100 trials With small samples, equidistance sampling performs excellently for all models!
14 Computer Simulations Computer Simulations (Unrealizable) (Unrealizable) Interpolate 600 chaotic series (red) from noisy samples (blue) = Model candidates: K C { S , S , S , , S } 0 1 2 40 Trigonomet ric polynomial model of order S n : n
15 Simulation Results Simulation Results (Unrealizable) (Unrealizable) σ = σ = 2 2 ( M , ) ( 100 , 0 . 07 ) ( M , ) ( 300 , 0 . 04 ) Horizontal: Order of models Averaged over 100 trials Vertical: Test error at all 600 points Equidistance sampling outperforms random sampling for all models!
16 Interpolated Chaotic Series Interpolated Chaotic Series After model selection with equidistance sampling, Selected model : S 13
17 Compared with True Series Compared with True Series We obtained good estimates from sparse data!
18 Conclusions Conclusions Active learning / model selection dilemma: Sample points and models can not be simultaneously optimized by simply combining existing active learning and model selection methods How to dissolve the dilemma: Find commonly optimal sample points for all models Is it realistic? Commonly optimal sample points surely exist for trigonometric polynomial models: equidistance sampling Is it practical? Computer simulations showed that the proposed method works excellently even in unrealizable cases
Recommend
More recommend