LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De Moor
Content Content • I. Overview • II. Classification • III. Regression • IV. Unsupervised Learning • V. Time-series • VI. Conclusions and Outlooks
Acknowledgements People Our research is supported by grants from several funding Contributors to LS-SVMlab: agencies and sources: Research Council K.U.Leuven: Concerted Research Action GOA -Mefisto 666 (Mathematical Engineering), •Kristiaan Pelckmans IDO (IOTA Oncology, Genetic networks), several PhD/postdoc & fellow grants; Flemish Government: Fund for Scientific •Johan Suykens Research FWO Flanders (several PhD/postdoc grants, projects G.0407.02 (support vector machines), G.0080.01 (collective •Tony Van Gestel intelligence), G.0256.97 (subspace), G.0115.01 (bio-i and microarrays), G.0240.99 (multilinear algebra), G.0197.02 (power •Jos De Brabanter islands), research communities ICCoS, ANMMM), AWI (Bil. Int. Collaboration South Africa, Hungary and Poland), IWT •Lukas Lukas (Soft4s (softsensors), STWW-Genprom (gene promotor prediction), GBOU McKnow(Knowledge management •Bart Hamers algorithms), Eureka-Impact (MPC-control), Eureka-FLiTE (flutter modeling), several PhD-grants); Belgian Federal •Emmanuel Lambert Government : DWTC (IUAP IV-02 (1996-2001) and IUAP V- 10-29 (2002-2006): Dynamical Systems and Control: Computation, Identification & Modelling), Program Sustainable Development PODO-II (CP-TR-18: Sustainibility effects of Supervisors: Traffic Management Systems); Direct contract research : Verhaert, Electrabel, Elia, Data4s, IPCOS. JS is a professor at •Bart De Moor K.U.Leuven Belgium and a postdoctoral researcher with FWO Flanders. BDM and JWDW are full professors at K.U.Leuven •Johan Suykens Belgium. •Joos Vandewalle
I. Overview I. Overview • Goal of the Presentation 1. Overview & Intuition 2. Demonstration LS-SVMlab 3. Pinpoint research challenges 4. Preparation NIPS 2002 • Research results and challenges • Towards applications • Overview LS-SVMlab
I.2 Overview research I.2 Overview research “Learning, generalization, extrapolation, identification, smoothing, modeling” • Prediction (black box modeling) • Point of view: Statistical Learning, Machine Learning, Neural Networks, Optimization, SVM
I.2 Type, Target, Topic I.2 Type, Target, Topic
I.3 Towards applications I.3 Towards applications • System identification • Financial engineering • Biomedical signal processing • Datamining • Bio-informatics • Textmining • Adaptive signal processing
I.4 LS- -SVMlab SVMlab I.4 LS
I.4 LS- -SVMlab SVMlab (2) (2) I.4 LS • Starting points: – Modularity – Object Oriented & Functional Interface – Basic bricks for advanced research • Website and tutorial • Reproducibility (preprocessing)
II. Classification II. Classification “Learn the decision function associated with a set of labeled data points to predict the values of unseen data” • Least Squares – Support Vector Machines • Bayesian Framework • Different norms • Coding schemes
II.1 Least Squares – – Support vector Machines Support vector Machines II.1 Least Squares (LS- -SVM SVM ( L , a ) ) ) (LS γ 1. Least Squares cost-function + regularization & equality constraints 2. Non-linearity by Mercer kernels (.,.) K σ 3. Primal-Dual Interpretation (Lagrange multipliers) Primal parametric Model: Dual non-parametric Model: = ∑ n → = + + α + + T ( , ) y w x b e y K x x b e σ i i i i i i j i = 1 j
II.1 LS- -SVM SVM ( II.1 LS ( L , a ) L , a ) “Learning representations from relations” < > < > < > , , ... , a a a a a a 1 1 1 2 1 N < > , ... ... ... a a Ω = 2 1 ... ... ... ... < > < > , ... ... , a a a a N N N N
II.2 Bayesian Inference θ θ ( | ) ( ) P X P • Bayes rule (MAP): θ = ( | ) P X ( ) P X • Closed form formulas Approximations: - Hessian in optimum - Gaussian distribution • Three levels of posteriors: α γ Level : ( | , , ) P K X σ 1 γ Level : ( | , ) P K X σ 2 Level : ( | ) P K X σ 3
II.3 SVM formulations & norms II.3 SVM formulations & norms • 1 norm + inequality constraints: SVM extensions to any convex cost-function • 2 norm + equality constraints: LS-SVM weighted versions
II.4 Coding schemes II.4 Coding schemes Multi-class Classification task � (multiple) binary classifiers Labels: … -1 -1 -1 1 … … 1 -1 -1 -1 … … 1 2 4 6 2 1 3 … … 1 2 4 6 2 1 3 … … … 1 -1 1 1 … Encoding Decoding
III. Regression III. Regression “Learn the underlying function from a set of data points and its corresponding noisy targets in order to predict the values of unseen data” • LS-SVM ( L , a ) • Cross-validation (CV) • Bayesian Inference • Robustness
III.1 LS-SVM ( L , a ) • Least Squares cost-function + Regularization & Equality constraints • Mercer kernels • Lagrange multipliers: Primal Parametric � Dual Non-parametric
III.1 LS-SVM ( L , a ) (2) III.1 • Regularization parameter: – Do not fit noise (overfitting)! – trade-off noise and information → sin( 10 ) x = + + ( ) sinc( ) f x x e 5
III.2 Cross- -validation (CV) validation (CV) III.2 Cross “How to estimate generalization power of model?” • Division training set – test set 1 2 3 …. t-1 t … n • Repeated division: Leave-one-out CV (fast implementation) 1 2 3 …. t-2 t-1 t t+1 t+2 … n • L-fold cross-validation 1 2 3…t-l-1 t-l…t+l t+1+l … n • Generalized Cross-validation (GCV): ˆ y y 1 1 [ ] γ = ( | , ) . ... ... S X K σ ˆ y y N N • Complexity criteria: AIC, BIC, …
III.2 Cross- -validation Procedure validation Procedure III.2 Cross (CVP) (CVP) “How to optimize model for optimal generalization performance” • Trade-off fitting – model complexity • Kernel parameters • Optimization routine?
III.1 LS-SVM ( L , a ) (3) III.1 • Kernel type and parameter “Zoölogy as elephantism and non-elephantism” • Model Comparison • By cross-validation or Bayesian Inference
III.3 Applications III.3 Applications “ok, but does it work?” • Soft4s – Together with O. Barrero, L. Hoegaerts, IPCOS (ISMC), BASF, B. De Moor – Soft-sensor • ELIA – Together with O. Barrero, I.Goethals, L. Hoegaerts, I.Markovsky, T. Van Gestel, ELIA, B. De Moor – Prediction short and long term electricity consumption
III.2 Bayesian Inference θ θ ( | ) ( ) P X P • Bayes rule (MAP): θ = ( | ) P X ( ) P X • Closed form formulas • Three levels of posteriors: α γ Level (Model parameters ) : ( | , , ) P K X σ 1 γ Level (Regulariz ation) : ( | , ) P K X σ 2 Level (Model Comparison ) : ( | ) P K X σ 3
III.4 Robustness III.4 Robustness “ How to build good models in the case of non- Gaussian noise or outliers” • Influence function • Breakdown point • How: – De-preciating influence of large residuals – Mean - Trimmed mean – Median • Robust CV, GCV, AIC,…
IV. Unsupervised Learning IV. Unsupervised Learning “Extract important features from the unlabeled data” • Kernel PCA and related methods • Nyström approximation – From Dual to primal – Fixed size LS-SVM
IV.1 Kernel PCA IV.1 Kernel PCA Principal Component Analysis Kernel based PCA y z x
IV.2 Kernel PCA (2) IV.2 Kernel PCA (2) • Primal Dual LS-SVM style formulations • For Kernel PCA, CCA, PLS
IV.2 Nystr Nyströ öm m approximation approximation IV.2 • Sampling of integral equation ∫ φ = λφ ( , ) ( ) ( ) ( ) K x y y p x dx y σ i i ⇓ ⇓ ϕ (.) ≈ N n ∑ ∑ φ = λ φ φ = λ φ ( , ) ( ) ( ) ( , ) ( ) ( ) K x y y y K x y y y σ σ j i i i j i i i = = j 1 j 1 ϕ • Approximating Feature map for (.) = ϕ T ϕ Mercer kernel ( , ) ( ) ( ) K x y x y σ
IV.3 Fixed Size LS- -SVM SVM IV.3 Fixed Size LS ? = ∑ n = φ + + → α + + T ( ) y w x b e ( , ) y K x x b e σ i i i j i i i i = 1 j
V. Time- -series series V. Time “Learn to predict future values given a sequence of past values” • NARX • Recurrent vs. feedforward
V.1 NARX V.1 NARX = ˆ ( , ,..., ) y f y y y − − − 1 1 t t t t l • Reducible to static regression ..., , , , , , ,.... y y y y y y + + + + + 1 2 3 4 5 t t t t t t f • CV and Complexity criteria • Predicting in recurrent mode • Fixed size LS-SVM (sparse representation)
V.1 NARX (2) V.1 NARX (2) Santa Fe Time-series competition
Recommend
More recommend