applied machine learning
play

Applied Machine Learning Timon Schroeter Konrad Rieck Soeren - PowerPoint PPT Presentation

Applied Machine Learning Timon Schroeter Konrad Rieck Soeren Sonnenburg Intelligent Data Analysis Group Fraunhofer FIRST http://ida.first.fhg.de/ Timon Schroeter, Konrad Rieck, Sren Sonnenburg Applied Machine Learning 1 22C3, Berlin,


  1. Applied Machine Learning Timon Schroeter Konrad Rieck Soeren Sonnenburg Intelligent Data Analysis Group Fraunhofer FIRST http://ida.first.fhg.de/ Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 1 22C3, Berlin, 27.12.2005

  2. Roadmap • Some Background • SVMs & Kernels • Applications Rationale: Let computers learn, to allow humans to � to automate processes � to understand highly complex data Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 2 22C3, Berlin, 27.12.2005

  3. Example: Spam Classification From: manfred@cse.ucsc.edu From: smartballlottery@hf-uk.org Subject: ML Positions in Santa Cruz Subject: Congratulations Date: 4. December 2004 06:00:37 MEZ Date: 16. December 2004 02:12:54 MEZ We have a Machine Learning position LOTTERY COORDINATOR, at Computer Science Department of INTERNATIONAL PROMOTIONS/PRIZE AWARD DEPARTMENT. the University of California at Santa Cruz SMARTBALL LOTTERY, UK. (at the assistant, associate or full professor level). DEAR WINNER, Current faculty members in related areas: Machine Learning: DAVID HELMBOLD and MANFRED WARMUTH WINNER OF HIGH STAKES DRAWS Artificial Intelligence: BOB LEVINSON DAVID HAUSSLER was one of the main ML researchers in our Congratulations to you as we bring to your notice, the department. He now has launched the new Biomolecular Engineering results of the the end of year, HIGH STAKES DRAWS of department at Santa Cruz SMARTBALL LOTTERY UNITED KINGDOM. We are happy to inform you that you have emerged a winner under the HIGH STAKES DRAWS There is considerable synergy for Machine Learning at Santa SECOND CATEGORY,which is part of our promotional draws. The Cruz: draws were held on15th DECEMBER 2004 and results are being -New department of Applied Math and Statistics with an emphasis officially announced today. Participants were selected on Bayesian Methods http://www.ams.ucsc.edu/ through a computer ballot system drawn from 30,000 -- New department of Biomolecular Engineering names/email addresses of individuals and companies from http://www.cbse.ucsc.edu/ Africa, America, Asia, Australia,Europe, Middle East, and Oceania as part of our International Promotions Program. … … Goal: Classify emails into spam / no spam How? Learn from previously labeled emails! Training: analyze previous emails Application: classify new emails Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 3 22C3, Berlin, 27.12.2005

  4. Problem Formulation ... Natural Plastic Natural Plastic ? +1 -1 +1 -1 The “World”: • Data: Pairs ( x , y ) • Featurevector x • Individual features e.g. x � R • e.g. Volume, Mass, RGB-Channels • Lables y � { +1, -1} • Unknown Target Function y = f( x ) • Unknown Distribution x ~ p( x ) • Objective: Given new x predict y Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 4 22C3, Berlin, 27.12.2005

  5. Premises for Machine Learning • Supervised Machine Learning • Observe N training examples with label • Learn function • Predict label of unseen example • Examples generated from statistical process • Relationship between features and label • Assum ption: unseen examples are generated from same or similar process Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 5 22C3, Berlin, 27.12.2005

  6. Problem Formulation Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 6 22C3, Berlin, 27.12.2005

  7. Problem Formulation • Want model to generalize • Need to find a good level of complexity y test ( ) error training ( ) x complexity • In practice e.g. model / parameter selection via crossvalidation Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 7 22C3, Berlin, 27.12.2005

  8. Example: Natural vs. Plastic Apples Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 8 22C3, Berlin, 27.12.2005

  9. Example: Natural vs. Plastic Apples Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 9 22C3, Berlin, 27.12.2005

  10. Linear Separation property 2 property 1 Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 10 22C3, Berlin, 27.12.2005

  11. Linear Separation ? property 2 property 1 Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 11 22C3, Berlin, 27.12.2005

  12. Linear Separation with Margins ? property 2 property 2 { m a r g i n property 1 property 1 large margin => good generalization Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 12 22C3, Berlin, 27.12.2005

  13. Large Margin Separation Idea: • Find hyperplane that maximizes margin (with ) { m • Use for prediction a r g i n Solution: • Linear combination of examples • many � ’s are zero • Support Vector Machines � Demo Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 13 22C3, Berlin, 27.12.2005

  14. Example: Polynomial Kernel Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 15 22C3, Berlin, 27.12.2005

  15. Support Vector Machines • Dem o: Gaussian Kernel • Many other algorithms can use kernels • Many other application specific kernels Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 16 22C3, Berlin, 27.12.2005

  16. Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification w ith few classes • Regression (real valued) • Novelty / Anomaly Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 17 22C3, Berlin, 27.12.2005

  17. Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression ( real valued) • Novelty / Anomaly Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 18 22C3, Berlin, 27.12.2005

  18. Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression ( real valued) • Novelty / Anomaly Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 19 22C3, Berlin, 27.12.2005

  19. Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression (real valued) • Novelty / Anom aly Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 20 22C3, Berlin, 27.12.2005

  20. Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression (real valued) • Novelty / Anom aly Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 21 22C3, Berlin, 27.12.2005

  21. Many Applications • Handwritten Letter/ Digit recognition • Gene Finding • Drug Discovery • Brain-Computer Interfacing • Intrusion Detection Systems (unsupervised) • Document Classification (by topic, spam mails) • Face/ Object detection in natural scenes • Non-Intrusive Load Monitoring of electric appliances • Company Fraud Detection (Questionaires) • Fake Interviewer identification (e.g. in social studies) • Optimized Disk caching strategies • Speaker recognition (e.g. on tapped phonelines) • … Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 22 22C3, Berlin, 27.12.2005

  22. Will discuss in more Detail: • Handwritten Letter/ Digit recognition • Drug Discovery • Fun examples • Gene Finding • Brain-Computer Interfacing Want to try this at home? • Libsvm (C++) http://www.csie.ntu.edu.tw/~cjlin/libsvm/ • Torch (Java, C++) http://torch.ch • Numarray (Python) http://sourceforge.net/projects/numpy Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 23 22C3, Berlin, 27.12.2005

  23. MNIST Benchmark SVM with polynomial kernel (considers d-th order correlations of pixels) Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 24 22C3, Berlin, 27.12.2005

  24. MNIST Error Rates Timon Schroeter, Konrad Rieck, Sören Sonnenburg Applied Machine Learning 25 22C3, Berlin, 27.12.2005

Recommend


More recommend