machine learning in science and engineering
play

Machine Learning in Science and Engineering Gunnar Rtsch Friedrich - PowerPoint PPT Presentation

Machine Learning in Science and Engineering Gunnar Rtsch Friedrich Miescher Laboratory Max Planck Society Tbingen, Germany http://www.tuebingen.mpg.de/~raetsch 1 Gunnar Rtsch Machine Learning in Science and Engineering CCC Berlin,


  1. Machine Learning in Science and Engineering Gunnar Rätsch Friedrich Miescher Laboratory Max Planck Society Tübingen, Germany http://www.tuebingen.mpg.de/~raetsch 1 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  2. Roadmap • Motivating Examples • Some Background • Boosting & SVMs • Applications Rationale: Let computers learn to automate processes and to understand highly complex data 2 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  3. Example 1: Spam Classification From: manfred@cse.ucsc.edu From: smartballlottery@hf-uk.org Subject: ML Positions in Santa Cruz Subject: Congratulations Date: 4. December 2004 06:00:37 MEZ Date: 16. December 2004 02:12:54 MEZ We have a Machine Learning position LOTTERY COORDINATOR, at Computer Science Department of INTERNATIONAL PROMOTIONS/PRIZE AWARD DEPARTMENT. the University of California at Santa Cruz SMARTBALL LOTTERY, UK. (at the assistant, associate or full professor level). DEAR WINNER, Current faculty members in related areas: Machine Learning: DAVID HELMBOLD and MANFRED WARMUTH WINNER OF HIGH STAKES DRAWS Artificial Intelligence: BOB LEVINSON DAVID HAUSSLER was one of the main ML researchers in our Congratulations to you as we bring to your notice, the department. He now has launched the new Biomolecular Engineering results of the the end of year, HIGH STAKES DRAWS of department at Santa Cruz SMARTBALL LOTTERY UNITED KINGDOM. We are happy to inform you that you have emerged a winner under the HIGH STAKES DRAWS There is considerable synergy for Machine Learning at Santa SECOND CATEGORY,which is part of our promotional draws. The Cruz: draws were held on15th DECEMBER 2004 and results are being -New department of Applied Math and Statistics with an emphasis officially announced today. Participants were selected on Bayesian Methods http://www.ams.ucsc.edu/ through a computer ballot system drawn from 30,000 -- New department of Biomolecular Engineering names/email addresses of individuals and companies from http://www.cbse.ucsc.edu/ Africa, America, Asia, Australia,Europe, Middle East, and Oceania as part of our International Promotions Program. … … Goal: Classify emails into spam / no spam How? Learn from previously classified emails! Training: analyze previous emails Application: classify new emails 3 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  4. Example 2: Drug Design Chemist Inactives F F F N N Cl H O H O H O N N O H O O N S N N O O H O N N Cl OH OH H N OH N H O O Cl OH O Actives 4 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  5. The Drug Design Cycle Chemist Inactives F F F N N Cl H O H O H O N N O H O O N S N N O O H O N N Cl OH OH H N OH N H O O Cl OH O Actives former CombiChem technology 5 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  6. The Drug Design Cycle Learning Inactives F F F N N Cl H O H O H O N N O H O O N S N N O O H O N N Cl OH OH H N OH N H O O Cl OH O Machine Actives former CombiChem former CombiChem technology technology 6 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  7. Example 3: Face Detection 7 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  8. Premises for Machine Learning • Supervised Machine Learning • Observe N training examples with label • Learn function • Predict label of unseen example • Examples generated from statistical process • Relationship between features and label • Assumption: unseen examples are generated from same or similar process 8 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  9. Problem Formulation · · · Natural Plastic Natural Plastic ? +1 -1 +1 -1 The “World”: • Data • Unknown Target Function • Unknown Distribution • Objective Problem: is unknown 9 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  10. Problem Formulation 10 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  11. Example: Natural vs. Plastic Apples 11 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  12. Example: Natural vs. Plastic Apples 12 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  13. Example: Natural vs. Plastic Apples 13 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  14. AdaBoost (Freund & Schapire, 1996) • Idea: • Use simple many “rules of thumb” • Simple hypotheses are not perfect! • Hypotheses combination => increased accuracy • Problems • How to generate different hypotheses? • How to combine them? • Method • Compute distribution on examples • Find hypothesis on the weighted sample • Combine hypotheses linearly: 14 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  15. Boosting: 1st iteration (simple hypothesis) 15 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  16. Boosting: recompute weighting 16 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  17. Boosting: 2nd iteration 17 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  18. Boosting: 2nd hypothesis 18 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  19. Boosting: recompute weighting 19 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  20. Boosting: 3rd hypothesis 20 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  21. Boosting: 4rd hypothesis 21 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  22. Boosting: combination of hypotheses 22 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  23. Boosting: decision 23 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  24. AdaBoost Algorithm 24 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  25. AdaBoost algorithm • Combination of • Decision stumps/trees • Neural networks • Heuristic rules • Further reading • http://www.boosting.org • http://www.mlss.cc 25 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  26. Linear Separation property 2 property 1 26 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  27. Linear Separation ? property 2 property 1 27 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  28. Linear Separation with Margins ? property 2 property 2 { margin property 1 property 1 large margin => good generalization 28 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  29. Large Margin Separation Idea: • Find hyperplane that maximizes margin (with ) { margin • Use for prediction Solution: • Linear combination of examples • many � ’s are zero • Support Vector Machines � Demo 29 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  30. Kernel Trick Linear in Non-linear in Linear in � input space input space feature space 30 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  31. Example: Polynomial Kernel 31 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  32. Support Vector Machines • Demo: Gaussian Kernel • Many other algorithms can use kernels • Many other application specific kernels 32 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  33. Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression (real valued) • Novelty Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties 33 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

  34. Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression (real valued) • Novelty Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties 34 Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

Recommend


More recommend