Twelve Key Ideas In Machine Learning Pedro Domingos Dept. of - PowerPoint PPT Presentation

Twelve Key Ideas In Machine Learning Pedro Domingos Dept. of Computer Science & Eng. University of Washington

Traditional Programming Data Output Computer Program Machine Learning Data Program Computer Output

Example: Classification l Classifier l Input: Vector of discrete/numeric values (features) l Output: Class l Example: Spam filter l Learner l Input: Training set of (input, output) examples l Output: Classifier l Test: Predictions on new examples

1. Learning = Representation + Evaluation + Optimization l Thousands of learning algorithms l Combinations of just three elements Representation Evaluation Optimization Instances Accuracy Greedy search Hyperplanes Precision/Recall Branch & bound Decision trees Squared error Gradient descent Sets of rules Likelihood Quasi-Newton Neural networks Posterior prob. Linear progr. Graphical models Margin Quadratic progr. Etc. Etc. Etc.

2. It ’ s Generalization that Counts l Test examples never seen before l Training examples can just be memorized l Set data aside to test l Don ’ t tune parameters on test data l Use cross-validation l No access to optimization goal l Local optimum may be fine

3. Data Alone Is Not Enough l Classes of unseen examples are arbitrary l So learner must make assumptions l “ No free lunch ” theorems l Luckily, real world is not random l Induction is knowledge lever

4. Overfitting Has Many Faces l Overfitting = Hallucinating patterns = Chosen classifier not best on test l The biggest problem in machine learning l Bias and variance l Less powerful learners can be better l Solutions l Cross-validation l Regularization

5. Intuition Fails In High Dimensions l Curse of dimensionality l Sparseness worsens exponentially with number of features l Irrelevant features ruin similarity l In high dimensions all examples look alike l 3D intuitions do not apply in high dimensions l Blessing of non-uniformity

6. Theoretical Guarantees Are Not What They Seem l Bounds on number of examples needed to ensure good generalization l Extremely loose l Low training error ≠ > Low test error l Asymptotic guarantees may be misleading l Theory is useful for algorithm design, not evaluation

7. Feature Engineering Is the Key l Most effort in ML projects is constructing features l Black art: Intuition, creativity required l ML is iterative process

8. More Data Beats A Cleverer Algorithm l Easiest way to improve: More data l Then: Data is bottleneck l Now: Scalability is bottleneck l ML algorithms more similar than they appear l Clever algorithms require more effort but can pay off in the end l Biggest bottleneck is human time

9. Learn Many Models, Not Just One l Three stages of machine learning Try variations of one algorithm, chose one 1. Try variations of many algorithms, choose one 2. Combine many algorithms, variations 3. l Ensemble techniques l Bagging l Boosting l Stacking l Etc.

10. Simplicity Does Not Imply Accuracy l Occam ’ s razor l Common misconception: Simpler classifiers are more accurate l Contradicts “ no free lunch ” theorems l Counterexamples: ensembles, SVMs, etc. l Can make preferred hypotheses shorter

11. Representable Does Not Imply Learnable l Standard claim: “ My language can represent/approximate any function ” l No excuse for ignoring others l Causes of non-learnability l Not enough data l Not enough components l Not enough search l Some representations exponentially more compact than others

12. Correlation Does Not Imply Causation l Predictive models are guides to action l Often interpreted causally l Observational vs. experimental data l Correlation → Further investigation

To Learn More l Article: P. Domingos, “ A Few Useful Things to Know About Machine Learning, ” Communications of the ACM , October 2012 (Free version on my Web page) l Online course: https://www.coursera.org/course/machlearning

Twelve Key Ideas In Machine Learning Pedro Domingos Dept. of - PowerPoint PPT Presentation

Twelve Key Ideas In Machine Learning Pedro Domingos Dept. of Computer Science & Eng. University of Washington Traditional Programming Data Output Computer Program Machine Learning Data Program Computer Output Example:

2017 Twelve Year Program 1 PAs Twelve Year Transportation Program PA Planning Partners 3

Results Presentation for the twelve months ended 31 March 2011 SALIENT FEATURES Twelve months

Todays Speakers About Us: Twelve Tone Consulting Twelve Tone Consulting is a niche

Maison Louis Latour Maison Louis Latour Masterclass Twelve Innovative Wines 1 Maison Louis

FY2005 Financial Results FY2005 Financial Results Twelve months ended March 31, 2005 Twelve

Conformal Finite Size Scaling of Conformal Finite Size Scaling of Flavors Chik Him Wong Twelve

Social History of Ideas Social History of Ideas Historians have a rich appreciation of ideas

High Twelve International Speech Purpose: The intent of this speech is for a stand-alone

Twelve months ended December 2014 Investors & Analysts Presentation DISCLAIMER This

FirstGroup plc Full year results For the twelve months to 31 March 2016 Tuesday 14 June 2016 14

FirstGroup plc Full year results For the twelve months to 31 March 2017 Thursday 1 June 2017 1

1 I am writing to the twelve tribesJewish believers scattered abroad. So James is not

Membrane Computing at Twelve Years Gheorghe P aun Romanian Academy, Bucure sti, RGNC,

CAP Twelve Years Later: How the Rules Have Changed Presenters: - Ayushi Bansal -

www.UNHistory.org www.UNHistory.org The Power of Ideas The Power of Ideas UNIHP Book Series

Innovative Ideas to Engage Agents Will Bickmore & Sarah-Lynne Rand Senior Account Managers

Dark Matter Radio (DM Radio) Kent Irwin for the DM Radio Collaboration DM Radio Pathfinder

For Monday No reading No homework Program 1 Questions? Homework Decision Tree

1 Hypothesis Space Inductive Learning Hypothesis Restrict learned functions a priori to a

Testing by Implicit Learning Ilias Diakonikolas Columbia University March 2009 2 What this

PSYC 001 : General Psychology If this is your first day, see an instructor after class.

Efficient Tensor Decomposition and Its Application Naoki KAWASHIMA (ISSP) Dec. 3, 2018 Occam's

Machine Learning @Quora: Beyond Deep Learning 08/02/2016 Xavier Amatriain (@xamat) Our Mission

Computational challenges & opportunities P. Perona California Institute of Technology 4

Twelve Key Ideas In Machine Learning Pedro Domingos Dept. of - PowerPoint PPT Presentation

Twelve Key Ideas In Machine Learning Pedro Domingos Dept. of Computer Science & Eng. University of Washington Traditional Programming Data Output Computer Program Machine Learning Data Program Computer Output Example:

2017 Twelve Year Program 1 PAs Twelve Year Transportation Program PA Planning Partners 3

Results Presentation for the twelve months ended 31 March 2011 SALIENT FEATURES Twelve months

Todays Speakers About Us: Twelve Tone Consulting Twelve Tone Consulting is a niche

Maison Louis Latour Maison Louis Latour Masterclass Twelve Innovative Wines 1 Maison Louis

FY2005 Financial Results FY2005 Financial Results Twelve months ended March 31, 2005 Twelve

Conformal Finite Size Scaling of Conformal Finite Size Scaling of Flavors Chik Him Wong Twelve

Social History of Ideas Social History of Ideas Historians have a rich appreciation of ideas

High Twelve International Speech Purpose: The intent of this speech is for a stand-alone

Twelve months ended December 2014 Investors &amp; Analysts Presentation DISCLAIMER This

FirstGroup plc Full year results For the twelve months to 31 March 2016 Tuesday 14 June 2016 14

FirstGroup plc Full year results For the twelve months to 31 March 2017 Thursday 1 June 2017 1

1 I am writing to the twelve tribesJewish believers scattered abroad. So James is not

Membrane Computing at Twelve Years Gheorghe P aun Romanian Academy, Bucure sti, RGNC,

CAP Twelve Years Later: How the Rules Have Changed Presenters: - Ayushi Bansal -

www.UNHistory.org www.UNHistory.org The Power of Ideas The Power of Ideas UNIHP Book Series

Innovative Ideas to Engage Agents Will Bickmore &amp; Sarah-Lynne Rand Senior Account Managers

Dark Matter Radio (DM Radio) Kent Irwin for the DM Radio Collaboration DM Radio Pathfinder

For Monday No reading No homework Program 1 Questions? Homework Decision Tree

1 Hypothesis Space Inductive Learning Hypothesis Restrict learned functions a priori to a

Testing by Implicit Learning Ilias Diakonikolas Columbia University March 2009 2 What this

PSYC 001 : General Psychology If this is your first day, see an instructor after class.

Efficient Tensor Decomposition and Its Application Naoki KAWASHIMA (ISSP) Dec. 3, 2018 Occam's

Machine Learning @Quora: Beyond Deep Learning 08/02/2016 Xavier Amatriain (@xamat) Our Mission

Computational challenges &amp; opportunities P. Perona California Institute of Technology 4

Twelve months ended December 2014 Investors & Analysts Presentation DISCLAIMER This

Innovative Ideas to Engage Agents Will Bickmore & Sarah-Lynne Rand Senior Account Managers

Computational challenges & opportunities P. Perona California Institute of Technology 4