Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ ∼ schapire
Machine Learning Machine Learning Machine Learning Machine Learning Machine Learning • studies how to automatically learn automatically learn automatically learn automatically learn to make accurate automatically learn predictions predictions predictions predictions based on past observations predictions • classification classification problems: classification classification classification • classify examples into given set of categories new example labeled classification machine learning training rule algorithm examples predicted classification
Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems bioinformatics • bioinformatics bioinformatics bioinformatics bioinformatics • classify proteins according to their function • predict if patient will respond to particular drug/therapy based on microarray profiles • predict if molecular structure is a small-molecule binding site • text categorization (e.g., spam filtering) • fraud detection • optical character recognition • machine vision (e.g., face detection) • natural-language processing (e.g., spoken language understanding) • market segmentation (e.g.: predict if customer will respond to promotion)
Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning • primary goal primary goal primary goal primary goal primary goal: highly accurate accurate predictions on test data accurate accurate accurate not • goal is not not not not to uncover underlying “truth” general purpose automatic • methods should be general purpose general purpose general purpose, fully automatic general purpose automatic automatic and automatic “off-the-shelf” • however, in practice, incorporation of prior, human knowledge prior, human knowledge is crucial prior, human knowledge prior, human knowledge prior, human knowledge • rich interplay between theory theory theory theory and practice practice practice practice theory practice • emphasis on methods that can handle large datasets
Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? • advantages advantages: advantages advantages advantages accurate • often much more accurate accurate accurate than human-crafted rules accurate (since data driven) • humans often incapable of expressing what they know (e.g., rules of English, or how to recognize letters), but can easily classify examples • automatic method to search for hypotheses explaining data • cheap and flexible — can apply to any learning task • disadvantages disadvantages disadvantages disadvantages disadvantages labeled • need a lot of labeled labeled labeled data labeled error prone • error prone error prone error prone — usually impossible to get perfect accuracy error prone • often difficult to discern what was learned
This Talk This Talk This Talk This Talk This Talk • conditions for accurate learning • two state-of-the-art algorithms: • boosting • support-vector machines
Conditions for Accurate Learning Conditions for Accurate Learning Conditions for Accurate Learning Conditions for Accurate Learning Conditions for Accurate Learning
Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil • problem problem problem problem: identify people as good or bad from their problem appearance sex mask cape tie ears smokes class training data training data training data training data training data batman male yes yes no yes no Good robin male yes yes no no no Good alfred male no no yes no no Good penguin male no no yes no yes Bad catwoman female yes no no yes no Bad joker male no no no no no Bad test data test data test data test data test data batgirl female yes yes no yes no ?? riddler male yes no no no no ??
An Example Classifier An Example Classifier An Example Classifier An Example Classifier An Example Classifier tie yes no cape smokes yes no no yes bad bad good good
Another Possible Classifier Another Possible Classifier Another Possible Classifier Another Possible Classifier Another Possible Classifier mask yes no smokes cape yes no no yes ears bad ears sex no no yes yes male female good bad tie cape smokes good no yes no yes yes no bad good bad good good bad • perfectly classifies training data • BUT: intuitively, overly complex
Yet Another Possible Classifier Yet Another Possible Classifier Yet Another Possible Classifier Yet Another Possible Classifier Yet Another Possible Classifier sex female male good bad • overly simple • doesn’t even fit available data
Complexity versus Accuracy on An Actual Dataset Complexity versus Accuracy on An Actual Dataset Complexity versus Accuracy on An Actual Dataset Complexity versus Accuracy on An Actual Dataset Complexity versus Accuracy on An Actual Dataset 50 0.5 ✞ ✝ ✝ ✞ ✝ ✞ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ☎ ✆ ☎ ✆ ✆ ☎ ✆ ☎ ✆ ☎ ✆ ☎ ✆ ☎ ✆ ☎ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ✝ ✞ ✞ ✝ ✝ ✞ ✞ ✝ ✞ ✝ ✞ ✝ ✝ ✞ ✞ ✝ ✞ ✝ 0.55 ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✂ ✄ ✄ ✂ ✝ ✞ ✝ ✞ ✝ ✞ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✂ ✄ ✄ ✂ ✄ ✂ ✂ ✄ ✄ ✂ ✂ ✄ ✞ ✝ ✞ ✝ ✝ ✞ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✂ ✄ ✂ ✄ ✄ ✂ ✄ ✂ ✂ ✄ ✂ ✄ On test data ✝ ✞ ✝ ✞ ✞ ✝ 40 ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✞ ✝ ✝ ✞ ✝ ✞ 0.6 On training data ✂ ✄ ✂ ✄ ✄ ✂ ✂ ✄ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✄ ✂ ✞ ✝ ✝ ✞ ✝ ✞ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✂ ✄ ✄ ✂ ✄ ✂ ✄ ✂ ✝ ✞ ✝ ✞ ✞ ✝ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✂ ✄ ✝ ✞ ✝ ✞ ✞ ✝ ✄ ✂ ✄ ✂ ✂ ✄ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✄ ✂ ✂ ✄ ✂ ✄ ✞ ✝ ✝ ✞ 0.65 ✝ ✞ test error (%) ✝ ✞ ✝ ✞ ✝ ✞ Accuracy ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ ✞ ✝ 30 0.7 ✞ ✝ ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ ✞ ✝ ✝ ✞ ✞ ✝ 0.75 ✝ ✞ ✝ ✞ ✞ ✝ ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ ✞ ✝ 20 ✞ ✝ ✝ ✞ ✝ ✞ 0.8 ✞ ✝ ✞ ✝ ✞ ✝ train ✞ ✝ ✞ ✝ ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ 0.85 ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✝ ✞ ✞ ✝ 10 ✝ ✞ ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ 0.9 ✝ ✞ 0 50 100 ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ complexity (tree size) � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ • classifiers must be expressive enough to fit training data ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � (so that “true” patterns are fully captured) • BUT: classifiers that are too complex may overfit overfit overfit overfit overfit (capture noise or spurious patterns in the data) • problem problem problem problem problem: can’t tell best classifier complexity from training error the central problem • controlling overfitting is the central problem the central problem the central problem of machine the central problem learning
Recommend
More recommend