Introduction to Machine Learning Introduction to Machine Learning - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ ∼ schapire

Machine Learning Machine Learning Machine Learning Machine Learning Machine Learning • studies how to automatically learn automatically learn automatically learn automatically learn to make accurate automatically learn predictions predictions predictions predictions based on past observations predictions • classification classification problems: classification classification classification • classify examples into given set of categories new example labeled classification machine learning training rule algorithm examples predicted classification

Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems bioinformatics • bioinformatics bioinformatics bioinformatics bioinformatics • classify proteins according to their function • predict if patient will respond to particular drug/therapy based on microarray profiles • predict if molecular structure is a small-molecule binding site • text categorization (e.g., spam filtering) • fraud detection • optical character recognition • machine vision (e.g., face detection) • natural-language processing (e.g., spoken language understanding) • market segmentation (e.g.: predict if customer will respond to promotion)

Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning • primary goal primary goal primary goal primary goal primary goal: highly accurate accurate predictions on test data accurate accurate accurate not • goal is not not not not to uncover underlying “truth” general purpose automatic • methods should be general purpose general purpose general purpose, fully automatic general purpose automatic automatic and automatic “off-the-shelf” • however, in practice, incorporation of prior, human knowledge prior, human knowledge is crucial prior, human knowledge prior, human knowledge prior, human knowledge • rich interplay between theory theory theory theory and practice practice practice practice theory practice • emphasis on methods that can handle large datasets

Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? • advantages advantages: advantages advantages advantages accurate • often much more accurate accurate accurate than human-crafted rules accurate (since data driven) • humans often incapable of expressing what they know (e.g., rules of English, or how to recognize letters), but can easily classify examples • automatic method to search for hypotheses explaining data • cheap and flexible — can apply to any learning task • disadvantages disadvantages disadvantages disadvantages disadvantages labeled • need a lot of labeled labeled labeled data labeled error prone • error prone error prone error prone — usually impossible to get perfect accuracy error prone • often difficult to discern what was learned

This Talk This Talk This Talk This Talk This Talk • conditions for accurate learning • two state-of-the-art algorithms: • boosting • support-vector machines

Conditions for Accurate Learning Conditions for Accurate Learning Conditions for Accurate Learning Conditions for Accurate Learning Conditions for Accurate Learning

Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil • problem problem problem problem: identify people as good or bad from their problem appearance sex mask cape tie ears smokes class training data training data training data training data training data batman male yes yes no yes no Good robin male yes yes no no no Good alfred male no no yes no no Good penguin male no no yes no yes Bad catwoman female yes no no yes no Bad joker male no no no no no Bad test data test data test data test data test data batgirl female yes yes no yes no ?? riddler male yes no no no no ??

An Example Classifier An Example Classifier An Example Classifier An Example Classifier An Example Classifier tie yes no cape smokes yes no no yes bad bad good good

Another Possible Classifier Another Possible Classifier Another Possible Classifier Another Possible Classifier Another Possible Classifier mask yes no smokes cape yes no no yes ears bad ears sex no no yes yes male female good bad tie cape smokes good no yes no yes yes no bad good bad good good bad • perfectly classifies training data • BUT: intuitively, overly complex

Yet Another Possible Classifier Yet Another Possible Classifier Yet Another Possible Classifier Yet Another Possible Classifier Yet Another Possible Classifier sex female male good bad • overly simple • doesn’t even fit available data

Complexity versus Accuracy on An Actual Dataset Complexity versus Accuracy on An Actual Dataset Complexity versus Accuracy on An Actual Dataset Complexity versus Accuracy on An Actual Dataset Complexity versus Accuracy on An Actual Dataset 50 0.5 ✞ ✝ ✝ ✞ ✝ ✞ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ☎ ✆ ☎ ✆ ✆ ☎ ✆ ☎ ✆ ☎ ✆ ☎ ✆ ☎ ✆ ☎ ✆ ☎ ☎ ✆ ✆ ☎ ☎ ✆ ✆ ☎ ✝ ✞ ✞ ✝ ✝ ✞ ✞ ✝ ✞ ✝ ✞ ✝ ✝ ✞ ✞ ✝ ✞ ✝ 0.55 ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✂ ✄ ✄ ✂ ✝ ✞ ✝ ✞ ✝ ✞ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✂ ✄ ✄ ✂ ✄ ✂ ✂ ✄ ✄ ✂ ✂ ✄ ✞ ✝ ✞ ✝ ✝ ✞ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✂ ✄ ✂ ✄ ✄ ✂ ✄ ✂ ✂ ✄ ✂ ✄ On test data ✝ ✞ ✝ ✞ ✞ ✝ 40 ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✞ ✝ ✝ ✞ ✝ ✞ 0.6 On training data ✂ ✄ ✂ ✄ ✄ ✂ ✂ ✄ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✄ ✂ ✞ ✝ ✝ ✞ ✝ ✞ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✂ ✄ ✄ ✂ ✄ ✂ ✄ ✂ ✝ ✞ ✝ ✞ ✞ ✝ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✄ ✂ ✂ ✄ ✝ ✞ ✝ ✞ ✞ ✝ ✄ ✂ ✄ ✂ ✂ ✄ ✄ ✂ ✂ ✄ ✂ ✄ ✂ ✄ ✄ ✂ ✄ ✂ ✂ ✄ ✂ ✄ ✞ ✝ ✝ ✞ 0.65 ✝ ✞ test error (%) ✝ ✞ ✝ ✞ ✝ ✞ Accuracy ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ ✞ ✝ 30 0.7 ✞ ✝ ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ ✞ ✝ ✝ ✞ ✞ ✝ 0.75 ✝ ✞ ✝ ✞ ✞ ✝ ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ ✞ ✝ 20 ✞ ✝ ✝ ✞ ✝ ✞ 0.8 ✞ ✝ ✞ ✝ ✞ ✝ train ✞ ✝ ✞ ✝ ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ 0.85 ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✝ ✞ ✞ ✝ 10 ✝ ✞ ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ 0.9 ✝ ✞ 0 50 100 ✞ ✝ ✝ ✞ ✝ ✞ ✝ ✞ ✝ ✞ ✞ ✝ complexity (tree size) � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ • classifiers must be expressive enough to fit training data ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ � ✁ ✁ � � ✁ ✁ � � ✁ ✁ � (so that “true” patterns are fully captured) • BUT: classifiers that are too complex may overfit overfit overfit overfit overfit (capture noise or spurious patterns in the data) • problem problem problem problem problem: can’t tell best classifier complexity from training error the central problem • controlling overfitting is the central problem the central problem the central problem of machine the central problem learning

Introduction to Machine Learning Introduction to Machine Learning - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ schapire Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Natural language processing using constraint-based grammars Ann Copestake University of Cambridge

What is Green? What does it mean to be green? Why is being green important?

Lecture 4: Introduction to Classification for NLP Julia Hockenmaier juliahmr@illinois.edu 3324

IP Reputation Analysis of Public Databases and Machine Learning Techniques Jared Lee Lewis

Energy Management Planning for Small Water Systems - Workshop Village of Fox Lake, Illinois

Thursday, October 24, 2019 A n d parents o u r a r e t h e b e s t T ! h ! a

Webinar: Junk Food Ad Bans Ben Reynolds Sustain Fran Bernhardt SUGAR SMART UK Campaign

Text Classification for Web Filtering Fabrizio Sebastiani Istituto di Scienza e Tecnologie

Sambuz

Useful Links

Newsletter

Mail Us