Learning theory and Decision trees Lecture 10 David - PowerPoint PPT Presentation

Learning ¡theory ¡and ¡Decision ¡trees ¡ Lecture ¡10 ¡ David ¡Sontag ¡ New ¡York ¡University ¡ Slides adapted from Carlos Guestrin & Luke Zettlemoyer

What ¡about ¡con:nuous ¡hypothesis ¡spaces? ¡ • Con:nuous ¡hypothesis ¡space: ¡ ¡ – |H| ¡= ¡ ∞ ¡ – Infinite ¡variance??? ¡ • Only ¡care ¡about ¡the ¡maximum ¡number ¡of ¡ points ¡that ¡can ¡be ¡classified ¡exactly! ¡

How ¡many ¡points ¡can ¡a ¡linear ¡boundary ¡classify ¡ exactly? ¡(1-‑D) ¡ 2 Points: Yes!! 3 Points: No… etc (8 total)

ShaLering ¡and ¡Vapnik–Chervonenkis ¡Dimension ¡ A ¡ set ¡of ¡points ¡ is ¡ sha$ered ¡by ¡a ¡hypothesis ¡ space ¡H ¡iff: ¡ – For ¡all ¡ways ¡of ¡ spli+ng ¡the ¡examples ¡into ¡ posi:ve ¡and ¡nega:ve ¡subsets ¡ – There ¡exists ¡some ¡ consistent ¡hypothesis ¡h ¡ The ¡ VC ¡Dimension ¡ of ¡H ¡over ¡input ¡space ¡X ¡ – The ¡size ¡of ¡the ¡ largest ¡finite ¡subset ¡of ¡X ¡ shaLered ¡by ¡H ¡

How ¡many ¡points ¡can ¡a ¡linear ¡boundary ¡classify ¡ exactly? ¡(2-‑D) ¡ 3 Points: Yes!! 4 Points: No… etc. [Figure from Chris Burges]

How ¡many ¡points ¡can ¡a ¡linear ¡boundary ¡classify ¡ exactly? ¡(d-‑D) ¡ • A ¡linear ¡classifier ¡∑ j=1..d w j x j ¡ + ¡b ¡ ¡can ¡ represent ¡all ¡ assignments ¡of ¡possible ¡labels ¡to ¡d+1 ¡points ¡ ¡ – But ¡not ¡d+2! ¡ – Thus, ¡VC-‑dimension ¡of ¡d-‑dimensional ¡linear ¡classifiers ¡is ¡ d+1 ¡ – Bias ¡term ¡b ¡required ¡ – Rule ¡of ¡Thumb: ¡number ¡of ¡parameters ¡in ¡model ¡o_en ¡ ( but ¡not ¡always ) ¡matches ¡max ¡number ¡of ¡points ¡ ¡ • Ques:on: ¡Can ¡we ¡get ¡a ¡bound ¡for ¡error ¡as ¡a ¡func:on ¡of ¡ the ¡VC-‑dimension? ¡

PAC ¡bound ¡using ¡VC ¡dimension ¡ • VC ¡dimension: ¡number ¡of ¡training ¡points ¡that ¡can ¡be ¡ classified ¡exactly ¡(shaLered) ¡by ¡hypothesis ¡space ¡H!!! ¡ – Measures ¡relevant ¡size ¡of ¡hypothesis ¡space ¡ • Same ¡bias ¡/ ¡variance ¡tradeoff ¡as ¡always ¡ – Now, ¡just ¡a ¡func:on ¡of ¡VC(H) ¡ • Note: ¡all ¡of ¡this ¡theory ¡is ¡for ¡ binary ¡classifica:on ¡ – Can ¡be ¡generalized ¡to ¡mul:-‑class ¡and ¡also ¡regression ¡

What ¡is ¡the ¡VC-‑dimension ¡of ¡rectangle ¡ classifiers? ¡ • First, ¡show ¡that ¡there ¡are ¡4 ¡points ¡that ¡ can ¡be ¡ shaLered: ¡ • Then, ¡show ¡that ¡no ¡set ¡of ¡5 ¡points ¡can ¡be ¡ shaLered: ¡ [Figures from Anand Bhaskar, Ilya Sukhar]

Generaliza:on ¡bounds ¡using ¡VC ¡dimension ¡ • Linear ¡classifiers: ¡ ¡ – VC(H) ¡= ¡d+1, ¡for ¡ d ¡features ¡plus ¡constant ¡term ¡ b ¡ • Classifiers ¡using ¡Gaussian ¡Kernel ¡ – VC(H) ¡= ¡ ∞ Euclidean distance, squared [Figure from Chris Burges] [Figure from mblondel.org]

Gap ¡tolerant ¡classifiers ¡ • Suppose ¡data ¡lies ¡in ¡R d ¡in ¡a ¡ball ¡of ¡diameter ¡ D ¡ • Consider ¡a ¡hypothesis ¡class ¡H ¡of ¡linear ¡classifiers ¡that ¡can ¡only ¡ classify ¡point ¡sets ¡with ¡margin ¡at ¡least ¡ M ¡ • What ¡is ¡the ¡largest ¡set ¡of ¡points ¡that ¡H ¡can ¡shaLer? ¡ Cannot ¡shaLer ¡these ¡points: ¡ Y=0 Φ =0 Φ =1 Y=+1 D = 2 M = 3/2 Φ =0 Y=0 < M Φ = − 1 Y=-1 Y=0 Φ =0 SVM ¡a@empts ¡to ¡ d, D 2 ✓ ◆ M = 2 γ = 2 1 VC dimension = min minimize ¡ || w || 2 , ¡which ¡ || w || M 2 minimizes ¡VC-‑dimension!!! ¡ [Figure from Chris Burges]

Gap ¡tolerant ¡classifiers ¡ • Suppose ¡data ¡lies ¡in ¡R d ¡in ¡a ¡ball ¡of ¡diameter ¡ D ¡ • Consider ¡a ¡hypothesis ¡class ¡H ¡of ¡linear ¡classifiers ¡that ¡can ¡only ¡ classify ¡point ¡sets ¡with ¡margin ¡at ¡least ¡ M ¡ • What ¡is ¡the ¡largest ¡set ¡of ¡points ¡that ¡H ¡can ¡shaLer? ¡ Y=0 Φ =0 What ¡is ¡R=D/2 ¡for ¡the ¡Gaussian ¡kernel? ¡ Φ =1 Y=+1 R = max || φ ( x ) || x D = 2 p = max φ ( x ) · φ ( x ) M = 3/2 x p = max K ( x, x ) Φ =0 Y=0 x = 1 ! ¡ Φ = − 1 Y=-1 Y=0 Φ =0 d, D 2 ✓ ◆ VC dimension = min M 2 [Figure from Chris Burges]

What ¡you ¡need ¡to ¡know ¡ • Finite ¡hypothesis ¡space ¡ – Derive ¡results ¡ – Coun:ng ¡number ¡of ¡hypothesis ¡ • Complexity ¡of ¡the ¡classifier ¡depends ¡on ¡number ¡of ¡ points ¡that ¡can ¡be ¡classified ¡exactly ¡ – Finite ¡case ¡– ¡number ¡of ¡hypotheses ¡considered ¡ – Infinite ¡case ¡– ¡VC ¡dimension ¡ – VC ¡dimension ¡of ¡gap ¡tolerant ¡classifiers ¡to ¡jus:fy ¡SVM ¡ • Bias-‑Variance ¡tradeoff ¡in ¡learning ¡theory ¡

Decision ¡Trees ¡

Machine ¡Learning ¡in ¡the ¡ER ¡ Physician documentation Triage Information Specialist consults MD comments (blood pressure, heart (free text) rate, temperature, …) 2 hrs 30 min T=0 Repeated vital signs Disposition (continuous values) Measured every 30 s Lab results (Continuous valued)

Can ¡we ¡predict ¡infec:on? ¡ Physician documentation Specialist consults Triage Information (blood pressure, heart MD comments rate, temperature, …) (free text) Many crucial decisions about a patient’s care are Repeated vital signs made here! (continuous values) Measured every 30 s Lab results (Continuous valued)

Can ¡we ¡predict ¡infec:on? ¡ • Previous ¡automa:c ¡approaches ¡based ¡on ¡simple ¡criteria: ¡ – Temperature ¡< ¡96.8 ¡°F ¡or ¡> ¡100.4 ¡°F ¡ – Heart ¡rate ¡> ¡90 ¡beats/min ¡ – Respiratory ¡rate ¡> ¡20 ¡breaths/min ¡ • Too ¡simplified… ¡e.g., ¡heart ¡rate ¡depends ¡on ¡age! ¡

Can ¡we ¡predict ¡infec:on? ¡ • These ¡are ¡the ¡aLributes ¡we ¡have ¡for ¡each ¡pa:ent: ¡ – Temperature ¡ – Heart ¡rate ¡(HR) ¡ – Respiratory ¡rate ¡(RR) ¡ – Age ¡ – Acuity ¡and ¡pain ¡level ¡ – Diastolic ¡and ¡systolic ¡blood ¡pressure ¡(DBP, ¡SBP) ¡ – Oxygen ¡Satura:on ¡(SaO2) ¡ • We ¡have ¡these ¡aLributes ¡+ ¡label ¡(infec:on) ¡for ¡200,000 ¡ pa:ents! ¡ • Let’s ¡ learn ¡to ¡classify ¡infec:on ¡

Predic:ng ¡infec:on ¡using ¡decision ¡trees ¡

Learning theory and Decision trees Lecture 10 David - PowerPoint PPT Presentation

Learning theory and Decision trees Lecture 10 David Sontag New York University Slides adapted from Carlos Guestrin & Luke Zettlemoyer What about con:nuous hypothesis

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Supervised Learning via Decision Trees Lecture 8 Supervised Learning via Decision Trees March

Supervised Learning via Decision Trees Lecture 9 Supervised Learning via Decision Trees March

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Preprocessing data SU P E R VISE D L E AR N IN G W ITH SC IK IT - L E AR N Andreas M ller

Shawna D Nesbitt MD, MS Associate Professor Cardiology Division, Hypertension Section Associate

Introductjon to EHR Data Quality Nicole G Weiskopf, 8/21/18 Learning Objectjves What is data

BIOE 301/362 Lecture 2: Leading Causes of Mortality, Ages 0-4 Geoff Preidis MD/PhD candidate

Paris and Stanford at EPE 2017: Downstream Evaluation of Graph-based Dependency

Semantic Graphs CSE 40657/60657: Natural Language Processing Representing Meaning 1. The boy

Semantic Roles & Semantic Role Labeling Ling571 Deep Processing Techniques for NLP February

Natural Language Processing and Information Retrieval Semantic Role Labeling Alessandro

Sambuz

Useful Links

Newsletter

Mail Us

Learning theory and Decision trees Lecture 10 David - PowerPoint PPT Presentation

Learning theory and Decision trees Lecture 10 David Sontag New York University Slides adapted from Carlos Guestrin & Luke Zettlemoyer What about con:nuous hypothesis

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Supervised Learning via Decision Trees Lecture 8 Supervised Learning via Decision Trees March

Supervised Learning via Decision Trees Lecture 9 Supervised Learning via Decision Trees March

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Preprocessing data SU P E R VISE D L E AR N IN G W ITH SC IK IT - L E AR N Andreas M ller

Shawna D Nesbitt MD, MS Associate Professor Cardiology Division, Hypertension Section Associate

Introductjon to EHR Data Quality Nicole G Weiskopf, 8/21/18 Learning Objectjves What is data

BIOE 301/362 Lecture 2: Leading Causes of Mortality, Ages 0-4 Geoff Preidis MD/PhD candidate

Paris and Stanford at EPE 2017: Downstream Evaluation of Graph-based Dependency

Semantic Graphs CSE 40657/60657: Natural Language Processing Representing Meaning 1. The boy

Semantic Roles &amp; Semantic Role Labeling Ling571 Deep Processing Techniques for NLP February

Natural Language Processing and Information Retrieval Semantic Role Labeling Alessandro

Sambuz

Useful Links

Newsletter

Mail Us

Semantic Roles & Semantic Role Labeling Ling571 Deep Processing Techniques for NLP February