10-701 Machine Learning Decision trees Optional additional - PowerPoint PPT Presentation

10-701 Machine Learning Decision trees Optional additional reading: Mitchell Chapter 3

Types of classifiers • We can divide the large variety of classification approaches into roughly two main types 1. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors 2. Generative: - build a generative statistical model - e.g., Bayesian networks 3. Discriminative - directly estimate a decision rule/boundary - e.g., decision tree

Decision trees • One of the most intuitive classifiers • Easy to understand and construct • Surprisingly, also works very (very) well* Lets build a decision tree! * More on this in future lectures

Structure of a decision tree S school D • Internal nodes D degree correspond to attributes 1 (yes) C citizen 0 (features) (no) G gender • Leafs correspond to C S classification outcome 1 1 0 0 • edges denote assignment yes no yes G 1 0 yes no

Netflix

Dataset Attributes (features) Label Movie Type Length Director Famous actors Liked? m1 Comedy Short Adamson No Yes m2 Animated Short Lasseter No No m3 Drama Medium Adamson No Yes m4 animated long Lasseter Yes No m5 Comedy Long Lasseter Yes No m6 Drama Medium Singer Yes Yes m7 animated Short Singer No Yes m8 Comedy Long Adamson Yes Yes m9 Drama Medium Lasseter No Yes

Building a decision tree Function BuildTree(n,A) // n: samples (rows), A: attributes If empty(A) or all n(L) are the same status = leaf class = most common class in n(L) else status = internal a  bestAttribute(n,A) LeftNode = BuildTree(n(a=1), A \ {a}) RightNode = BuildTree(n(a=0), A \ {a}) end end

Building a decision tree Function BuildTree(n,A) // n: samples (rows), A: attributes If empty(A) or all n(L) are the same n(L): Labels for samples in status = leaf this set class = most common class in n(L) else We will discuss this function status = internal next a  bestAttribute(n,A) LeftNode = BuildTree(n(a=1), A \ {a}) Recursive calls to create left and right subtrees, n(a=1) is RightNode = BuildTree(n(a=0), A \ {a}) the set of samples in n for end which the attribute a is 1 end

Identifying ‘bestAttribute’ • There are many possible ways to select the best attribute for a given set. • We will discuss one possible way which is based on information theory and generalizes well to non binary variables

Entropy • Quantifies the amount of uncertainty associated with a specific probability distribution • The higher the entropy, the less confident we are in the outcome • Definition =  − = = ( ) ( ) log ( ) H X p X c p X c 2 c Claude Shannon (1916 – 2001), most of the work was done in Bell labs

Entropy H(X) • Definition =  − = = ( ) ( ) log ( ) H X p X i p X i 2 i • So, if P(X=1) = 1 then = − = = − = = ( ) ( 1 ) log ( 1 ) ( 0 ) log ( 0 ) H X p x p X p x p X 2 2 = − − = 1 log 1 0 log 0 0 • If P(X=1) = .5 then = − = = − = = ( ) ( 1 ) log ( 1 ) ( 0 ) log ( 0 ) H X p x p X p x p X 2 2 = − − = − = . 5 log . 5 . 5 log . 5 log . 5 1 2 2 2

Interpreting entropy • Entropy can be interpreted from an information standpoint • Assume both sender and receiver know the distribution. How many bits, on average, would it take to transmit one value? • If P(X=1) = 1 then the answer is 0 (we don’t need to transmit anything) • If P(X=1) = .5 then the answer is 1 (either values is equally likely) • If 0<P(X=1)<.5 or 0.5<P(X=1)<1 then the answer is between 0 and 1 - Why?

Expected bits per symbol • Assume P(X=1) = 0.8 • Then P(11) = 0.64, P(10)=P(01)=.16 and P(00)=.04 • Lets define the following code - For 11 we send 0 - For 10 we send 10 - For 01 we send 110 - For 00 we send 1110

Expected bits per symbol • Assume P(X=1) = 0.8 • Then P(11) = 0.64, P(10)=P(01)=.16 and P(00)=.04 • Lets define the following code - For 11 we send 0 so: 01001101110001101110 - For 10 we send 10 can be broken to: 01 00 11 01 11 00 01 10 11 10 - For 01 we send 110 which is: 110 1110 0 110 0 1110 110 10 0 10 - For 00 we send 1110 • What is the expected bits / symbol? (.64*1+.16*2+.16*3+.04*4)/2 = 0.8 • Entropy (lower bound) H(X)=0.7219

Conditional entropy Movie Liked? • Entropy measures the uncertainty in a length Short Yes specific distribution Short No • What if both sender and receiver know Medium Yes something about the transmission? long No • For example, say I want to send the label Long No (liked) when the length is known Medium Yes • This becomes a conditional entropy Short Yes problem: H(Li | Le=v) Long Yes Is the entropy of Liked among movies with Medium Yes length v

Conditional entropy: Examples for specific values Movie Liked? length Lets compute H(Li | Le=v) Short Yes 1. H(Li | Le = S) = .92 Short No Medium Yes long No Long No Medium Yes Short Yes Long Yes Medium Yes

Conditional entropy: Examples for specific values Movie Liked? length Lets compute H(Li | Le=v) Short Yes 1. H(Li | Le = S) = .92 Short No 2. H(Li | Le = M) = 0 Medium Yes 3. H(Li | Le = L) = .92 long No Long No Medium Yes Short Yes Long Yes Medium Yes

Conditional entropy Movie Liked? • We can generalize the conditional entropy length Short Yes idea to determine H( Li | Le) Short No • That is, what is the expected number of Medium Yes bits we need to transmit if both sides know the value of Le for each of the records long No (samples) Long No  • Definition: Medium Yes = = = ( | ) ( ) ( | ) H Y X P X i H Y X i Short Yes i Long Yes We explained how to compute this in Medium Yes the previous slides

Information gain • How much do we gain (in terms of reduction in entropy) from knowing one of the attributes • In other words, what is the reduction in entropy from this knowledge • Definition: IG(Y|X)* = H(Y)-H(Y|X) *IG(X|Y) is always ≥ 0 Proof: Jensen inequality

Where we are • We were looking for a good criteria for selecting the best attribute for a node split • We defined the entropy, conditional entropy and information gain • We will now use information gain as our criteria for a good split • That is, BestAttribute will return the attribute that maximizes the information gain at each node

Building a decision tree Function BuildTree(n,A) // n: samples (rows), A: attributes If empty(A) or all n(L) are the same status = leaf class = most common class in n(L) else Based on information gain status = internal a  bestAttribute(n,A) LeftNode = BuildTree(n(a=1), A \ {a}) RightNode = BuildTree(n(a=0), A \ {a}) end end

Example: Root attribute P(Li=yes) = 2/3 H(Li) = .91 Movie Type Length Director Famous Liked H(Li | T) = actors ? H(Li | Le) = m1 Comedy Short Adamson No Yes H(Li | D) = m2 Animated Short Lasseter No No H(Li | F) = m3 Drama Medium Adamson No Yes m4 animated long Lasseter Yes No m5 Comedy Long Lasseter Yes No m6 Drama Medium Singer Yes Yes M7 animated Short Singer No Yes m8 Comedy Long Adamson Yes Yes m9 Drama Medium Lasseter No Yes

Example: Root attribute P(Li=yes) = 2/3 H(Li) = .91 Movie Type Length Director Famous Liked H(Li | T) = 0.61 actors ? H(Li | Le) = 0.61 m1 Comedy Short Adamson No Yes H(Li | D) = 0.36 m2 Animated Short Lasseter No No H(Li | F) = 0.85 m3 Drama Medium Adamson No Yes m4 animated long Lasseter Yes No m5 Comedy Long Lasseter Yes No m6 Drama Medium Singer Yes Yes M7 animated Short Singer No Yes m8 Comedy Long Adamson Yes Yes m9 Drama Medium Lasseter No Yes

Example: Root attribute P(Li=yes) = 2/3 H(Li) = .91 Movie Type Length Director Famous Liked H(Li | T) = 0.61 actors ? H(Li | Le) = 0.61 m1 Comedy Short Adamson No Yes H(Li | D) = 0.36 m2 Animated Short Lasseter No No H(Li | F) = 0.85 m3 Drama Medium Adamson No Yes m4 animated long Lasseter Yes No m5 Comedy Long Lasseter Yes No IG(Li | T) = .91-.61 = 0.3 m6 Drama Medium Singer Yes Yes IG(Li | Le) = .91-.61 = 0.3 M7 animated Short Singer No Yes IG(Li | D) = .91-.36 = 0.55 m8 Comedy Long Adamson Yes Yes IG(Li | Le) = .91-.85 = 0.06 m9 Drama Medium Lasseter No Yes

10-701 Machine Learning Decision trees Optional additional - PowerPoint PPT Presentation

10-701 Machine Learning Decision trees Optional additional reading: Mitchell Chapter 3 Types of classifiers We can divide the large variety of classification approaches into roughly two main types 1. Instance based classifiers - Use

701 HARRISON Planning Commission Hearing April 30th, 2020 701 HARRISON PROJECT SITE ASSESSOR'S

Hafner valves with Namur interface With the standard MNH 310 701 and MNH 510 701 Hafner offers

Machine Learning Machine Learning 10 10- -701/15 701/15- -781, Fall 2006 781, Fall 2006

Leveraging Supply Chain Finance to Optimize Value Brad Peterson +1 312 701 8568

Negotiating ERP Implementation Agreements for Success Paul Chandler 312 701 8499

North Dakota Public Service Commission www.psc.nd.gov 701-328-2400 Annette Bendish

Contracting for Digital Services Joe Pennell Brad Peterson Partner Partner +1 312 701 8354 +1

3.1 Architecture 3 Systems Alexander Smola Introduction to Machine Learning 10-701

9.1 Overview 9 Deep Learning Alexander Smola Introduction to Machine Learning 10-701

10-701 Machine Learning (Spring 2012) Principal Component Analysis Yang Xu This note is partly

Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Carnegie Mellon

Recitation First recitation tomorrow 56:30 here Linear algebra Geoff Gordon10-701

6.1 Gaussians 6 Bayesian Kernel Methods Alexander Smola Introduction to Machine Learning 10-701

Internet of Things for B2B Connected Devices, Data and Connected Devices, Data and IoT IoT

North Dakota Department of Mineral Resources http://www.oilgas.nd.gov http://www.state.nd.us/ndgs

Products from Brazil and China Inv. Nos. 701-TA-636 and 731-TA-1469-1470 (P) January 29, 2020

NAC-AISS Drama Sharing PCF SPARKLETOTS PRESCHOOL @ ADMIRALTY BLK 501 (KN) Sharing by: Nur Sri

Contacting XD users 71 active TG- projects, after removing staff and training accounts

Last time: , Formula ::= P | | | | where P

It makes no difference whether you suppose that formal and judicial testimony is here intended,

Career Development Workshops for Women in Physics Shobhana Narasimhan Jawaharlal Nehru Centre

Motivating Dramatic Interactions Stefan Rank 1 Paolo Petta 1 , 2 1 Austrian Research Institute for

Accent Location 53.5

Accent reclassification and speech recognition of Afrikaans, Black and White South African English

Sambuz

Useful Links

Newsletter

Mail Us