Administrative notes • Labs this week: project time. Remember, you need to pass the project in order to pass the course! (See course syllabus.) • We are still unable to upload clicker grades and are waiting for help from UBC IT on this. (We can upload them manually if necessary but hope to avoid this.) Computational Thinking ct.cs.ubc.ca
Administrative notes • March 3: Data mining reading quiz • March 14: Midterm 2 • March 17: In the News call #3 • March 30: Project deliverables and individual report due Computational Thinking ct.cs.ubc.ca
Data mining: finding patterns in data Part 1: Building decision tree classifiers from data Computational Thinking ct.cs.ubc.ca
Learning goals • [CT Building Block] Students will be able to build a simple decision tree • [CT Building Block] Students will be able to describe what considerations are important in building a decision tree Computational Thinking ct.cs.ubc.ca
Why data mining? • The world is awash with digital data; trillions of gigabytes and growing • How many bytes in a gigabyte? Clicker question A. 1 000 000 B. 1 000 000 000 C. 1 000 000 000 000 Computational Thinking ct.cs.ubc.ca
Why data mining? • The world is awash with digital data; trillions of gigabytes and growing • A trillion gigabytes is a zettabyte , or 1 000 000 000 000 000 000 000 bytes Computational Thinking ct.cs.ubc.ca
Why data mining? • More and more, businesses and institutions are using data mining to make decisions, classifications, diagnoses, and recommendations that affect our lives Computational Thinking ct.cs.ubc.ca
Data mining for classification Recall our loan application example Computational Thinking ct.cs.ubc.ca
Data mining for classification • In the loan strategy example, we focused on fairness of different classifiers, but we didn’t focus much on how to build a classifier • Today you’ll learn how to build decision tree classifiers for simple data mining scenarios Computational Thinking ct.cs.ubc.ca
A rooted tree in computer science • Before we get to decision trees, we’ll define what is a tree Computational Thinking ct.cs.ubc.ca
A rooted tree in computer science A collection of nodes such that • one node is the designated root • a node can have zero or more children ; a node with zero children is a leaf • all non-root nodes have a single parent Computational Thinking ct.cs.ubc.ca
A rooted tree in computer science A collection of nodes such that • one node is the designated root • a node can have zero or more children ; a node with zero children is a leaf • all non-root nodes have a single parent • edges denote parent-child relationships • nodes and/or edges may be labeled by data Computational Thinking ct.cs.ubc.ca
A rooted tree in computer science Often but not always drawn with root on top Computational Thinking ct.cs.ubc.ca
Are these rooted trees? Clicker question 2 1 A. 1 but not 2 C. Both 1 and 2 B. 2 both not 1 D. Neither 1 nor 2 Computational Thinking http://jerome.boulinguez.free.fr/english/file/hotpotatoes/familytree.htm ct.cs.ubc.ca
Is this a rooted tree? Clicker question A. Yes B. No C. I’m not sure Computational Thinking http://jerome.boulinguez.free.fr/english/file/hotpotatoes/familytree.htm ct.cs.ubc.ca
Decision trees: trees whose node labels are attributes, edge labels are conditions Computational Thinking ct.cs.ubc.ca
Decision trees: trees whose node labels are attributes, edge labels are conditions Computational Thinking ct.cs.ubc.ca
Decision trees: trees whose node labels are attributes, edge labels are conditions Computational Thinking ct.cs.ubc.ca
Decision trees: trees whose node labels are attributes, edge labels are conditions Computational Thinking ct.cs.ubc.ca
Decision trees: trees whose node labels are attributes, edge labels are conditions Computational Thinking https://gbr.pepperdine.edu/2010/08/how-gerber-used-a- decision-tree-in-strategic-decision-making/ ct.cs.ubc.ca
Decision trees: trees whose node labels are attributes, edge labels are conditions A decision tree for max profit loan strategy colour orange blue credit credit rating rating > 61 < 61 > 50 < 50 approve deny approve deny (Note that some worthy applicants are denied loans, while other unworthy ones get loans) Computational Thinking ct.cs.ubc.ca
Exercise: Construct the decision tree for the “Group Unaware” loan strategy Computational Thinking ct.cs.ubc.ca
Building decision trees from training data • Should you get an ice cream? • You might start out with the following data Weather Wallet Ice Cream? Great Empty No Nasty Empty No Great Full Yes Okay Full Yes Nasty Full No Computational Thinking ct.cs.ubc.ca
Building decision trees from training data • Should you get an ice cream? • You might start out with the following data attributes Weather Wallet Ice Cream? Great Empty No Nasty Empty No Great Full Yes Okay Full Yes Nasty Full No conditions Computational Thinking ct.cs.ubc.ca
Building decision trees from training data • Should you get an ice cream? • You might start out with the following data • You might build a decision tree that looks like this: attributes Weather Wallet Ice Wallet Cream? Empty Full Great Empty No Weather Nasty Empty No No Great Full Yes Nasty Great Okay Okay Full Yes Yes No Yes Nasty Full No conditions Computational Thinking ct.cs.ubc.ca
Shall we play a game? Suppose we want Outlook Temperature Humidity Windy Play? to help a soccer sunny hot high false No league decide whether sunny hot high true No on not to cancel games. overcast hot high false Yes rain mild high false Yes We have some data. rain cool normal false Yes Our goal is a decision rain cool normal true No tree to help officials overcast cool normal true Yes sunny mild high false No make decisions sunny cool normal false Yes Assume that decisions rain mild normal false Yes are the same given the sunny mild normal true Yes same information overcast mild high true Yes overcast hot normal false Yes rain mild high true No Computational Thinking Example adapted from http://www.kdnuggets.com/ data_mining_course/index.html#materials ct.cs.ubc.ca
Create a decision tree Group exercise Outlook Temperature Humidity Windy Play? Create a decision tree sunny hot high false No that decides whether sunny hot high true No the game should be overcast hot high false Yes rain mild high false Yes played or not rain cool normal false Yes The leaf nodes should be rain cool normal true No whether or not to play overcast cool normal true Yes sunny mild high false No The non-leaf nodes sunny cool normal false Yes should be questions rain mild normal false Yes sunny mild normal true Yes The edges should be overcast mild high true Yes values overcast hot normal false Yes rain mild high true No Computational Thinking ct.cs.ubc.ca
Some example potential starts to the decision tree Windy? Outlook? true false Overcast Sunny Rainy Temperature? Windy? Humidity? Humidity? Humidity? … … … … … Computational Thinking ct.cs.ubc.ca
How did you split up your tree and why? Computational Thinking ct.cs.ubc.ca
Here’s that example again Outlook Temperature Humidity Windy Play? Create a decision tree sunny hot high false No that decides whether sunny hot high true No the game should be overcast hot high false Yes rain mild high false Yes played or not rain cool normal false Yes The leaf nodes should be rain cool normal true No whether or not to play overcast cool normal true Yes sunny mild high false No The non-leaf nodes sunny cool normal false Yes should be questions rain mild normal false Yes sunny mild normal true Yes The edges should be overcast mild high true Yes values overcast hot normal false Yes rain mild high true No Computational Thinking ct.cs.ubc.ca
Deciding which nodes go where: A decision tree construction algorithm • Top-down tree construction • At start, all examples are at the root. • Partition the examples recursively by choosing one attribute each time. • In deciding which attribute to split on, one common method is to try to reduce entropy – i.e., each time you split, you should make the resulting groups more homogenous. The more you reduce entropy, the higher the information gain. Computational Thinking ct.cs.ubc.ca
Recommend
More recommend