Administrative Notes October 20, 2016 • If you’ve been using a clicker and don’t see your scores on Connect, send me mail with your clicker (hex!) ID – there are some unclaimed/unregistered clickers • Project proposal grades should be back by Friday • Optional project proposal resubmission due next Friday Computational Thinking ct.cs.ubc.ca
Data Mining Computational Thinking ct.cs.ubc.ca
Learning Goals • CT Building Block: Students will be able to demonstrate that they understand the Apriori algorithm by describing what the output would be for a small input. • CT Building Block: Students will be able to create English language descriptions of algorithms to analyze data and show how their algorithms would work on an input data set. • CT Application: Students will be able to use computing to examine datasets and facilitate exploration in order to gain insight and knowledge (data and information). • CT Impact: Students will be able to give examples of privacy and security issues that arise as a result of data mining Computational Thinking 3 ct.cs.ubc.ca
A quote from the NY Times article “We have the capacity to send every customer an ad booklet, specifically designed for them, that says, ‘Here’s everything you bought last week and a coupon for it,’ ” one Target executive told me. ‘We do that for grocery products all the time.’ But for pregnant women, Target’s goal was selling them baby items they didn’t even know they needed yet.” Computational Thinking ct.cs.ubc.ca
Target can identify pregnant women and send them individual mailings In a group of 3-4 discuss whether you think this is cool, creepy, or both Computational Thinking ct.cs.ubc.ca
Target: Cool, creepy or both Cool Creepy They're making my life easier They're watching us (unsettling) - predicting our needs oe Computational Thinking ct.cs.ubc.ca
Target can identify pregnant women and send them individual mailings In a group of 3-4 discuss whether you think this is cool, creepy, or both A. Cool B. Creepy C. Both Computational Thinking ct.cs.ubc.ca
Things for Target to figure out Cool Creepy If I'm going to die - sure, why not If you have a certain medical What movies and video games condition based on what drugs you like you buy. Your one month anniversary is coming up - you didn't say you had an SO Lots of demographic information Computational Thinking ct.cs.ubc.ca
Group Discussion Loyalty Card Pros and Cons In a group of 3-4, list pros and cons of loyalty cards Pros: Cons: Most loyalty cards give you discounts/points free stuff is good! once you get one, they stop bugging you You can build a relationship with the store - sometimes you can avoid needing receipts Sometimes you have to remember them Stupid e-mails They can make it really hard for people to shop without loyalty cards Share information across different stores Computational Thinking ct.cs.ubc.ca
Group Discussion: Loyalty cards and credit cards After reading these articles, are you more or less likely to use a credit card/loyalty card for purchases? Why? Computational Thinking ct.cs.ubc.ca
Clicker question: Loyalty cards and credit cards After reading these articles, are you more or less likely to use a credit card/loyalty card for purchases: A. More likely B. Less likely C. The same Computational Thinking ct.cs.ubc.ca
As we discussed, cookies tell information about you. But how do pages that you’ve visited predict the future? Computational Thinking ct.cs.ubc.ca
Data Mining • Data mining is the process of looking for patterns in large data sets • There are many different kinds for many different purposes • We’ll do an in depth exploration of one of them Computational Thinking ct.cs.ubc.ca
Association Rules • One type of data mining rules is Association Rules • An example rule is “people who by diapers tend to buy beer” • This is useful for stores because they can improve stock • They’ve also been used in many areas, including medical diagnoses, protein sequence composition, health insurance claim analysis and census data Computational Thinking ct.cs.ubc.ca
Group exercise: list examples of what association rules could be used for Stores could use them to prey on addictions - chips around weed locations people with spinal cord injuries tend to get pneumonia Insurance companies, sports cars --> accidents Computational Thinking ct.cs.ubc.ca
Here’s the plan • Stores keep track of all the items that people bought at a time • By looking at all of the different purchases, we can figure out which items were bought at the same time • Then we can figure out which one was the “cause” and which one was the “effect” Computational Thinking ct.cs.ubc.ca
Let’s look at some sample data Each row is a transaction – one person’s grocery order T1 Sushi, Chicken, Milk So in T2 the T2 Sushi, Bread person bought T3 Bread, Vegetables Sushi and Bread T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen Now we need to decide whether there are any items that people tend to buy when they buy other items. We refer to this as a rule (e.g., diapers beer is one rule (not supported by this data!)). Computational Thinking ct.cs.ubc.ca
Group discussion T1 Sushi, Chicken, Milk Looking at this example, T2 Sushi, Bread intuitively (not algorithmically) are there any rules that you T3 Bread, Vegetables think hold? If so, what? Why? T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen If sushi -> milk or bread - seems to hold in all cases if ramen --> milk (everyone who bought ramen also bought milk) chicken -> sushi - grocery store layout Computational Thinking ct.cs.ubc.ca
Support Informally: support measures if items appear together a lot of times Formally: A rule X Y holds with support sup if sup% of transactions contain T1 Sushi, Chicken, Milk X AND Y . T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen For example, {Chicken, Ramen, Milk} occurs with 3/7= 42% support Computational Thinking ct.cs.ubc.ca
Support question What is the support of Sushi Bread (express as a fraction – no need for math)? (Reminder: a rule X Y holds with support sup if sup% of transactions contain T1 Sushi, Chicken, Milk X AND Y . ) T2 Sushi, Bread T3 Bread, Vegetables A. 3/7 T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk B. 3/4 T6 Chicken, Ramen, Milk C. 4/7 T7 Chicken, Milk, Ramen D. None of the above Computational Thinking ct.cs.ubc.ca
Confidence Informally: confidence measures which items suggest the others will be there, too. Formally: A rule X Y holds with confidence conf% if conf% of transactions that T1 Sushi, Chicken, Milk contain X also contain Y T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen Ramen Milk, Chicken [conf = 3/3 = 100%] Ramen, Chicken Milk [conf = 3/3 = 100%] Computational Thinking ct.cs.ubc.ca
Confidence question What is the confidence of Sushi Chicken (express as a fraction – no need for math)? (Reminder: A rule X Y holds with confidence conf% if conf% of transactions that T1 Sushi, Chicken, Milk contain X also contain Y) T2 Sushi, Bread T3 Bread, Vegetables A. 3/7 T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk B. 3/4 T6 Chicken, Ramen, Milk C. 3/5 T7 Chicken, Milk, Ramen D. None of the above Computational Thinking ct.cs.ubc.ca
So when is a rule valid? A rule is valid if its support is above a given threshold (minimum support) and its confidence is over another given threshold (minimum confidence). T1 Sushi, Chicken, Milk T2 Sushi, Bread A frequent itemset is a set T3 Bread, Vegetables of items that has at least T4 Sushi, Chicken, Bread minimum support T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen In this example, chicken, milk, ramen is a frequent itemset if the minimum support is less than 3/7. Computational Thinking ct.cs.ubc.ca
Group exercise on a piece of paper: Create an algorithm to find itemsets with a minimum support of 3/7. Sample data to check T1 Sushi, Chicken, Milk your algorithm: T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen Computational Thinking ct.cs.ubc.ca
Swap algorithms with a group near you Use the new algorithm to find all the frequent itemsets with support of 2/4=50% in the following data Which are Transaction Items frequent itemsets? T1 apple, dates, rice, corn T2 corn, dates, tuna A. apple, corn T3 apple, corn, dates, tuna B. apple, dates T4 corn, tuna C. corn, dates D. apple, corn, dates E. All of the above Computational Thinking ct.cs.ubc.ca
Did you get the other team’s algorithm to work? A. Yes B. No Computational Thinking ct.cs.ubc.ca
Recommend
More recommend