administrative notes
play

Administrative notes Proposal resubmissions are graded, and - PowerPoint PPT Presentation

Administrative notes Proposal resubmissions are graded, and feedback sent. If you resubmitted your proposal and didnt receive an email, please contact your TA. The Connect grade centre has a column called Project Rubric. This


  1. Administrative notes • Proposal resubmissions are graded, and feedback sent. If you resubmitted your proposal and didn’t receive an email, please contact your TA. • The Connect grade centre has a column called “Project Rubric”. This tells you which rubric we will be using to grade your project. Find your rubric at http://www.ugrad.cs.ubc.ca/~cs100/2016W2/proje ct-grading.html#projectMarkingScheme. If you have any questions about the rubric, please email your project TA (also listed on Connect). Computational Thinking ct.cs.ubc.ca

  2. Administrative notes • Sometime within the next two weeks, we will email you which projects you should review. Please make sure you have a working CS ID and that email forwarding for your CS email (CS_ID@ugrad.cs.ubc.ca) works (you should have set this up in Lab 0). Computational Thinking ct.cs.ubc.ca

  3. Administrative notes • March 14: Midterm 2: this will cover all lectures, labs and readings between Tue Jan 31 and Thu Mar 9 inclusive • March 17: In the News call #3 • March 30: Project deliverables and individual report due Computational Thinking ct.cs.ubc.ca

  4. Data Mining 3 Mining by Association: The Apriori algorithm Computational Thinking ct.cs.ubc.ca

  5. Learning Goals • [CT Building Block] Students will be able to demonstrate that they understand the Apriori algorithm by describing what the output would be for a small input. • [CT Building Block] Students will be able to create English language descriptions of algorithms to analyze data and show how their algorithms would work on an input data set. Computational Thinking ct.cs.ubc.ca

  6. A quote from the NY Times article “We have the capacity to send every customer an ad booklet, specifically designed for them, that says, ‘Here’s everything you bought last week and a coupon for it,’ ” one Target executive told me. ‘We do that for grocery products all the time.’ But for pregnant women, Target’s goal was selling them baby items they didn’t even know they needed yet.” Computational Thinking ct.cs.ubc.ca

  7. Target can identify pregnant women and send them individual mailings In a group of 3-4 discuss whether, and why, you think this is cool, creepy, or both Computational Thinking ct.cs.ubc.ca

  8. Target can identify pregnant women and send them individual mailings In a group of 3-4 discuss whether, and why, you think this is cool, creepy, or both A. Cool B. Creepy C. Both Computational Thinking ct.cs.ubc.ca

  9. Loyalty cards pros and cons Group discussion | Student responses In a group of 3-4, list pros and cons of loyalty cards Pros: • Discounts • Being able to return stuff, reprint receipts Cons: • The degree to which information is tracked – how will it be used? Computational Thinking ct.cs.ubc.ca

  10. Loyalty cards and credit cards Clicker question After reading these articles, are you more or less likely to use a credit card/loyalty card for purchases: A. More likely B. Less likely C. The same Computational Thinking ct.cs.ubc.ca

  11. How to predict the future? One way: Association rules • An association rule X à Y links two sets of items X and Y, if the people who buy the items in X (cause) also tend to buy the items in Y (effect) • Example: Diapers à Beer Computational Thinking ct.cs.ubc.ca

  12. How to predict the future? One way: Association rules • Association rules are useful for stores because they can improve stock • They’ve also been used in many areas, including medical diagnoses, protein sequence composition, health insurance claim analysis and census data Computational Thinking ct.cs.ubc.ca

  13. Suggest other uses for association rules Group exercise | Student responses • When buying a computer, you get suggestions as to what else to buy; thisc an be helpful, if you are a rational buyer. Not if you are a compulsive buyer! • Amazon can display what else people with tastes similar to yours bought • Association rules enable sellers to provide a bundled deal. That can be a win both for the seller and consumer. Computational Thinking ct.cs.ubc.ca

  14. How are association rules derived? • Stores keep track of all the items that people buy at a time • By looking at all of the different purchases, we can figure out which items were bought at the same time • Then we can figure out which one was the “cause” and which one was the “effect” Computational Thinking ct.cs.ubc.ca

  15. Let’s look at some sample data Each row is a transaction – one person’s grocery order. In T2 the person bought Sushi and Bread T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen Computational Thinking ct.cs.ubc.ca

  16. What association rules can you find? Why? Group discussion T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen Computational Thinking ct.cs.ubc.ca

  17. Where we’re headed • We’ll identify two key properties of items in transaction data that will enable us to identify valid association rules • These properties are called support and confidence • We’ll first look at support Computational Thinking ct.cs.ubc.ca

  18. Support: The degree to which items appear together The support of a set of items is the fraction of transactions that contain all items in the set. T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen Here, the set {Chicken, Ramen, Milk} has support 3/7 Computational Thinking ct.cs.ubc.ca

  19. Support Clicker question What is the support of {Sushi, Bread}? (Reminder: The support of a set is the fraction of transactions that contain all items in the set.) T1 Sushi, Chicken, Milk T2 Sushi, Bread A. 3/7 T3 Bread, Vegetables T4 Sushi, Chicken, Bread B. 3/4 T5 Sushi, Chicken, Ramen, Bread, Milk C. 4/7 T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen D. None of the above Computational Thinking ct.cs.ubc.ca

  20. A frequent itemset A frequent itemset is a set of whose support is at least some specified minimum threshold. T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen Example: If the minimum threshold is 3/7 then {Chicken, Milk, Ramen} is a frequent itemset Computational Thinking ct.cs.ubc.ca

  21. Back to support Group exercise What is the support of {Apple, Corn}? (Reminder: The support of a set is the fraction of transactions that contain all items in the set.) A. 1/4 Transaction Items B. 2/4 T1 apple, dates, rice, corn C. 3/4 T2 corn, dates, tuna D. 4/4 T3 apple, corn, dates, tuna T4 corn, tuna Computational Thinking ct.cs.ubc.ca

  22. Group exercise written down: Create an algorithm that, given as input a list of t transactions, finds all itemsets with a minimum support of s/t. T1 Sushi, Chicken, Milk Sample data to check T2 Sushi, Bread your algorithm, with the T3 Bread, Vegetables threshold set to s/t = 3/7: T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen Computational Thinking ct.cs.ubc.ca

  23. Swap algorithms with a group near you Use the new algorithm to find all the frequent itemsets with support of 2/4 in the following data Which are the Transaction Items frequent itemsets? T1 apple, dates, rice, corn T2 corn, dates, tuna A. apple, corn T3 apple, corn, dates, tuna B. apple, dates T4 corn, tuna C. corn, dates D. apple, corn, dates E. all of the above Computational Thinking ct.cs.ubc.ca

  24. Did you get the other team’s algorithm to work? A. Yes B. No Computational Thinking ct.cs.ubc.ca

  25. Comparing algorithms Get together with the group that you swapped algorithms with. Which algorithm would scale better as you add more items/transactions/items per transaction? Why? Computational Thinking ct.cs.ubc.ca

  26. As a whole class: what are some things that could help algorithms to scale well? Computational Thinking ct.cs.ubc.ca

  27. Recall: Where we’re headed • We’ll identify two key properties of items in transaction data that will enable us to identify valid association rules • These properties are called support and confidence • We’ll next look at confidence Computational Thinking ct.cs.ubc.ca

  28. Confidence: Which items suggest that others will be there too (cause à effect) Formally: The confidence of rule X à Y is the fraction of transactions containing all items in X that also contain all items in Y T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen Both Ramen à {Milk, Chicken} and {Ramen, Chicken} à Milk have confidence 3/3 = 1 Computational Thinking ct.cs.ubc.ca

Recommend


More recommend