Market basket introduction MARK ET BAS K ET AN ALYS IS IN R Christopher Bruffaerts Statistician
Overview Market Basket course Chapter 1 : Introduction to market basket analysis Chapter 2 : Metrics and techniques in market basket analysis Chapter 3 : Visualization in market basket analysis Chapter 4 : Case study: Movie recommendations @ movieLens MARKET BASKET ANALYSIS IN R
What is a basket? Basket = collection of items Examples of baskets : Items 1. Your basket @ the grocery store 2. Your Amazon shopping cart 1. Products at the supermarket 3. Your courses @ DataCamp 2. Products on online website 4. The movies you watched on Net�ix 3. DataCamp courses 4. Movies watched by users MARKET BASKET ANALYSIS IN R
Grocery store example What's in the store? What are you up for today? One bread Three pieces of cheese MARKET BASKET ANALYSIS IN R
Grocery store example in R What's in the store? R output store = c("Bread", "Butter", my_basket "Cheese", "Wine") TID Product set.seed(1234) 1 1 Bread n_items = 4 2 1 Cheese my_basket = data.frame( 3 1 Cheese TID = rep(1,n_items), 4 1 Cheese Product = sample( store, n_items, replace = TRUE)) MARKET BASKET ANALYSIS IN R
What's in my basket? My original basket My adjusted basket One record per item purchased One record per distinct item purchased TID Product # A tibble: 2 x 3 1 1 Bread TID Product Quantity 2 1 Cheese <dbl> <fct> <int> 3 1 Cheese 1 1 Bread 1 4 1 Cheese 2 1 Cheese 3 MARKET BASKET ANALYSIS IN R
What's in my R basket? Reshaping the basket data # Number of distinct items n_distinct(my_basket$Product) # Adjusting my basket my_basket = my_basket %>% 2 add_count(Product) %>% unique() %>% # Total basket size rename(Quantity = n) my_basket %>% summarize(sum(Quantity)) 4 MARKET BASKET ANALYSIS IN R
Visualizing items in my basket Visualizing items in my basket # Plotting items ggplot(my_basket, aes(x=reorder(Product, Quantity), y = Quantity)) + geom_col() + coord_flip() + xlab("Items") + ggtitle("Summary of items in my basket") MARKET BASKET ANALYSIS IN R
Why are we looking at my basket? Question: Is there any relationship between items within a basket ? Back to examples 1. Your basket @ the grocery store, e.g. Spaghetti and T omato sauce 2. Your Amazon shopping cart, e.g. Phone and a phone case 3. Your courses @ DataCamp e.g. "Introduction to R" and "Intermediate R" MARKET BASKET ANALYSIS IN R
Happy shopping! MARK ET BAS K ET AN ALYS IS IN R
Item combinations MARK ET BAS K ET AN ALYS IS IN R Christopher Bruffaerts Statistician
Back to the grocery store What's in the store? What are you up for today? {"Bread", "Cheese", "Cheese", "Cheese"} Focus of market basket analysis {"Bread", "Cheese"} MARKET BASKET ANALYSIS IN R
Subsets and supersets My store - set Subsets of X - itemsets Size 0 : { ∅ } X = {"Bread", "Butter", "Cheese", "Wine"} Size 1 : {"Bread"}, {"Wine"}, ... Size 2 : {"Bread", "Wine"}, ... Supersets {"Bread", "Butter"} superset of {"Bread"} {"Bread", "Butter", "Cheese", "Wine"} superset of {"Bread", "Butter"} MARKET BASKET ANALYSIS IN R
Itemset graph Question : What is the set of all possible subsets of X? X = {A, B, C, D} MARKET BASKET ANALYSIS IN R
Intersections and unions Intersection Union {"Bread"} ∩ {"Butter"} = ∅ {"Bread"} ∪ {"Butter"} = {"Bread", "Butter"} {"Bread", "Butter"} ∩ {"Butter", "Wine"} = {"Butter"} union(A,B) library(dplyr) [1] "Bread" "Butter" "Wine" A = c("Bread", "Butter") B = c("Bread", "Wine") intersect(A,B) [1] "Bread" MARKET BASKET ANALYSIS IN R
How many baskets of size k? Question : Example: How many possible subsets of size k from a set of Number of baskets with 2 distinct items from the size n ? store: "n choose k" n ! ( k n ) = , ( n − k )! k ! where 4 ) 4! n ! = n × ( n − 1) × ( n − 2) × ... × 2 × 1 ( 2 = = 6 (4 − 2)!2! MARKET BASKET ANALYSIS IN R
How many possible baskets? Question Example How many possible baskets can be created from a T otal number of baskets: set of size n ? 4 2 = 16 Newton's binom n ( k n ) ∑ n = 2 k =0 2^(n_items) MARKET BASKET ANALYSIS IN R
How many baskets in R? Combinations in R Output n_items = 4 colnames(store)=c("size", "nb_combi") basket_size = 2 store choose(n_items, basket_size) size nb_combi [1] 6 [1,] 0 1 [2,] 1 4 [3,] 2 6 # Looping through all possible values [4,] 3 4 store = matrix(NA, nrow=5, ncol=2) [5,] 4 1 for (i in 0:n_items){ store[i+1,] = c(i, choose(n_items,i))} MARKET BASKET ANALYSIS IN R
Plotting number of combinations Get an idea of how fast number of combinations n_items = 50 fun_nk = function(x) choose(n_items, x) # Plotting ggplot(data = data.frame(x = 0), mapping = aes(x=x))+ stat_function(fun = fun_nk)+ xlim(0, n_items)+ xlab("Subset size")+ ylab("Number of subsets") MARKET BASKET ANALYSIS IN R
Are you ready to count? MARK ET BAS K ET AN ALYS IS IN R
What is market basket analysis ? MARK ET BAS K ET AN ALYS IS IN R Christopher Bruffaerts Statistician
Multiple baskets @ grocery store What's in the store? Multiple baskets If 100 customers visit the grocery store, can we �nd associations of items that occur together? Example : Bread and Cheese Basket 1 : {"Bread", "Cheese"} Outcome: “if this, then that” Basket 2 : {"Bread", "Wine" , "Cheese"} MARKET BASKET ANALYSIS IN R
Market basket applications Learning from multiple baskets Different applications E-commerce : “customers who bought this also bought this” Retail : items which are “bundled or placed together” Social media : friends and connections recommendation Videos and movies recommendation MARKET BASKET ANALYSIS IN R
Multiple baskets in R Create a dataset containing multiple baskets! A glimpse at my baskets my_baskets = data.frame( head(my_baskets) "Basket" = c(1,1,1,1, 2,2,2, 3,3, 4,4,4, 5,5, 6,6, 7,7) "Product" = c("Bread", "Cheese", "Cheese", "Cheese", Basket Product "Bread", "Butter", "Wine", 1 1 Bread "Butter", "Butter", 2 1 Cheese "Butter", "Wine", "Wine", 3 1 Cheese "Butter", "Cheese", 4 1 Cheese "Cheese", "Wine", 5 2 Bread "Wine", "Wine") 6 2 Butter ) MARKET BASKET ANALYSIS IN R
What's in our baskets? Questions How many items are there in each basket? df_basket = How many distinct items are there? my_baskets %>% n_distinct(my_baskets$Product) group_by(Basket) %>% summarize( n_total = n(), [1] 4 n_items = n_distinct(Product)) How many baskets are there? Basket n_total n_items n_distinct(my_baskets$Basket) <dbl> <int> <int> 1 1 4 2 2 2 3 3 [1] 7 MARKET BASKET ANALYSIS IN R
How big are baskets? Average basket sizes Distribution of basket size basket_size %>% # Distribution of distinct items summarize( ggplot(df_basket, aes(n_items)) + avg_total_items = mean(n_total), geom_bar() avg_dist_items = mean(n_items)) # A tibble: 1 x 2 avg_total_items avg_dist_items <dbl> <dbl> 1 2.57 1.86 MARKET BASKET ANALYSIS IN R
Speci�c products in the baskets Which item are you looking at? Filtering for Cheese in R How many times an item appears across all # Number of baskets containing Cheese baskets? my_baskets %>% How many baskets contain that item? filter(Product == "Cheese") %>% summarize( Example : n_tot_items = n(), n_basket_item = n_distinct(Basket)) n_tot_items n_basket_item 1 5 3 MARKET BASKET ANALYSIS IN R
Association rule mining Association rule mining : �nding frequent co-occuring associations among a collection of items. Example of rule extraction: {Bread} → {Butter} {Bread, Cheese} → {Wine} MARKET BASKET ANALYSIS IN R
So what's coming next? Agenda for the rest of the course: Chapter 2 : Metrics & techniques in market basket analysis Chapter 3 : Visualization in market basket analysis Chapter 4 : Case study: Movie recommendations @ movieLens MARKET BASKET ANALYSIS IN R
Let's play with baskets! MARK ET BAS K ET AN ALYS IS IN R
Recommend
More recommend