Introduction to K- means Dmitriy (Dima) Gorenshteyn Sr. Data - PowerPoint PPT Presentation

DataCamp Cluster Analysis in R CLUSTER ANALYSIS IN R Introduction to K- means Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan Kettering Cancer Center

DataCamp Cluster Analysis in R

DataCamp Cluster Analysis in R kmeans() print(lineup) x y 1 -1 1 2 -2 -3 3 8 6 4 7 -8 ... ... ... model <- kmeans(lineup, centers = 2)

DataCamp Cluster Analysis in R Assigning Clusters print(model$cluster) [1] 1 1 2 2 1 1 1 2 2 2 1 2 lineup_clustered <- mutate(lineup, cluster = model$cluster) print(lineup_clustered) x y cluster <dbl> <dbl> <int> 1 -1 1 1 2 -2 -3 1 3 8 6 2 4 7 -8 2 ... ... ... ...

DataCamp Cluster Analysis in R CLUSTER ANALYSIS IN R Let's practice!

DataCamp Cluster Analysis in R CLUSTER ANALYSIS IN R Evaluating Different Values of K by Eye Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan Kettering Cancer Center

DataCamp Cluster Analysis in R T otal Within-Cluster Sum of Squares: k = 1

DataCamp Cluster Analysis in R Elbow Plot

DataCamp Cluster Analysis in R Generating the Elbow Plot model <- kmeans(x = lineup, centers = 2) model$tot.withinss [1] 1434.5

DataCamp Cluster Analysis in R Generating the Elbow Plot library(purrr) tot_withinss <- map_dbl(1:10, function(k){ model <- kmeans(x = lineup, centers = k) model$tot.withinss }) elbow_df <- data.frame( k = 1:10, tot_withinss = tot_withinss ) print(elbow_df) k tot_withinss 1 1 3489.9167 2 2 1434.5000 3 3 881.2500 4 4 637.2500 ... ... ...

DataCamp Cluster Analysis in R Generating the Elbow Plot ggplot(elbow_df, aes(x = k, y = tot_withinss)) + geom_line() + scale_x_continuous(breaks = 1:10)

DataCamp Cluster Analysis in R CLUSTER ANALYSIS IN R Silhouette Analysis Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan Kettering Cancer Center

DataCamp Cluster Analysis in R Soccer Lineup with K = 3

DataCamp Cluster Analysis in R Silhouette Width Within Cluster Distance: C(i) Closest Neighbor Distance: N(i)

DataCamp Cluster Analysis in R Silhouette Width: S(i)

DataCamp Cluster Analysis in R Silhouette Width: S(i) 1: Well matched to cluster 0: On border between two clusters -1: Better fit in neighboring cluster

DataCamp Cluster Analysis in R Calculating S(i) library(cluster) pam_k3 <- pam(lineup, k = 3) pam_k3$silinfo$widths cluster neighbor sil_width 4 1 2 0.465320054 2 1 3 0.321729341 10 1 2 0.311385893 1 1 3 0.271890169 9 2 1 0.443606497 ... ... ... ...

DataCamp Cluster Analysis in R Silhouette Plot sil_plot <- silhouette(pam_k3) plot(sil_plot)

DataCamp Cluster Analysis in R Average Silhouette Width pam_k3$silinfo$avg.width [1] 0.353414 1: Well matched to each cluster 0: On border between clusters -1: Poorly matched to each cluster

DataCamp Cluster Analysis in R Highest Average Silhouette Width library(purrr) sil_width <- map_dbl(2:10, function(k){ model <- pam(x = lineup, k = k) model$silinfo$avg.width }) sil_df <- data.frame( k = 2:10, sil_width = sil_width ) print(sil_df) k sil_width 1 2 0.4164141 2 3 0.3534140 3 4 0.3535534 4 5 0.3724115 ... ... ...

DataCamp Cluster Analysis in R Choosing K Using Average Silhouette Width ggplot(sil_df, aes(x = k, y = sil_width)) + geom_line() + scale_x_continuous(breaks = 2:10)

DataCamp Cluster Analysis in R CLUSTER ANALYSIS IN R Making Sense of the K- Means Clusters Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan Kettering Cancer Center

DataCamp Cluster Analysis in R Wholesale Dataset 45 observations print(customers_spend) Milk Grocery Frozen 1 11103 12469 902 3 features: 2 2013 6550 909 3 1897 5234 417 Milk Spending 4 1304 3643 3045 5 3199 6986 1455 ... ... ... ... Grocery Spending Frozen Food Spending

DataCamp Cluster Analysis in R Segmenting with Hierarchical Clustering

DataCamp Cluster Analysis in R Segmenting with Hierarchical Clustering cluster Milk Grocery Frozen cluster size 1 16950 12891 991 5 2 2512 5228 1795 29 3 10452 22550 1354 5 4 1249 3916 10888 6

DataCamp Cluster Analysis in R Segmenting with K-means Estimate the "best" k using average silhouette width Run k-means with the suggested k Characterize the spending habits of these clusters of customers

DataCamp Cluster Analysis in R CLUSTER ANALYSIS IN R Let's cluster!

Introduction to K- means Dmitriy (Dima) Gorenshteyn Sr. Data - PowerPoint PPT Presentation

DataCamp Cluster Analysis in R CLUSTER ANALYSIS IN R Introduction to K- means Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan Kettering Cancer Center DataCamp Cluster Analysis in R DataCamp Cluster Analysis in R DataCamp

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Unsupervised learning D. Dubhashi D. Dubhashi Introduction Introduction Everything weve

K -means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] COMP24111 Machine Learning Outline

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

11/11/2014 Chapter 22 INFERENCES ABOUT MEANS 1 SAMPLING DISTRIBUTION FOR MEANS Recall, the

Chapter 7: The Distribution of Sample Means Frequency 2 1 0 1 2 3 4 5 6 7 8 9 Scores Distribution

A Semantics for Means-End Relations Jesse Hughes Technical University of Eindhoven August 29,

k -means++ seeding Have seen that the k -means algorithm can output arbitrarily poor solutions, if

Image Denoising Using Two-stage Non-local Means Enming Luo advisor: Truong Nguyen March 15,

CLUSTER ANALYSIS WITH K-MEANS What about the details ? Maurice ROUX Ex-Professor Paul Cezanne

MacConvilles Surveying BIM What it Means to Quantity Surveying BIM What it Means to

How Tortillas Stack Up in the Baking Industry What is a Tortilla? In Mexico, means little

QSL Card QSL Card A means of providing written confirmation A means of providing written

Administrative notes October 26, 2017 Well do some In the News Groupwork today

Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and

Abstracting and Coding Boot Camp: Webinar Series Cancer Case Scenarios NAACCR 20152016

Convex Biclustering Eric Chi Rice University joint work with Genevera Allen and Rich Baraniuk

LIFE SCIENCES IN PARIS REGION PARIS AREA : FIRST EUROPEAN REGION IN THE FIELD OF LIFE SCIENCE AND

Clustering and information visualization Samuel Kaski University of Helsinki Department of

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21

Co-manifold learning with missing data Gal Mishne, Eric C. Chi and Ronald R. Coifman Department

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us