Data Mining Learning from Large Data Sets Lecture 8 - PowerPoint PPT Presentation

Data ¡Mining ¡ Learning ¡from ¡Large ¡Data ¡Sets ¡ Lecture ¡8 ¡– ¡Clustering ¡large ¡data ¡sets ¡ ¡ 263-‑5200-‑00L ¡ Andreas ¡Krause ¡

Announcements ¡ � Homework ¡4 ¡out ¡tomorrow ¡ 2 ¡

Course ¡organizaPon ¡ � Retrieval ¡ � Given ¡a ¡query, ¡find ¡“most ¡similar” ¡item ¡in ¡a ¡large ¡data ¡set ¡ � Determine ¡relevance ¡of ¡search ¡results ¡ � Applica'ons : ¡GoogleGoggles, ¡Shazam, ¡… ¡ � Supervised ¡learning ¡ (ClassificaPon, ¡Regression) ¡ � Learn ¡a ¡concept ¡(funcPon ¡mapping ¡queries ¡to ¡labels) ¡ � Applica'ons : ¡Spam ¡filtering, ¡predicPng ¡price ¡changes, ¡… ¡ � Unsupervised ¡learning ¡(Clustering, ¡dimension ¡reducPon) ¡ � IdenPfy ¡clusters, ¡“common ¡pa]erns”; ¡anomaly ¡detecPon ¡ � Applica'ons : ¡Recommender ¡systems, ¡fraud ¡detecPon, ¡… ¡ � Learning ¡with ¡limited ¡feedback ¡ � Learn ¡to ¡opPmize ¡a ¡funcPon ¡that’s ¡expensive ¡to ¡evaluate ¡ � Applica'ons : ¡Online ¡adverPsing, ¡opt. ¡UI, ¡learning ¡rankings, ¡… ¡ 3 ¡

Unsupervised ¡learning ¡ � “Learning ¡without ¡labels” ¡ � Typically ¡useful ¡for ¡exploratory ¡data ¡analysis ¡ ¡ (“find ¡pa]erns”; ¡visualizaPon; ¡…) ¡ � Most ¡common ¡methods: ¡ � Clustering ¡(unsupervised ¡classificaPon) ¡ � Dimension ¡reducPon ¡(unsupervised ¡regression) ¡ ¡ 4 ¡

What ¡is ¡clustering? ¡ � Given ¡data ¡points, ¡group ¡into ¡clusters ¡such ¡that ¡ � Similar ¡ points ¡are ¡in ¡the ¡same ¡cluster ¡ � Dissimilar ¡ points ¡are ¡in ¡different ¡clusters ¡ � Points ¡are ¡typically ¡represented ¡either ¡ ¡ � in ¡(high-‑dimensional) ¡Euclidean ¡space ¡ � in ¡a ¡metric ¡space, ¡given ¡in ¡terms ¡of ¡pairwise ¡distances ¡ (Jaccard, ¡cosine, ¡…) ¡ � Anomaly ¡/ ¡outlier ¡detecPon: ¡IdenPficaPon ¡of ¡points ¡ that ¡“don’t ¡fit ¡well ¡in ¡any ¡of ¡the ¡clusters” ¡ ¡ 5 ¡ ¡

Examples ¡of ¡clustering ¡ � Cluster ¡ � Documents ¡based ¡on ¡the ¡words ¡they ¡contain ¡ � Images ¡based ¡on ¡image ¡features ¡ � DNA ¡sequences ¡based ¡on ¡edit ¡distance ¡ � Products ¡based ¡on ¡which ¡customers ¡bought ¡them ¡ � Customers ¡based ¡on ¡their ¡purchase ¡history ¡ � Web ¡surfers ¡based ¡on ¡their ¡queries ¡/ ¡sites ¡they ¡visit ¡ � … ¡ 6 ¡

Standard ¡approaches ¡to ¡clustering ¡ � Hierarchical ¡clustering ¡ � Build ¡a ¡tree ¡(either ¡bo]om-‑up ¡or ¡top-‑down), ¡represenPng ¡ the ¡distances ¡among ¡the ¡data ¡points ¡ ¡ � Example: ¡single-‑, ¡average-‑ ¡linkage ¡agglomeraPve ¡clustering ¡ � ParPPonal ¡approaches ¡ � Define ¡and ¡opPmize ¡a ¡noPon ¡of ¡“goodness” ¡defined ¡over ¡ parPPons ¡ � Example: ¡Spectral ¡clustering, ¡graph-‑cut ¡based ¡approaches ¡ � Model-‑based ¡approaches ¡ � Maintain ¡cluster ¡“models” ¡and ¡infer ¡cluster ¡membership ¡ (e.g., ¡assign ¡each ¡point ¡to ¡closest ¡center) ¡ � Example: ¡k-‑means, ¡Gaussian ¡mixture ¡models, ¡… ¡ 7 ¡

We ¡will ¡ � Review ¡standard ¡clustering ¡algorithms ¡ � K-‑means ¡ � ProbabilisPc ¡mixture ¡models ¡ � Discuss ¡how ¡to ¡scale ¡them ¡to ¡massive ¡data ¡sets ¡and ¡ data ¡streams ¡ 8 ¡

Clustering ¡example ¡ 9 ¡

k-‑means ¡ � Assumes ¡points ¡are ¡in ¡Euclidean ¡space ¡ x i ∈ R d � Represent ¡clusters ¡as ¡centers ¡ µ j ∈ R d � Each ¡point ¡is ¡assigned ¡to ¡closest ¡center ¡ ¡ ¡Goal : ¡Pick ¡centers ¡to ¡minimize ¡average ¡squared ¡distance ¡ N X || µ j − x i || 2 min 2 j i =1 � Non-‑convex ¡opPmizaPon! ¡ ¡ � NP-‑hard ¡ è ¡can’t ¡solve ¡opPmally ¡in ¡general ¡ 10 ¡

Classical ¡k-‑means ¡algorithm ¡ � IniPalize ¡cluster ¡centers ¡ � E.g., ¡pick ¡one ¡point ¡at ¡random, ¡the ¡other ¡ones ¡with ¡ maximum ¡distance ¡ � While ¡not ¡converged ¡ � Assign ¡each ¡point ¡ x i ¡to ¡closest ¡center ¡ � Update ¡center ¡as ¡mean ¡of ¡assigned ¡data ¡points ¡ 11 ¡

K-‑means ¡ 12 ¡

ProperPes ¡of ¡k-‑means ¡ � Guaranteed ¡to ¡monotonically ¡decrease ¡average ¡ squared ¡distance ¡in ¡each ¡iteraPon ¡ N X || µ j − x i || 2 L ( µ ) = min 2 j i =1 � Converges ¡to ¡a ¡local ¡opPmum ¡ � Complexity: ¡ � Per ¡iteraPon ¡ � Have ¡to ¡process ¡enPre ¡data ¡set ¡in ¡each ¡iteraPon ¡ 19 ¡

K-‑means ¡for ¡large ¡data ¡sets ¡/ ¡streams ¡ � What ¡if ¡data ¡set ¡does ¡not ¡fit ¡in ¡main ¡memory? ¡ � In ¡principle ¡not ¡a ¡problem ¡(why?) ¡ � But ¡each ¡iteraPon ¡sPll ¡requires ¡an ¡enPre ¡pass ¡ ¡ through ¡the ¡data ¡set ¡ � Recall ¡supervised ¡learning ¡(online ¡SVM, ¡etc.) ¡ � There ¡we ¡were ¡able ¡to ¡process ¡one ¡data ¡point ¡at ¡a ¡Pme ¡ � Get ¡(provably) ¡good ¡soluPons ¡from ¡a ¡single ¡pass ¡through ¡ the ¡data ¡ � Could ¡even ¡do ¡it ¡in ¡parallel! ¡ ¡ � Can ¡we ¡do ¡the ¡same ¡thing ¡for ¡clustering?? ¡ 20 ¡

Streaming ¡clustering ¡ � How ¡should ¡me ¡maintain ¡clusters ¡as ¡new ¡data ¡arrives? ¡ 21 ¡

Recall ¡online ¡SVM ¡ � Recall ¡Online ¡SVMs ¡(& ¡stochasPc ¡gradient ¡descent) ¡ � Loss ¡funcPon ¡ decomposes ¡addi'vely ¡ over ¡data ¡set ¡ X L ( w ) = hinge( x i ; y i , w ) i � Can ¡take ¡a ¡(sub-‑)gradient ¡step ¡for ¡each ¡data ¡point ¡ 22 ¡

Online ¡k-‑means ¡ � For ¡k-‑means, ¡loss ¡funcPon ¡ also ¡ decomposes ¡addiPvely ¡ over ¡data ¡set ¡ N X || µ j − x i || 2 L ( µ ) = min 2 j i =1 � Let’s ¡try ¡take ¡a ¡(sub-‑)gradient ¡step ¡for ¡each ¡data ¡point ¡ 23 ¡

CalculaPng ¡the ¡gradient ¡ N X || µ j − x i || 2 L ( µ ) = min 2 j i =1 24 ¡

Online ¡k-‑means ¡algorithm ¡ � IniPalize ¡centers ¡randomly ¡ � For ¡t ¡= ¡1:N ¡ � Find ¡ || µ j − x t || 2 c = arg min j µ c ← µ c + η t ( x t − µ c ) � Set ¡ ¡ ¡ � To ¡converge ¡to ¡local ¡opPmum, ¡need ¡that ¡ ¡ X X η 2 t < ∞ η t = ∞ t t 25 ¡

PracPcal ¡aspects ¡ � Generally ¡works ¡best ¡if ¡data ¡is ¡«randomly» ¡ordered ¡ (like ¡stochasPc ¡gradient ¡descent) ¡ � Typically, ¡want ¡to ¡choose ¡larger ¡value ¡for ¡k ¡ � How ¡can ¡one ¡implement ¡mulPple ¡random ¡restarts ¡in ¡ one ¡pass? ¡ 26 ¡

Problems ¡with ¡online ¡k-‑means ¡ � Have ¡to ¡commit ¡to ¡“k” ¡in ¡advance ¡ � ObjecPve ¡funcPon ¡non-‑convex ¡(and ¡problem ¡NP-‑hard) ¡ à ¡guarantees ¡for ¡online ¡convex ¡programming ¡/ ¡SGD ¡ ¡ ¡ ¡ ¡ ¡ ¡do ¡not ¡apply! ¡ � Not ¡clear ¡how ¡to ¡parallelize ¡ 27 ¡

AlternaPve: ¡Summarizing ¡large ¡data ¡sets ¡ � Idea: ¡ ¡ � Efficiently ¡construct ¡a ¡ compact ¡version ¡ C ¡of ¡the ¡data ¡set ¡D ¡ such ¡that ¡solving ¡k-‑means ¡on ¡C ¡gives ¡a ¡good ¡soluPon ¡to ¡D ¡ � Approach: ¡ � First ¡construct ¡C ¡such ¡that ¡it ¡allows ¡ approximately ¡answer ¡ “k-‑means ¡queries” ¡ ¡ N i.e., ¡approximately ¡evaluate ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ X || µ j − x i || 2 L ( µ ) = min 2 j i =1 � Then ¡solve ¡k-‑means ¡using ¡the ¡approximate ¡loss ¡funcPon ¡ 28 ¡

k-‑mean ¡queries ¡ 29 ¡

Data ¡set ¡summarizaPon ¡for ¡k-‑means ¡ 32 ¡

Data Mining Learning from Large Data Sets Lecture 8 - PowerPoint PPT Presentation

Data Mining Learning from Large Data Sets Lecture 8 Clustering large data sets 263-5200-00L Andreas Krause Announcements Homework 4 out

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for

Background Poisson or Binomial data with the following properties GLM with clustered data A

Advanced Reconstruction Algorithms for the CMS High Granularity Calorimeter Kevin Pedro (FNAL)

20-03-28 9. General Improvement Techniques 9.1 Preprocessing There are several

Clustering and Classification by Optimum-Path Forest Alexandre Falc ao Institute of Computing

Big Data Era 1 1 https://vimeo.com/102998774 The big problem: Scalability Visualization

New Developments In The Theory Of Clustering thats all very well in practice, but does it work

Towards a Statistical Theory of Clustering Ulrike von Luxburg, Shai Ben-David Page 1 Ulrike von

Data Mining Learning from Large Data Sets Lecture 8 - PowerPoint PPT Presentation

Data Mining Learning from Large Data Sets Lecture 8 Clustering large data sets 263-5200-00L Andreas Krause Announcements Homework 4 out

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Lecture 14: Inference in Dirichlet Processes (Blei &amp; Jordan, Variational inference for

Background Poisson or Binomial data with the following properties GLM with clustered data A

Advanced Reconstruction Algorithms for the CMS High Granularity Calorimeter Kevin Pedro (FNAL)

20-03-28 9. General Improvement Techniques 9.1 Preprocessing There are several

Clustering and Classification by Optimum-Path Forest Alexandre Falc ao Institute of Computing

Big Data Era 1 1 https://vimeo.com/102998774 The big problem: Scalability Visualization

New Developments In The Theory Of Clustering thats all very well in practice, but does it work

Towards a Statistical Theory of Clustering Ulrike von Luxburg, Shai Ben-David Page 1 Ulrike von

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for