Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 8 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 8 Jan-Willem van de Meent ( credit : Yijun Zhao, Carla Brodley, Eamonn Keogh)

Classification Wrap-up

Classifier Comparison Nearest   Linear   RBF   Random   Ada-   Naive   Data QDA Neighbors SVM SVM Forest boost Bayes

Confusion Matrix Predicted Truth True email spam Prediction 57.3% 4.0% email 5.3% 33.4% spam

Confusion Matrix Predicted Truth True email spam Prediction 57.3% 4.0% email True Pos False Pos 5.3% 33.4% spam False Neg True Neg True Positive (TP): Hit (show e-mail) True Negative (TN): Correct rejection   False Positive (FP): False alarm, type I error   False Negative (FN): Miss, type II error

Precision and Recall TP PPV = TP + FP TP TPR = TP + FN

Precision and Recall Precision or Positive Predictive Value (PPV) TP PPV = TP + FP Recall or Sensitivity, True Positive Rate (TPR) TP TPR = TP + FN F1 score: harmonic mean of Precisin and Recall 2 TP F 1 = (2 TP + FP + FN ) Specificity (SPC) or True Negative Rate (TNR) TN SPC = ( FP + TN )

Precision-Recall Curve Recall Precision | − p ( Y = 1 | x ) λ 12 − λ 22 Vary detection > p ( Y = 2 | x ) λ 21 − λ 11 threshold ave assumed (FN) >

ROC Curve Recall 1-Precision | − p ( Y = 1 | x ) λ 12 − λ 22 Vary detection > p ( Y = 2 | x ) λ 21 − λ 11 threshold ave assumed (FN) >

ROC Curve True Positive Rate False Positive Rate | − p ( Y = 1 | x ) λ 12 − λ 22 Vary detection > p ( Y = 2 | x ) λ 21 − λ 11 threshold ave assumed (FN) >

ROC Curve True Positive Rate False Positive Rate

ROC Curve True Positive Rate False Positive Rate Macro-average (True Positive Rate)

ROC Curve True Positive Rate False Positive Rate Micro-average (True Positive Rate)

Clustering (a.k.a. unsupervised classification) with slides from Eamonn Keogh (UC Riverside)

Clustering • Unsupervised learning (no labels for training) • Group data into similar classes that • Maximize inter-cluster similarity • Minimize intra-cluster similarity

Two Types of Clustering Partitional Hierarchical Create a hierarchical Construct partitions and decomposition using evaluate them using “some criterion” “some criterion”

What is a natural grouping? Choice of clustering criterion can be task-dependent Simpson’s School Females Males Family Employees

What is Similarity? Can be hard to define, but we know it when we see it.

Defining Distance Measures Peter Piotr 3 0.2 342.7 Need : Some function D ( x 1 , x 2 ) that   represents degree of dissimilarity

Example: Distance Measures s k P Euclidean Distance ( ( x i − y i ) 2 ) i =1 k P Mahattan Distance | x i − y i | i =1 ◆ 1 ✓ k q ( | x i − y i | ) q P Minkowski Distance i =1

Example: Kernels Polynomial Radial Basis Function (RBF) Squared Exponential (SE) Automatic Relevance   Determination (ARD)

Inner Product vs Distance Measure Inner Product • ⟨ A, B ⟩ = ⟨ B, A ⟩ Symmetry • ⟨ α A, B ⟩ = α ⟨ A, B ⟩ Linearity Postive-definiteness • ⟨ A, Α ⟩ = 0, ⟨ A, Α ⟩ = 0 iff A = 0 Distance Measure • D(A, B) = D(B, A) Symmetry • D(A, A) = 0 Constancy of Self-Similarity • D(A, B) = 0 iff A = B Positivity (Separation) • D(A, B) ≤ D(A, C) + D(B, C) Triangular Inequality An inner product ⟨ A, B ⟩ induces   a distance measure D(A, B) = ⟨ A-B, A-B ⟩ 1/2

Inner Product vs Distance Measure Inner Product • ⟨ A, B ⟩ = ⟨ B, A ⟩ Symmetry • ⟨ α A, B ⟩ = α ⟨ A, B ⟩ Linearity Postive-definiteness • ⟨ A, Α ⟩ = 0, ⟨ A, Α ⟩ = 0 iff A = 0 Distance Measure • D(A, B) = D(B, A) Symmetry • D(A, A) = 0 Constancy of Self-Similarity • D(A, B) = 0 iff A = B Positivity (Separation) • D(A, B) ≤ D(A, C) + D(B, C) Triangular Inequality Is the reverse also true?   Why?

Hierarchical Clustering

Dendrogram ( a.k.a. a similarity tree ) Similarity of A and B is D(A,B) represented as height   of lowest shared   internal node (Bovine: 0.69395, (Spider Monkey: 0.390, (Gibbon:0.36079,(Orang: 0.33636, (Gorilla: 0.17147,   (Chimp: 0.19268, Human: 0.11927): 0.08386): 0.06124): 0.15057): 0.54939);

Dendrogram ( a.k.a. a similarity tree ) D(A,B) Natural when measuring   genetic similarity, distance   to common ancestor (Bovine: 0.69395, (Spider Monkey: 0.390, (Gibbon:0.36079,(Orang: 0.33636, (Gorilla: 0.17147,   (Chimp: 0.19268, Human: 0.11927): 0.08386): 0.06124): 0.15057): 0.54939);

Example: Iris data Iris Setosa Iris versicolor Iris virginica https://en.wikipedia.org/wiki/Iris_flower_data_set

Hierarchical Clustering ( Euclidian Distance ) https://en.wikipedia.org/wiki/Iris_flower_data_set

Edit Distance Distance Patty and Selma Change dress color, 1 point Change earring shape, 1 point Change hair part, 1 point D(Patty, Selma) = 3 Distance Marge and Selma Change dress color, 1 point Add earrings, 1 point Decrease height, 1 point Take up smoking, 1 point Lose weight, 1 point D(Marge,Selma) = 5 Can be defined for any set of discrete features

Edit Distance for Strings • Transform string Q into string C , using only Similarity “Peter” and “Piotr”? Substitution , Insertion and Deletion . Substitution 1 Unit • Assume that each of these operators has a Insertion 1 Unit cost associated with it. Deletion 1 Unit • The similarity between two strings can be D ( Peter , Piotr ) is 3 defined as the cost of the cheapest transformation from Q to C. Peter Substitution (i for e) Piter Insertion (o) Pioter Deletion (e) Pedro Piotr Peter Piotr Piero Pyotr Petros Pietro Pierre

Hierarchical Clustering ( Edit Distance ) Pedro (Portuguese) Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian) Cristovao (Portuguese) Christoph (German), Christophe (French), Cristobal (Spanish), Cristoforo (Italian), Kristoffer (Scandinavian), Krystof (Czech), Christopher (English) Miguel (Portuguese) Michalis (Greek), Michael (English), Mick (Irish) Cristovao Pedro Miguel Christoph n Piotr r Petros o Pierre o Peter Peka r Michalis Michael Mick Christopher e Cristobal Cristoforo Kristoffer f r o t a e r r h a o t t e d d p e s y e i a e y P P o d i P e r P t s K s P i r i r C h C

Meaningful Patterns Edit distance yields clustering according to geography Slide from Eamonn Keogh Pedro ( Portuguese/Spanish ) Petros ( Greek ), Peter ( English ), Piotr ( Polish ), Peadar (Irish), Pierre ( French ), Peder ( Danish ), Peka (Hawaiian), Pietro ( Italian ), Piero ( Italian Alternative ), Petr (Czech), Pyotr ( Russian )

Spurious Patterns In general clusterings will only be as meaningful as your distance metric spurious; there is no connection between the two South Georgia & Serbia & St. Helena & U.K. AUSTRALIA ANGUILLA FRANCE NIGER INDIA IRELAND BRAZIL South Sandwich Montenegro Dependencies Islands (Yugoslavia)

Spurious Patterns In general clusterings will only be as meaningful as your distance metric spurious; there is no connection between the two South Georgia & Serbia & St. Helena & U.K. AUSTRALIA ANGUILLA FRANCE NIGER INDIA IRELAND BRAZIL South Sandwich Montenegro Dependencies Islands (Yugoslavia) Former UK colonies No relation

“Correct” Number of Clusters

“Correct” Number of Clusters Determine number of clusters by looking at distance

Detecting Outliers The single isolated branch is suggestive of a data point that is very different to all others Outlier

Bottom-up vs Top-down Since we cannot test all possible The number of dendrograms with n trees we will have to heuristic leafs = (2 n -3)!/[(2 ( n -2) ) ( n -2)!] search of all possible trees. We could do this.. Number Number of Possible of Leafs Dendrograms 2 1 Bottom-Up (agglomerative): 3 3 4 15 Starting with each item in its own 5 105 cluster, find the best pair to merge ... … into a new cluster. Repeat until all 10 34,459,425 clusters are fused together. Top-Down (divisive): Starting with all the data in a single cluster, consider every possible way to divide the cluster into two. Choose the best division and recursively operate on both sides.

Distance Matrix We begin with a distance matrix which contains the distances between every pair of objects in our 0 8 8 7 7 database. 0 2 4 4 0 3 3 D( , ) = 8 0 1 D( , ) = 1 0

Bottom-up (Agglomerative Clustering) Consider Choose … all possible the best merges… 25

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 8 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 8 Jan-Willem van de Meent ( credit : Yijun Zhao, Carla Brodley, Eamonn Keogh) Classification Wrap-up Classifier Comparison Nearest Linear RBF Random Ada- Naive

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Data Mining: Concepts and Techniques Chap 8. Data Streams, Time Series Data, and Sequential

Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Implementing the LeybourneTaylor test for seasonal unit roots in Stata Christopher F Baum

Discrete Probabilistic Programming from First Principles Guy Van den Broeck The Fourth

WINLAB Contact: Liang Xiao lxiao@winlab.rutgers.edu With Profs. Larry Greenstein, Wade Trappe,

and the Tim ime-based Reconstruction Jenny Regina PANDA CM, Computing Session GSI, 24-28 June

Inference following aggregate-level hypothesis testing in large scale genomic data Ruth Heller

Off-line Signature Verification: A Circular Outline Grid-Based Feature Extraction Approach

Sambuz

Useful Links

Newsletter

Mail Us