Learning From Data Lecture 19 A Peek At Unsupervised Learning k - PowerPoint PPT Presentation

Learning From Data Lecture 19 A Peek At Unsupervised Learning k -Means Clustering Probability Density Estimation Gaussian Mixture Models M. Magdon-Ismail CSCI 4100/6100

recap: Radial Basis Functions Nonparametric RBF Parametric k -RBF-Network N � � k α n ( x ) � | | x − µ j | | � � � g ( x ) = · y n h ( x ) = w 0 + w j · φ � N r m =1 α m ( x ) n =1 j =1 = w t Φ ( x ) � � | | x − x n | | (bump on µ j ) α n ( x ) = φ (bump on x ) r linear model given µ j choose µ j as centers of k -clusters of data r = 0 . 05 y y x x k = 4 , r = 1 k = 10, regularized No Training k M Unsupervised Learning : 2 /23 � A c L Creator: Malik Magdon-Ismail Unsupervised learning − →

Unsupervised Learning • Preprocessor to organize the data for supervised learning: Organize data for faster nearest neighbor search Determine centers for RBF bumps. • Important to be able to organize the data to identify patterns. Learn the patterns in data, e.g. the patterns in a language before getting into a supervised setting. amazon.com organizes books into categories M Unsupervised Learning : 3 /23 � A c L Creator: Malik Magdon-Ismail Clustering digits − →

Clustering Digits 21-NN rule, 10 Classes 10 Clustering of Data 1 Symmetry 4 0 9 8 3 7 2 6 Average Intensity M Unsupervised Learning : 4 /23 � A c L Creator: Malik Magdon-Ismail Clustering − →

Clustering A cluster is a collection of points S A k -clustering is a partition of the data into k clusters S 1 , . . . , S k . ∪ k j =1 S j = D S i ∩ S j = ∅ for i � = j Each cluster has a center µ j M Unsupervised Learning : 5 /23 � A c L Creator: Malik Magdon-Ismail k -means error − →

How good is a clustering? Points in a cluster should be similar (close to each other, and the center) Error in cluster j : � | 2 . | | x n − µ j | E j = x n ∈ S j k -Means Clustering Error: k � E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) = E j j =1 N | 2 � | | x n − µ ( x n ) | = n =1 µ ( x n ) is the center of the cluster to which x n belongs. M Unsupervised Learning : 6 /23 � A c L Creator: Malik Magdon-Ismail − →

k -Means Clustering You get to pick S 1 , . . . , S k and µ 1 , . . . , µ k to minimize E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) If centers µ j are known, picking the sets is easy: Add to S j all points closest to µ j If the clusters S j are known, picking the centers is easy: Center µ j is the centroid of cluster S j 1 � µ j = x n | S j | x n ∈ S j M Unsupervised Learning : 7 /23 � A c L Creator: Malik Magdon-Ismail Lloyd’s algorithm − →

Lloyd’s Algorithm for k -Means Clustering N � | 2 E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) = | | x n − µ ( x n ) | n =1 1: Initialize Pick well separated centers µ j . 2: Update S j to be all points closest µ j . S j ← { x n : | | x n − µ j | | ≤ | | x n − µ ℓ | | for ℓ = 1 , . . . , k } . 3: Update µ j to the centroid of S j . 1 � µ j ← x n | S j | x n ∈ S j 4: Repeat steps 2 and 3 until E in stops decreasing. M Unsupervised Learning : 8 /23 � A c L Creator: Malik Magdon-Ismail Update clusters − →

Lloyd’s Algorithm for k -Means Clustering N � | 2 E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) = | | x n − µ ( x n ) | n =1 1: Initialize Pick well separated centers µ j . 2: Update S j to be all points closest µ j . S j ← { x n : | | x n − µ j | | ≤ | | x n − µ ℓ | | for ℓ = 1 , . . . , k } . 3: Update µ j to the centroid of S j . 1 � µ j ← x n | S j | x n ∈ S j 4: Repeat steps 2 and 3 until E in stops decreasing. M Unsupervised Learning : 9 /23 � A c L Creator: Malik Magdon-Ismail Update centers − →

Lloyd’s Algorithm for k -Means Clustering N � | 2 E in ( S 1 , . . . , S k ; µ 1 , . . . , µ k ) = | | x n − µ ( x n ) | n =1 1: Initialize Pick well separated centers µ j . 2: Update S j to be all points closest µ j . S j ← { x n : | | x n − µ j | | ≤ | | x n − µ ℓ | | for ℓ = 1 , . . . , k } . 3: Update µ j to the centroid of S j . 1 � µ j ← x n | S j | x n ∈ S j 4: Repeat steps 2 and 3 until E in stops decreasing. M Unsupervised Learning : 10 /23 � A c L Creator: Malik Magdon-Ismail Application to RBF-Network − →

Application to k -RBF-Network 10-center RBF-network 300-center RBF-network Choosing k - knowledge of problem (10 digits) or CV. M Unsupervised Learning : 11 /23 � A c L Creator: Malik Magdon-Ismail Probability density estimation − →

Probability Density Estimation P ( x ) P ( x ) measures how likely it is to generate inputs similar to x . Estimating P ( x ) results in a ‘softer/finer’ representation than clustering Clusters are regions of high probability. M Unsupervised Learning : 12 /23 � A c L Creator: Malik Magdon-Ismail Parzen windows − →

Parzen Windows – RBF density estimation Basic idea: put a bump of ‘size’ (volume) 1 N on each data point. P ( x ) x N � | | x − x i | | 1 � ˆ � P ( x ) = φ Nr d r i =1 (2 π ) d/ 2 e − 1 1 2 z 2 φ ( z ) = M Unsupervised Learning : 13 /23 � A c L Creator: Malik Magdon-Ismail Digits data − →

Digits Data RBF Density Estimate Density Contours y y x x M Unsupervised Learning : 14 /23 � A c L Creator: Malik Magdon-Ismail GMM − →

The Gaussian Mixture Model (GMM) Instead of N bumps − → k ≪ N bumps. (Similar to nonparametric RBF − → parametric k -RBF-network) Instead of uniform spherical bumps − → each bump has its own shape. Bump centers: µ 1 , . . . , µ k Bump shapes: Σ 1 , . . . , Σ k Gaussian formula for the bump: 1 (2 π ) d/ 2 | Σ j | 1 / 2 e − 1 2 ( x − µ j ) t Σ j − 1 ( x − µ j ) . N ( x ; µ j , Σ j ) = M Unsupervised Learning : 15 /23 � A c L Creator: Malik Magdon-Ismail GMM formula − →

GMM Density Estimate (2 π ) d/ 2 | Σ j | 1 / 2 e − 1 1 2 ( x − µ j ) t Σ j − 1 ( x − µ j ) . N ( x ; µ j , Σ j ) = k ˆ � w j N ( x ; µ j , Σ j ) P ( x ) = j =1 (Sum of k weighted bumps). k � w j > 0 , w j = 1 j =1 You get to pick { w j , µ j , Σ j } j =1 ,...,k M Unsupervised Learning : 16 /23 � A c L Creator: Malik Magdon-Ismail Maximum likelihood − →

Maximize Likelihood Estimation Pick { w j , µ j , Σ j } j =1 ,...,k to best explain the data. Maximize the likelihood of the data given { w j , µ j , Σ j } j =1 ,...,k (We saw this when we derived the cross entropy error for logistic regression) M Unsupervised Learning : 17 /23 � A c L Creator: Malik Magdon-Ismail E-M algorithm − →

Expectation-Maximization: The E-M Algorithm A simple algorithm to get to the local minimum of the likelihood. Partition variables into two sets. Given one-set, you can estimate the other ‘Bootstrap’ your way to a decent solution. Lloyd’s algorithm for k -means is an example for ‘hard clustering’ M Unsupervised Learning : 18 /23 � A c L Creator: Malik Magdon-Ismail γ nj − →

Bump Memberships Fraction of x n belonging to bump j (a ‘hidden variable’) γ nj N � N j = γ nj (‘number’ of points in bump j ) n =1 w j = N j (probability bump j ) N N 1 � µ j = γ nj x n (centroid of bump j ) N j n =1 N 1 � γ nj x n x t n − µ j µ t Σ j = (covariance matrix of bump j ) j N j n =1 M Unsupervised Learning : 19 /23 � A c L Creator: Malik Magdon-Ismail Parameters given γ nj − →

Bump Memberships Fraction of x n belonging to bump j (a ‘hidden variable’) γ nj N � N j = γ nj (‘number’ of points in bump j ) n =1 w j = N j (probability bump j ) N N 1 � µ j = γ nj x n (centroid of bump j ) N j n =1 N 1 � γ nj x n x t n − µ j µ t Σ j = (covariance matrix of bump j ) j N j n =1 M Unsupervised Learning : 20 /23 � A c L Creator: Malik Magdon-Ismail Restimating γ nj − →

Re-Estimating Bump Memberships w j N ( x n ; µ j , Σ j ) γ nj = � k ℓ =1 w ℓ N ( x n ; µ ℓ , Σ ℓ ) γ nj is the probability that x n came from bump j probability of bump j : w j probability density for x n given bump j : N ( x n ; µ j , Σ j ) M Unsupervised Learning : 21 /23 � A c L Creator: Malik Magdon-Ismail E-M Algorithm − →

E-M Algorithm E-M Algorithm for GMMs: 1: Start with estimates for the bump membership γ nj . 2: Estimate w j , µ j , Σ j given the bump memberships. 3: Update the bump memberships given w j , µ j , Σ j ; 4: Iterate to step 2 until convergence. M Unsupervised Learning : 22 /23 � A c L Creator: Malik Magdon-Ismail GMM on digits − →

GMM on Digits Data 10-center GMM Density Contours y y x x M Unsupervised Learning : 23 /23 � A c L Creator: Malik Magdon-Ismail

Learning From Data Lecture 19 A Peek At Unsupervised Learning k - PowerPoint PPT Presentation

Learning From Data Lecture 19 A Peek At Unsupervised Learning k -Means Clustering Probability Density Estimation Gaussian Mixture Models M. Magdon-Ismail CSCI 4100/6100 recap: Radial Basis Functions Nonparametric RBF Parametric k -RBF-Network

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Mt. San Jacinto College Student Health Center Kay Peek, DBA, RN, MSHCM, PHN C. Grant Peek, MD,

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Lecture 13: From Unsupervised to Reinforcement Learning (Chapters 8-10) R. Rao, 528: Lecture 13

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

05 Detail mapping Steve Marschner CS5625 Spring 2016 Hierarchy of scales macroscopic 1000

Whats your reply? What do you think about the Bible? MY GOAL Elevator Speech 10

Conflict of Interest I have no relevant conflicts to disclose. 2 Case 1 A 45 year old man

Conflict of Interest 2016 I have no relevant conflicts to disclose. Internal Medicine Board

Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k

Thinking outside of the chip Using co-design to optimize interconnect between IC, Package

Study of I ndium bumps f or the Study of I ndium bumps f or the ATLAS pixel detector ATLAS

The Art and Science of (small) Memory Allocation Don Porter 1 COMP 530: Operating Systems