Unsupervised Learning George Konidaris gdk@cs.brown.edu Fall 2019

Machine Learning Subfield of AI concerned with learning from data . Broadly, using: • Experience • To Improve Performance • On Some Task (Tom Mitchell, 1997)

Unsupervised Learning Input: inputs X = {x 1 , …, x n } Try to understand the structure of the data. E.g., how many types of cars? How can they vary?

Clustering One particular type of unsupervised learning: • Split the data into discrete clusters. • Assign new data points to each cluster. • Clusters can be thought of as types . Formal definition Given: • Data points X = {x 1 , …, x n }. Find: • Number of clusters k • Assignment function f(x) = {1, …, k}

Clustering

k-Means One approach: • Pick k • Place k points (“means”) in the data • Assign new point to i th cluster if nearest to i th “mean”.

k-Means

k-Means Major question: • Where to put the “means”? Very simple algorithm: • Place k “means” at random. { µ 1 , ..., µ k } • Assign all points in the data to each “mean” f ( x j ) = i such that d ( x j , µ i )  d ( x j , µ l ) 8 l 6 = i • Move each “mean” to mean of assigned data. x v X µ i = | C i | v ∈ C i

k-Means

k-Means Remaining questions … How to choose k ? What about bad initializations? How to measure distance? Broadly: • Use a quality metric. • Loop through k . • Random restart initial position. • Use distance metric D .

Density Estimation Clustering: can answer which cluster, but not does this belong ?

Density Estimation Estimate the distribution the data is drawn from . This allows us to evaluate the probability that a new point is drawn from the same distribution as the old data. Formal definition Given: • Data points X = {x 1 , …, x n }, Find: • PDF P(X)

GMM Simple approach: • Model the data as a mixture of Gaussians. Each Gaussian has its own mean and variance. Each has its own weight (sum to 1). Weighted sum of Gaussians still a PDF.

GMM Algorithm - broadly as before: • Place k “means” at random. { µ 1 , ..., µ k } • Set variances to be high. • Assign all points to highest probability distribution. C i = { x v | N ( x v | µ i , σ 2 i ) > N ( x v | µ j , σ 2 j ) , ∀ j } • Set mean, variance, weights to match assigned data. | C i | x v X σ 2 w i = µ i = i = variance( C i ) P j | C j | | C i | v ∈ C i

GMM Major issue: • How to decide between two GMMs? • How to choose k ? General statistical question: model selection. Several good answers for this. Simple example: Bayesian information criterion (BIC). Trades off model complexity (k) with fit (likelihood). − 2 log L + k log n # data # parameters points likelihood in model

Nonparametric Density Estimation Parametric: • Define a parametrized model (e.g., a Gaussian) • Fit parameters • Done! Key assumptions : • Data is distributed according to the parametrized form. • We know which parametrized form in advance. What is the shape of the distribution over images representing flowers?

Nonparametric Density Estimation Nonparametric alternative: • Avoid fixed parametrized form. • Compute density estimate directly from the data. Kernel density estimator: n ✓ x i − x ◆ PDF ( x ) = 1 X D nb b i =1 where: • D is a special kind of distance metric called a kernel. • Falls away from zero, integrates to one. • b is bandwidth: controls how fast kernel falls away.

Nonparametric Density Estimation n ✓ x i − x ◆ PDF ( x ) = 1 X D nb b i =1 Kernel: • Lots of choices, Gaussian often works in practice. Bandwidth: • High: distant points have higher “contribution” to sum. • Low: distant points have lower.

Nonparametric Density Estimation (wikipedia)

Nonparametric Density Estimator

Dimensionality Reduction X = {x 1 , …, x n }, each x i has m dimensions: x i = [x 1 , …, x m ] . If m is high, data can be hard to deal with. • High-dimensional decision boundary. • Need more data. • But data is often not really high-dimensional. Dimensionality reduction: • Reduce or compress the data • Try not to lose too much! • Find intrinsic dimensionality

Dimensionality Reduction For example, imagine if x 1 and x 2 are meaningful features, and x 3 … x m are random noise. What happens to k-nearest neighbors? What happens to a decision tree? What happens to the perceptron algorithm? What happens if you want to do clustering?

Dimensionality Reduction Often can be phrased as a projection: f : X → X 0 where: • | X 0 | << | X | • our goal: retain as much sample variance as possible. Variance captures what varies within the data .

PCA Principle Components Analysis. Project data into a new space: • Dimensions are linearly uncorrelated. • We have a measure of importance for each dimension.

Unsupervised Learning George Konidaris gdk@cs.brown.edu Fall 2019 - PowerPoint PPT Presentation

Unsupervised Learning George Konidaris gdk@cs.brown.edu Fall 2019 Machine Learning Subfield of AI concerned with learning from data . Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell, 1997)

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

DarkSide-20k and the Darkside Program for Dark Matter Searches Cristiano Galbiati Princeton

MINER n A Cross Sections what is MINER n A ? why MINER n A ? n beam and n flux n / n inclusive

Numeric Rela5onal Operators The if Statement The if statement

Polish in-kind contribution to the FAIR cryogenic system Maciej Chorowski, Jaros aw Fydrych

Data-Intensive Distributed Computing CS 431/461 451/651 (Fall 2019) Part 2: From MapReduce to

Lecture 8 (Part 2): Texturing Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic

Mass Storage & IO - II RAID: Redundant Array of Inexpensive Disks multiple disk drives

CMSC 131 Fall 2018 Announcements Project #1 (Orioles Baseball) due Sunday Computers are

Unsupervised Learning George Konidaris gdk@cs.brown.edu Fall 2019 - PowerPoint PPT Presentation

Unsupervised Learning George Konidaris gdk@cs.brown.edu Fall 2019 Machine Learning Subfield of AI concerned with learning from data . Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell, 1997)

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

DarkSide-20k and the Darkside Program for Dark Matter Searches Cristiano Galbiati Princeton

MINER n A Cross Sections what is MINER n A ? why MINER n A ? n beam and n flux n / n inclusive

Numeric Rela5onal Operators The if Statement The if statement

Polish in-kind contribution to the FAIR cryogenic system Maciej Chorowski, Jaros aw Fydrych

Data-Intensive Distributed Computing CS 431/461 451/651 (Fall 2019) Part 2: From MapReduce to

Lecture 8 (Part 2): Texturing Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic

Mass Storage &amp; IO - II RAID: Redundant Array of Inexpensive Disks multiple disk drives

CMSC 131 Fall 2018 Announcements Project #1 (Orioles Baseball) due Sunday Computers are

Mass Storage & IO - II RAID: Redundant Array of Inexpensive Disks multiple disk drives