无 监 督学 习 中的 选 代表和被代表 问题 - AP & LLE 张响亮 Xiangliang Zhang King Abdullah University of Science and Technology CNCC, Oct 25, 2018 Hangzhou, China
Outline • Affinity Propagation (AP) [Frey and Dueck, Science, 2007] • Locally Linear Embedding (LLE) [Roweis and Saul, Science, 2000] 2
Affinity Propagation [Frey and Dueck, Science 2007] 3
Affinity Propagation [Frey and Dueck, NIPS 2005] We describe a new method that, for the first time to our knowledge, combines the advantages of model-based clustering and affinity-based clustering. Component Mixing coefficient 4
Clustering : group the similar points together Minimize Minimize k 2 k 2 S S - µ S S - µ ( x ) ( x ) = = m 1 i m m 1 i m Î Î x C x C i m i m where where 1 µ = Î µ = S x { x | x C } m i i Î x C | C | m i i m m m K-medoids K-medians
Inspired by greedy k-medoids Data to cluster: The likelihood of belong to the cluster with center is the Bayesian prior probability that is a cluster center The responsibility of k-th component generating x i Assign x i with center s i Choose a new center
Understanding the process Message sent from x i to each center/exemplar: preference to be with each exemplar Hard decision for cluster centers/exemplars Introduce: “availabilities”, which is sent from exemplars to other points, and provides soft evidence of the preference for each exemplar to be available as a center for each point
The method presented in NIPS ‘ 05 Responsibilities are computed using likelihoods and availabilities • Availabilities are computed using responsibilities, recursively • Affinities 8
Interpretation by Factor Graph is the index of the exemplar for Should not be empty with a single exemplar Constraints: An exemplar must select itself as exemplar Objective function is 9
Input and Output of AP in Science07 Preference (prior) 10
AP: a message passing algorithm 11
Iterations of Message passing in AP 12
Iterations of Message passing in AP 13
Iterations of Message passing in AP 14
Iterations of Message passing in AP 15
Iterations of Message passing in AP 16
Iterations of Message passing in AP 17
Iterations of Message passing in AP 18
Iterations of Message passing in AP 19
Summary AP Xiangliang Zhang, KAUST CS340: Data Mining 20
Extensive study of AP 21
Outline • Affinity Propagation (AP) [Frey and Dueck, Science, 2007] • Locally Linear Embedding (LLE) [Roweis and Saul, Science, 2000] 22
Locally Linear Embedding (LLE) [Roweis and Saul, Science, 2000] Saul and Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. JMLR 2003 23
LLE - motivations 24
Inspired by MDS Multidimensional Scaling (MDS), find embedding of objects in low-dim space, preserving pairwise distance Given pairwise similarity Embedding to find Eliminate the need to estimate pairwise distances between widely separated data points? 25
LLE – general idea Locally, on a fine enough scale, everything looks linear Represent object as linear combination of its neighbors Assumption: same linear representation will hold in the low dimensional space Find a low dimensional embedding which minimizes reconstruction loss 26
LLE – matrix representation 1.Select k nearest neighbors 2.Reconstruct x i by its K nearest neighbors Find W by minimizing (W) å å 2 e = - || x w x || i ij j i j 27
LLE – matrix representation 3.Need to solve system Y = W*Y Find the embedding vectors Y i by minimizing: N N 2 e = - = • å å åå (Y) || Y W Y || M (Y Y ) i ij j ij i j i j i j = - T - where M (I W) (I W) N = å s.t. Y 0 (centered on the origin) i i 1 N T = å and Y Y I ( with unit convarianc e ) i i N i 28
LLE – algorithm summary 1. Find k nearest neighbors in X space 2. Solve for reconstruction weights W -1 C 1 T = = - h - h h w where C ( x ) ( x ), and is one of x' s K nearest neighbors jk j k j T - 1 1 C 1 3. Compute embedding coordinates Y using weights W: Create a sparse matrix M = (I-W) T (I-W) Set Y to be the eigenvectors corresponding to the bottom d non-zero eigenvectors of M 29
Continuing, SNE Allows “many-to-one” mappings in which a single ambiguous object really belongs in several disparate locations in the low-dimensional space, while LLE makes one-to-one mapping. q j|i is induced probability that point p j|i is the asymmetric probability i picks point j as its neighbor that i would pick j as its neighbor 30 Gaussian Neighborhood in low- Gaussian Neighborhood in dim space original space
Continuing, t-SNE uses a Student-t distribution (heavier tail) rather than a Gaussian to compute the similarity between two points in the low-dimensional space symmetrized version of the SNE cost function with simpler gradients 31
[Chen and Liu, 2011] LLE - Follow up work LLE output strongly depends on selection of k Jing Chen and Yang Liu. Locally linear embedding: a survey. Artificial Intelligence Review (2011) 32
[Ting and Jordan, 2018]
Thank you for your attention! Lab of Machine Intelligence and kNowledge Engineering (MINE): http://mine.kaust.edu.sa/
Recommend
More recommend