Entropic Affinities: Properties and Efficient Numerical Computation - PowerPoint PPT Presentation

Entropic Affinities: Properties and Efficient Numerical Computation Max Vladymyrov and Miguel Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu June 18, 2013

Summary •The entropic affinities define affinities so that each point has an effective number of neighbors equal to K. •First introduced in: G. E. Hinton & S. Roweis: " Stochastic Neighbor Embedding ", NIPS 2002 . •Not in a widespread use, even though they work well in a range of problems. •We study some properties of entropic affinities and give fast algorithms to compute them. 2

Affinity matrix Defines a measure of similarity between points in the dataset. Used in: Affinity matrix Data set • Dimensionality reduction: 1 ‣ Stochastic Neighbor Embedding, t -SNE, 0.9 Elastic Embedding, Laplacian Eigenmaps. 0.8 • Clustering: 0.7 0.6 ‣ Mean-Shift, Spectral clustering. 0.5 • Semi-supervised learning. 0.4 0.3 • and others 0.2 The performance of the algorithms depends crucially of the affinity construction, govern by the bandwidth . σ Common practice to set : σ • constant, • rule-of-thumb (e.g. distance to the 7th nearset neighbor, Zelnik & Perona, 05). 3

Motivation: choice of σ COIL-20: Rotations of objects every 5º; input are greyscale images of . 128 × 128 Affinity matrices: Rule-of-thumb: Constant sigma Entropic affinities Dist. to the 7th nn (Zelnik & Perona, 05) − 5 − 6 − 4 x 10 x 10 x 10 16 7 3.5 14 6 3 12 5 2.5 10 4 2 8 3 1.5 6 2 1 4 1 0.5 2 0 0 4

Motivation: choice of σ COIL-20: Rotations of objects every 5º; input are greyscale images of . 128 × 128 Dimensionality Reduction with Elastic Embedding algorithm: Rule-of-thumb: Constant sigma Entropic affinities Dist. to the 7th nn (Zelnik & Perona, 05) 5

Search for good σ Good should be: σ • Set separately for every data point. • Take into account the whole distribution of distances. σ n x n σ n x n x 1 x 2 x 1 x 2 6

Entropic affinities In the entropic affinities, the is set individually for each point such that it has a σ distribution over neighbors with fixed perplexity (Hinton & Rowies, 2003). K x 1 , . . . , x N ∈ R D x ∈ R D • Consider a distribution of the neighbors for : x N � || ( x − x n ) / σ || 2 � K p n ( x ; σ ) = P N � || ( x − x k ) / σ || 2 � k =1 K posterior distribution of Kernel Density Estimate. x 2 x • The entropy of the distribution is defined as H ( x , σ ) = − P N n =1 p n ( x , σ ) log( p n ( x , σ )) • Consider the bandwidth (or precision ) given the perplexity : 1 x 1 β = K σ 2 σ 2 H ( x , β ) = log K • Perplexity of in a distribution over neighbors provides the same surprise N K p as if we were to choose among equiprobable neighbors. K • We define entropic affinities as probabilities for with respect p = ( p 1 , . . . , p N ) x to . Thos affinities define a random walk matrix. β 7

Entropic affinities: example 8

Entropic affinities: properties H ( x n , β n ) ≡ − P N n =1 p n ( x n , β n ) log( p n ( x n , β n )) = log K • This is a root-finding problem or an 1 D inversion problem . β n = H − 1 x n (log K ) • Should be solved for 6 x n ∈ x 1 , . . . , x N H(x, � ) β ∗ • We can prove that: K=30 4 ‣ The root-finding problem is well log(K) defined for a Gaussian kernel for 2 any , and has a unique root β n > 0 for any . K ∈ (0 , N ) 0 ‣ The inverse is a uniquely defined − 2 − 1 0 1 2 3 log( � ) continuously differentiable function for all and . x n ∈ R N K ∈ (0 , N ) 9

Entropic affinities: bounds The bounds for every and : [ β L , β U ] x n ∈ R N K ∈ (0 , N ) d N 0 1 s d 2 N log N log N A , K K β L = max , d 1 @ ( N − 1) ∆ 2 d 4 N − d 4 1 N 6 ✓ ◆ β U = 1 p 1 log ( N − 1) β ∗ , β U ∆ 2 1 − p 1 4 2 log(K) ∆ 2 2 = d 2 2 − d 2 where , , and is a unique ∆ 2 N = d 2 N − d 2 p 1 β L 2 1 1 solution of the equation √ N � � 2(1 − p 1 ) log 2(1 − p 1 ) = log min( 2 N, K ) 0 − 2 − 1 0 1 2 3 The bounds are computed in for each point. O (1) log( � ) 10

Entropic affinities: computation For every x n ∈ x 1 , . . . , x N H ( x n , β n ) = log K x N 1. Initialize as close to the root as β n possible. 2. Compute the root . β n x 1 x 2 11

1. Computation of ; the root-finding β n Convergence Number of . O ( N ) Methods Meth Derivatives order order evaluations Bisection linear 0 1 Derivative- Derivative- Brent linear 0 1 free free Ridder quadratic 0 2 Newton quadratic 1 2 Derivative- Derivative- Halley cubic 2 3 based based Euler cubic 2 3 • The cost of the objective function evaluation and each of derivative is . O ( N ) • Derivative-free methods above generally converge globally. They work by iteratively shrinking an interval bracketing the root. • Derivative-based methods have higher convergence order, but may diverge. 12

Robustified root-finding algorithm •We embed the derivative-based algorithm into bisection loop for global convergence. •We run the following algorithm for each x n ∈ x 1 , . . . , x N β K Input: initial , perplexity , 5.5 1 d 2 1 , . . . , d 2 , bounds . B distances 5 N while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update 1.5 H( � ) end for log(K) β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 13

Robustified root-finding algorithm •We embed the derivative-based algorithm into bisection loop for global convergence •We run the following algorithm for each x n ∈ x 1 , . . . , x N Bisection: step is outside the brackets β K Input: initial , perplexity , 5.5 1 d 2 1 , . . . , d 2 , bounds . B distances N 5 while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update H( � ) 1.5 log(K) end for Newton β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 14

Robustified root-finding algorithm •We embed the derivative-based algorithm 3.6 into bisection loop for global convergence 3.5 •We run the following algorithm for each 3.4 x n ∈ x 1 , . . . , x N 3.3 3.2 − 3.7 − 3.6 − 3.5 − 3.4 − 3.3 − 3.2 − 3.1 − 3 Normal step β K Input: initial , perplexity , 5.5 2 1 d 2 1 , . . . , d 2 , bounds . B distances N 5 while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update H( � ) 1.5 log(K) end for Newton β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 15

Robustified root-finding algorithm •We embed the derivative-based algorithm 3.6 into bisection loop for global convergence 3.5 •We run the following algorithm for each 3.4 x n ∈ x 1 , . . . , x N 3.3 3.2 − 3.7 − 3.6 − 3.5 − 3.4 − 3.3 − 3.2 − 3.1 − 3 Normal step β K Input: initial , perplexity , 5.5 2 3 1 d 2 1 , . . . , d 2 , bounds . B distances N 5 while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update H( � ) 1.5 log(K) end for Newton β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 16

Robustified root-finding algorithm •We embed the derivative-based algorithm 3.6 into bisection loop for global convergence 3.5 •We run the following algorithm for each 3.4 x n ∈ x 1 , . . . , x N 3.3 3.2 − 3.7 − 3.6 − 3.5 − 3.4 − 3.3 − 3.2 − 3.1 − 3 Normal step β K Input: initial , perplexity , 5.5 2 3 4 1 d 2 1 , . . . , d 2 , bounds . B distances N 5 while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update H( � ) 1.5 log(K) end for Newton β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 17

2. Initialization of β n 1. Simple initialization: • midpoint of the bounds, • distance to th nearest neighbor. k Typically far from root and require more iterations. 2. Each new is initialized from the β n solution to its predecessor: • sequential order; • tree order. We need to find orders that are correlated with the behavior of . β 18

2. Initialization of β n 1. Simple initialization: • middle of the bounds, • distance to th nearest neighbor. k Typically far from root and require more iterations. 2. Each new is initialized from the β n solution to its predecessor: • sequential order; • tree order. We need to find orders that are correlated with the behavior of . β 19

Entropic Affinities: Properties and Efficient Numerical Computation - PowerPoint PPT Presentation

Entropic Affinities: Properties and Efficient Numerical Computation Max Vladymyrov and Miguel Carreira-Perpin Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu June 18, 2013 Summary

Entropic Causal Inference Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath and Babak

Maxima and entropic repulsion of Gaussian free field: Going beyond Z d Joe P. Chen Department of

On entropic cost optimal transport cost Soumik Pal University of Washington, Seattle

Time energy entropic uncertainty relations: an algebraic approach Christian Bertoni, Yuxiang

First passage fluctuation relations ruled by cycles affinities F. Cornu joint work with M.

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

The Great Gatsby and Icarus Exposing Parallels and Problems within an Entropic Universe 5 May

EPR-steering Inequalities from Entropic Uncertainty Relations James Schneeloch, 1 Curtis J.

Divergence, Gibbs measures, and entropic regularizations of optimal transport Soumik Pal

MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle

Entropic Analysis of Spectrum Sensing for Cognitive Radio Jim Gaines (Dr. Neal Patwari)

Quantum Mechanical Foundations of Causal Entropic Forces Swapnil Shah Department of Electrical

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs Yogesh

Waiting for rare entropic fluctuations in stochastic thermodynamics Keiji Saito (Keio University)

Equivalence of wave-par2cle duality to entropic uncertainty

Affinity Graph Supervision for Visual Recognition Paper ID: 7437 Chu Wang 1 , Babak Samari 1 ,

Goal Goal: Identify groups of pixels that go together image credit: Steve Seitz, Kristen

Affinity Group 2 April 2, 2019 The University of Wisconsin Service Center will Serve the

Affinity-aw are Dynam ic Pinning Scheduling for Virtual Machines Zhi Li lizhi@cse.buaa.edu.cn

Social Justice Standards and Affinity Groups at PHS Featuring Dr. Elizabeth Denevi Hosted by The

VM: Hey VM, can I share a host with you? Affinity rules in a virtual cluster 4 th of

Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, Junxuan Chen, Yiru Zhao and

Towards a global IP Anycast service Hitesh Ballani, Paul Francis Cornell University ACM SIGCOMM