Dim imensionality ty Redu eduction: Th Theoretic ical Ana nalysis of Pr Practi tical Mea easu sures Nova Fandina Hebrew University Joint work with Yair Bartal, Hebrew University Ofer Neiman, Ben Gurion University 1
Outl utline โข Measuring the Quality of Embedding - in theory : worst case distortion analysis - in practice: average case distortion measures - in between: theoretical analysis of practical measures (for dimensionality reduction methods) โข Our Results - upper bounds - lower bounds - approximating optimal embedding 2
Measuring the Quality ty of Embeddin ing: g: in theory Basic question in metric embedding theory (informally) Given metric spaces ๐ and ๐ , embed ๐ into ๐ , with small error on the distances How well it can be done? In theory: โwellโ traditionally means to minimize distortion of the worst pair Definition For an embedding ๐: ๐ โ ๐ , for a pair of points ๐ฃ โ ๐ค โ ๐ ๐ ๐ ๐ ๐ฃ ,๐ ๐ค ๐ ๐ ๐ฃ,๐ค โข ๐๐ฆ๐๐๐๐ก ๐ ๐ฃ, ๐ค = , ๐๐๐๐ข๐ ๐ ๐ฃ, ๐ค = ๐ ๐ ๐ฃ,๐ค ๐ ๐ ๐ ๐ฃ ,๐ ๐ค โข ๐๐๐ก๐ข๐๐ ๐ข๐๐๐ ๐ = ๐๐๐ฆ ๐ฃโ ๐คโ๐ ๐๐ฆ๐๐๐๐ก ๐ ๐ฃ, ๐ค โ ๐๐๐ฆ ๐ฃโ ๐คโ๐ {๐๐๐๐ข๐ ๐ (๐ฃ, ๐ค)} 3
Mea easuring the Quality ty of Embeddin ing: in pract ctice ce Demand for the worst case guarantee is too strong: The quality of a method in practical applications is its average performance over all pairs โข Yuval Shavitt and Tomer Tankel. Big-bang simulation for embedding network distances in Euclidean space . IEEE/ACM Trans. Netw., 12(6), 2004. โข P. Sharma, Z. Xu, S. Banerjee, and S. Lee . Estimating network proximity and latency. Computer Communication Review, 36(3), 2006. โข P. J. F. Groenen, R. Mathar, and W. J. Heiser. The majorization approach to multidimensional scaling for minkowski distances . Journal of Classification, 12(1), 1995. โข J. F. Vera, W. J. Heiser, and A. Murillo. Global optimization in any minkowski metric: A permutation- translation simulated annealing algorithm for multidimensional scaling . Journal of Classification, 24(2), 2007. โข A. Censi and D. Scaramuzza. Calibration by correlation using metric embedding from nonmetric similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(10), 2013. โข C. Lumezanu and N. Spring . Measurement manipulation and space selection in network coordinates . The 28th International Conference on Distributed Computing Systems, 2008. โข S. Chatterjee, B. Neff, and P. Kumar. Instant approximate 1-center on road networks via embeddings. In Proceedings of the 19th International Conference on Advances in Geographic Information Systems, GIS โ11, 2011. โข S. Lee, Z. Zhang, S. Sahu, and D. Saha. On suitability of Euclidean embedding for host-based network coordinate systems . IEEE/ACM Trans. Netw., 18(1), 2010. โข L. Chennuru Vankadara and U. von Luxburg. Measures of distortion for machine learning . Advances in Neural Information Processing Systems, Curran Associates, Inc., 2018. Just a small sample from googolplex number of such studies 4
Measuring the Quality ty of Embeddin ing: g: in pract ctice ce Moments of Distortion and Relative Error For ๐: ๐ โ ๐ , for a pair ๐ฃ โ ๐ค โ ๐, ๐๐๐ก๐ข ๐ ๐ฃ, ๐ค โ max{๐๐ฆ๐๐๐๐ก ๐ ๐ฃ, ๐ค , ๐๐๐๐ข๐ ๐๐๐ข ๐ (๐ฃ, ๐ค)} โ ๐ -di distortio ion ABN[06] For ๐: ๐ โ ๐, for a distribution ฮ over pairs of ๐, ๐ โฅ 1 1/๐ ๐ ฮ - dist(f) = ๐น ๐ฒ โ ๐ ๐๐๐ก๐ข ๐ ๐ฃ, ๐ค Relative Error Measure [In many papers] 1/๐ (ฮ ) = ๐น ๐ฒ ๐ ๐๐น๐ ๐ |๐๐๐ก๐ข ๐ ๐ฃ, ๐ค โ 1| 5
Measuring the Quality ty of Embeddin ing: g: in pract ctice ce Additive Distortion Measures [MDS: optimally embed a given finite X into a k-dim Euclidean space, for a given k] แ For a pair ๐ฃ โ ๐ค โ ๐, ๐ ๐ฃ๐ค = ๐ ๐ ๐ฃ, ๐ค , ๐ ๐ฃ๐ค = ๐ ๐ ๐ ๐ฃ , ๐ ๐ค 1/๐ 1/๐ ๐น ฮ [|๐ ๐ฃ๐ค โ แ ๐ ๐ฃ๐ค | ๐ ] ๐น ฮ [|๐ ๐ฃ๐ค โ แ ๐ ๐ฃ๐ค | ๐ ] โ ๐ = ๐๐ข๐ ๐๐ก๐ก ๐ ๐๐ข๐ ๐๐ก๐ก ๐ ๐ = ๐ ] ๐น ฮ [ ๐ ๐ฃ๐ค ๐ ] ๐น ฮ [ แ ๐ ๐ฃ๐ค 1/๐ 1/๐ ๐ ๐ | แ | แ ๐ ๐ฃ๐ค โ ๐ ๐ฃ๐ค | ๐ ๐ฃ๐ค โ ๐ ๐ฃ๐ค | ๐๐น๐ ๐ ๐ = E ฮ ๐น๐๐๐ ๐๐ง ๐ ๐ = E ฮ min{๐ ๐ฃ๐ค , แ ๐ ๐ฃ๐ค ๐ ๐ฃ๐ค } 6
Measuring the Quality ty of Embeddin ing: g: in pract ctice ce ๐ -distortion ML motivated, in [VvL18] โข Many heuristics for optimizing these measures 1/๐ ๐ ๐๐ฆ๐๐๐๐ก ๐ ๐ฃ, ๐ค ๐ โ ๐๐๐ก๐ข (ฮ )๐,๐ ๐ = E ฮ ๐ โ ๐๐ฆ๐๐๐๐ก ๐ โ 1 โ ๐ โข Almost nothing is known in terms of rigorous analysis (U) โ ๐๐ฆ๐๐๐๐ก ๐ = ๐น U [(๐๐ฆ๐๐๐๐ก ๐ ๐ฃ, ๐ค ๐ )] โข โ ๐ (U) โ ๐๐๐๐ข๐ ๐ = ๐น U [(๐๐๐๐ข๐ ๐ ๐ฃ, ๐ค ๐ )] โข โ ๐ โNecessary properties for ML applicationsโ โข translation invariance โข scale invariance โข monotonicity โข robustness (outliers, noise) โข incorporation of probability 7
Measuring the Quality ty of Embeddin ing: g: in betw tween Bridging the gap between theory and practice outlook ๐ฝ(๐, ๐) -Dimension Reduction Given a dimension bound ๐ฅ โฅ ๐ and ๐ซ โฅ ๐ , what is the least ๐(๐ฅ, ๐ซ) such that every finite subset of Euclidean space embeds into ๐ฅ dim. with ๐๐๐๐ญ๐ฏ๐ฌ๐ ๐ซ โค ๐(๐ฅ, ๐ซ) ? General Metrics: Approximating the Optimal Embedding General Metrics (MDS) For a given finite ๐ and for ๐ โฅ 1 , compute an embedding of X into k-dim Euclidean For a given finite ๐ and ๐ โฅ 1 , compute the optimal embedding of ๐ into k-dim space that approximates the best possible embedding, for a given ๐๐๐๐ก๐ฃ๐ ๐ ๐ . Euclidean space, minimizing a particular ๐๐๐๐ก๐ฃ๐ ๐ ๐ . [CD06] Optimizing is NP-hard for ๐๐ข๐ ๐๐ก๐ก ๐ and ๐ = 1 8
Our ur Resu sult lts: : upper boun ounds s previous results ๐ฝ(๐, ๐) -Dimension Reduction Given a dimension bound ๐ฅ โฅ ๐ and ๐ซ โฅ ๐ , what is the least ๐(๐ฅ, ๐ซ) such that every finite subset of Euclidean space embeds into ๐ฅ dim. with ๐๐๐๐ญ๐ฏ๐ฌ๐ ๐ซ โค ๐(๐ฅ, ๐ซ) ? Previous results: worst case distortion 2 ๐ embeds into โ 2 ๐ with distortion ๐ ๐ ๐ (log ๐)/๐ JL[84] Every ๐ -point ๐ โ โ 2 ๐+1 such that any ๐: ๐ โ โ 2 ๐ must have distortion ๐ ฮฉ(1/๐) Mat[90] There is ๐ โ โ 2 distortion(f) โค (โ โ - dist) 2 โช โช For every ๐: ๐ โ ๐ (scalable) there is g: ๐ โ ๐ with โ โ - ๐๐ฃ๐ญ๐ฎ ๐ก = ๐๐ฃ๐ญ๐ฎ ๐ What about the ๐๐๐๐ก๐ฃ๐ ๐ ๐ guarantees for ๐ < โ? 9
Our ur Resu esult lts: upper boun bounds s JL transform: IM implementation The answer to the ๐ฝ(๐, ๐) -Dim. Reduction question is, essentially, the JL transform [JL84] Projection onto a random subspace of dim. ๐ = ๐ท(๐ฆ๐ฉ๐ก ๐ /๐ ๐ ) , with const. prob. ๐๐๐๐ ๐ = ๐ + ๐ [tight, LN16] [IM 98] ๐ is a matrix of size ๐ ร ๐ with indep. entries sampled from ๐(0,1) . ๐ is The embedding ๐: ๐ โ โ 2 ๐ ๐ฆ = 1/ ๐ โ ๐(๐ฆ) โข The JL transform of IM98 provides constant upper bounds for all ๐๐๐๐ก๐ฃ๐ ๐ ๐ The bounds are almost optimal โข Other popular implementations of JL do not work for โ ๐ -dist and for ๐๐น๐ ๐ โข PCA may produce an embedding of extremely poor quality for all the measures (this does not happen to the JL) 10
Our ur Resu sult lts: : upper boun ounds s other implementations of JL [Achl03] The entries of T are uniform indep. from {ยฑ1} [DKS10,KN10, AL10] Sparse/Fast: particular distr. from {ยฑ1,0} Constant bounds cannot be achieved using the above implementations Observation If a linear transformation ๐: ๐ ๐ โ ๐ ๐ samples its entries form a discrete set of values of size ๐ก โค ๐ 1/๐ , then applying it on a standard basis of ๐ ๐ results in โ ๐ -dist, ๐๐น๐ ๐ = โ. 1/๐ ๐ 1/๐ (ฮ ) = ๐น ๐ฒ ฮ - dist(f) = ๐น ๐ฒ ๐ โช โ ๐ ๐๐๐ก๐ข ๐ ๐ฃ, ๐ค , ๐๐น๐ ๐ |๐๐๐ก๐ข ๐ ๐ฃ, ๐ค โ 1| โช ๐๐๐ก๐ข ๐ ๐ฃ, ๐ค = max(๐๐ฆ๐๐๐๐ก ๐ ๐ฃ, ๐ค , ๐๐๐๐ข๐ ๐๐๐ข ๐ (๐ฃ, ๐ค)) ๐ ๐ 1 , โฆ , ๐ ๐ = {๐๐๐๐ฃ๐๐๐ก ๐๐ ๐}. T he number of different columns is ๐ก ๐ < ๐ โข 11
Our ur Resu sult lts: upper boun ounds s limitation of PCA ๐ and a given integer ๐ โฅ 1 , computes the PCA/c-MDS For a given finite ๐ โ โ 2 best rank ๐ - approx. to ๐: A projection ๐ onto the ๐ - dim subspace spanned by largest eigenvectors of 2 the covariance matrix, with the smallest ฯ ๐ฃโ๐ ๐ฃ โ ๐ ๐ฃ ๐ with optimal ฯ ๐ฃโ ๐คโ๐ (๐ ๐ฃ๐ค 2 โ แ 2 ) over all projections โช ๐: ๐ โ โ 2 ๐ ๐ฃ๐ค โช Often misused: โminimizing ๐๐ข๐ ๐๐ก๐ก 2 over all embeddings into ๐ - dimโ โช Actually, PCA does not minimize any of the mentioned measures 12
Recommend
More recommend