dim imensionality ty redu eduction
play

Dim imensionality ty Redu eduction: Th Theoretic ical Ana - PowerPoint PPT Presentation

Dim imensionality ty Redu eduction: Th Theoretic ical Ana nalysis of Pr Practi tical Mea easu sures Nova Fandina Hebrew University Joint work with Yair Bartal, Hebrew University Ofer Neiman, Ben Gurion University 1 Outl utline


  1. Dim imensionality ty Redu eduction: Th Theoretic ical Ana nalysis of Pr Practi tical Mea easu sures Nova Fandina Hebrew University Joint work with Yair Bartal, Hebrew University Ofer Neiman, Ben Gurion University 1

  2. Outl utline โ€ข Measuring the Quality of Embedding - in theory : worst case distortion analysis - in practice: average case distortion measures - in between: theoretical analysis of practical measures (for dimensionality reduction methods) โ€ข Our Results - upper bounds - lower bounds - approximating optimal embedding 2

  3. Measuring the Quality ty of Embeddin ing: g: in theory Basic question in metric embedding theory (informally) Given metric spaces ๐‘Œ and ๐‘ , embed ๐‘Œ into ๐‘ , with small error on the distances How well it can be done? In theory: โ€œwellโ€ traditionally means to minimize distortion of the worst pair Definition For an embedding ๐‘”: ๐‘Œ โ†’ ๐‘ , for a pair of points ๐‘ฃ โ‰  ๐‘ค โˆˆ ๐‘Œ ๐‘’ ๐‘ ๐‘” ๐‘ฃ ,๐‘” ๐‘ค ๐‘’ ๐‘Œ ๐‘ฃ,๐‘ค โ€ข ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘” ๐‘ฃ, ๐‘ค = , ๐‘‘๐‘๐‘œ๐‘ข๐‘  ๐‘” ๐‘ฃ, ๐‘ค = ๐‘’ ๐‘Œ ๐‘ฃ,๐‘ค ๐‘’ ๐‘ ๐‘” ๐‘ฃ ,๐‘” ๐‘ค โ€ข ๐‘’๐‘—๐‘ก๐‘ข๐‘๐‘ ๐‘ข๐‘—๐‘๐‘œ ๐‘” = ๐‘›๐‘๐‘ฆ ๐‘ฃโ‰ ๐‘คโˆˆ๐‘Œ ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘” ๐‘ฃ, ๐‘ค โ‹… ๐‘›๐‘๐‘ฆ ๐‘ฃโ‰ ๐‘คโˆˆ๐‘Œ {๐‘‘๐‘๐‘œ๐‘ข๐‘  ๐‘” (๐‘ฃ, ๐‘ค)} 3

  4. Mea easuring the Quality ty of Embeddin ing: in pract ctice ce Demand for the worst case guarantee is too strong: The quality of a method in practical applications is its average performance over all pairs โ€ข Yuval Shavitt and Tomer Tankel. Big-bang simulation for embedding network distances in Euclidean space . IEEE/ACM Trans. Netw., 12(6), 2004. โ€ข P. Sharma, Z. Xu, S. Banerjee, and S. Lee . Estimating network proximity and latency. Computer Communication Review, 36(3), 2006. โ€ข P. J. F. Groenen, R. Mathar, and W. J. Heiser. The majorization approach to multidimensional scaling for minkowski distances . Journal of Classification, 12(1), 1995. โ€ข J. F. Vera, W. J. Heiser, and A. Murillo. Global optimization in any minkowski metric: A permutation- translation simulated annealing algorithm for multidimensional scaling . Journal of Classification, 24(2), 2007. โ€ข A. Censi and D. Scaramuzza. Calibration by correlation using metric embedding from nonmetric similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(10), 2013. โ€ข C. Lumezanu and N. Spring . Measurement manipulation and space selection in network coordinates . The 28th International Conference on Distributed Computing Systems, 2008. โ€ข S. Chatterjee, B. Neff, and P. Kumar. Instant approximate 1-center on road networks via embeddings. In Proceedings of the 19th International Conference on Advances in Geographic Information Systems, GIS โ€™11, 2011. โ€ข S. Lee, Z. Zhang, S. Sahu, and D. Saha. On suitability of Euclidean embedding for host-based network coordinate systems . IEEE/ACM Trans. Netw., 18(1), 2010. โ€ข L. Chennuru Vankadara and U. von Luxburg. Measures of distortion for machine learning . Advances in Neural Information Processing Systems, Curran Associates, Inc., 2018. Just a small sample from googolplex number of such studies 4

  5. Measuring the Quality ty of Embeddin ing: g: in pract ctice ce Moments of Distortion and Relative Error For ๐‘”: ๐‘Œ โ†’ ๐‘ , for a pair ๐‘ฃ โ‰  ๐‘ค โˆˆ ๐‘Œ, ๐‘’๐‘—๐‘ก๐‘ข ๐‘” ๐‘ฃ, ๐‘ค โ‰” max{๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘” ๐‘ฃ, ๐‘ค , ๐‘‘๐‘๐‘œ๐‘ข๐‘ ๐‘๐‘‘๐‘ข ๐‘” (๐‘ฃ, ๐‘ค)} โ„“ ๐’“ -di distortio ion ABN[06] For ๐‘”: ๐‘Œ โ†’ ๐‘, for a distribution ฮ  over pairs of ๐‘Œ, ๐‘Ÿ โ‰ฅ 1 1/๐‘Ÿ ๐‘Ÿ ฮ  - dist(f) = ๐น ๐›ฒ โ„“ ๐‘Ÿ ๐‘’๐‘—๐‘ก๐‘ข ๐‘” ๐‘ฃ, ๐‘ค Relative Error Measure [In many papers] 1/๐‘Ÿ (ฮ ) = ๐น ๐›ฒ ๐‘Ÿ ๐‘†๐น๐‘ ๐‘Ÿ |๐‘’๐‘—๐‘ก๐‘ข ๐‘” ๐‘ฃ, ๐‘ค โˆ’ 1| 5

  6. Measuring the Quality ty of Embeddin ing: g: in pract ctice ce Additive Distortion Measures [MDS: optimally embed a given finite X into a k-dim Euclidean space, for a given k] แˆ˜ For a pair ๐‘ฃ โ‰  ๐‘ค โˆˆ ๐‘Œ, ๐‘’ ๐‘ฃ๐‘ค = ๐‘’ ๐‘Œ ๐‘ฃ, ๐‘ค , ๐‘’ ๐‘ฃ๐‘ค = ๐‘’ ๐‘ ๐‘” ๐‘ฃ , ๐‘” ๐‘ค 1/๐‘Ÿ 1/๐‘Ÿ ๐น ฮ  [|๐‘’ ๐‘ฃ๐‘ค โˆ’ แˆ˜ ๐‘’ ๐‘ฃ๐‘ค | ๐‘Ÿ ] ๐น ฮ  [|๐‘’ ๐‘ฃ๐‘ค โˆ’ แˆ˜ ๐‘’ ๐‘ฃ๐‘ค | ๐‘Ÿ ] โˆ— ๐‘” = ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก ๐‘Ÿ ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก ๐‘Ÿ ๐‘” = ๐‘Ÿ ] ๐น ฮ  [ ๐‘’ ๐‘ฃ๐‘ค ๐‘Ÿ ] ๐น ฮ  [ แˆ˜ ๐‘’ ๐‘ฃ๐‘ค 1/๐‘Ÿ 1/๐‘Ÿ ๐‘Ÿ ๐‘Ÿ | แˆ˜ | แˆ˜ ๐‘’ ๐‘ฃ๐‘ค โˆ’ ๐‘’ ๐‘ฃ๐‘ค | ๐‘’ ๐‘ฃ๐‘ค โˆ’ ๐‘’ ๐‘ฃ๐‘ค | ๐‘†๐น๐‘ ๐‘Ÿ ๐‘” = E ฮ  ๐น๐‘œ๐‘“๐‘ ๐‘•๐‘ง ๐‘Ÿ ๐‘” = E ฮ  min{๐‘’ ๐‘ฃ๐‘ค , แˆ˜ ๐‘’ ๐‘ฃ๐‘ค ๐‘’ ๐‘ฃ๐‘ค } 6

  7. Measuring the Quality ty of Embeddin ing: g: in pract ctice ce ๐‰ -distortion ML motivated, in [VvL18] โžข Many heuristics for optimizing these measures 1/๐‘Ÿ ๐‘Ÿ ๐‘“๐‘ฆ๐‘๐‘ž๐‘œ๐‘ก ๐‘” ๐‘ฃ, ๐‘ค ๐œ โˆ’ ๐‘’๐‘—๐‘ก๐‘ข (ฮ )๐‘Ÿ,๐‘  ๐‘” = E ฮ  ๐‘‰ โˆ’ ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘” โˆ’ 1 โ„“ ๐‘  โžข Almost nothing is known in terms of rigorous analysis (U) โˆ’ ๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘” = ๐น U [(๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘” ๐‘ฃ, ๐‘ค ๐‘  )] โ€ข โ„“ ๐‘  (U) โˆ’ ๐‘‘๐‘๐‘œ๐‘ข๐‘  ๐‘” = ๐น U [(๐‘‘๐‘๐‘œ๐‘ข๐‘  ๐‘” ๐‘ฃ, ๐‘ค ๐‘  )] โ€ข โ„“ ๐‘  โ€œNecessary properties for ML applicationsโ€ โ€ข translation invariance โ€ข scale invariance โ€ข monotonicity โ€ข robustness (outliers, noise) โ€ข incorporation of probability 7

  8. Measuring the Quality ty of Embeddin ing: g: in betw tween Bridging the gap between theory and practice outlook ๐›ฝ(๐‘™, ๐‘Ÿ) -Dimension Reduction Given a dimension bound ๐ฅ โ‰ฅ ๐Ÿ and ๐ซ โ‰ฅ ๐Ÿ , what is the least ๐›ƒ(๐ฅ, ๐ซ) such that every finite subset of Euclidean space embeds into ๐ฅ dim. with ๐๐Ÿ๐›๐ญ๐ฏ๐ฌ๐Ÿ ๐ซ โ‰ค ๐›ƒ(๐ฅ, ๐ซ) ? General Metrics: Approximating the Optimal Embedding General Metrics (MDS) For a given finite ๐‘Œ and for ๐‘™ โ‰ฅ 1 , compute an embedding of X into k-dim Euclidean For a given finite ๐‘Œ and ๐‘™ โ‰ฅ 1 , compute the optimal embedding of ๐‘Œ into k-dim space that approximates the best possible embedding, for a given ๐‘๐‘“๐‘๐‘ก๐‘ฃ๐‘ ๐‘“ ๐‘Ÿ . Euclidean space, minimizing a particular ๐‘๐‘“๐‘๐‘ก๐‘ฃ๐‘ ๐‘“ ๐‘Ÿ . [CD06] Optimizing is NP-hard for ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก ๐‘Ÿ and ๐‘™ = 1 8

  9. Our ur Resu sult lts: : upper boun ounds s previous results ๐›ฝ(๐‘™, ๐‘Ÿ) -Dimension Reduction Given a dimension bound ๐ฅ โ‰ฅ ๐Ÿ and ๐ซ โ‰ฅ ๐Ÿ , what is the least ๐›ƒ(๐ฅ, ๐ซ) such that every finite subset of Euclidean space embeds into ๐ฅ dim. with ๐๐Ÿ๐›๐ญ๐ฏ๐ฌ๐Ÿ ๐ซ โ‰ค ๐›ƒ(๐ฅ, ๐ซ) ? Previous results: worst case distortion 2 ๐‘’ embeds into โ„“ 2 ๐‘™ with distortion ๐‘ƒ ๐‘œ ๐‘™ (log ๐‘œ)/๐‘™ JL[84] Every ๐‘œ -point ๐‘Œ โˆˆ โ„“ 2 ๐‘™+1 such that any ๐‘”: ๐‘Š โ†’ โ„“ 2 ๐‘™ must have distortion ๐‘œ ฮฉ(1/๐‘™) Mat[90] There is ๐‘Š โˆˆ โ„“ 2 distortion(f) โ‰ค (โ„“ โˆž - dist) 2 โ–ช โ–ช For every ๐‘”: ๐‘Œ โ†’ ๐‘ (scalable) there is g: ๐‘Œ โ†’ ๐‘ with โ„“ โˆž - ๐ž๐ฃ๐ญ๐ฎ ๐ก = ๐ž๐ฃ๐ญ๐ฎ ๐  What about the ๐‘๐‘“๐‘๐‘ก๐‘ฃ๐‘ ๐‘“ ๐‘Ÿ guarantees for ๐‘Ÿ < โˆž? 9

  10. Our ur Resu esult lts: upper boun bounds s JL transform: IM implementation The answer to the ๐›ฝ(๐‘™, ๐‘Ÿ) -Dim. Reduction question is, essentially, the JL transform [JL84] Projection onto a random subspace of dim. ๐’ = ๐‘ท(๐ฆ๐ฉ๐ก ๐’ /๐‘ ๐Ÿ‘ ) , with const. prob. ๐’†๐’‹๐’•๐’– ๐’ˆ = ๐Ÿ + ๐‘ [tight, LN16] [IM 98] ๐‘ˆ is a matrix of size ๐‘™ ร— ๐‘’ with indep. entries sampled from ๐‘‚(0,1) . ๐‘™ is The embedding ๐‘”: ๐‘Œ โ†’ โ„“ 2 ๐‘” ๐‘ฆ = 1/ ๐‘™ โ‹… ๐‘ˆ(๐‘ฆ) โ€ข The JL transform of IM98 provides constant upper bounds for all ๐‘๐‘“๐‘๐‘ก๐‘ฃ๐‘ ๐‘“ ๐‘Ÿ The bounds are almost optimal โ€ข Other popular implementations of JL do not work for โ„“ ๐‘Ÿ -dist and for ๐‘†๐น๐‘ ๐‘Ÿ โ€ข PCA may produce an embedding of extremely poor quality for all the measures (this does not happen to the JL) 10

  11. Our ur Resu sult lts: : upper boun ounds s other implementations of JL [Achl03] The entries of T are uniform indep. from {ยฑ1} [DKS10,KN10, AL10] Sparse/Fast: particular distr. from {ยฑ1,0} Constant bounds cannot be achieved using the above implementations Observation If a linear transformation ๐‘ˆ: ๐‘† ๐‘’ โ†’ ๐‘† ๐‘™ samples its entries form a discrete set of values of size ๐‘ก โ‰ค ๐‘’ 1/๐‘™ , then applying it on a standard basis of ๐‘† ๐‘’ results in โ„“ ๐‘Ÿ -dist, ๐‘†๐น๐‘ ๐‘Ÿ = โˆž. 1/๐‘Ÿ ๐‘Ÿ 1/๐‘Ÿ (ฮ ) = ๐น ๐›ฒ ฮ  - dist(f) = ๐น ๐›ฒ ๐‘Ÿ โ–ช โ„“ ๐‘Ÿ ๐‘’๐‘—๐‘ก๐‘ข ๐‘” ๐‘ฃ, ๐‘ค , ๐‘†๐น๐‘ ๐‘Ÿ |๐‘’๐‘—๐‘ก๐‘ข ๐‘” ๐‘ฃ, ๐‘ค โˆ’ 1| โ–ช ๐‘’๐‘—๐‘ก๐‘ข ๐‘” ๐‘ฃ, ๐‘ค = max(๐‘“๐‘ฆ๐‘ž๐‘๐‘œ๐‘ก ๐‘” ๐‘ฃ, ๐‘ค , ๐‘‘๐‘๐‘œ๐‘ข๐‘ ๐‘๐‘‘๐‘ข ๐‘” (๐‘ฃ, ๐‘ค)) ๐‘ˆ ๐‘“ 1 , โ€ฆ , ๐‘“ ๐‘’ = {๐‘‘๐‘๐‘š๐‘ฃ๐‘›๐‘œ๐‘ก ๐‘๐‘” ๐‘ˆ}. T he number of different columns is ๐‘ก ๐‘™ < ๐‘’ โžข 11

  12. Our ur Resu sult lts: upper boun ounds s limitation of PCA ๐‘’ and a given integer ๐‘™ โ‰ฅ 1 , computes the PCA/c-MDS For a given finite ๐‘Œ โˆˆ โ„“ 2 best rank ๐‘™ - approx. to ๐‘Œ: A projection ๐‘„ onto the ๐‘™ - dim subspace spanned by largest eigenvectors of 2 the covariance matrix, with the smallest ฯƒ ๐‘ฃโˆˆ๐‘Œ ๐‘ฃ โˆ’ ๐‘„ ๐‘ฃ ๐‘™ with optimal ฯƒ ๐‘ฃโ‰ ๐‘คโˆˆ๐‘Œ (๐‘’ ๐‘ฃ๐‘ค 2 โˆ’ แˆ˜ 2 ) over all projections โ–ช ๐‘”: ๐‘Œ โ†’ โ„“ 2 ๐‘’ ๐‘ฃ๐‘ค โ–ช Often misused: โ€œminimizing ๐‘‡๐‘ข๐‘ ๐‘“๐‘ก๐‘ก 2 over all embeddings into ๐‘™ - dimโ€ โ–ช Actually, PCA does not minimize any of the mentioned measures 12

Recommend


More recommend