Dimensionality Reduction Metric Space Isometric Dimensionality Reduction embedding Distortion L ∞ Norm Corollaries Anil Maheshwari Euclidean Norm anil@scs.carleton.ca School of Computer Science Carleton University Canada
Metric Space h X, d i Dimensionality Reduction Let X be a set of n -points and let d be a distance Metric Space measure associated with pairs of elements in X . Isometric embedding We say that h X, d i is a finite metric space if the function d Distortion satisfies metric properties, i.e. L ∞ Norm (a) 8 x 2 X , d ( x, x ) = 0 , Corollaries (b) 8 x, y 2 X, x 6 = y, d ( x, y ) > 0 , Euclidean Norm (c) 8 x, y 2 X , d ( x, y ) = d ( y, x ) (symmetry), and (d) 8 x, y, z 2 X , d ( x, y ) d ( x, z ) + d ( z, y ) (triangle inequality).
Embeddings Dimensionality Reduction Let h X, d i and h X 0 , d 0 i be two metric spaces. Metric Space Isometric Embedding: A map f : X ! X 0 is called an embedding. embedding Distortion Isometric embedding (i.e., distance preserving) if for all L ∞ Norm x, y 2 X , d ( x, y ) = d 0 ( f ( x ) , f ( y )) . Corollaries Euclidean Norm
Motivating Problem Dimensionality Reduction Input: X =Set of n -points in k -dimensional space, where Metric Space n >> 2 k Isometric embedding Output: A pair of points that maximize L 1 -distance. Distortion � n ) = O ( kn 2 ) time � L ∞ Norm Naive Solution: O ( k 2 Corollaries 1 ! L 2 k Better algorithm via isometric embedding of L k Euclidean Norm 1 running in O (2 k n ) time
Universality of L 1 -metric Dimensionality Reduction Metric Space L 1 -metric Isometric embedding Let h X, d i be any finite metric space, where n = | X | . X Distortion can be isometrically embedded into L 1 -metric space of L ∞ Norm appropriate dimension. Corollaries Euclidean Norm
Euclidean Metric Dimensionality Reduction Input: Metric Space defined by K 4 , C 4 , and star- Y w.r.t. Metric Space unweighted SP . Isometric embedding Question: Can one embed 4-points in Euclidean space Distortion isometrically? L ∞ Norm Corollaries Euclidean Norm
Distortion Dimensionality Reduction Contraction: Is the maximum factor by which the Metric Space d ( x,y ) Isometric distances shrink and it equals max x,y 2 X d 0 ( f ( x ) ,f ( y )) . embedding Distortion Expansion: Is the maximum factor by which the L ∞ Norm distances are stretched and it equals Corollaries d 0 ( f ( x ) ,f ( y )) max x,y 2 X . d ( x,y ) Euclidean Norm Distortion: of an embedding is the product of its expansion and contraction factor.
2 D D log n ) ! L k = O ( Dn Dimensionality h X, d i , 1 Reduction Input: A metric space h X, d i , where X is a set of n -points Metric Space Isometric and let d satisfies the metric properties. embedding 2 D log n ) Output: An embedding of X in a k = O ( Dn Distortion dimensional space such that such that the distances gets L ∞ Norm distorted (actually contracted) by a factor of at most D Corollaries Euclidean Norm under L 1 norm.
2 D D log n ) ! L k = O ( Dn Dimensionality h X, d i (contd.) , 1 Reduction Let x, y 2 X and let f ( x ) , f ( y ) be their embedding in the Metric Space Isometric k -dimensional space, respectively. embedding Distortion Property L ∞ Norm The distances gets contracted by a factor of at most Corollaries d ( x,y ) D � 1 . Formally, max x,y 2 X || f ( x ) � f ( y ) || 1 D Euclidean Norm Example: If D = O (log n ) , k = O (log 2 n ) , i.e. O (log n ) L O (log 2 n ) h X, d i ! , 1 Meaning: Any metric space h X, d i can be embedded in a O (log 2 n ) -dimensional space and the distances may distort (contract) by a factor of at most O (log n ) . Applications ?
2 D D log n ) ! L k = O ( Dn Dimensionality Proof of h X, d i , 1 Reduction Metric Space Constructive proof via a randomized algorithm. Isometric embedding Definition Distortion Let S ✓ X . For x 2 X , define distance of x from S as L ∞ Norm d ( x, S ) = min z 2 S d ( x, z ) Corollaries Euclidean Norm Claim Let x, y 2 X . For all S ✓ X , | d ( x, S ) � d ( y, S ) | d ( x, y ) .
Proof Contd. Dimensionality Reduction Metric Space Definition Isometric embedding ( Mapping ) Let x 2 X . Let S 1 , S 2 , · · · , S k ✓ X . The Distortion mapping f maps x to the point L ∞ Norm Corollaries f ( x ) = { d ( x, S 1 ) , d ( x, S 2 ) , · · · , d ( x, S k ) } . Euclidean Norm Observation: Let S 1 , S 2 , · · · , S k ✓ X . For x, y 2 X , || f ( x ) � f ( y ) || 1 d ( x, y ) .
Proof Contd. Dimensionality Reduction 2020-10-19 Definition ( Mapping ) Let x 2 X . Let S 1 , S 2 , · · · , S k ✓ X . The L ∞ Norm mapping f maps x to the point f ( x ) = { d ( x, S 1 ) , d ( x, S 2 ) , · · · , d ( x, S k ) } . Observation: Let S 1 , S 2 , · · · , S k ✓ X . For x, y 2 X , || f ( x ) � f ( y ) || 1 d ( x, y ) . Proof Contd. Proof. Follows from the above claim, as for each 1 i k , | d ( x, S i ) � d ( y, S i ) | d ( x, y ) .
Randomized Algorithm Dimensionality Reduction Input: Metric space X and parameter D . Metric Space Output: A set of O ( Dm ) subsets of X . Isometric embedding Distortion 2 , n � 2 p min( 1 D ) 1 L ∞ Norm 2 Corollaries D log n ) m O ( n 2 Euclidean Norm For j 1 to d D 2 e and 3 For i 1 to m : Choose set S ij by sampling each element of X independently with probability p j For each x 2 X return f ( x ) = [ d ( x, S 11 ) , · · · d ( x, S m 1 ) , 4 d ( x, S 12 ) , · · · , d ( x, S m 2 ) , · · · d ( x, S 1 d D 2 e ) , · · · , d ( x, S m d D 2 e )]
An Observation Dimensionality Reduction Let x, y be two distinct points of X . Let B ( x, r ) be the set Metric Space of points of X that are within a distance of r from x (think Isometric embedding of B ( x, r ) as a ball of radius r centred at x ). Similarly, let Distortion B ( y, r + ∆ ) be the set of points of X that are within a L ∞ Norm distance of r + ∆ from y . Consider a subset S ⇢ X such Corollaries that S \ B ( x, r ) 6 = ; and S \ B ( y, r + ∆ ) = ; . Then Euclidean Norm | d ( x, S ) � d ( y, S ) | � ∆ .
Key Lemma Dimensionality Reduction Metric Space Lemma Isometric embedding Let x, y be two distinct points of X . There exists an index Distortion j 2 { 1 , · · · , d D 2 e } such that if S ij is as chosen in the L ∞ Norm || f ( x ) � f ( y ) || 1 � d ( x,y ) � p ⇥ ⇤ Algorithm, than Pr 12 D Corollaries Euclidean Norm 2 , n � 2 p min( 1 D ) 1 2 D log n ) m O ( n 2 For j 1 to d D 2 e and 3 For i 1 to m : Choose set S ij by sampling each element of X independently with probability p j For each x 2 X return f ( x ) = [ d ( x, S 11 ) , · · · d ( x, S m 1 ) , 4 d ( x, S 12 ) , · · · , d ( x, S m 2 ) , · · · 2 e ) , · · · , d ( x, S m d D d ( x, S 1 d D 2 e )]
Ball Properties Dimensionality Reduction Set ∆ = d ( x,y ) . Metric Space D For i = 0 , · · · , d D 2 e , define balls of radius i ∆ as follows. Isometric embedding Let B 0 = { x } . Distortion B 1 be the ball of radius ∆ centred at y . L ∞ Norm B 2 is the ball of radius 2 ∆ centred at x . Corollaries B 3 is the ball centred at y of radius 3 ∆ and so on. Euclidean Norm Property I No even ball overlaps with an odd ball.
Ball Properties (contd.) Dimensionality Reduction For even (odd) i , let | B i | denote the number of points of Metric Space X that are within a distance of at most i ∆ from x Isometric embedding (respectively, y ). Distortion L ∞ Norm Property II Corollaries There is an index t 2 { 0 , · · · , d D 2 e � 1 } , such that Euclidean Norm 2( t +1) 2 t D and | B t +1 | n | B t | � n D
Ball Properties (contd.) Dimensionality Reduction 2 t D and Let t be the index such that | B t | � n Metric Space 2( t +1) Isometric | B t +1 | n D embedding Consider when j = t + 1 in the Algorithm. Distortion L ∞ Norm Property III Corollaries The set S ij chosen by the algorithm has non-empty Euclidean Norm intersection with B t with probability at least p/ 3 , and it will avoid B t +1 with probability at least 1 / 4 . Define: Event E 1 : S ij \ B t 6 = ; . Event E 2 : S ij \ B t +1 = ; .
Event E 1 Dimensionality Reduction Metric Space Pr ( S ij \ B t 6 = ; ) � p/ 3 Isometric embedding Distortion L ∞ Norm Corollaries Euclidean Norm
Event E 2 Dimensionality Reduction Metric Space Pr ( S ij \ B t +1 = ; ) � 1 / 4 Isometric embedding Distortion L ∞ Norm Corollaries Euclidean Norm
Main Theorem Dimensionality Reduction Metric Space 2 D D log n ) ! L k = O ( Dn Isometric h X, d i , 1 embedding Distortion L ∞ Norm Corollaries Euclidean Norm
L O (log 2 n ) Θ (log n ) Dimensionality Corollary 1: h X, d i ! , 1 Reduction Metric Space Set D = Θ (log n ) , in the Theorem 2 Isometric D D log n ) ! L k = O ( Dn embedding h X, d i and we obtain , 1 Distortion Θ (log n ) L O (log 2 n ) h X, d i ! . , 1 L ∞ Norm Corollaries Euclidean Norm
Recommend
More recommend