Near Optimal Dimensionality Reductions that Preserve Volumes RANDOM/APPROX 2008 Avner Magen Anastasios Zouzias University of Toronto August, 2008 A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 1 / 23
Dimension Reduction P ⊆ IR t : set of n points Goal: Find f : P → IR d ( d ≪ n , t ) s.t. some property is preserved. Measure of quality (Distance) f has distortion 1 + ε if ∀ p , q ∈ P � p − q � ≤ � f ( p ) − f ( q ) � ≤ (1 + ε ) � p − q � . Measure of quality (Volume) f has volume distortion 1 + ε if 1 � vol(f(S)) � | S |− 1 ∀ S ⊂ P , | S | ≤ k 1 ≤ ≤ 1 + ε. vol(S) A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 2 / 23
Dimension Reduction P ⊆ IR t : set of n points Goal: Find f : P → IR d ( d ≪ n , t ) s.t. some property is preserved. Measure of quality (Distance) f has distortion 1 + ε if ∀ p , q ∈ P � p − q � ≤ � f ( p ) − f ( q ) � ≤ (1 + ε ) � p − q � . Measure of quality (Volume) f has volume distortion 1 + ε if 1 � vol(f(S)) � | S |− 1 ∀ S ⊂ P , | S | ≤ k 1 ≤ ≤ 1 + ε. vol(S) A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 2 / 23
Dimension Reduction P ⊆ IR t : set of n points Goal: Find f : P → IR d ( d ≪ n , t ) s.t. some property is preserved. Measure of quality (Distance) f has distortion 1 + ε if ∀ p , q ∈ P � p − q � ≤ � f ( p ) − f ( q ) � ≤ (1 + ε ) � p − q � . Measure of quality (Volume) f has volume distortion 1 + ε if 1 � vol(f(S)) � | S |− 1 ∀ S ⊂ P , | S | ≤ k 1 ≤ ≤ 1 + ε. vol(S) A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 2 / 23
Dimension Reduction P ⊆ IR t : set of n points Goal: Find f : P → IR d ( d ≪ n , t ) s.t. some property is preserved. Measure of quality (Distance) f has distortion 1 + ε if ∀ p , q ∈ P � p − q � ≤ � f ( p ) − f ( q ) � ≤ (1 + ε ) � p − q � . Measure of quality (Volume) (This talk) f has volume distortion 1 + ε if 1 � vol(f(S)) � | S |− 1 ∀ S ⊂ P , | S | ≤ k 1 ≤ ≤ 1 + ε. vol(S) A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 2 / 23
Johnson Lindenstrauss Lemma Lemma (Distances) Let P an n -point subset of Euclidean space. There exists a mapping f from P into R d , d = O ( ε − 2 log n ) such that ∀ x , y ∈ P (1 − ε ) � x − y � ≤ � f ( x ) − f ( y ) � ≤ (1 + ε ) � x − y � A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 3 / 23
Johnson Lindenstrauss Lemma Lemma (Distances) Let P an n -point subset of Euclidean space. There exists a mapping f from P into R d , d = O ( ε − 2 log n ) such that ∀ x , y ∈ P (1 − ε ) � x − y � ≤ � f ( x ) − f ( y ) � ≤ (1 + ε ) � x − y � Almost tight Lower bound Ω ( ε − 2 log n / log(1 /ε )) [Alon, 2003]. A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 3 / 23
Random Projections Many ways to generate such a (linear) mapping (encoded by X ∈ R n × d ): X i , j ∼ N (0 , 1) X i , j ∼ ± 1 w.p. 1 / 2 . Sparse Gaussian matrix (with preprocessing) Entries with Subgaussian tails ECC and Rademacher r.v. Lean Walsh Transform. (Next talk, [Liberty et al., 2008]) A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 4 / 23
Random Projections Many ways to generate such a (linear) mapping (encoded by X ∈ R n × d ): X i , j ∼ N (0 , 1) (This talk) X i , j ∼ ± 1 w.p. 1 / 2 . Sparse Gaussian matrix (with preprocessing) Entries with Subgaussian tails ECC and Rademacher r.v. Lean Walsh Transform. (Next talk, [Liberty et al., 2008]) A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 4 / 23
Related Work: Extensions of JL to other cases [Magen, 2002] Preserve volumes of subsets of size up to k and affine distances using O ( k ε − 2 log n ) dimensions. [Sarlos, 2006] Preserve distances of all points lying in any k dim. linear subspace by projecting into O ( k ε − 2 log( k /ε )) dimensions. [Wakin and Baraniuk, 2006, Agarwal et al., 2007, Clarkson, 2008] Moving points, curves, surfaces and manifolds etc. A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 5 / 23
Related Work: Extensions of JL to other cases [Magen, 2002] Preserve volumes of subsets of size up to k and affine distances using O ( k ε − 2 log n ) dimensions. [Sarlos, 2006] Preserve distances of all points lying in any k dim. linear subspace by projecting into O ( k ε − 2 log( k /ε )) dimensions. [Wakin and Baraniuk, 2006, Agarwal et al., 2007, Clarkson, 2008] Moving points, curves, surfaces and manifolds etc. Our Contribution Improve Magen’s result for volumes, by showing that O (max { k /ε,ε − 2 log n } ) dimensions are enough. JL Lemma preserves more than distances. It preserves volumes of subsets of size up to log n /ε . A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 5 / 23
Our Result Theorem ε ,ε − 2 log n } ) , s.t. ∀ subset S of P , Let P ⊂ R n . There ∃ f : P → R d , d = O (max { k 1 < | S | < k , 1 � vol(f(S)) � | S |− 1 1 − ε ≤ ≤ 1 + ε. vol(S) Overview of proof: There are roughly O ( n s ) sets of size s . It suffices to prove the failure probability for a subset of size s is roughly e − Ω ( sd ε 2 ) . Union bound implies that a volume-preserving mapping exists. A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 6 / 23
Our Result Theorem ε ,ε − 2 log n } ) , s.t. ∀ subset S of P , Let P ⊂ R n . There ∃ f : P → R d , d = O (max { k 1 < | S | < k , 1 � vol(f(S)) � | S |− 1 1 − ε ≤ ≤ 1 + ε. vol(S) Overview of proof: There are roughly O ( n s ) sets of size s . It suffices to prove the failure probability for a subset of size s is roughly e − Ω ( sd ε 2 ) . (Core of the talk.) Union bound implies that a volume-preserving mapping exists. A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 6 / 23
Proof Two steps: Prove it for the regular n -simplex. 1 Reduce the general case to the above case. 2 A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 7 / 23
The n -simplex Assume input points are { e 1 ,..., e n } . A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 8 / 23
The n -simplex Assume input points are { e 1 ,..., e n } . Form a matrix e i → i th row, i.e., identity matrix. A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 8 / 23
The n -simplex Assume input points are { e 1 ,..., e n } . Form a matrix e i → i th row, i.e., identity matrix. Random Projection (without normalization) ... 1 0 0 0 0 1 0 ... 0 . ... X ij ∼ N (0 , 1) . 0 0 . . . . . . . 1 0 0 0 ... 0 1 n × n n × d A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 8 / 23
The n -simplex Assume input points are { e 1 ,..., e n } . Projected points are Random Gaussian Vectors in R d . X ij ∼ N (0 , 1) n × d A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 8 / 23
The n -simplex Assume input points are { e 1 ,..., e n } . Projected points are Random Gaussian Vectors in R d . Pick any subset S , | S | = s of rows of X ∼ N (0 , 1) X S : = X ij s × d A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 8 / 23
The n -simplex Assume input points are { e 1 ,..., e n } . Projected points are Random Gaussian Vectors in R d . Pick any subset S , | S | = s of rows of X ∼ N (0 , 1) X S : = X ij s × d � det(X S X ⊤ S ) / s! . vol(S ∪{ 0 } ) = A. Zouzias (University of Toronto) Dimensionality Reductions for Volumes RANDOM/APPROX 2008 8 / 23
Recommend
More recommend