Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P
Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality k 2 ǫ 2
Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality k 2 ǫ 2 = Var ( y ) k 2 ǫ 2
Chebyshev bound x || 2 Let y := || � 2 , � ( y − E ( y )) 2 ≥ k 2 ǫ 2 � P ( | y − k | ≥ k ǫ ) = P � ( y − E ( y )) 2 � E ≤ by Markov’s inequality k 2 ǫ 2 = Var ( y ) k 2 ǫ 2 2 = k ǫ 2
Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension k For any ǫ > 0 � � � � − k ǫ 2 x || 2 k ( 1 − ǫ ) < || � 2 < k ( 1 + ǫ ) ≥ 1 − 2 exp P 8
Proof x || 2 Let y := || � 2 . The result is implied by � � − k ǫ 2 P ( y > k ( 1 + ǫ )) ≤ exp 8 � � − k ǫ 2 P ( y < k ( 1 − ǫ )) ≤ exp 8
Proof Fix t > 0 P ( y > a )
Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at ))
Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at )) ≤ exp ( − at ) E (exp ( t y )) by Markov’s inequality
Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at )) ≤ exp ( − at ) E (exp ( t y )) by Markov’s inequality � � k �� � 2 ≤ exp ( − at ) E exp t x i i = 1
Proof Fix t > 0 P ( y > a ) = P (exp ( t y ) > exp ( at )) ≤ exp ( − at ) E (exp ( t y )) by Markov’s inequality � � k �� � 2 ≤ exp ( − at ) E exp t x i i = 1 k � � � 2 �� ≤ exp ( − at ) E exp t x i by independence of x 1 , . . . , x k i = 1
Proof Lemma (by direct integration) � � t x 2 �� 1 exp = √ 1 − 2 t E Equivalent to controlling higher-order moments since � ∞ � � t x 2 � i � � � t x 2 �� exp = E E i ! i = 0 � t i � x 2 i �� ∞ � E = . i ! i = 0
Proof Fix t > 0 k � � � 2 �� P ( y > a ) ≤ exp ( − at ) exp t x i E i = 1 = exp ( − at ) k ( 1 − 2 t ) 2
Proof Setting a := k ( 1 + ǫ ) and t := 1 1 2 − 2 ( 1 + ǫ ) , we conclude � � − k ǫ P ( y > k ( 1 + ǫ )) ≤ ( 1 + ǫ ) k 2 exp 2 � � − k ǫ 2 ≤ exp 8
Projection onto a fixed subspace P S 1 � P S 2 � z z 0 . 007 = ||P S 1 � z || 2 < ||P S 2 � z || 2 = 0 . 043 || � x || 2 || � x || 2 � 0 . 043 dim ( S 2 ) 0 . 007 = 6 . 14 ≈ (not a coincidence) dim ( S 1 )
Projection onto a fixed subspace Let S be a k -dimensional subspace of R n and � z ∈ R n a vector of iid standard Gaussian noise 2 is a χ 2 random variable with k degrees of freedom z || 2 ||P S � It has the same distribution as k � 2 y := x i i = 1 where x 1 , . . . , x k are iid standard Gaussians.
Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: z || 2 ||P S � 2
Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2
Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z
Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z
Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w
Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w k � w [ i ] 2 = � i = 1
Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w k � w [ i ] 2 = � i = 1 w := U T � � z is Gaussian with mean zero and covariance matrix w = U T Σ � Σ � z U
Proof Let UU T be a projection matrix for S , where the columns of U ∈ R n × k are orthonormal: � � � � 2 � � � � z || 2 � UU T � ||P S � 2 = z � � � 2 z T UU T UU T � = � z z T UU T � = � z w T � = � w k � w [ i ] 2 = � i = 1 w := U T � � z is Gaussian with mean zero and covariance matrix w = U T Σ � Σ � z U = U T U = I
Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension k For any ǫ > 0 � � � � − k ǫ 2 x || 2 k ( 1 − ǫ ) < || � 2 < k ( 1 + ǫ ) ≥ 1 − 2 exp P 8
Projection onto a fixed subspace Let S be a k -dimensional subspace of R n and � z ∈ R n a vector of iid standard Gaussian noise For any ǫ > 0 � � − k ǫ 2 P ( k ( 1 − ǫ ) < ||P S � z || 2 < k ( 1 + ǫ )) ≥ 1 − 2 exp 8
Gaussian random variables Gaussian random vectors Randomized projections SVD of a random matrix Randomized SVD
Dimensionality reduction ◮ PCA preserves the most energy ( ℓ 2 norm) ◮ Problem 1: Computationally expensive ◮ Problem 2: Depends on all of the data ◮ (Possible) Solution: Just project randomly! x 2 , . . . ∈ R m compute A � x 2 , . . . ∈ R m where ◮ For a data set � x 1 , � x 1 , A � A ∈ R k × n ( k < n ) has iid standard Gaussian entries
Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R b is a deterministic vector with unit ℓ 2 norm, then A � If � v is an a -dimensional iid standard Gaussian vector Proof:
Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R b is a deterministic vector with unit ℓ 2 norm, then A � If � v is an a -dimensional iid standard Gaussian vector Proof: ( A � v ) [ i ] , 1 ≤ i ≤ a is Gaussian with mean zero and variance � � A T v T Σ A i , : � i , : � = � Var v v v T I � = � v v || 2 = || � 2 = 1
Non-asymptotic Chernoff tail bound Let � x be an iid standard Gaussian random vector of dimension k For any ǫ > 0 � � � � − k ǫ 2 x || 2 k ( 1 − ǫ ) < || � 2 < k ( 1 + ǫ ) ≥ 1 − 2 exp P 8
Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R p with unit norm and any ǫ ∈ ( 0 , 1 ) For any � � � a ( 1 − ǫ ) ≤ || A � v || 2 ≤ a ( 1 + ǫ ) � � − a ǫ 2 / 8 with probability at least 1 − 2 exp
Johnson-Lindenstrauss lemma Let A be a k × n matrix with iid standard Gaussian entries x p ∈ R n be any fixed set of p deterministic vectors Let � x 1 , . . . , � For any pair � x i , � x j and any ǫ ∈ ( 0 , 1 ) � � � � 2 � � � � 1 x i − 1 x j || 2 � � � � x j || 2 √ √ ( 1 − ǫ ) || � x i − � 2 ≤ A � A � x j ≤ ( 1 + ǫ ) || � x i − � � � � � 2 k k 2 with probability at least 1 p as long as k ≥ 16 log ( p ) ǫ 2
Proof Aim: Control action of A the normalized differences x i − � � x j � v ij := || � x i − � x j || 2 Our event of interest is the intersection of the events � � v ij || 2 E ij = k ( 1 − ǫ ) < || A � 2 < k ( 1 + ǫ ) 1 ≤ i < p , i < j ≤ p
Fixed vector Let A be a a × b matrix with iid standard Gaussian entries v ∈ R b with unit norm and any ǫ ∈ ( 0 , 1 ) For any � � � a ( 1 − ǫ ) ≤ || A � v || 2 ≤ a ( 1 + ǫ ) � � − a ǫ 2 / 8 with probability at least 1 − 2 exp This implies � � ≤ 2 if k ≥ 16 log ( p ) E c P ij p 2 ǫ 2
Union bound For any events S 1 , S 2 , . . . , S n in a probability space n � P ( ∪ i S i ) ≤ P ( S i ) . i = 1
Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound � E ij P i , j
Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound � � = 1 − P E c E ij P ij i , j i , j
Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound � � = 1 − P E c E ij P ij i , j i , j � � � E c ≥ 1 − P ij i , j
Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound � � = 1 − P E c E ij P ij i , j i , j � � � E c ≥ 1 − P ij i , j ≥ 1 − p ( p − 1 ) 2 p 2 2
Proof � p � Number of events E ij equals = p ( p − 1 ) / 2 2 By the union bound � � = 1 − P E c E ij P ij i , j i , j � � � E c ≥ 1 − P ij i , j ≥ 1 − p ( p − 1 ) 2 p 2 2 ≥ 1 p
Dimensionality reduction for visualization Motivation: Visualize high-dimensional features projected onto 2D or 3D Example: Seeds from three different varieties of wheat: Kama, Rosa and Canadian Features: ◮ Area ◮ Perimeter ◮ Compactness ◮ Length of kernel ◮ Width of kernel ◮ Asymmetry coefficient ◮ Length of kernel groove
Dimensionality reduction for visualization Randomized projection PCA
Nearest neighbors in random subspace Nearest neighbors classification (Algorithm 4.2 in Lecture Notes 1) computes n distances in R m for each new example Cost: O ( nmp ) for p examples Idea: Use a k × m iid standard Gaussian matrix to project onto k -dimensional space beforehand Cost: ◮ kmn operations to project training set ◮ kmp operations to project test set ◮ knp to perform nearest-neighbor classification Much faster!
Face recognition Training set: 360 64 × 64 images from 40 different subjects (9 each) Test set: 1 new image from each subject We model each image as a vector in R 4096 ( m = 4096) To classify we: 1. Project onto random a k -dimensional subspace 2. Apply nearest-neighbor classification using the ℓ 2 -norm distance in R k
Performance Average 40 Maximum 35 Minimum 30 30 25 Errors 20 15 10 10 5 0 0 20 40 60 80 100 120 140 160 180 200 Dimension
Nearest neighbor in R 50 Test image Projection Closest projection Corresponding image
Gaussian random variables Gaussian random vectors Randomized projections SVD of a random matrix Randomized SVD
Singular values of n × k matrix, k = 100 n / k 2 1 . 5 5 10 20 50 1 √ n σ i 100 200 0 . 5 0 20 40 60 80 100 i
Recommend
More recommend