kernel properties convexity
play

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila - PowerPoint PPT Presentation

Kernel Properties Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties - Convexity Kernel Properties Kernel Properties data is not linearly separable ! use feature vector of the data ( x ) in another


  1. Kernel Properties Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties - Convexity

  2. Kernel Properties Kernel Properties data is not linearly separable ! use feature vector of the data Φ ( x ) in another space we can even use infinite feature vectors because of the Kernel trick you will not have to explicitly compute the feature vectors Φ ( x ) . (you will Kernelize an algorithms in HW2). Leila Wehbe Kernel Properties - Convexity

  3. Kernel Properties Kernels dot product in feature space k ( x , x 0 ) = h Φ ( x ) , Φ ( x 0 ) i we can write the kernel in matrix form over the data sample: K ij = h Φ ( x ) , Φ ( x 0 ) i = k ( x , x 0 ) . This is called a Gram matrix. K is positive semi-definite, i.e. α K α � 0 for all α 2 R m and all kernel matrices K 2 R m ⇥ m . Proof (from class): m m X X α i α j K ij = α i α j h Φ ( x i ) , Φ ( x j ) i i , j i , j m m m α i Φ ( x i ) || 2 � 0 X X X = h α i Φ ( x i ) , α j Φ ( x j ) i = || i j i Leila Wehbe Kernel Properties - Convexity

  4. Kernel Properties Kernels by mercer’s theorem, any symmetric, square integrable function k : X ⇥ X ! R that satisfies Z k ( x , x 0 ) f ( x ) f ( x 0 ) dxdx 0 � 0 X ⇥ X there exist a feature space Φ ( x ) and a λ � 0 k ( x , x 0 ) = P i λ i φ i ( x ) φ i ( x 0 ) ( we have k ( x , x 0 ) = h Φ 0 ( x ) , Φ 0 ( x 0 ) i ) in discrete space: P P j K ( x i , x j ) c i c j i any Gram matrix derived of a kernel k is positive semi definite $ k is a valid kernel (dot product) Leila Wehbe Kernel Properties - Convexity

  5. Kernel Properties Exercices k ( x , x 0 ) is a valid kernel show that f ( x ) f ( x 0 ) k ( x , x 0 ) is a kernel Leila Wehbe Kernel Properties - Convexity

  6. Kernel Properties Exercices Answer: f ( x ) f ( y ) k ( x , y ) = f ( x ) f ( y ) < φ ( x ) , φ ( y ) > = < f ( x ) φ ( x ) , f ( y ) φ ( y ) > = < φ 0 ( x ) , φ 0 ( y ) > Leila Wehbe Kernel Properties - Convexity

  7. Kernel Properties Exercices k 1 ( x , x 0 ) , k 2 ( x , x 0 ) are valid kernels show that c 1 ⇤ k 1 ( x , x 0 ) + c 2 ⇤ k 2 ( x , x 0 ) , where c 1 , c 2 � 0 is a valid Kernel (multiple ways to show it) Leila Wehbe Kernel Properties - Convexity

  8. Kernel Properties Exercices Answer 1: For any function f ( . ) : Z x , x 0 f ( x ) f ( x 0 )[ c 1 k 1 ( x , x 0 ) + c 2 k 2 ( x , x 0 )] dx dx 0 Z Z x , x 0 f ( x ) f ( x 0 ) k 1 ( x , x 0 ) dx dx 0 + c 2 x , x 0 f ( x ) f ( x 0 ) k 2 ( x , x 0 ) dx dx 0 � 0 = c 1 x , x 0 f ( x ) f ( x 0 ) k 1 ( x , x 0 ) dx dx 0 � 0 and R since x , x 0 f ( x ) f ( x 0 ) k 2 ( x , x 0 ) dx dx 0 � 0 since k 1 and k 2 are valid kernels. R Leila Wehbe Kernel Properties - Convexity

  9. Kernel Properties Exercices Answer 2: Here is another way to prove it: Given any final set of instances { x 1 , . . . , x n } , let K 1 (resp., K 2 ) be the n ⇥ n Gram matrix associated with k 1 (resp., k 2 ). The Gram matrix associated with c 1 k 1 + c 2 k 2 is just K = c 1 K 1 + c 2 K 2 . K is PSD because any v 2 R n , v T ( c 1 K 1 + c 2 K 2 ) v = c 1 ( v T K 1 v ) + c 2 ( v T K 2 v ) � 0 as v T K 1 v � 0 and v T K 2 v � 0 follows from K 1 and K 2 being positive semi definite. k is a valid kernel. Leila Wehbe Kernel Properties - Convexity

  10. Kernel Properties Exercices Answer 3: let Φ 1 and Φ 2 be the feature vectors associated with k 1 and k 2 respectively. Take vector Φ which is the concatenation of p c 1 Φ 1 and p c 2 Φ 2 . i.e. Φ ( x ) = [ p c 1 φ 1 1 ( x ) , p c 1 φ 1 2 ( x ) , .... p c 1 φ 1 m ( x ) , p c 2 φ 2 1 ( x ) , p c 2 φ 2 2 ( x ) , .... p c 2 φ 2 m ( x )] . It’s easy to check that N m X X φ 1 i ( x ) ⇥ φ 1 h Φ ( x ) , Φ ( x 0 ) i = φ i ( x ) ⇥ φ i ( x 0 ) = c 1 i ( x 0 ) i = 1 i = 1 = c 1 h Φ 1 ( x ) , Φ 1 ( x 0 ) i + c 2 h Φ 2 ( x ) , Φ 2 ( x 0 ) i = c 1 k 1 ( x , x 0 ) + c 2 k 2 ( x , x 0 ) = k ( x , x 0 ) therefore k is a valid kernel. Leila Wehbe Kernel Properties - Convexity

  11. Kernel Properties Exercices k 1 , k 2 are valid kernels show that k 1 ( x , x 0 ) � k 2 ( x , x 0 ) is not necessarily a kernel Leila Wehbe Kernel Properties - Convexity

  12. Kernel Properties Exercices Proof by counter example: Consider the kernel k 1 being the identity ( k 1 ( x , x 0 ) = 1 iff x = x 0 and = 0 otherwise), and k 2 being twice the identity ( k 1 ( x , x 0 ) = 2 iff x = x 0 and = 0 otherwise). Let K 1 = I p be the p ⇥ p identity matrix and K p = 2 I p be 2 times that identity matrix. K 1 and K 2 are the Gram matrices associated with k 1 and k 2 respectively. Clearly both K 1 and K 2 are positive semi definite, however K 1 � K 2 = � I is not, as its eigenvalues are -1. Therefore k is not a valid kernel. Leila Wehbe Kernel Properties - Convexity

  13. Kernel Properties Exercices PSD matrices A and B show that AB is not necessarily PSD Leila Wehbe Kernel Properties - Convexity

  14. Kernel Properties Exercices for PSD matrices A and B , it suffices to show that AB is not ✓ 1 ✓ 2 ◆ ◆ 0 1 symmetric – so just use A = and B = ; here 0 2 1 2 ✓ 2 ◆ 1 AB = which is not symmetric. 2 4 Leila Wehbe Kernel Properties - Convexity

  15. Kernel Properties Exercices k 1 , k 2 are valid kernels show that the element wise product k ( x i , x j ) = k 1 ( x i , x j ) ⇥ k 2 ( x i , x j ) is a valid kernel. start by showing that if matrices A and B are PSD, then C ij = A ij ⇥ B ij is PSD Leila Wehbe Kernel Properties - Convexity

  16. Kernel Properties Exercices Answer: First show that C s.t. C ij = A ij ⇥ B ij is PSD: One way to show it: Any PSD matrix Q is a covariance matrix. 1 To see this, think of a p-dimensional random variable x with a covariance matrix I p , the identity matrix. ( Q is p ⇥ p ) Because Q is PSD it admits a non-negative symmetric 1 2 . square root Q Then: 1 1 1 1 1 2 = Q 2 = Q cov ( Q 2 x ) = Q 2 cov ( x )) Q 2 I Q And therefore Q is a covariance matrix. We also know that any covariance matrix is PSD. So given 2 A and B PSD, we know that they are covariance matrices. We want to show that C is also a covariance matrix and therefore PSD. Leila Wehbe Kernel Properties - Convexity

  17. Kernel Properties Exercices Let u = ( u 1 , . . . , u n ) T ⇠ N ( 0 p , A ) and 3 v = ( v 1 , . . . , v n ) T ⇠ N ( 0 p , B ) where 0 + p is a p-dimensional vector of zeros Define the vector w = ( u 1 v 1 , . . . , u n v n ) T 4 cov ( w ) = E [( w � µ w )( w � µ w ) T ] = E [ ww T ] This is because µ w i = 0 for all i . This is because u and v are independent so µ w = µ u ⇥ µ v = 0 p cov ( w ) i , j = E [ w i w T j ] = E [( u i v i )( u j v j )] = E [( u i u j )( v i v j )] = E [ u i u j ] E [ v i v j ] This is again because u and v are independent. cov ( w ) i , j = E [ u i u j ] E [ v i v j ] = A i , j ⇥ B i , j = C i , j Leila Wehbe Kernel Properties - Convexity

  18. Kernel Properties Exercices Therefore C is a covariance matrix and therefore PSD 5 Since any kernel matrix created from 6 k ( x i , x j ) = k 1 ( x i , x j ) ⇥ k 2 ( x i , x j ) is PSD, then k is PSD. Leila Wehbe Kernel Properties - Convexity

  19. Kernel Properties Exercices A is PSD show that A m is PSD Leila Wehbe Kernel Properties - Convexity

  20. Kernel Properties Exercices Answer: Recall A = UDU T First we show that A m = UD m U T . Proof by induction: trivially true for m = 1 . A m + 1 = AA m = UDU T ( UD m U T ) = UD ( U T U ) D m U T = UDD m U T = UD m + 1 U T Hence, the eigenvalues of A m are the diagonal elements of D m , which are λ m i (where { λ i } are the diagonal elements of D ). Since λ i � 0 , these eigenvalues λ m i are also � 0 . This means A m is PSD. Leila Wehbe Kernel Properties - Convexity

  21. Kernel Properties Exercices k ( x , x 0 ) is a valid kernel show that k ( x , y ) 2  k ( x , x ) k ( y , y ) Leila Wehbe Kernel Properties - Convexity

  22. Kernel Properties Exercices Answer: k ( x , y ) 2 = < φ ( x ) , φ ( y ) > 2 = || φ ( x ) || 2 || φ ( y ) || 2 ( cos ( θ φ ( x ) , φ ( y ) )) 2  || φ ( x ) || 2 || φ ( y ) || 2 = k ( x , x ) k ( y , y ) Leila Wehbe Kernel Properties - Convexity

  23. Convexity Unconstrained Convex Optimization Introduction to Convex Optimization Xuezhi Wang Computer Science Department Carnegie Mellon University 10701-recitation, Jan 29 Introduction to Convex Optimization

  24. Convexity Unconstrained Convex Optimization Outline Convexity 1 Convex Sets Convex Functions Unconstrained Convex Optimization 2 First-order Methods Newton’s Method Introduction to Convex Optimization

  25. Convexity Convex Sets Unconstrained Convex Optimization Convex Functions Outline Convexity 1 Convex Sets Convex Functions Unconstrained Convex Optimization 2 First-order Methods Newton’s Method Introduction to Convex Optimization

  26. Convexity Convex Sets Unconstrained Convex Optimization Convex Functions Convex Sets Definition For x , x 0 2 X it follows that λ x + ( 1 � λ ) x 0 2 X for λ 2 [ 0 , 1 ] Examples Empty set ; , single point { x 0 } , the whole space R n Hyperplane: { x | a > x = b } , halfspaces { x | a > x  b } Euclidean balls: { x | || x � x c || 2  r } + = { A 2 S n | A ⌫ 0 } ( S n is Positive semidefinite matrices: S n the set of symmetric n ⇥ n matrices) Introduction to Convex Optimization

Recommend


More recommend