Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda
Vector space Consists of: ◮ A set V ◮ A scalar field (usually R or C ) ◮ Two operations + and ·
Properties ◮ For any � x , � y ∈ V , � x + � y belongs to V ◮ For any � x ∈ V and any scalar α , α · � x ∈ V ◮ There exists a zero vector � x + � 0 such that � 0 = � x for any � x ∈ V y = � ◮ For any � x ∈ V there exists an additive inverse � y such that � x + � 0, usually denoted by − � x
Properties ◮ The vector sum is commutative and associative, i.e. for all � x , � y , � z ∈ V � x + � y = � y + � x , ( � x + � y ) + � z = � x + ( � y + � z ) ◮ Scalar multiplication is associative, for any scalars α and β and any � x ∈ V α ( β · � x ) = ( α β ) · � x ◮ Scalar and vector sums are both distributive, i.e. for any scalars α and β and any � x , � y ∈ V ( α + β ) · � x = α · � x + β · � x , α · ( � x + � y ) = α · � x + α · � y
Subspaces A subspace of a vector space V is any subset of V that is also itself a vector space
Linear dependence/independence A set of m vectors � x 1 , � x 2 , . . . , � x m is linearly dependent if there exist m scalar coefficients α 1 , α 2 , . . . , α m which are not all equal to zero and m � x i = � α i � 0 i = 1 Equivalently, any vector in a linearly dependent set can be expressed as a linear combination of the rest
Span The span of { � x 1 , . . . , � x m } is the set of all possible linear combinations � m � � span ( � x 1 , . . . , � x m ) := y | � � y = α i � for some scalars α 1 , α 2 , . . . , α m x i i = 1 The span of any set of vectors in V is a subspace of V
Basis and dimension A basis of a vector space V is a set of independent vectors { � x 1 , . . . , � x m } such that V = span ( � x 1 , . . . , � x m ) If V has a basis with finite cardinality then every basis contains the same number of vectors The dimension dim ( V ) of V is the cardinality of any of its bases Equivalently, the dimension is the number of linearly independent vectors that span V
Standard basis 1 0 0 0 1 0 e 1 = � , � e 2 = , . . . , e n = � . . . . . . . . . 0 0 1 The dimension of R n is n
Inner product Operation �· , ·� that maps a pair of vectors to a scalar
Properties ◮ If the scalar field is R , it is symmetric. For any � x , � y ∈ V � � x , � y � = � � y , � x � If the scalar field is C , then for any � x , � y ∈ V � � x , � y � = � � y , � x � , where for any α ∈ C α is the complex conjugate of α
Properties ◮ It is linear in the first argument, i.e. for any α ∈ R and any � x , � y , � z ∈ V � α � x , � y � = α � � x , � y � , � � x + � y , � z � = � � x , � z � + � � y , � z � . If the scalar field is R , it is also linear in the second argument ◮ It is positive definite: � � x , � x � is nonnegative for all � x ∈ V and if x = � � � x , � x � = 0 then � 0
Dot product y ∈ R n Inner product between � x , � � � x · � y := � x [ i ] � y [ i ] i R n endowed with the dot product is usually called a Euclidean space of dimension n y ∈ C n If � x , � � � x · � � x [ i ] � y := y [ i ] i
Sample covariance Quantifies joint fluctuations of two quantities or features For a data set ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) n 1 � cov (( x 1 , y 1 ) , . . . , ( x n , y n )) := ( x i − av ( x 1 , . . . , x n )) ( y i − av ( y 1 , . . . , y n )) n − 1 i = 1 where the average or sample mean is defined by n av ( a 1 , . . . , a n ) := 1 � a i n i = 1 If ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) are iid samples from x and y E ( cov (( x 1 , y 1 ) , . . . , ( x n , y n ))) = Cov ( x , y ) := E (( x − E ( x )) ( y − E ( y )))
Matrix inner product The inner product between two m × n matrices A and B is � � A T B � A , B � := tr m n � � = A ij B ij i = 1 j = 1 where the trace of an n × n matrix is defined as the sum of its diagonal n � tr ( M ) := M ii i = 1 For any pair of m × n matrices A and B � � � AB T � B T A tr := tr
Function inner product The inner product between two complex-valued square-integrable functions f , g defined in an interval [ a , b ] of the real line is � b � f · � g := f ( x ) g ( x ) d x a
Norm Let V be a vector space, a norm is a function ||·|| from V to R with the following properties ◮ It is homogeneous. For any scalar α and any � x ∈ V || α � x || = | α | || � x || . ◮ It satisfies the triangle inequality || � x + � y || ≤ || � x || + || � y || . In particular, || � x || ≥ 0 x = � ◮ || � x || = 0 implies � 0
Inner-product norm Square root of inner product of vector with itself � || � x || �· , ·� := � � x , � x �
Inner-product norm ◮ Vectors in R n or C n : ℓ 2 norm � n √ � � � x [ i ] 2 || � x || 2 := � x · � x = � � i = 1 ◮ Matrices in R m × n or C m × n : Frobenius norm � m n � � � � � A 2 || A || F := tr ( A T A ) = � ij i = 1 j = 1 ◮ Square-integrable complex-valued functions: L 2 norm �� b | f ( x ) | 2 d x � || f || L 2 := � f , f � = a
Cauchy-Schwarz inequality For any two vectors � x and � y in an inner-product space |� � x , � y �| ≤ || � x || �· , ·� || � y || �· , ·� Assume || � x || �· , ·� � = 0, then || � y || �· , ·� � � x , � y � = − || � x || �· , ·� || � y || �· , ·� ⇐ ⇒ � y = − � x || � x || �· , ·� || � y || �· , ·� � � x , � y � = || � x || �· , ·� || � ⇒ � � y || �· , ·� ⇐ y = x || � x || �· , ·�
Sample variance and standard deviation The sample variance quantifies fluctuations around the average n 1 � ( x i − av ( x 1 , x 2 , . . . , x n )) 2 var ( x 1 , x 2 , . . . , x n ) := n − 1 i = 1 If x 1 , x 2 , . . . , x n are iid samples from x � ( x − E ( x )) 2 � E ( var ( x 1 , x 2 , . . . , x n )) = Var ( x ) := E The sample standard deviation is � std ( x 1 , x 2 , . . . , x n ) := var ( x 1 , x 2 , . . . , x n )
Correlation coefficient Normalized covariance cov (( x 1 , y 1 ) , . . . , ( x n , y n )) ρ ( x 1 , y 1 ) ,..., ( x n , y n ) := std ( x 1 , . . . , x n ) std ( y 1 , . . . , y n ) Corollary of Cauchy-Schwarz − 1 ≤ ρ ( x 1 , y 1 ) ,..., ( x n , y n ) ≤ 1 and ⇒ y i = av ( y 1 , . . . , y n ) − std ( y 1 , . . . , y n ) ρ � y = − 1 ⇐ std ( x 1 , . . . , x n ) ( x i − av ( x 1 , . . . , x n )) x ,� ⇒ y i = av ( y 1 , . . . , y n ) + std ( y 1 , . . . , y n ) ρ � y = 1 ⇐ std ( x 1 , . . . , x n ) ( x i − av ( x 1 , . . . , x n )) x ,�
Correlation coefficient ρ � 0.50 0.90 0.99 x ,� y ρ � 0.00 -0.90 -0.99 x ,� y
Temperature data Temperature in Oxford over 150 years ◮ Feature 1: Temperature in January ◮ Feature 1: Temperature in August ρ = 0 . 269 20 18 16 April 14 12 10 8 16 18 20 22 24 26 28 August
Temperature data Temperature in Oxford over 150 years (monthly) ◮ Feature 1: Maximum temperature ◮ Feature 1: Minimum temperature ρ = 0 . 962 20 Minimum temperature 15 10 5 0 5 10 5 0 5 10 15 20 25 30 Maximum temperature
Parallelogram law A norm � · � on a vector space V is an inner-product norm if and only if x � 2 + 2 � � y � 2 = � � y � 2 + � � y � 2 2 � � x − � x + � for any � x , � y ∈ V
ℓ 1 and ℓ ∞ norms Norms in R n or C n not induced by an inner product n � || � x || 1 := | � x [ i ] | i = 1 || � x || ∞ := max | � x [ i ] | i Hölder’s inequality |� � x , � y �| ≤ || � x || 1 || � y || ∞
Norm balls ℓ 1 ℓ 2 ℓ ∞
Distance The distance between two vectors � x and � y induced by a norm ||·|| is d ( � x , � y ) := || � x − � y ||
Classification Aim: Assign a signal to one of k predefined classes Training data: n pairs of signals (represented as vectors) and labels: { � x 1 , l 1 } , . . . , { � x n , l n }
Nearest-neighbor classification nearest neighbor
Face recognition Training set: 360 64 × 64 images from 40 different subjects (9 each) Test set: 1 new image from each subject We model each image as a vector in R 4096 and use the ℓ 2 -norm distance
Face recognition Training set
Nearest-neighbor classification Errors: 4 / 40 Test image Closest image
Orthogonality Two vectors � x and � y are orthogonal if and only if � � x , � y � = 0 A vector � x is orthogonal to a set S , if � � x ,� s � = 0 , for all � s ∈ S Two sets of S 1 , S 2 are orthogonal if for any � x ∈ S 1 , � y ∈ S 2 � � x , � y � = 0 The orthogonal complement of a subspace S is S ⊥ := { � x | � � x , � y � = 0 for all � y ∈ S}
Recommend
More recommend