nla reading group spring 13
play

NLA Reading Group Spring13 by smail Ar is a linear combination of - PowerPoint PPT Presentation

NLA Reading Group Spring13 by smail Ar is a linear combination of the columns of 2 Let us re-write the matrix-vector multiplication As mathematicians, we are used to viewing the formula = as a


  1. NLA Reading Group Spring’13 by İsmail Arı

  2.  𝑐 is a linear combination of the columns of 𝐵   2

  3. Let us re-write the matrix-vector multiplication “As mathematicians, we are used to viewing the formula 𝐵𝑦 = 𝑐 as a statement that 𝐵 acts on 𝑦 to produce 𝑐 The new formula, by contrast, suggests the interpretation that 𝑦 acts on 𝐵 to produce 𝑐 3

  4. The map from vectors of coefficients of polynomials 𝑞 of degree < 𝑜 to vectors (𝑞(𝑦 1 ), 𝑞(𝑦 2 ), … , 𝑞(𝑦 𝑛 )) of sampled polynomial values is linear. The product 𝐵𝑑 gives the sampled polynomial values: 4

  5. Do not see 𝐵𝑑 as 𝑛 distinct scalar summations. Instead, see 𝐵 as a matrix of columns, each giving sampled values of a monomial*, Thus, 𝐵𝑑 is a single vector summation that at once gives a linear combination of these monomials, *In mathematics, a monomial is roughly speaking, a polynomial which has only one term.

  6.  each column of 𝐶 is a linear combination of the columns of 𝐵 Thus 𝑐 𝑘 is a linear combinations of the columns 𝑏 𝑙 with coefficients 𝑑 𝑙𝑘 6

  7. 7

  8. The matrix 𝑆 is a discrete analogue of an indefinite integral operator 8

  9. range(𝐵) is the space spanned by the columns of 𝐵 null(𝐵) is the set of vectors that satisfy 𝐵𝑦 = 0 , where 0 is the 0-vector in ℂ 𝑛 The column/row rank of a matrix is the dimension of its column/row space. Column rank always equals row rank. So, we call this as rank of the matrix. A matrix 𝐵 of size m -by- n with m ≥ n has full rank iff it maps no two distinct vectors to the same vector. 9

  10. A nonsingular or invertible matrix is a square matrix of full rank. 𝐽 is the m -by- m identity. The matrix 𝑎 is the inverse of 𝐵 . 10

  11. For an m -by- m matrix 𝐵 , the following conditions are equivalent: We mention the determinant, though a convenient notion theoretically, rarely finds a useful role on numerical algorithms. 11

  12. Do not think 𝑦 as the result of applying 𝐵 −1 to 𝑐 . Instead, think it as the unique vector that satisfies the equation 𝐵𝑦 = 𝑐 . Multiplication by 𝐵 −1 is a change of basis operation. 𝐵 −1 𝑐 is the vector of coefficients of the expansion of 𝑐 in the basis of columns of 𝐵 . 12

  13. NLA Reading Group Spring’13 by İsmail Arı

  14. The complex conjugate of a scalar 𝑨 , written 𝑨 or 𝑨 ∗ , is obtained by negating its imaginary part. The hermitian conjugate or adjoint of an m -by- n matrix 𝐵 , written 𝐵 ∗ , is the n-by-m matrix whose i , j entry is the complex conjugate of the j , i entry of 𝐵 . I f 𝐵 = 𝐵 ∗ , 𝐵 is hermitian. For real 𝐵 , adjoint is known as transpose and shown as 𝐵 𝑈 . If 𝐵 = 𝐵 𝑈 , then 𝐵 is symmetric. 14

  15. Euclidean length of 𝑦 The inner product is bilinear, i.e. linear in each vector separately: 15

  16. A pair of vectors 𝑦 and 𝑧 are orthogonal if 𝑦 ∗ 𝑧 = 0 . Two sets of vectors 𝑌 and 𝑍 are orthogonal if every 𝑦 ∈ 𝑌 is orthogonal to 𝑧 ∈ 𝑍 . A set of nonzero vectors 𝑇 is orthogonal if its elements are pairwise orthogonal. A set of nonzero vectors 𝑇 is orthonormal if it is orthogonal, in addition, every 𝑦 ∈ 𝑇 has 𝑦 = 1 . 16

  17. The vectors in an orthogonal set 𝑇 are linearly independent. Sketch of the proof:  Assume than they were not independent and propose a nonzero vector by linear combination of the members of 𝑇  Observe that its length should be larger than 0  Use the bilinearity of inner products and the orthogonality of 𝑇 to contradict the assumption ⇒ If an orthogonal set 𝑇 ⊆ ℂ 𝑛 contains 𝑛 vectors, then it is a basis for ℂ 𝑛 . 17

  18. Inner products can be used to decompose arbitrary vectors into orthogonal components. Assume 𝑟 1 , 𝑟 2 , … , 𝑟 𝑜 : an orthonormal set 𝑤 : an arbitrary vector ∗ 𝑤 as coordinates in an expansion, we find that Utilizing the scalars 𝑟 𝑘 is orthogonal to 𝑟 1 , 𝑟 2 , … , 𝑟 𝑜 Thus we see that 𝑤 can be decomposed into 𝑜 + 1 orthogonal components: 18

  19. 1 2 ∗ 𝑤 times vectors 𝑟 𝑗 . We view 𝑤 as a sum of coefficients 𝑟 𝑘 1 We view 𝑤 as a sum of orthogonal projections of 𝑤 onto the various directions of 𝑟 𝑗 . The 𝑗 th projection operation is achieved by the very 2 ∗ . special rank-one matrix 𝑟 𝑗 𝑟 𝑗 19

  20. I f 𝑅 ∗ = 𝑅 −1 , 𝑅 is unitary. 20

  21. 𝑅 ∗ 𝑐 is the vector of coefficients of the expansion of 𝑐 in the basis of columns of 𝐵 . 21

  22. Multiplication by a unitary matrix or its adjoint preserve geometric structure in the Euclidean sense, because inner products are preserved. The invariance of inner products means that angles between vectors are preserved, and so are their lengths: In the real case, multiplication by an orthonormal matrix 𝑅 corresponds to a rigid rotation (if det𝑅 = 1 ) or reflection (if det𝑅 = −1 ) of the vector space. 22

  23. NLA Reading Group Spring’13 by İsmail Arı

  24. The essential notions of size and distance in a vector space are captured by norms. In order to conform a reasonable notion of length, a norm must satisfy for all vectors 𝑦 and 𝑧 and for all scalars 𝛽 ∈ ℂ. 24

  25. The closed unit ball 𝑦 ∈ ℂ 𝑛 : 𝑦 ≤ 1 corresponding to each norm is illustrated to the right for the case 𝑛 = 2 . 25

  26. Introduce the diagonal matrix 𝑋 whose 𝑗 th diagonal entry is the weight 𝑥 𝑗 ≠ 0. Example: a weighted 2 -norm The most important norms in this book are the unweighted 2-norm and its induced matrix form. 26

  27. An 𝑛 × 𝑜 matrix can be viewed as a vector in an 𝑛𝑜 -dimensional space: each of the 𝑛𝑜 entries of the matrix is an independent coordinate. ⇒ Any 𝑛𝑜 - dimensional norm can be used for measuring the “size” of such a matrix. However, certain special matrix norms are more useful than the vector norms. These are the induced matrix norms, defined in terms of the behavior of a matrix as an operator between its normed domain and range spaces. 27

  28. Given vector norms ⋅ (𝑜) and ⋅ (𝑛) on the domain and range of 𝐵 ∈ ℂ 𝑛×𝑜 , respectively, the induced matrix norm 𝐵 (𝑛,𝑜) is the smallest number 𝐷 for which In other words, it is the maximum factor by which 𝐵 can stretch a vector 𝑦 . 28

  29. 29

  30. 30

  31. For any 𝑛 × 𝑜 matrix 𝐵 , 𝐵 1 is equal to the maximum column sum of 𝐵 . Consider 𝑦 be in By choosing 𝑦 = 𝑓 𝑘 , where 𝑘 maximizes 𝑏 𝑘 1 , we attain: 31

  32. For any 𝑛 × 𝑜 matrix 𝐵 , 𝐵 ∞ is equal to the maximum row sum of 𝐵 . 32

  33. 1 1 Let 𝑞 and 𝑟 satisfy 𝑞 + 𝑟 = 1 , with 1 ≤ 𝑞, 𝑟 ≤ ∞ . Then, the Hölder inequality states that, for any vectors 𝑦 and 𝑧 : The Cauchy-Schwartz inequality is a special case 𝑞 = 𝑟 = 2 : 33

  34. Consider 𝐵 = 𝑏 ∗ where 𝑏 is a column vector. For any 𝑦 , we have: This bound is tight: observe that Therefore, we have 34

  35. Consider 𝐵 = 𝑣𝑤 ∗ , where 𝑣 is an 𝑛 -vector and 𝑤 is an 𝑜 -vector. For any 𝑜 -vector 𝑦 , we can bound Therefore, we have This inequality is an equality for the case 𝑦 = 𝑤 . 35

  36. Therefore, the induced norm of 𝐵𝐶 must satisfy 36

  37. 37

  38. The most important matrix norm which is not induced by a vector norm is the Hilbert-Schmidt or Frobenius norm, defined by Observe that this s the same as the 2 -norm of the matrix when viewed as an 𝑛𝑜 - dimensional vector. Alternatively, we can write 38

  39. Let 𝐷 = 𝐵𝐶 , then 39

  40. The matrix 2 -norm and Frobenius norm are invariant under multiplication by unitary matrices. This fact is still valid if 𝑅 is generalized to a rectangular matrix with orthonormal columns. Recall transformation used in PCA. 40

Recommend


More recommend