numerical linear algebra
play

Numerical Linear Algebra EECS 442 David Fouhey Fall 2019, - PowerPoint PPT Presentation

Numerical Linear Algebra EECS 442 David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/ Administrivia HW 1 out due in two weeks Follow submission format (wrong format = 0) The


  1. Matrices Horizontally concatenate n, m-dim column vectors and you get a mxn matrix A (here 2x3) ๐‘ค 1 1 ๐‘ค 2 1 ๐‘ค 3 1 ๐‘ฉ = ๐’˜ 1 , โ‹ฏ , ๐’˜ ๐‘œ = ๐‘ค 1 2 ๐‘ค 2 2 ๐‘ค 3 2 (scalar) (vector) (matrix) a undecorated a bold or arrow A lowercase lowercase uppercase bold

  2. Matrices ๐‘ˆ ๐‘ Transpose: flip (3x1) T = 1x3 ๐‘ = ๐‘ ๐‘ ๐‘‘ rows / columns ๐‘‘ Vertically concatenate m, n-dim row vectors and you get a mxn matrix A (here 2x3) ๐‘ˆ ๐’— 1 ๐‘ฃ 1 1 ๐‘ฃ 1 2 ๐‘ฃ 1 3 ๐ต = = โ‹ฎ ๐‘ฃ 2 1 ๐‘ฃ 2 2 ๐‘ฃ 2 3 ๐‘ˆ ๐’— ๐‘œ

  3. Matrix-Vector Product ๐’› 2๐‘ฆ1 = ๐‘ฉ 2๐‘ฆ3 ๐’š 3๐‘ฆ1 ๐‘ฆ 1 ๐‘ง 1 ๐‘ฆ 2 ๐’˜ ๐Ÿ ๐’˜ ๐Ÿ‘ ๐’˜ ๐Ÿ’ ๐‘ง 2 = ๐‘ฆ 3 ๐’› = ๐‘ฆ 1 ๐’˜ ๐Ÿ + ๐‘ฆ 2 ๐’˜ ๐Ÿ‘ + ๐‘ฆ 3 ๐’˜ ๐Ÿ’ Linear combination of columns of A

  4. Matrix-Vector Product ๐’› 2๐‘ฆ1 = ๐‘ฉ 2๐‘ฆ3 ๐’š 3๐‘ฆ1 3 ๐‘ง 1 ๐’— ๐Ÿ ๐‘ง 2 = ๐’š 3 ๐’— ๐Ÿ‘ ๐‘ผ ๐’š ๐‘ผ ๐’š ๐‘ง 1 = ๐’— ๐Ÿ ๐‘ง 2 = ๐’— ๐Ÿ‘ Dot product between rows of A and x

  5. Matrix Multiplication Generally: A mn and B np yield product ( AB ) mp | | ๐‘ผ โˆ’ ๐’ƒ ๐Ÿ โˆ’ ๐’„ ๐Ÿ โ‹ฏ ๐’„ ๐’’ ๐‘ฉ๐‘ช = โ‹ฎ ๐‘ผ โˆ’ ๐’ƒ ๐’ โˆ’ | | Yes โ€“ in A , Iโ€™m referring to the rows, and in B , Iโ€™m referring to the columns

  6. Matrix Multiplication Generally: A mn and B np yield product ( AB ) mp | | ๐’„ ๐Ÿ โ‹ฏ ๐’„ ๐’’ | | ๐‘ผ ๐’„ ๐Ÿ ๐‘ผ ๐’„ ๐’’ ๐‘ผ ๐’ƒ ๐Ÿ โ‹ฏ ๐’ƒ ๐Ÿ โˆ’ ๐’ƒ ๐Ÿ โˆ’ ๐‘ฉ๐‘ช = โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ ๐‘ผ ๐’„ ๐Ÿ ๐‘ผ ๐’„ ๐’’ ๐‘ผ โˆ’ ๐’ƒ ๐’ โˆ’ ๐’ƒ ๐’ โ‹ฏ ๐’ƒ ๐’ ๐‘ผ ๐’„ ๐’Œ ๐‘ฉ๐‘ช ๐‘—๐‘˜ = ๐’ƒ ๐’‹

  7. Matrix Multiplication โ€ข Dimensions must match โ€ข Dimensions must match โ€ข Dimensions must match โ€ข (Yes, itโ€™s associative): ABx = (A)(Bx) = (AB)x โ€ข ( No itโ€™s not commutative ): ABx โ‰  (BA)x โ‰  ( BxA)

  8. Operations They Donโ€™t Teach You Probably Saw Matrix Addition ๐‘’ + ๐‘“ ๐‘” ๐‘ + ๐‘“ ๐‘ + ๐‘” ๐‘ ๐‘ โ„Ž = ๐‘• ๐‘‘ + ๐‘• ๐‘’ + โ„Ž ๐‘‘ What is this? FYI: e is a scalar ๐‘ ๐‘ ๐‘ + ๐‘“ ๐‘ + ๐‘“ ๐‘’ + ๐‘“ = ๐‘‘ ๐‘‘ + ๐‘“ ๐‘’ + ๐‘“

  9. Broadcasting If you want to be pedantic and proper, you expand e by multiplying a matrix of 1s (denoted 1 ) ๐‘ ๐‘ = ๐‘ ๐‘ ๐‘’ + ๐‘“ ๐‘’ + ๐Ÿ 2๐‘ฆ2 ๐‘“ ๐‘‘ ๐‘‘ ๐‘’ + ๐‘“ ๐‘“ = ๐‘ ๐‘ ๐‘“ ๐‘“ ๐‘‘ Many smart matrix libraries do this automatically. This is the source of many bugs.

  10. Broadcasting Example Given: a nx2 matrix P and a 2D column vector v , Want: nx2 difference matrix D ๐‘ฆ 1 ๐‘ง 1 ๐‘ฆ 1 โˆ’ ๐‘ ๐‘ง 1 โˆ’ ๐‘ ๐’˜ = ๐‘ โ‹ฎ โ‹ฎ ๐‘ธ = โ‹ฎ โ‹ฎ ๐‘ฌ = ๐‘ ๐‘ฆ ๐‘œ ๐‘ง ๐‘œ ๐‘ฆ ๐‘œ โˆ’ ๐‘ ๐‘ง ๐‘œ โˆ’ ๐‘ ๐‘ฆ 1 ๐‘ง 1 Blue stuff is ๐‘ ๐‘ ๐‘ธ โˆ’ ๐’˜ ๐‘ˆ = โ‹ฎ โ‹ฎ assumed / โˆ’ โ‹ฎ broadcast ๐‘ฆ ๐‘œ ๐‘ง ๐‘œ ๐‘ ๐‘

  11. Two Uses for Matrices 1. Storing things in a rectangular array (images, maps) โ€ข Typical operations : element-wise operations, convolution (which weโ€™ll cover next) โ€ข Atypical operations : almost anything you learned in a math linear algebra class 2. A linear operator that maps vectors to another space ( Ax ) โ€ข Typical/Atypical: reverse of above

  12. Images as Matrices Suppose someone hands you this matrix. Whatโ€™s wrong with it?

  13. Contrast โ€“ Gamma curve Typical way to change the contrast is to apply a nonlinear correction pixelvalue ๐›ฟ The quantity ๐›ฟ controls how much contrast gets added

  14. Contrast โ€“ Gamma curve Now the darkest 90% regions (10 th pctile) are much darker 50% than the moderately new dark regions (50 th 10% 90% pctile). new new 10% 50%

  15. Implementation Python+Numpy (right way): imNew = im**0.25 Python+Numpy (slow way โ€“ why? ): imNew = np.zeros(im.shape) for y in range(im.shape[0]): for x in range(im.shape[1]): imNew[y,x] = im[y,x]**expFactor

  16. Results Phew! Much Better.

  17. Element-wise Operations Element-wise power โ€“ beware notation ๐‘ž ๐‘ฉ ๐‘ž ๐‘—๐‘˜ = ๐ต ๐‘—๐‘˜ โ€œHadamard Productโ€ / Element -wise multiplication ๐‘ฉ โŠ™ ๐‘ช ๐‘—๐‘˜ = ๐‘ฉ ๐‘—๐‘˜ โˆ— ๐‘ช ๐‘—๐‘˜ Element-wise division ๐‘ฉ/๐‘ช ๐‘—๐‘˜ = ๐ต ๐‘—๐‘˜ ๐ถ ๐‘—๐‘˜

  18. Sums Across Axes ๐‘ฆ 1 ๐‘ง 1 Suppose have โ‹ฎ โ‹ฎ ๐‘ฉ = Nx2 matrix A ๐‘ฆ ๐‘œ ๐‘ง ๐‘œ ๐‘ฆ 1 + ๐‘ง 1 ND col. vec. โ‹ฎ ฮฃ(๐‘ฉ, 1) = ๐‘ฆ ๐‘œ + ๐‘ง ๐‘œ ๐‘œ ๐‘œ 2D row vec ฮฃ(๐‘ฉ, 0) = เท ๐‘ฆ ๐‘— , เท ๐‘ง ๐‘— ๐‘—=1 ๐‘—=1 Note โ€“ libraries distinguish between N-D column vector and Nx1 matrix.

  19. Vectorizing Example โ€ข Suppose I represent each image as a 128- dimensional vector โ€ข I want to compute all the pairwise distances between { x 1 , โ€ฆ, x N } and { y 1 , โ€ฆ, y M } so I can find, for every x i the nearest y j โ€ข Identity: ๐’š โˆ’ ๐’› 2 = ๐’š 2 + ๐’› 2 โˆ’ 2๐’š ๐‘ˆ ๐’› ๐’š 2 + ๐’› 2 โˆ’ 2๐’š ๐‘ˆ ๐’› 1/2 โ€ข Or: ๐’š โˆ’ ๐’› =

  20. Vectorizing Example โˆ’ ๐’š 1 โˆ’ โˆ’ ๐’› 1 โˆ’ | | ๐’ ๐‘ผ = โ‹ฎ โ‹ฎ ๐’› 1 โ‹ฏ ๐’› ๐‘ ๐’€ = ๐’ = โˆ’ ๐’š ๐‘‚ โˆ’ โˆ’ ๐’› ๐‘ โˆ’ | | ๐’š ๐Ÿ ๐Ÿ‘ Compute a Nx1 ๐šป ๐’€ ๐Ÿ‘ , ๐Ÿ = โ‹ฎ vector of norms ๐’š ๐‘ถ ๐Ÿ‘ (can also do Mx1) Compute a NxM ๐‘ผ ๐’› ๐’Œ ๐’€๐’ ๐‘ผ ๐‘—๐‘˜ = ๐’š ๐’‹ matrix of dot products

  21. Vectorizing Example ๐‘ผ โˆ’ 2๐’€๐’ ๐‘ผ 1/2 ๐„ = ฮฃ ๐’€ ๐Ÿ‘ , 1 + ฮฃ ๐’ ๐Ÿ‘ , 1 ๐’š ๐Ÿ ๐Ÿ‘ ๐’› 1 ๐Ÿ‘ ๐Ÿ‘ + โ‹ฏ ๐’› ๐‘ โ‹ฎ ๐’š ๐‘ถ ๐Ÿ‘ ๐’š ๐Ÿ 2 + ๐’› ๐Ÿ 2 ๐’š ๐Ÿ 2 + ๐’› ๐‘ต 2 โ‹ฏ Why? โ‹ฎ โ‹ฑ โ‹ฎ ๐’š ๐‘ถ 2 + ๐’› ๐Ÿ 2 ๐’š ๐‘ถ 2 + ๐’› ๐‘ต 2 โ‹ฏ 2 2 + ๐‘ง ๐‘˜ ฮฃ ๐‘Œ 2 , 1 + ฮฃ ๐‘ 2 , 1 ๐‘ˆ ๐‘—๐‘˜ = ๐‘ฆ ๐‘—

  22. Vectorizing Example ๐‘ผ โˆ’ 2๐’€๐’ ๐‘ผ 1/2 ๐„ = ฮฃ ๐’€ ๐Ÿ‘ , 1 + ฮฃ ๐’ ๐Ÿ‘ , 1 2 + 2๐’š ๐‘ผ ๐’› ๐’š ๐’‹ 2 + ๐’› ๐’Œ ๐„ ๐‘—๐‘˜ = Numpy code: XNorm = np.sum(X**2,axis=1,keepdims=True) YNorm = np.sum(Y**2,axis=1,keepdims=True) D = (XNorm+YNorm.T-2*np.dot(X,Y.T))**0.5 *May have to make sure this is at least 0 (sometimes roundoff issues happen)

  23. Does it Make a Difference? Computing pairwise distances between 300 and 400 128-dimensional vectors 1. for x in X, for y in Y, using native python: 9s 2. for x in X, for y in Y, using numpy to compute distance: 0.8s 3. vectorized: 0.0045s (~2000x faster than 1, 175x faster than 2) Expressing things in primitives that are optimized is usually faster

  24. Linear Independence A set of vectors is linearly independent if you canโ€™t write one as a linear combination of the others. 0 0 5 Suppose: ๐’ƒ = ๐’„ = ๐’… = 0 6 0 2 0 0 0 0 = 1 2 ๐’ƒ โˆ’ 1 ๐’š = = 2๐’ƒ 0 ๐’› = 3 ๐’„ โˆ’2 4 1 โ€ข Is the set {a,b,c} linearly independent? โ€ข Is the set {a,b,x} linearly independent? โ€ข Max # of independent 3D vectors?

  25. Span Span: all linear combinations of a set of vectors Span({ }) = Span({[0,1]}) = ? All vertical lines through origin = ๐œ‡ 0,1 : ๐œ‡ โˆˆ ๐‘† Is blue in {red }โ€™s span?

  26. Span Span: all linear combinations of a set of vectors Span({ , }) = ?

  27. Span Span: all linear combinations of a set of vectors Span({ , }) = ?

  28. Matrix-Vector Product | | Right-multiplying A by x ๐’… ๐Ÿ โ‹ฏ ๐’… ๐’ mixes columns of A ๐‘ฉ๐’š = ๐’š according to entries of x | | โ€ข The output space of f( x ) = Ax is constrained to be the span of the columns of A . โ€ข Canโ€™t output things you canโ€™t construct out of your columns

  29. An Intuition ๐‘ฆ 1 | | | ๐‘ฆ 2 ๐’… ๐Ÿ ๐’… ๐Ÿ‘ ๐’… ๐’ ๐’› = ๐‘ฉ๐’š = ๐‘ฆ 3 | | | y 1 y x Ax y 2 x 1 x 2 x 3 y 3 x โ€“ knobs on machine (e.g., fuel, brakes) y โ€“ state of the world (e.g., where you are) A โ€“ machine (e.g., your car)

  30. Linear Independence Suppose the columns of 3x3 matrix A are not linearly independent (c 1 , ฮฑ c 1 , c 2 for instance) ๐‘ฆ 1 | | | ๐‘ฆ 2 ๐’… ๐Ÿ ๐›ฝ๐’… ๐Ÿ ๐’… ๐Ÿ‘ ๐’› = ๐‘ฉ๐’š = ๐‘ฆ 3 | | | ๐’› = ๐‘ฆ 1 ๐’… ๐Ÿ + ๐›ฝ๐‘ฆ 2 ๐’… ๐Ÿ + ๐‘ฆ 3 ๐’… ๐Ÿ‘ ๐’› = ๐‘ฆ 1 + ๐›ฝ๐‘ฆ 2 ๐’… ๐Ÿ + ๐‘ฆ 3 ๐’… ๐Ÿ‘

  31. Linear Independence Intuition Knobs of x are redundant. Even if y has 3 outputs, you can only control it in two directions ๐’› = ๐‘ฆ 1 + ๐›ฝ๐‘ฆ 2 ๐’… ๐Ÿ + ๐‘ฆ 3 ๐’… ๐Ÿ‘ y 1 y x Ax y 2 x 1 x 2 x 3 y 3

  32. Linear Independence Recall: ๐‘ฉ๐’š = ๐‘ฆ 1 + ๐›ฝ๐‘ฆ 2 ๐’… ๐Ÿ + ๐‘ฆ 3 ๐’… ๐Ÿ‘ ๐‘ฆ 1 + ๐›พ ๐‘ฆ 1 + ๐›พ + ๐›ฝ๐‘ฆ 2 โˆ’ ๐›ฝ ๐›พ ๐’› = ๐‘ฉ ๐‘ฆ 2 โˆ’ ๐›พ/๐›ฝ = ๐›ฝ ๐‘‘ 1 + ๐‘ฆ 3 ๐‘‘ 2 ๐‘ฆ 3 โ€ข Can write y an infinite number of ways by ๐›พ adding ๐›พ to x 1 and subtracting ๐›ฝ from x 2 โ€ข Or, given a vector y thereโ€™s not a unique vector x s.t. y = Ax โ€ข Not all y have a corresponding x s.t. y=Ax

  33. Linear Independence ๐‘ฉ๐’š = ๐‘ฆ 1 + ๐›ฝ๐‘ฆ 2 ๐’… ๐Ÿ + ๐‘ฆ 3 ๐’… ๐Ÿ‘ ๐›พ ๐›พ โˆ’ ๐›ฝ ๐›พ ๐’› = ๐‘ฉ โˆ’๐›พ/๐›ฝ = ๐›ฝ ๐’… ๐Ÿ + 0๐’… ๐Ÿ‘ 0 โ€ข What else can we cancel out? โ€ข An infinite number of non-zero vectors x can map to a zero-vector y โ€ข Called the right null-space of A.

  34. Rank โ€ข Rank of a nxn matrix A โ€“ number of linearly independent columns ( or rows ) of A / the dimension of the span of the columns โ€ข Matrices with full rank (n x n, rank n) behave nicely: can be inverted, span the full output space, are one-to-one. โ€ข Matrices with full rank are machines where every knob is useful and every output state can be made by the machine

  35. Inverses โ€ข Given ๐’› = ๐‘ฉ๐’š , y is a linear combination of columns of A proportional to x . If A is full-rank, we should be able to invert this mapping. โ€ข Given some y (output) and A , what x (inputs) produced it? โ€ข x = A -1 y โ€ข Note: if you donโ€™t need to compute it, never ever compute it. Solving for x is much faster and stable than obtaining A -1 .

  36. Symmetric Matrices โ€ข Symmetric: ๐‘ฉ ๐‘ผ = ๐‘ฉ or ๐‘ 11 ๐‘ 12 ๐‘ 13 ๐‘ฉ ๐‘—๐‘˜ = ๐‘ฉ ๐‘˜๐‘— ๐‘ 21 ๐‘ 22 ๐‘ 23 โ€ข Have lots of special ๐‘ 31 ๐‘ 32 ๐‘ 33 properties Any matrix of the form ๐‘ฉ = ๐’€ ๐‘ผ ๐’€ is symmetric. ๐‘ผ ๐‘ฉ ๐‘ผ = ๐’€ ๐‘ผ ๐’€ Quick check: ๐‘ฉ ๐‘ผ = ๐’€ ๐‘ผ ๐’€ ๐‘ผ ๐‘ผ ๐‘ฉ ๐‘ผ = ๐’€ ๐‘ผ ๐’€

  37. Special Matrices โ€“ Rotations ๐‘  ๐‘  ๐‘  11 12 13 ๐‘  ๐‘  ๐‘  21 22 23 ๐‘  ๐‘  ๐‘  31 32 33 โ€ข Rotation matrices ๐‘บ rotate vectors and do not change vector L2 norms ( ๐‘บ๐’š 2 = ๐’š 2 ) โ€ข Every row/column is unit norm โ€ข Every row is linearly independent โ€ข Transpose is inverse ๐‘บ๐‘บ ๐‘ผ = ๐‘บ ๐‘ผ ๐‘บ = ๐‘ฑ โ€ข Determinant is 1 (otherwise itโ€™s also a coordinate flip/reflection), eigenvalues are 1

  38. Eigensystems โ€ข An eigenvector ๐’˜ ๐’‹ and eigenvalue ๐œ‡ ๐‘— of a matrix ๐‘ฉ satisfy ๐‘ฉ๐’˜ ๐’‹ = ๐œ‡ ๐‘— ๐’˜ ๐’‹ ( ๐‘ฉ๐’˜ ๐’‹ is scaled by ๐œ‡ ๐‘— ) โ€ข Vectors and values are always paired and typically you assume ๐’˜ ๐’‹ 2 = 1 โ€ข Biggest eigenvalue of A gives bounds on how much ๐‘” ๐’š = ๐‘ฉ๐’š stretches a vector x . โ€ข Hints of what people really mean: โ€ข โ€œLargest eigenvectorโ€ = vector w/ largest value โ€ข Spectral just means thereโ€™s eigenvectors

  39. Suppose I have points in a grid

  40. Now I apply f( x ) = Ax to these points Pointy-end: Ax . Non-Pointy-End: x

  41. ๐‘ฉ = 1.1 0 0 1.1 Red box โ€“ unit square, Blue box โ€“ after f( x ) = Ax . What are the yellow lines and why?

  42. ๐‘ฉ = 0.8 0 0 1.25 Now I apply f( x ) = Ax to these points Pointy-end: Ax . Non-Pointy-End: x

  43. ๐‘ฉ = 0.8 0 0 1.25 Red box โ€“ unit square, Blue box โ€“ after f( x ) = Ax . What are the yellow lines and why?

  44. ๐‘ฉ = cos(๐‘ข) โˆ’sin(๐‘ข) sin(๐‘ข) cos(๐‘ข) Red box โ€“ unit square, Blue box โ€“ after f( x ) = Ax . Can we draw any yellow lines?

  45. Eigenvectors of Symmetric Matrices โ€ข Always n mutually orthogonal eigenvectors with n (not necessarily) distinct eigenvalues โ€ข For symmetric ๐‘ฉ , the eigenvector with the ๐’š ๐‘ผ ๐‘ฉ๐’š largest eigenvalue maximizes ๐’š ๐‘ผ ๐’š (smallest/min) โ€ข So for unit vectors (where ๐’š ๐‘ผ ๐’š = 1 ), that eigenvector maximizes ๐’š ๐‘ผ ๐‘ฉ๐’š โ€ข A surprisingly large number of optimization problems rely on (max/min)imizing this

  46. The Singular Value Decomposition Can always write a mxn matrix A as: ๐‘ฉ = ๐‘ฝ๐šป๐‘พ ๐‘ผ 0 ฯƒ 1 A = U โˆ‘ ฯƒ 2 ฯƒ 3 Scale 0 Rotation Eigenvectors Sqrt of of AA T Eigenvalues of A T A

  47. The Singular Value Decomposition Can always write a mxn matrix A as: ๐‘ฉ = ๐‘ฝ๐šป๐‘พ ๐‘ผ V T A = U โˆ‘ Rotation Scale Rotation Eigenvectors Sqrt of Eigenvectors of AA T Eigenvalues of A T A of A T A

  48. Singular Value Decomposition โ€ข Every matrix is a rotation, scaling, and rotation โ€ข Number of non-zero singular values = rank / number of linearly independent vectors โ€ข โ€œClosestโ€ matrix to A with a lower rank 0 ฯƒ 1 V T ฯƒ 2 = A U ฯƒ 3 0

  49. Singular Value Decomposition โ€ข Every matrix is a rotation, scaling, and rotation โ€ข Number of non-zero singular values = rank / number of linearly independent vectors โ€ข โ€œClosestโ€ matrix to A with a lower rank 0 ฯƒ 1 V T ฯƒ 2 = ร‚ U 0 0

  50. Singular Value Decomposition โ€ข Every matrix is a rotation, scaling, and rotation โ€ข Number of non-zero singular values = rank / number of linearly independent vectors โ€ข โ€œClosestโ€ matrix to A with a lower rank โ€ข Secretly behind basically many things you do with matrices

  51. Solving Least-Squares Start with two points (x i ,y i ) (x 2 ,y 2 ) ๐’› = ๐‘ฉ๐’˜ ๐‘ง 1 ๐‘ง 2 = ๐‘ฆ 1 1 ๐‘› ๐‘ฆ 2 1 ๐‘ (x 1 ,y 1 ) ๐‘ง 1 ๐‘ง 2 = ๐‘›๐‘ฆ 1 + ๐‘ ๐‘›๐‘ฆ 2 + ๐‘ We know how to solve this โ€“ invert A and find v (i.e., (m,b) that fits points)

  52. Solving Least-Squares Start with two points (x i ,y i ) (x 2 ,y 2 ) ๐’› = ๐‘ฉ๐’˜ ๐‘ง 1 ๐‘ง 2 = ๐‘ฆ 1 1 ๐‘› ๐‘ฆ 2 1 ๐‘ (x 1 ,y 1 ) 2 ๐‘ง 1 ๐‘ง 2 โˆ’ ๐‘›๐‘ฆ 1 + ๐‘ ๐’› โˆ’ ๐‘ฉ๐’˜ 2 = ๐‘›๐‘ฆ 2 + ๐‘ 2 + ๐‘ง 2 โˆ’ ๐‘›๐‘ฆ 2 + ๐‘ 2 = ๐‘ง 1 โˆ’ ๐‘›๐‘ฆ 1 + ๐‘ The sum of squared differences between the actual value of y and what the model says y should be.

  53. Solving Least-Squares Suppose there are n > 2 points ๐’› = ๐‘ฉ๐’˜ ๐‘ง 1 ๐‘ฆ 1 1 ๐‘› โ‹ฎ โ‹ฎ โ‹ฎ = ๐‘ ๐‘ง ๐‘‚ ๐‘ฆ ๐‘‚ 1 Compute ๐‘ง โˆ’ ๐ต๐‘ฆ 2 again ๐‘œ ๐’› โˆ’ ๐‘ฉ๐’˜ 2 = เท ๐‘ง ๐‘— โˆ’ (๐‘›๐‘ฆ ๐‘— + ๐‘) 2 ๐‘—=1

  54. Solving Least-Squares Given y , A , and v with y = Av overdetermined ( A tall / more equations than unknowns) We want to minimize ๐’› โˆ’ ๐‘ฉ๐’˜ ๐Ÿ‘ , or find: arg min ๐’˜ ๐’› โˆ’ ๐‘ฉ๐’˜ 2 (The value of x that makes the expression smallest) Solution satisfies ๐‘ฉ ๐‘ผ ๐‘ฉ ๐’˜ โˆ— = ๐‘ฉ ๐‘ผ ๐’› or โˆ’1 ๐‘ฉ ๐‘ผ ๐’› ๐’˜ โˆ— = ๐‘ฉ ๐‘ผ ๐‘ฉ (Donโ€™t actually compute the inverse!)

  55. When is Least-Squares Possible? Given y , A , and v . Want y = Av Want n outputs, have n knobs y = A v to fiddle with, every knob is useful if A is full rank. A: rows (outputs) > columns = (knobs). Thus canโ€™t get precise v y A output you want (not enough knobs). So settle for โ€œclosestโ€ knob setting.

  56. When is Least-Squares Possible? Given y , A , and v . Want y = Av Want n outputs, have n knobs y = A v to fiddle with, every knob is useful if A is full rank. A: columns (knobs) > rows y = A (outputs). Thus, any output can v be expressed in infinite ways.

  57. Homogeneous Least-Squares Given a set of unit vectors (aka directions) ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ and I want vector ๐’˜ that is as orthogonal to all the ๐’š ๐’‹ as possible (for some definition of orthogonal) Stack ๐’š ๐’‹ into A , compute Av ๐’š ๐’ โ€ฆ ๐’š ๐Ÿ‘ 0 if ๐‘ผ ๐‘ผ ๐’˜ โˆ’ ๐’š ๐Ÿ โˆ’ ๐’š ๐Ÿ ๐’š ๐Ÿ orthog ๐‘ฉ๐’˜ = ๐’˜ = โ‹ฎ โ‹ฎ ๐‘ผ ๐‘ผ ๐’˜ โˆ’ ๐’š ๐’ โˆ’ ๐’š ๐’ ๐’ ๐’˜ ๐Ÿ‘ ๐‘ฉ๐’˜ ๐Ÿ‘ = เท ๐‘ผ ๐’˜ Compute ๐’š ๐’‹ ๐’‹ Sum of how orthog. v is to each x

  58. Homogeneous Least-Squares โ€ข A lot of times, given a matrix A we want to find the v that minimizes ๐‘ฉ๐’˜ 2 . โ€ข I.e., want ๐ฐ โˆ— = arg min 2 ๐‘ฉ๐’˜ 2 ๐’˜ โ€ข Whatโ€™s a trivial solution? โ€ข Set v = 0 โ†’ Av = 0 โ€ข Exclude this by forcing v to have unit norm

  59. Homogeneous Least-Squares 2 Letโ€™s look at ๐‘ฉ๐’˜ 2 2 = Rewrite as dot product ๐‘ฉ๐’˜ 2 2 = ๐๐ฐ T (๐๐ฐ) Distribute transpose ๐‘ฉ๐’˜ 2 2 = ๐’˜ ๐‘ผ ๐‘ฉ ๐‘ผ ๐๐ฐ = ๐ฐ ๐” ๐ ๐” ๐ ๐ฐ ๐‘ฉ๐’˜ 2 We want the vector minimizing this quadratic form Where have we seen this?

Recommend


More recommend