Matrices Horizontally concatenate n, m-dim column vectors and you get a mxn matrix A (here 2x3) ๐ค 1 1 ๐ค 2 1 ๐ค 3 1 ๐ฉ = ๐ 1 , โฏ , ๐ ๐ = ๐ค 1 2 ๐ค 2 2 ๐ค 3 2 (scalar) (vector) (matrix) a undecorated a bold or arrow A lowercase lowercase uppercase bold
Matrices ๐ ๐ Transpose: flip (3x1) T = 1x3 ๐ = ๐ ๐ ๐ rows / columns ๐ Vertically concatenate m, n-dim row vectors and you get a mxn matrix A (here 2x3) ๐ ๐ 1 ๐ฃ 1 1 ๐ฃ 1 2 ๐ฃ 1 3 ๐ต = = โฎ ๐ฃ 2 1 ๐ฃ 2 2 ๐ฃ 2 3 ๐ ๐ ๐
Matrix-Vector Product ๐ 2๐ฆ1 = ๐ฉ 2๐ฆ3 ๐ 3๐ฆ1 ๐ฆ 1 ๐ง 1 ๐ฆ 2 ๐ ๐ ๐ ๐ ๐ ๐ ๐ง 2 = ๐ฆ 3 ๐ = ๐ฆ 1 ๐ ๐ + ๐ฆ 2 ๐ ๐ + ๐ฆ 3 ๐ ๐ Linear combination of columns of A
Matrix-Vector Product ๐ 2๐ฆ1 = ๐ฉ 2๐ฆ3 ๐ 3๐ฆ1 3 ๐ง 1 ๐ ๐ ๐ง 2 = ๐ 3 ๐ ๐ ๐ผ ๐ ๐ผ ๐ ๐ง 1 = ๐ ๐ ๐ง 2 = ๐ ๐ Dot product between rows of A and x
Matrix Multiplication Generally: A mn and B np yield product ( AB ) mp | | ๐ผ โ ๐ ๐ โ ๐ ๐ โฏ ๐ ๐ ๐ฉ๐ช = โฎ ๐ผ โ ๐ ๐ โ | | Yes โ in A , Iโm referring to the rows, and in B , Iโm referring to the columns
Matrix Multiplication Generally: A mn and B np yield product ( AB ) mp | | ๐ ๐ โฏ ๐ ๐ | | ๐ผ ๐ ๐ ๐ผ ๐ ๐ ๐ผ ๐ ๐ โฏ ๐ ๐ โ ๐ ๐ โ ๐ฉ๐ช = โฎ โฎ โฑ โฎ ๐ผ ๐ ๐ ๐ผ ๐ ๐ ๐ผ โ ๐ ๐ โ ๐ ๐ โฏ ๐ ๐ ๐ผ ๐ ๐ ๐ฉ๐ช ๐๐ = ๐ ๐
Matrix Multiplication โข Dimensions must match โข Dimensions must match โข Dimensions must match โข (Yes, itโs associative): ABx = (A)(Bx) = (AB)x โข ( No itโs not commutative ): ABx โ (BA)x โ ( BxA)
Operations They Donโt Teach You Probably Saw Matrix Addition ๐ + ๐ ๐ ๐ + ๐ ๐ + ๐ ๐ ๐ โ = ๐ ๐ + ๐ ๐ + โ ๐ What is this? FYI: e is a scalar ๐ ๐ ๐ + ๐ ๐ + ๐ ๐ + ๐ = ๐ ๐ + ๐ ๐ + ๐
Broadcasting If you want to be pedantic and proper, you expand e by multiplying a matrix of 1s (denoted 1 ) ๐ ๐ = ๐ ๐ ๐ + ๐ ๐ + ๐ 2๐ฆ2 ๐ ๐ ๐ ๐ + ๐ ๐ = ๐ ๐ ๐ ๐ ๐ Many smart matrix libraries do this automatically. This is the source of many bugs.
Broadcasting Example Given: a nx2 matrix P and a 2D column vector v , Want: nx2 difference matrix D ๐ฆ 1 ๐ง 1 ๐ฆ 1 โ ๐ ๐ง 1 โ ๐ ๐ = ๐ โฎ โฎ ๐ธ = โฎ โฎ ๐ฌ = ๐ ๐ฆ ๐ ๐ง ๐ ๐ฆ ๐ โ ๐ ๐ง ๐ โ ๐ ๐ฆ 1 ๐ง 1 Blue stuff is ๐ ๐ ๐ธ โ ๐ ๐ = โฎ โฎ assumed / โ โฎ broadcast ๐ฆ ๐ ๐ง ๐ ๐ ๐
Two Uses for Matrices 1. Storing things in a rectangular array (images, maps) โข Typical operations : element-wise operations, convolution (which weโll cover next) โข Atypical operations : almost anything you learned in a math linear algebra class 2. A linear operator that maps vectors to another space ( Ax ) โข Typical/Atypical: reverse of above
Images as Matrices Suppose someone hands you this matrix. Whatโs wrong with it?
Contrast โ Gamma curve Typical way to change the contrast is to apply a nonlinear correction pixelvalue ๐ฟ The quantity ๐ฟ controls how much contrast gets added
Contrast โ Gamma curve Now the darkest 90% regions (10 th pctile) are much darker 50% than the moderately new dark regions (50 th 10% 90% pctile). new new 10% 50%
Implementation Python+Numpy (right way): imNew = im**0.25 Python+Numpy (slow way โ why? ): imNew = np.zeros(im.shape) for y in range(im.shape[0]): for x in range(im.shape[1]): imNew[y,x] = im[y,x]**expFactor
Results Phew! Much Better.
Element-wise Operations Element-wise power โ beware notation ๐ ๐ฉ ๐ ๐๐ = ๐ต ๐๐ โHadamard Productโ / Element -wise multiplication ๐ฉ โ ๐ช ๐๐ = ๐ฉ ๐๐ โ ๐ช ๐๐ Element-wise division ๐ฉ/๐ช ๐๐ = ๐ต ๐๐ ๐ถ ๐๐
Sums Across Axes ๐ฆ 1 ๐ง 1 Suppose have โฎ โฎ ๐ฉ = Nx2 matrix A ๐ฆ ๐ ๐ง ๐ ๐ฆ 1 + ๐ง 1 ND col. vec. โฎ ฮฃ(๐ฉ, 1) = ๐ฆ ๐ + ๐ง ๐ ๐ ๐ 2D row vec ฮฃ(๐ฉ, 0) = เท ๐ฆ ๐ , เท ๐ง ๐ ๐=1 ๐=1 Note โ libraries distinguish between N-D column vector and Nx1 matrix.
Vectorizing Example โข Suppose I represent each image as a 128- dimensional vector โข I want to compute all the pairwise distances between { x 1 , โฆ, x N } and { y 1 , โฆ, y M } so I can find, for every x i the nearest y j โข Identity: ๐ โ ๐ 2 = ๐ 2 + ๐ 2 โ 2๐ ๐ ๐ ๐ 2 + ๐ 2 โ 2๐ ๐ ๐ 1/2 โข Or: ๐ โ ๐ =
Vectorizing Example โ ๐ 1 โ โ ๐ 1 โ | | ๐ ๐ผ = โฎ โฎ ๐ 1 โฏ ๐ ๐ ๐ = ๐ = โ ๐ ๐ โ โ ๐ ๐ โ | | ๐ ๐ ๐ Compute a Nx1 ๐ป ๐ ๐ , ๐ = โฎ vector of norms ๐ ๐ถ ๐ (can also do Mx1) Compute a NxM ๐ผ ๐ ๐ ๐๐ ๐ผ ๐๐ = ๐ ๐ matrix of dot products
Vectorizing Example ๐ผ โ 2๐๐ ๐ผ 1/2 ๐ = ฮฃ ๐ ๐ , 1 + ฮฃ ๐ ๐ , 1 ๐ ๐ ๐ ๐ 1 ๐ ๐ + โฏ ๐ ๐ โฎ ๐ ๐ถ ๐ ๐ ๐ 2 + ๐ ๐ 2 ๐ ๐ 2 + ๐ ๐ต 2 โฏ Why? โฎ โฑ โฎ ๐ ๐ถ 2 + ๐ ๐ 2 ๐ ๐ถ 2 + ๐ ๐ต 2 โฏ 2 2 + ๐ง ๐ ฮฃ ๐ 2 , 1 + ฮฃ ๐ 2 , 1 ๐ ๐๐ = ๐ฆ ๐
Vectorizing Example ๐ผ โ 2๐๐ ๐ผ 1/2 ๐ = ฮฃ ๐ ๐ , 1 + ฮฃ ๐ ๐ , 1 2 + 2๐ ๐ผ ๐ ๐ ๐ 2 + ๐ ๐ ๐ ๐๐ = Numpy code: XNorm = np.sum(X**2,axis=1,keepdims=True) YNorm = np.sum(Y**2,axis=1,keepdims=True) D = (XNorm+YNorm.T-2*np.dot(X,Y.T))**0.5 *May have to make sure this is at least 0 (sometimes roundoff issues happen)
Does it Make a Difference? Computing pairwise distances between 300 and 400 128-dimensional vectors 1. for x in X, for y in Y, using native python: 9s 2. for x in X, for y in Y, using numpy to compute distance: 0.8s 3. vectorized: 0.0045s (~2000x faster than 1, 175x faster than 2) Expressing things in primitives that are optimized is usually faster
Linear Independence A set of vectors is linearly independent if you canโt write one as a linear combination of the others. 0 0 5 Suppose: ๐ = ๐ = ๐ = 0 6 0 2 0 0 0 0 = 1 2 ๐ โ 1 ๐ = = 2๐ 0 ๐ = 3 ๐ โ2 4 1 โข Is the set {a,b,c} linearly independent? โข Is the set {a,b,x} linearly independent? โข Max # of independent 3D vectors?
Span Span: all linear combinations of a set of vectors Span({ }) = Span({[0,1]}) = ? All vertical lines through origin = ๐ 0,1 : ๐ โ ๐ Is blue in {red }โs span?
Span Span: all linear combinations of a set of vectors Span({ , }) = ?
Span Span: all linear combinations of a set of vectors Span({ , }) = ?
Matrix-Vector Product | | Right-multiplying A by x ๐ ๐ โฏ ๐ ๐ mixes columns of A ๐ฉ๐ = ๐ according to entries of x | | โข The output space of f( x ) = Ax is constrained to be the span of the columns of A . โข Canโt output things you canโt construct out of your columns
An Intuition ๐ฆ 1 | | | ๐ฆ 2 ๐ ๐ ๐ ๐ ๐ ๐ ๐ = ๐ฉ๐ = ๐ฆ 3 | | | y 1 y x Ax y 2 x 1 x 2 x 3 y 3 x โ knobs on machine (e.g., fuel, brakes) y โ state of the world (e.g., where you are) A โ machine (e.g., your car)
Linear Independence Suppose the columns of 3x3 matrix A are not linearly independent (c 1 , ฮฑ c 1 , c 2 for instance) ๐ฆ 1 | | | ๐ฆ 2 ๐ ๐ ๐ฝ๐ ๐ ๐ ๐ ๐ = ๐ฉ๐ = ๐ฆ 3 | | | ๐ = ๐ฆ 1 ๐ ๐ + ๐ฝ๐ฆ 2 ๐ ๐ + ๐ฆ 3 ๐ ๐ ๐ = ๐ฆ 1 + ๐ฝ๐ฆ 2 ๐ ๐ + ๐ฆ 3 ๐ ๐
Linear Independence Intuition Knobs of x are redundant. Even if y has 3 outputs, you can only control it in two directions ๐ = ๐ฆ 1 + ๐ฝ๐ฆ 2 ๐ ๐ + ๐ฆ 3 ๐ ๐ y 1 y x Ax y 2 x 1 x 2 x 3 y 3
Linear Independence Recall: ๐ฉ๐ = ๐ฆ 1 + ๐ฝ๐ฆ 2 ๐ ๐ + ๐ฆ 3 ๐ ๐ ๐ฆ 1 + ๐พ ๐ฆ 1 + ๐พ + ๐ฝ๐ฆ 2 โ ๐ฝ ๐พ ๐ = ๐ฉ ๐ฆ 2 โ ๐พ/๐ฝ = ๐ฝ ๐ 1 + ๐ฆ 3 ๐ 2 ๐ฆ 3 โข Can write y an infinite number of ways by ๐พ adding ๐พ to x 1 and subtracting ๐ฝ from x 2 โข Or, given a vector y thereโs not a unique vector x s.t. y = Ax โข Not all y have a corresponding x s.t. y=Ax
Linear Independence ๐ฉ๐ = ๐ฆ 1 + ๐ฝ๐ฆ 2 ๐ ๐ + ๐ฆ 3 ๐ ๐ ๐พ ๐พ โ ๐ฝ ๐พ ๐ = ๐ฉ โ๐พ/๐ฝ = ๐ฝ ๐ ๐ + 0๐ ๐ 0 โข What else can we cancel out? โข An infinite number of non-zero vectors x can map to a zero-vector y โข Called the right null-space of A.
Rank โข Rank of a nxn matrix A โ number of linearly independent columns ( or rows ) of A / the dimension of the span of the columns โข Matrices with full rank (n x n, rank n) behave nicely: can be inverted, span the full output space, are one-to-one. โข Matrices with full rank are machines where every knob is useful and every output state can be made by the machine
Inverses โข Given ๐ = ๐ฉ๐ , y is a linear combination of columns of A proportional to x . If A is full-rank, we should be able to invert this mapping. โข Given some y (output) and A , what x (inputs) produced it? โข x = A -1 y โข Note: if you donโt need to compute it, never ever compute it. Solving for x is much faster and stable than obtaining A -1 .
Symmetric Matrices โข Symmetric: ๐ฉ ๐ผ = ๐ฉ or ๐ 11 ๐ 12 ๐ 13 ๐ฉ ๐๐ = ๐ฉ ๐๐ ๐ 21 ๐ 22 ๐ 23 โข Have lots of special ๐ 31 ๐ 32 ๐ 33 properties Any matrix of the form ๐ฉ = ๐ ๐ผ ๐ is symmetric. ๐ผ ๐ฉ ๐ผ = ๐ ๐ผ ๐ Quick check: ๐ฉ ๐ผ = ๐ ๐ผ ๐ ๐ผ ๐ผ ๐ฉ ๐ผ = ๐ ๐ผ ๐
Special Matrices โ Rotations ๐ ๐ ๐ 11 12 13 ๐ ๐ ๐ 21 22 23 ๐ ๐ ๐ 31 32 33 โข Rotation matrices ๐บ rotate vectors and do not change vector L2 norms ( ๐บ๐ 2 = ๐ 2 ) โข Every row/column is unit norm โข Every row is linearly independent โข Transpose is inverse ๐บ๐บ ๐ผ = ๐บ ๐ผ ๐บ = ๐ฑ โข Determinant is 1 (otherwise itโs also a coordinate flip/reflection), eigenvalues are 1
Eigensystems โข An eigenvector ๐ ๐ and eigenvalue ๐ ๐ of a matrix ๐ฉ satisfy ๐ฉ๐ ๐ = ๐ ๐ ๐ ๐ ( ๐ฉ๐ ๐ is scaled by ๐ ๐ ) โข Vectors and values are always paired and typically you assume ๐ ๐ 2 = 1 โข Biggest eigenvalue of A gives bounds on how much ๐ ๐ = ๐ฉ๐ stretches a vector x . โข Hints of what people really mean: โข โLargest eigenvectorโ = vector w/ largest value โข Spectral just means thereโs eigenvectors
Suppose I have points in a grid
Now I apply f( x ) = Ax to these points Pointy-end: Ax . Non-Pointy-End: x
๐ฉ = 1.1 0 0 1.1 Red box โ unit square, Blue box โ after f( x ) = Ax . What are the yellow lines and why?
๐ฉ = 0.8 0 0 1.25 Now I apply f( x ) = Ax to these points Pointy-end: Ax . Non-Pointy-End: x
๐ฉ = 0.8 0 0 1.25 Red box โ unit square, Blue box โ after f( x ) = Ax . What are the yellow lines and why?
๐ฉ = cos(๐ข) โsin(๐ข) sin(๐ข) cos(๐ข) Red box โ unit square, Blue box โ after f( x ) = Ax . Can we draw any yellow lines?
Eigenvectors of Symmetric Matrices โข Always n mutually orthogonal eigenvectors with n (not necessarily) distinct eigenvalues โข For symmetric ๐ฉ , the eigenvector with the ๐ ๐ผ ๐ฉ๐ largest eigenvalue maximizes ๐ ๐ผ ๐ (smallest/min) โข So for unit vectors (where ๐ ๐ผ ๐ = 1 ), that eigenvector maximizes ๐ ๐ผ ๐ฉ๐ โข A surprisingly large number of optimization problems rely on (max/min)imizing this
The Singular Value Decomposition Can always write a mxn matrix A as: ๐ฉ = ๐ฝ๐ป๐พ ๐ผ 0 ฯ 1 A = U โ ฯ 2 ฯ 3 Scale 0 Rotation Eigenvectors Sqrt of of AA T Eigenvalues of A T A
The Singular Value Decomposition Can always write a mxn matrix A as: ๐ฉ = ๐ฝ๐ป๐พ ๐ผ V T A = U โ Rotation Scale Rotation Eigenvectors Sqrt of Eigenvectors of AA T Eigenvalues of A T A of A T A
Singular Value Decomposition โข Every matrix is a rotation, scaling, and rotation โข Number of non-zero singular values = rank / number of linearly independent vectors โข โClosestโ matrix to A with a lower rank 0 ฯ 1 V T ฯ 2 = A U ฯ 3 0
Singular Value Decomposition โข Every matrix is a rotation, scaling, and rotation โข Number of non-zero singular values = rank / number of linearly independent vectors โข โClosestโ matrix to A with a lower rank 0 ฯ 1 V T ฯ 2 = ร U 0 0
Singular Value Decomposition โข Every matrix is a rotation, scaling, and rotation โข Number of non-zero singular values = rank / number of linearly independent vectors โข โClosestโ matrix to A with a lower rank โข Secretly behind basically many things you do with matrices
Solving Least-Squares Start with two points (x i ,y i ) (x 2 ,y 2 ) ๐ = ๐ฉ๐ ๐ง 1 ๐ง 2 = ๐ฆ 1 1 ๐ ๐ฆ 2 1 ๐ (x 1 ,y 1 ) ๐ง 1 ๐ง 2 = ๐๐ฆ 1 + ๐ ๐๐ฆ 2 + ๐ We know how to solve this โ invert A and find v (i.e., (m,b) that fits points)
Solving Least-Squares Start with two points (x i ,y i ) (x 2 ,y 2 ) ๐ = ๐ฉ๐ ๐ง 1 ๐ง 2 = ๐ฆ 1 1 ๐ ๐ฆ 2 1 ๐ (x 1 ,y 1 ) 2 ๐ง 1 ๐ง 2 โ ๐๐ฆ 1 + ๐ ๐ โ ๐ฉ๐ 2 = ๐๐ฆ 2 + ๐ 2 + ๐ง 2 โ ๐๐ฆ 2 + ๐ 2 = ๐ง 1 โ ๐๐ฆ 1 + ๐ The sum of squared differences between the actual value of y and what the model says y should be.
Solving Least-Squares Suppose there are n > 2 points ๐ = ๐ฉ๐ ๐ง 1 ๐ฆ 1 1 ๐ โฎ โฎ โฎ = ๐ ๐ง ๐ ๐ฆ ๐ 1 Compute ๐ง โ ๐ต๐ฆ 2 again ๐ ๐ โ ๐ฉ๐ 2 = เท ๐ง ๐ โ (๐๐ฆ ๐ + ๐) 2 ๐=1
Solving Least-Squares Given y , A , and v with y = Av overdetermined ( A tall / more equations than unknowns) We want to minimize ๐ โ ๐ฉ๐ ๐ , or find: arg min ๐ ๐ โ ๐ฉ๐ 2 (The value of x that makes the expression smallest) Solution satisfies ๐ฉ ๐ผ ๐ฉ ๐ โ = ๐ฉ ๐ผ ๐ or โ1 ๐ฉ ๐ผ ๐ ๐ โ = ๐ฉ ๐ผ ๐ฉ (Donโt actually compute the inverse!)
When is Least-Squares Possible? Given y , A , and v . Want y = Av Want n outputs, have n knobs y = A v to fiddle with, every knob is useful if A is full rank. A: rows (outputs) > columns = (knobs). Thus canโt get precise v y A output you want (not enough knobs). So settle for โclosestโ knob setting.
When is Least-Squares Possible? Given y , A , and v . Want y = Av Want n outputs, have n knobs y = A v to fiddle with, every knob is useful if A is full rank. A: columns (knobs) > rows y = A (outputs). Thus, any output can v be expressed in infinite ways.
Homogeneous Least-Squares Given a set of unit vectors (aka directions) ๐ ๐ , โฆ , ๐ ๐ and I want vector ๐ that is as orthogonal to all the ๐ ๐ as possible (for some definition of orthogonal) Stack ๐ ๐ into A , compute Av ๐ ๐ โฆ ๐ ๐ 0 if ๐ผ ๐ผ ๐ โ ๐ ๐ โ ๐ ๐ ๐ ๐ orthog ๐ฉ๐ = ๐ = โฎ โฎ ๐ผ ๐ผ ๐ โ ๐ ๐ โ ๐ ๐ ๐ ๐ ๐ ๐ฉ๐ ๐ = เท ๐ผ ๐ Compute ๐ ๐ ๐ Sum of how orthog. v is to each x
Homogeneous Least-Squares โข A lot of times, given a matrix A we want to find the v that minimizes ๐ฉ๐ 2 . โข I.e., want ๐ฐ โ = arg min 2 ๐ฉ๐ 2 ๐ โข Whatโs a trivial solution? โข Set v = 0 โ Av = 0 โข Exclude this by forcing v to have unit norm
Homogeneous Least-Squares 2 Letโs look at ๐ฉ๐ 2 2 = Rewrite as dot product ๐ฉ๐ 2 2 = ๐๐ฐ T (๐๐ฐ) Distribute transpose ๐ฉ๐ 2 2 = ๐ ๐ผ ๐ฉ ๐ผ ๐๐ฐ = ๐ฐ ๐ ๐ ๐ ๐ ๐ฐ ๐ฉ๐ 2 We want the vector minimizing this quadratic form Where have we seen this?
Recommend
More recommend