lgebra Linear e Aplicaes VECTOR SPACES Avoid rediscovering the - PowerPoint PPT Presentation

Proof • Let ✓ P 1 ◆ ✓ U r ◆ P − 1 = � Q 1 � P = U = Q 2 PA = U P 2 0 P − 1 P = I ⇒ Q 1 P 1 + Q 2 P 2 = I ⇒ Q 1 P 1 = I − Q 2 P 2 • Proof ( ⇒ ) ✓ P 1 ◆ ✓ P 1 A ◆ ✓ U r ◆ PA = A = = U ⇒ P 2 A = 0 = P 2 P 2 A 0 • Proof ( ⇐ ) y T A = 0 ⇒ y T = u T P 2 � ✓ U r ◆ y T A = 0 ⇒ y T P − 1 U = 0 ⇒ y T � Q 1 = 0 ⇒ y T Q 1 U r = 0 Q 2 0 ⇒ y T Q 1 P 1 = 0 ⇒ y T ( I − Q 2 P 2 ) = 0 ⇒ y T Q 1 = 0 ⇒ y T = y T Q 2 P 2 ⇒ y T = ( y T Q 2 ) P 2

Example • Using Gauss-Jordan • From which     1 / 3   1 2 2 3 1 0 0   N ( A T ) = span − 5 / 3 2 4 1 3 0 1 0     1   3 6 1 4 0 0 1 • Find E A   − 1 / 3 2 / 3 1 2 2 3 0 2 / 3 − 1 / 3 0 0 1 1 0   1 / 3 − 5 / 3 0 0 0 0 1 • So   − 1 / 3 2 / 3 0 2 / 3 − 1 / 3 P = 0   1 / 3 − 5 / 3 1

Additional insights • We have shown that N ( A T ) = R ( P T 2 ) • It turns out that R ( A ) = N ( P 2 ) • Proof R ( A ) ⊆ N ( P 2 ) y = Ax ⇒ P 2 y = 0 P 2 y = P 2 Ax = ( P 2 A ) x = 0x = 0 R ( A ) ⊇ N ( P 2 ) P 2 y = 0 ⇒ ∃ x | Ax = y P ( A | y ) = ( PA | Py ) ✓ P 1 A ◆ ✓ U r ◆ ✓ P 1 y ◆ ✓ P 1 y ◆ PA = Py = = = P 2 A 0 P 2 y 0 ✓ U r ◆ P 1 y P ( A | y ) = 0 0

Equal Nullspaces #1 • We already know how to test for equality in range spaces: row and column equivalence • How do we test for Nullspace equality? • Use equivalence again • For two matrices A and B of the same shape row N ( A ) = N ( B ) ⇔ A ∼ B • Similarly, N ( A T ) = N ( B T ) ⇔ A col ∼ B

Equal Nullspaces #2 • Let’s prove one of them N ( A T ) = N ( B T ) ⇔ A col ∼ B • Proof ( ⇒ ) N ( A T ) = N ( B T ) ⇒ R ( A ) = R ( B ) ⇔ A col ∼ B y T A = 0 ⇔ y T B = 0 ⇒ z = Ax 1 ⇔ z = Bx 2 ✓ P 1 A ◆ P 1 Bx 2 ( A | Bx 2 ) → ( PA | PBx 2 ) → P 2 A P 2 Bx 2 ✓ P 1 A ◆ ✓ P 1 A ◆ P 1 Bx 2 P 1 Bx 2 → → 0 P 2 Bx 2 0 0

Equal Nullspaces #3 • Let’s prove one of them N ( A T ) = N ( B T ) ⇔ A col ∼ B • Proof ( ⇐ ) A = BQ P 2 A = 0 ⇒ P 2 BQ = 0 ⇒ P 2 B = 0 ⇒ N ( A T ) ⊆ N ( B T ) • Conversely, N ( B T ) ⊆ N ( A T ) (replace A and B in proof)

Summary #1 • The four fundamental subspaces associated to a matrix A m × n are • The range or column space R ( A ) = { Ax } ⊆ R m • The row-space or left-hand range R ( A T ) = { yA } ⊆ R n • The nullspace N ( A ) = { x | Ax = 0 } ⊆ R n • The left-hand nullspace N ( A T ) = { y | yA = 0 } ⊆ R m

Summary #2 • Let P be a nonsingular matrix such that PA = U , where U is in echelon form and let rank ( A ) = r • Spanning sets for • R ( A ) : Basic columns of A • R ( A T ) : Non-zero rows in U (transposed) • N ( A ) : The h i in the general solution of Ax = 0 • N ( A T ) : The last m – r rows in P (transposed)

Summary #3 • If A and B are matrices of the same shape ∼ B ⇔ R ( A ) = R ( B ) ⇔ N ( A T ) = N ( B T ) col A ∼ B ⇔ N ( A ) = N ( B ) ⇔ R ( A T ) = R ( B T ) row A

LINEAR INDEPENDENCE, BASIS, AND DIMENSION

Linear independence • Matrix dimensions give an incomplete picture of the true size of a linear system • The important number is the rank • Number of pivots • Number of non-zero rows in echelon form • Better interpretation • Number of genuinely independent rows in matrix • Other rows are redundant

Formally • Take a set of vectors S = { v 1 , v 2 , . . . , v r } • Look at linear combinations α 1 v 1 + α 2 v 2 + · · · + α r v r • Vectors v i are linearly independent (l.i.) iff the only linear combination that produces 0 is trivial α i = 0 • Otherwise they are linearly dependent (l.d.) • One of them is a linear combination of the others

Easy to visualize in R 3 • 2 vectors are dependent if they lie on a line • 3 vectors are dependent if they lie on a plane • Or line • 4 vectors are always dependent • 3 random vectors should be independent

Example • Determine if the set of • I.e., non-trivial solution vectors is l.i. to the homogeneous linear system         1 1 5   S = 2 0 6  ,  ,           1 1 5 0 α 1 1 2 7    = 2 0 6 0 α 2      • Look for a non-trivial 1 2 7 0 α 3 solution to • From Gauss-Jordan   1 0 3         1 1 5 0 E A = 0 1 2  + α 2  + α 3  = 2 0 6 0   α 1      0 0 0 1 2 7 0 • So, they are l.d. and e.g. α 1 = − 3 α 2 = − 2 α 3 = 1

Linear independence and Matrices • Let A be an m × n matrix. • These are equivalent to saying the columns of A form a linearly independent set • N ( A ) = { 0 } rank ( A ) = n • These are equivalent to saying the rows of A form a linearly independent set • N ( A T ) = { 0 } rank ( A ) = m • If A is square, these are equivalent to saying matrix A is non-singular • Columns of A form a linearly independent set • Rows of A form a linearly independent set

Diagonal dominance • An n × n matrix A = [ a ij ] is diagonally dominant whenever n X | a kk | > | a kj | , k ∈ { 1 , 2 , . . . , n } j =1 j 6 = k • I.e., diagonal elements are larger in magnitude than the sum of magnitudes of other row elements • These matrices appear frequently in practical applications. Two important properties are • They are never singular • Don’t need to use partial pivoting

Diagonal dominance • Diagonally dominant matrices are non-singular • Proof by contradiction • Assume there is a non-zero vector in N ( A ) • Find a contradiction • Let Ax = 0 , and let x k be the entry of largest magnitude in x n n X X [ Ax ] k = 0 = ⇒ a kk x k = − a kj x j a kj x j j =1 j =1 j 6 = k n n n n � � X X X X ⇒ | a kk || x k | = | a kj || x j | ≤ | x k | | a kj | ⇒ | a kk | ≤ | a kj | � ≤ � � a kj x j � j =1 j =1 j =1 j =1 j 6 = k j 6 = k j 6 = k j 6 = k

Polynomial interpolation • Given a set of m points S = { ( x 1 , y 1 ) , . . . , ( x m , y m ) } where x i are distinct, there is a unique polynomial ⇥ ( t ) = � 0 + � 1 t + · · · + � m − 1 t m − 1 of degree m – 1 that goes through each point in S 1 + · · · � m − 1 x m − 1 � 0 + � 1 x 1 + � 2 x 2 = ⇥ ( x 1 ) = y 1 1 2 + · · · � m − 1 x m − 1 � 0 + � 1 x 2 + � 2 x 2 = ⇥ ( x 2 ) = y 2 2 . . . � 0 + � 1 x m + � 2 x 2 m + · · · � m − 1 x m − 1 = ⇥ ( x m ) = y m m

Polynomial interpolation • Same as saying the following system has a unique solution for any right-hand side y i x m − 1  x 2      1 x 1 α 0 y 1 · · · 1 1 x m − 1 x 2 1 x 2 α 1 y 2  · · ·      2 2  =       . . . . . . . . . . . .       . . . . . .      x 2 x m − 1 1 α m − 1 y m x m · · · m m • Matrix is non-singular whenever x i are distinct? • Such matrices are called Vandermonde Matrices

Vandermonde Matrices • Vandermonde matrices have independent columns whenever n ≤ m x n − 1      x 2  0 1 α 0 x 1 · · · 1 1 x n − 1 x 2 0 1 α 1 x 2      · · ·  2 2  =       . . . . . . . . . . . .       . . . . . .      x 2 x n − 1 0 1 α n − 1 x m · · · m m • Proof i + · · · + α n − 1 x n − 1 p ( x i ) = α 0 + α 1 x i + α 2 x 2 = 0 i • So p ( x ) has m distinct roots and degree n – 1 ? • Fundamental theorem of of algebra implies α j = 0

Lagrange interpolator • In particular, when n = m we have that x m − 1  x 2      1 x 1 α 0 y 1 · · · 1 1 x m − 1 x 2 1 α 1 y 2 x 2 · · ·       2 2  =       . . . . . . . . . . . .       . . . . . .      x 2 x m − 1 1 α m − 1 y m x m · · · m m has a unique solution • The solution is the Lagrange interpolator Q m ! m j 6 = i ( t − x j ) X � ( t ) = y i Q m j 6 = i ( x i − x j ) i =1

Example of interpolation H L H L 8 25 6 20 15 4 10 2 5 x x 2 4 6 8 2 4 6 8 P 4 H x L P 5 H x L 6 6 4 4 2 2 x x 2 4 6 8 2 4 6 8 - 2 - 2 - 4 - 4 6 6

Maximal independent subsets #1 • We know that if rank ( A m × n ) < n then the columns of A must be a dependent set • In such cases, we often want to extract the maximal independent subset of columns • An l.i. set with as many columns of A as possible • Such columns are sufficient to span R ( A )

Maximal independent subsets #2 • If rank ( A m × n ) = r , then the following hold • Any maximal independent subset of the columns of A contain exactly r columns • Any maximal independent subset of rows from A contain exactly r rows • In particular, the r basic columns in A constitute a maximal independent subset of the columns of A

Maximal independent subsets #3 • Any maximal independent subset of the columns of A contain exactly r columns • Proof • Every column of matrix A can be written as a linear combination of the basic columns in A • Pick k > r columns of A and show they are l.d. 0 1 α 1 α 2 B C � A ∗ s 1 � A ∗ s 2 A ∗ s k A = 0 B C · · · . B C . . 0 1 0 1 @ β 11 β 12 β 1 k α 1 · · · α k β 21 β 22 β 2 k α 2 · · · B C B C � A ∗ b 1 � A ∗ b 2 A ∗ b r A = 0 B C B C · · · . . . . ... B C B C . . . . . . . . @ A @ β r 1 β r 2 β rk α k · · ·

Basic facts about Independence • The following hold about a set of vectors in V S = { u 1 , u 2 , . . . , u n } • If S contains an l.d. subset, then S itself must be l.d. • If S is l.i., then every subset of S is also l.i. • If S is l.i. and , then is l.i. iff S ∪ { v } v ∈ V v ⇥� span ( S ) • Proof ( ) ⇐ α 1 u 1 + α 2 u 2 + · · · + α n u n + α n +1 v = 0 ⇒ α n +1 = 0 ⇒ α 1 u 1 + α 2 u 2 + · · · + α n u n = 0 ⇒ α 1 = α 2 = · · · = α n = 0 • If and n > m , then S is l.d. S ⊆ R m

BASIS AND DIMENSION

Bases • A basis for a vector space V is a set S that • Spans V • Is linearly independent • Spanning sets can contain redundant vectors • Bases, on the other hand, contain only necessary and sufficient information • Every vector space V has a basis • General proof depends on the axiom of choice • Bases are not unique

Examples • The unit vectors in R n are a S = { e 1 , e 2 , . . . , e n } basis for R n . The standard or canonic basis of R n • If A is an n × n non-singular matrix, then the set of rows and the set of columns of A each constitute a basis of R n • What about ? Z = { 0 } • The set is a basis for all S = { 1 , x, x 2 , . . . , x n } polynomials of degree n or less • What about the vector space of all polynomials?

Characterizations of a Basis • With V a subspace of R m and B = { b 1 , b 2 , . . . , b n } the following are equivalent • (1) B is a basis for V • (2) B is a minimal spanning set for V • (3) B is a maximal l.i. subset of V

Proof #1 • (basis) ⇒ (minimal spanning set) • Assume B is a basis and X is a smaller spanning set 0 1 α 11 α 12 α 1 n · · · α 21 α 22 α 2 n · · · B C � b 1 � � x 1 � k < n b 2 b n x 2 x k B C = · · · · · · . . . ... B C . . . . . . @ A B = XA α k 1 α k 2 α kn · · · rank ( A ) ≤ k < n � ⇤ y ⇥ = 0 | Ay = 0 ⇒ By = 0 ⇒ B is l.d. (contradiction) • (minimal spanning set) ⇒ (basis) • A minimal spanning set must be l.i. • Otherwise remove an l.d. vector and reduce size • So it wasn’t minimal (contradiction)

Proof #2 • (maximal l.i. set) ⇒ (basis) • If a maximal l.i. set B of V is not a basis for V then there is v � V | v ⇥� span ( B ) • So is l.i. and B is not maximal (contradiction) B ∪ { v } • (basis) ⇒ (maximal l.i. set) • If basis B is not maximal l.i., then take a larger set Y that is maximal l.i. • We know Y is a basis • But a basis is minimal and B is smaller • So B must also be maximal

Dimension • We have just proven that, although there are many bases for V , each of them has the same number of vectors • The dimension of a space V , dim V , is the number of vectors in • Any basis of V • Any minimal spanning set for V • Any maximal independent set for V

Examples , then dim Z = 0 • Z = { 0 } • The basis is the empty set • L is a line through the origin in R 3 , dim L = 1 • Any non-zero vector along L forms a basis for L • P is a plane through the origin in R 3 , dim P = 2 • How would we find a basis? • dim R n = n • The canonic vectors form a basis

Further insights • Dimension measures the “amount of stuff” in a subspace • Point < Line < Plane < R 3 • Also measures the number of degrees of freedom in the subspace • Z : no freedom, Line: 1 degree, Plane: 2 etc • Do not confuse with number of components in a vector! Related, but not equal!

Subspace dimension • Let M and N be vectors spaces and M ⊆ N • (1) dim M ≤ dim N • (2) If dim M = dim N , then M = N • Proof • (1) Assume dim M > dim N • Basis of M (all l.i. elements of N ) would have more vectors than the maximum independent set of N • (2) Assume M ⊂ N • Augment basis of M with v ∈ N \ M • Independent set with more than dim N vectors!

Four Fundamental Subspaces: Dimension • For an m × n matrix A with rank ( A ) = r • dim R ( A ) = r • dim N ( A ) = n – r • dim R ( A T ) = r • dim N ( A T ) = m – r

Rank Plus Nullity Theorem • For all m × n matrices A dim R ( A ) + dim N ( A ) = n • As the “amount of stuff” in R ( A ) grows, the “amount of stuff” in N ( A ) shrinks • ( dim N ( A ) was traditionally known as nullity)

Completing a Basis • If is an l.i. subset of an S r = { v 1 , v 2 , . . . , v r } n -dimensional space V , where r < n, show how to extend S r with so that { v r +1 , . . . , v n } S n = { v 1 , . . . , v r , v r +1 , . . . , v n } forms a basis for V • Solution • Create a matrix A with S r as columns • Augment to by the identity matrix ( A | I ) • Reduce to echelon form to find basic columns • Return the n basic columns of ( A | I )

Example • Take two l.i. vectors in R 4 and augment to a complete basis for R 4       1 0     0 0       S 2 =     − 1  , 1        2 − 2   • Solution     1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 − 1 / 2 0 1 1 0 0     ( A | I ) = E ( A | I ) =     − 1 1 0 0 1 0 0 0 0 1 0 0     2 − 2 0 0 0 1 1 / 2 0 0 0 0 1           1 0 0 0     0 0 1 0           S 4 =          ,  ,  , − 1 1 0 1          2 − 2 0 0  

Graphs • A graph G is defined by is a pair ( V , E ), where V is a set of vertices , and E a set of edges • Each edge connects two vertices • So E ⊆ V × V v 1 e 1 = ( v 2 , v 1 ) e 1 e 2 e 5 e 2 = ( v 1 , v 4 ) e 4 v 2 v 4 . . e 3 e 6 . v 3

Incidence Matrices • For a graph G with m vertices and n edges • Associate an m × n matrix E such that  1 , e j = ( ∗ , v i )   [ E ] ij = − 1 , e j = ( v i , ∗ )  0 , otherwise  v 1 e 1 e 2 e 5 e 1 e 2 e 3 e 4 e 5 e 6 e 4   1 − 1 0 0 − 1 0 v 1 v 2 v 4 − 1 0 − 1 1 0 0 v 2   E =   0 0 1 0 1 1 v 3 e 3 e 6   0 1 0 − 1 0 − 1 v 4 v 3

Rank and Connectivity • Each edge is associated to two vertices • Each column contains two entries ( 1 , and – 1 ) • All columns add up to zero • In other words, if e T = (1 1 … 1) , e ∈ N ( E T ) then e T E = 0 and therefore • So rank ( E ) = rank ( E T ) = m − dim N ( E T ) ≤ m − 1 • Equality holds iff the graph is connected! • I.e., when there is a sequence of edges connecting any pair of vertices

Proof of Rank and Connectivity #1 • Proof ( ⇒ ) • Assume G is connected, prove dim N ( E T ) = 1 • I.e., prove e = (1 1 … 1) T spans N ( E T ) • Let and take any x i and x k from x x ∈ N ( E T ) • There is a path from v i to v k • Take the subset of vertices visited along the way { v j 1 = v i , v j 2 , . . . , v j r = v k } • There is an edge q linking and v j p v j p +1

Proof of Rank and Connectivity #2 • There is an edge q linking and v j p v j p +1 • So column q in E is - 1 at row j p and 1 at row j p + 1 x T E = 0 x T E ∗ q = 0 = x j p +1 − x j p • But and so • Since this is true for all p , it turns out x i = x k • But i and k were arbitrary • And so finally we reach x = α e • So dim N ( E T ) = 1 • Which leads to rank ( E ) = m – 1

Proof of Rank and Connectivity #3 • Proof ( ⇐ ) • If the graph is not connected, we can partition it into two disconnected subgraphs G 1 and G 2 • Reorder vertices so vertices/edges in G 1 appear before vertices/edges of G 2 in E . ✓ E 1 ◆ 0 E = 0 E 2 • Now compute the rank ✓ E 1 ◆ 0 rank ( E ) = rank 0 E 2 ≤ ( m 1 − 1) + ( m 2 − 1) = rank ( E 1 ) + rank ( E 2 ) = m − 2

Application of Rank and Connectivity • Nodes 1 : I 1 − I 2 − I 5 = 0 2 : − I 1 − I 3 + I 4 = 0 3 : I 3 + I 5 + I 6 = 0 4 : I 2 − I 4 − I 6 = 0 • Loops A : I 1 R 1 − I 3 R 3 + I 5 R 5 = E 1 − E 3 B : I 2 R 2 − I 5 R 5 + I 6 R 6 = E 2 C : I 3 R 3 + I 4 R 4 − I 6 R 6 = E 3 + E 4

Rank of a product • Equivalent matrices have the same rank • Recall the rank normal form • Multiplication by invertible matrices preserves rank • Multiplication by rectangular or singular matrices can reduce the rank • If A is m × n and B is n × p then rank ( AB ) = rank ( B ) − dim N ( A ) ∩ R ( B )

Proof #1 • Start with a basis for N ( A ) ∩ R ( B ) S = { x 1 , x 2 , . . . , x s } • Augment to form a basis for R ( B ) S ext = { x 1 , . . . , x s , z 1 , . . . , z t } • Let us prove that dim R ( AB ) = t , so that rank ( AB ) = rank ( B ) − dim N ( A ) ∩ R ( B ) • Sufficient to prove that T = { Az 1 , . . . , Az t } is a basis for R ( AB )

Proof #2 • T spans R ( AB ) b ∈ R ( AB ) ⇒ b = ABy X X By ∈ R ( B ) ⇒ By = ξ i x i + η i z i � X � X X � � b = A ξ i x i + A η i z i η i Az i = • T is l.i. X � X � α i Az i = 0 ⇒ A α i z i = 0 X X X α i z i ∈ N ( A ) ∩ R ( B ) α i z i = β i x i ⇒ ⇒ X X α i z i − β i x i = 0 ⇒ α i = β i = 0 ⇒

Small perturbations can’t reduce rank • We already know that we can’t increase rank by means of matrix product rank ( AB ) ≤ rank ( B ) • We now show it is impossible to reduce rank by adding a matrix that is “small enough” rank ( A + E ) ≥ rank ( A ) • “Small” in a sense that will be clarified later, but for now here is some intuition

Proof • Suppose rank ( A ) = r and let P and Q reduce A to rank normal form • Apply P and Q to A + E ✓ I r ◆ ✓ E 11 ◆ ✓ I r + E 11 ◆ 0 E 12 E 12 PAQ = P ( A + E ) Q = PEQ = 0 0 E 21 E 22 E 21 E 22 • But I r + E 11 is invertible. Keep eliminating ✓ I r + E 11 ◆ 0 P 2 P ( A + E ) QQ 2 = 0 S • From which rank ( A + E ) = rank ( A ) + rank ( S ) ≥ rank ( A )

Pitfall solving singular systems • Due to floating-point precision, we do not really solve Ax = b • We solve some perturbed system ( A + E ) x = b • If A is non-singular, so is A + E and we are fine • If A is singular, A + E may have higher rank! • All we need is for rank ( S ) > 0! • But S = E 22 − E 21 ( I + E 11 ) − 1 E 12 • So fewer free variables than actual system • Significant loss of information

Products A T A and AA T • For A in R m × n , the following statements hold • rank ( A T A ) = rank ( A ) = rank ( AA T ) • R ( A T A ) = R ( A T ) and R ( AA T ) = R ( A ) and N ( AA T ) = N ( A T ) • N ( A T A ) = N ( A ) • For A in C m × n , replace transposition by conjugate transpose operation

Proof #1 • rank ( A T A ) = rank ( A ) • We know that rank ( A T A ) = rank ( A ) − dim N ( A T ) ∩ R ( A ) • So prove N ( A T ) ∩ R ( A ) = { 0 } x ∈ N ( A T ) ∩ R ( A ) ⇒ A T x = 0 x = Ay ⇒ A T Ay = 0 ⇒ y T A T Ay = 0 X ⇒ x T x = 0 x 2 i = 0 ⇒ x = 0 ⇒

Proof #2 • R ( A T A ) = R ( A T ) R ( BC ) ⊆ R ( B ) ⇒ R ( A T A ) ⊆ R ( A T ) dim R ( A T A ) = rank ( A T A ) = rank ( A ) = rank ( A T ) = dim R ( A T ) • N ( A T A ) = N ( A ) N ( B ) ⊆ N ( CB ) ⇒ N ( A ) ⊆ N ( A T A ) dim N ( A ) = n − rank ( A ) = n − rank ( A T A ) = dim N ( A T A )

Application for A T A • Consider an m × n system Ax = b that may or may not be consistent • Multiply on the left by A T to reach A T Ax = A T b • This is known as the associated system of normal equations • It has many nice properties

Application for A T A • A T Ax = A T b is always consistent! A T b ∈ R ( A T ) = R ( A T A ) • If Ax = b is consistent, then both systems have the same solution set • Take a particular solution p for Ax = b • If Ap = b , then A T Ap = A T b • General solution is p + N ( A ) = p + N ( A T A ) • If Ax = b has a unique solution, then it is • x = ( A T A ) -1 A T b N ( A ) = 0 = N ( A T A ) (warning: A may not even be square, so not invertible!)

Normal equations • For an m × n system Ax = b , the associated system of normal equations is the m × n system A T Ax = A T b • A T Ax = A T b is always consistent, even when Ax = b is not • When both are consistent, the solution sets agree • Otherwise, A T Ax = A T b gives the least-squares solution to Ax = b • When Ax = b is consistent and has a unique solution, so does A T Ax = A T b and the solution x = ( A T A ) -1 A T b

LEAST SQUARES

Motivating problem • Assume we observe a phenomenon that varies with time and record observations � D = ( t 1 , b 1 ) , ( t 2 , b 2 ) , . . . , ( t m , b m ) • Want to be able to infer the value of an observation at an arbitrary point in time t ) = ˆ f (ˆ b • Assume we have a sensible model for f e.g. f ( t ) = α + β t • Find “good” values for and given D β α

Proposed solution • Want to find “best” f ( t ) = α + β t • Find values for and β α that minimize m m � 2 X X ε 2 � i = f ( t i ) − b i i =1 i =1 • Turns out this reduces to a linear problem • Let us express in vector form and generalize

Changing to vector form • In our example, define     b 1 1 t 1 1 b 2 t 2 ✓ α ◆     ε = Ax − b A = b =     . . x = . β . .     . . . .     1 b m t m • Then and [ ε ] i = α + β t i − b i = ε i m i = ε T ε = ( Ax − b ) T ( Ax − b ) X ε 2 i =1 = x T A T Ax − x T A T b − b T Ax + b T b = x T A T Ax − 2 x T A T b + b T b

The minimization problem • Our goal is to find where the scalar arg min ε ( x ) function x ε ( x ) = x T A T Ax − 2 x T A T b + b T b • From calculus, at the minimum r ε ( x ) = 0 i = ∂ε ( x ) ⇥ ⇤ r ε ( x ) ∂ x i • Both Ax and x T A T can be seen as matrix functions of each x i • We can use our rules for differentiation of matrix functions

Finding the minimum • Differentiating ε ( x ) = x T A T Ax − 2 x T A T b + b T b w.r.t. each component in x we get T T i = ∂ε ( x ) = ∂ x A T Ax + x T A T A ∂ x − 2 ∂ x A T b ⇥ ⇤ r ε ( x ) ∂ x i ∂ x i ∂ x i ∂ x i ∂ x • Since and since e T i A T = A T ⇤ ⇥ = e i i ∗ ∂ x i i A T Ax + x T A T Ae i � 2 e T i A T b i A T Ax − 2 e T i A T b i = e T = 2 e T ⇥ ⇤ r ε ( x ) A T Ax A T b ⇥ ⇤ ⇥ ⇤ = 2 i ∗ − 2 i ∗ • Equating to zero and grouping all rows A T Ax = A T b

Is there a favorite solution? • Calculus tells us that the minimum of ε ( x ) can only happen at some solution of the normal equations A T Ax = A T b • Are all solutions equally good? ε ( x ) = x T A T Ax − 2 x T A T b + b T b • Take any two solutions z 1 and z 2 = z 1 + u ε ( z 1 ) = b T b − z T 1 A T b ε ( z 2 ) = ε ( z 1 + u ) = ε ( z 1 ) + u T A T Au = ε ( z 1 ) • Same argument proves no other vector can produce a lower value for ε ( x )

General Least Squares • For A in R m × n and b in R m , let ε = Ax − b • The general least squares problem is to find a vector x that minimizes the quantity m i = ε T ε = ( Ax − b ) T ( Ax − b ) X ε 2 i =1 • Any such vector is a least-squares solution • The solution set is the same as of A T Ax = A T b • Unique only iff rank ( A ) = n , in which case x = ( A T A ) -1 A T b • If Ax = b is consistent, solution sets are the same

Example of Linear Regression • Predict amount of weight that a pint of ice- cream loses when stored at low temperatures • Assume a linear model for phenomenon y = α 0 + α 1 t 1 + α 2 t 2 + ε t 1 (time), t 2 (temperature), (random noise) ε • Assume random noise “averages out” • Use measurements to find least-squares solution for parameters in E ( t 1 , t 2 ) = α 0 + α 1 t 1 + α 2 t 2

Result of experiments • Assume the following measurements Time (weeks) 1 1 1 2 2 2 3 3 3 Temp ( o C) -10 -5 0 -10 -5 0 -10 -5 0 Loss (grams) 0.15 0.18 0.2 0.17 0.19 0.22 0.2 0.23 0.25 • In vector form, we get   0 . 15   1 1 − 10 0 . 18 1 1 − 5       0 . 2   1 1 0         0 . 17   α 0 1 2 − 10       b = 0 . 19   α 1 A = x = 1 2 − 5         0 . 22   α 2 1 2 0       0 . 2   1 3 − 10       0 . 23   1 3 − 5     0 . 25 1 3 − 0

lgebra Linear e Aplicaes VECTOR SPACES Avoid rediscovering the - PowerPoint PPT Presentation

lgebra Linear e Aplicaes VECTOR SPACES Avoid rediscovering the wheel Many mathematical objects that seem to have nothing in common with matrices do in fact share very similar properties Points in plane, in 3-space, polynomials,

Math 221: LINEAR ALGEBRA Chapter 6. Vector Spaces 6-3. Vector Spaces - Linear Independence Le

Beyond Fields: Vector Spaces and Algebras Bernd Schr oder logo1 Bernd Schr oder

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

lgebra Linear e Aplicaes LINEAR EQUATIONS Start with a few examples One way of

Math 221: LINEAR ALGEBRA Chapter 6. Vector Spaces 6-2. Vector Spaces - Examples and Basic

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

Speckers Proof of Infinity in NF Sergei Tupailo Centro de Matem atica e Aplica c oes

Linear Algebra Review Leila Wehbe January 29, 2013 Leila Wehbe Linear Algebra Review Metrics

Vector Spaces Linear Independence, Bases and Dimension Marco Chiarandini Department of

Matrix Calculations: Vector Spaces and Linear Maps H. Geuvers (and A. Kissinger) Institute for

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

lgebra Linear e Aplicaes EIGENVALUES AND EIGENVECTORS Motivation #1 We have so far been

Motivation We do not cover all the math Just the common basics (yellow triangle) IVA Modeling

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

lgebra Linear e Aplicaes DISCRETE FOURIER TRANSFORM (AND THE FFT) Motivation Consider

Linear Algebra Chapter 3. Vector Spaces Section 3.2. Basic Concepts of Vector SpacesProofs of

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Question answering CS685 Fall 2020 Advanced Natural Language Processing Mohit Iyyer College of

Towards a Characterization of the Double Category of Spans Evangelia Aleiferi Dalhousie

Using Kieker with Elastic APM: An Experience Report Valentin Seifermann Duan Okanovi SSP

No BMW (1916) No Sliced Bread (1919) The Republic of Ireland Germany Didnt

The interplay of symplectic and projective geometry in the context of plane wave spacetimes

A Finslerian notion of causal structure Omid Makhmali IMPAN, Warsaw February 21, 2019 IEMath,

Morphology and genesis of long-tailed tropospheric tracer anomaly distributions Benjamin R.