Where to put a facility? Given locations p 1 , . . . , p m in R n of m houses, want to choose a location c in R n for the fire station. Want c to be as close as possible to all the house. We know how to measure distance between a proposed location c and a point. But different houses have different ideas about where to put the firehouse—how to combine their preferences into a single location? ◮ Choose the point that minimizes the average station-to-house distance. Same as minimizing the sum of station-to-house distances. This could be really bad for houses that are outside of the town center. ◮ Choose the point that minimizes the maximum station-to-house distance.This could be really bad for most of the houses! ◮ Choose the point that minimizes sum of squared station-to-house distances � p 1 − c � 2 + � p 2 − c � 2 + � p 3 − c � 2 + · · · + � p m − c � 2 This is a sort of compromise—like average but if some house is very far away the squared distance is very large. (These three different measures are called L 1 , L ∞ , L 2 .)
Putting a facility in the location that minimizes sum of squared distances Given locations p 1 , . . . , p m in R n of m houses, want to choose a location c in R n for the fire station so as to minimize sum of squared distances � p 1 − c � 2 + � p 2 − c � 2 + · · · + � p m − c � 2 ◮ Question: How to find this location? ◮ Answer: c = 1 m ( p 1 + p 2 + · · · + p m ) Called the centroid of p 1 , . . . , p m . It is the average for vectors. In fact, for i = 1 , . . . , n , entry i of the centroid is the average of entry i of all the points. p satisfies the equation m ¯ p = � i p i . Centroid ¯ Therefore � i ( p i − ¯ p ) equals the zero vector.
Proving that the centroid minimizes the sum of squared distances Let q be any point. We show that the sum of squared q -to-datapoint distances is at least the p -to-datapoint distances. sum of squared ¯ For i = 1 , . . . , m , � p i − q � 2 p − q � 2 = � p i − ¯ p + ¯ � p i − ¯ p + ¯ p − q , p i − ¯ p + ¯ p − q � = = � p i − ¯ p , p i − ¯ p � + � p i − ¯ p , ¯ p − q � + � ¯ p − q , p i − ¯ p � + � ¯ p − q , ¯ p − q � p � 2 + � p i − ¯ p − q � 2 � p i − ¯ p , ¯ p − q � + � ¯ p − q , p i − ¯ p � + � ¯ = Summing over i = 1 , . . . , m , � � p i − q � 2 i p � 2 + � � � � p − q � 2 � p i − ¯ � p i − ¯ p , ¯ p − q � + � ¯ p − q , p i − ¯ p � + � ¯ = i i i i �� � � � p � 2 + � � � � p i − ¯ ( p i − ¯ p ) , ¯ p − q p − q , ( p i − ¯ p ) p − q � 2 = + ¯ + � ¯ i i i i � �
Proving that the centroid minimizes the sum of squared distances Let q be any point. We show that the sum of squared q -to-datapoint distances is at least the p -to-datapoint distances. sum of squared ¯ Summing over i = 1 , . . . , m , � � p i − q � 2 i p � 2 + � � � � p − q � 2 � p i − ¯ � p i − ¯ p , ¯ p − q � + p − q , p i − ¯ p � + = � ¯ � ¯ i i i i �� � � � p � 2 + � � � p − q � 2 � p i − ¯ ( p i − ¯ p ) , ¯ p − q p − q , ¯ ( p i − ¯ p ) � ¯ = + + i i i i p � 2 + � 0 , ¯ � � p − q � 2 = � p i − ¯ p − q � + � ¯ p − q , 0 � + � ¯ i i p � 2 + 0 + 0 + � � p − q � 2 � p i − ¯ = � ¯ i i p -to-datapoint distances + squared ¯ p -to- q distance = sum of squared ¯
k -means clustering using Lloyd’s Algorithm k -means clustering: Given data points (vectors) p 1 , . . . , p m in R n , select k centers c 1 , . . . , c k so as to minimize the sum of squared distances of data points to the nearest centers. That is, define the function f ( x , [ c 1 , . . . , c k ]) = min {� x − c i � 2 : i ∈ { 1 , . . . , k }} . This function returns the squared distance from x to whichever of c 1 , . . . , c k is nearest. The goal of k -means clustering is to select points c 1 , . . . , c k so as to minimize f ( p 1 , [ c 1 , . . . , c k ]) + f ( p 2 , [ c 1 , . . . , c k ]) + · · · + f ( p m , [ c 1 , . . . , c k ]) The purpose is to partition the data points into k groups (called clusters ).
k -means clustering using Lloyd’s Algorithm Select k centers to minimize sum of squared distances of data points to nearest centers. Combines two ideas: 1. Assign each data point to the nearest center. 2. Choose the centers so as to be close to data points. Suggests an algorithm. Start with k centers somewhere, perhaps randomly chosen. Then repeatedly perform the following steps: 1. Assign each data point to nearest center. 2. Move each center to be as close as possible to the data points assigned to it This means let the new location of the center be the centroid of the assigned points.
Orthogonalization [9] Orthogonalization
Finding the closest point in a plane Goal: Given a point b and a plane, find the point in the plane closest to b .
Finding the closest point in a plane Goal: Given a point b and a plane, find the point in the plane closest to b . By translation, we can assume the plane includes the origin. The plane is a vector space V . Let { v 1 , v 2 } be a basis for V . Goal: Given a point b , find the point in Span { v 1 , v 2 } closest to b . Example: v 1 = [8 , − 2 , 2] and v 2 = [4 , 2 , 4] b = [5 , − 5 , 2] point in plane closest to b : [6 , − 3 , 0].
Closest-point problem in higher dimensions Goal: An algorithm that, given a vector b and vectors v 1 , . . . , v n , finds the vector in Span { v 1 , . . . , v n } that is closest to b . Special case: We can use the algorithm to determine whether b lies in Span { v 1 , . . . , v n } : If the vector in Span { v 1 , . . . , v n } closest to b is b itself then clearly b is in the span; if not, then v 1 b is not in the span. Let A = v n · · · . Using the linear-combinations interpretation of matrix-vector multiplication, a vector in Span { v 1 , . . . , v n } can be written A x . Thus testing if b is in Span { v 1 , . . . , v n } is equivalent to testing if the equation A x = b has a solution. More generally: Even if A x = b has no solution, we can use the algorithm to find the point in { A x : x ∈ R n } closest to b . Moreover: We hope to extend the algorithm to also find the best solution x .
High-dimensional projection onto/orthogonal to For any vector b and any vector a , define vectors b || a and b ⊥ a so that b = b || a + b ⊥ a and there is a scalar σ ∈ R such that b || a = σ a and b ⊥ a is orthogonal to a Definition: For a vector b and a vector space V , we define the projection of b onto V (written b ||V ) and the projection of b orthogonal to V (written b ⊥V ) so that b = b ||V + b ⊥V and b ||V is in V , and b ⊥V is orthogonal to every vector in V . b projection onto V projection orthogonal to V b || V b ⊥ V + b =
High-Dimensional Fire Engine Lemma Definition: For a vector b and a vector space V , we define the projection of b onto V (written b ||V ) and the projection of b orthogonal to V (written b ⊥V ) so that b = b ||V + b ⊥V and b ||V is in V , and b ⊥V is orthogonal to every vector in V . One-dimensional Fire Engine Lemma: The point in Span { a } closest to b is b || a and the distance is � b ⊥ a � . High-Dimensional Fire Engine Lemma: The point in a vector space V closest to b is b ||V and the distance is � b ⊥V � .
Finding the projection of b orthogonal to Span { a 1 , . . . , a n } High-Dimensional Fire Engine Lemma: Let b be a vector and let V be a vector space. The vector in V closest to b is b ||V . The distance is � b ⊥V � . Suppose V is specified by generators v 1 , . . . , v n Goal: An algorithm for computing b ||V in this case. ◮ input: vector b , vectors v 1 , . . . , v n ◮ output: projection of b onto Span { v 1 , . . . , v n } We already know how to solve this when n = 1: def project_along(b, v): return (0 if v.is_almost_zero() else (b*v)/(v*v))*v Let’s try to generalize....
project onto(b, vlist) def project_along(b, v): return (0 if v.is_almost_zero() else (b*v)/(v*v))*v ⇓ def project_onto(b, vlist): return sum([project_along(b, v) for v in vlist]) Reviews are in.... “Short, elegant, .... and flawed” “Beautiful—if only it worked!” “A tragic failure.”
Failure of project onto Try it out on vector b and vlist = [ v 1 , v 2 ] in R 2 , so v = Span { v 1 , v 2 } . v 1 In this case, b is in Span { v 1 , v 2 } , so b ||V = b . The algorithm tells us to find the projection of b along v 1 and the projection of b along v 2 . b The sum of these projections should be equal to b || ... but it’s not. v 2
Failure of project onto Try it out on vector b and vlist = [ v 1 , v 2 ] in R 2 , so v = Span { v 1 , v 2 } . v 1 In this case, b is in Span { v 1 , v 2 } , so b ||V = b . The algorithm tells us to find the projection of b along v 1 and the projection of b along v 2 . b The sum of these projections should be equal to b || ... but it’s not. v 2
Failure of project onto Try it out on vector b and vlist = [ v 1 , v 2 ] in R 2 , so v = Span { v 1 , v 2 } . v 1 In this case, b is in Span { v 1 , v 2 } , so b ||V = b . The algorithm tells us to find the projection of b along v 1 and the projection of b along v 2 . b The sum of these projections should be equal to b || ... but it’s not. v 2
Recommend
More recommend