Data Analysis with Geometry Geometry and Distances From data to feature vectors From data to feature vectors The curse of dimensionality From data to feature vectors Technical notation Technical notation Data Analysis with Geometry Geometry and Distances Quick vector algebra review Summary Quick vector algebra review Geometry and Distances Geometry and Distances The importance of transformations Quick vector algebra review Quick vector algebra review Quick vector algebra review Quick vector algebra review Data Analysis with Geometry Geometry and Distances Data Analysis with Geometry Motivating Example: Credit Analysis Quiz K-nearest neighbor classi�cation Inductive bias Scalar Multiplication : vectors can be scaled by real values; Feature scaling is an important issue in distance-based methods. Parameter Here is the same credit data represented as a matrix of feature vectors A common situation: Distance-based methods like KNN can be problematic in high- The vast majority of ML algorithms we see in class treat instances as The norm of a vector , written The vast majority of ML algorithms we see in class treat instances as Now that we think of instances as vectors we can do some interesting Task predict account default The dot product , or inner product of two vectors and is defined as Vector addition can be viewed geometrically as taking a vector , then One may be interested in various things: Observed values will be denoted in lower case. So A (real-valued) vector is just an array of real values, for instance We will represent many ML algorithms geometrically as vectors Vectors will not be bold, for example is a hyper-parameter , it's value may affect prediction is its length. may mean all predictors for means the th Introduction to Data Science: Data K x ∥ x ∥ x i u x i v a i tacking on to the end of it; the new end point is exactly . "feature vectors". dimensional problems "feature vectors". operations. accuracy significantly. observation of the random variable subject , unless it is the vector of a particular predictor Vector math review is a three-dimensional vector. . . x = ⟨ 1, 2.5, −6 ⟩ i b X c x j Unless otherwise specified, this is its Euclidean length, namely: What is the outcome What effects do the covariates an outcome attribute (variable) ? , and have on the outcome ? The assumptions we make about our data that allow us to make Now that we have a distance between instances we can create a Write Euclidean distance of vectors and as a vector norm Analysis with Geometry default student balance income p All vectors are assumed to be column vectors, so the -th row of K-nearest neighbors default student balance income 2 ⟨ 1, 2.5, −6 ⟩ = ⟨ 2, 5, −12 ⟩ Y X i Y Y Which of these two features u v u ′ v = i X We can represent each instance as a vector in Euclidean space Let's try a first one: define a distance between two instances using Question: which situation may lead to overfitting , high or low values of What are the predictors We can represent each instance as a vector in Euclidean space Consider the case where we have many covariates. We want to use - Vector sums are computed pointwise, and are only defined when one or more independent covariate or predictor attributes Matrices are represented with bold face upper case. For example How well can we quantify these effects? ? classifier. Suppose we want to predict the class for an instance . predictions. ∑ u i v i will be The curse of dimensionality , i.e., the transpose of . 1 0 1717.0716 38408.89 In general, No X j No 729.5265 44361.625 k X K will affect distance the most? x ′ x x i Euclidean distance ? Why? nearest neighbor methods. dimensions match, so Can we predict outcome will represent all observed predictors. . . . using covariates ?, etc... j =1 ax = ⟨ ax 1 , ax 2 , … , ax p ⟩ p i Héctor Corrada Bravo ⟨ x 1 , … , x p , y ⟩ ⟨ x 1 , … , x p , y ⟩ X 1 , … , X p Y X i 1 1 1983.2345 25687.93 In KNN, our inductive bias is that points that are nearby will be of the K-nearest neighbors uses the closest points in predictor space predict . x 2 No Yes 817.1804 12106.135 A useful geometric interpretation of the inner product is that it gives ∥ x ∥ = ∑ j Y v ′ u One usually observes these variables for multiple "instances" (or Basically, we need to define distance and look for small multi- every measurement is represented as a continuous value (or ) will usually mean the number of observations, or length of ⎷ same class. -1 1 883.1573 18213.08 j =1 the projection of onto (when ). p ⟨ 1, 2.5, −6 ⟩ + ⟨ 2, −2.5, 3 ⟩ = ⟨ 3, 0, −3 ⟩ No No 1073.5492 31767.139 University of Maryland, College Park, USA N n Y entities). dimensional "balls" around the target points. . will be used to denote which observation and to denote which in particular, categorical variables become numeric (e.g., one-hot v u ∥ u ∥ = 1 ( x 1 j − x 2 j ) 2 1 d ( x 1 , x 2 ) = 2020-04-05 ∑ . ^ 1 0 1975.6530 38221.84 i j Y = ∑ y k . No No 529.2506 35704.494 covariate or predictor. encoding) ⎷ k j =1 With many covariates this becomes difficult. x k ∈ N k ( x ) -1 0 0.0000 32809.33 No No 785.6559 38463.496 In general, if then for all vectors . represents the -nearest points to . How would you use to c = a + b cd = ad + bd d -1 0 528.0893 46389.34 ^ No Yes 919.5885 7491.559 N k ( x ) k x Y make a prediction? 21 / 22 22 / 22 15 / 22 10 / 22 12 / 22 20 / 22 16 / 22 17 / 22 19 / 22 14 / 22 18 / 22 13 / 22 11 / 22 1 / 22 2 / 22 5 / 22 5 / 22 9 / 22 7 / 22 6 / 22 3 / 22 4 / 22 8 / 22
Data Analysis with Geometry A common situation: an outcome attribute (variable) , and Y one or more independent covariate or predictor attributes . X 1 , … , X p One usually observes these variables for multiple "instances" (or entities). 1 / 22
Data Analysis with Geometry One may be interested in various things: What effects do the covariates have on the outcome ? X i Y How well can we quantify these effects? Can we predict outcome using covariates ?, etc... Y X i 2 / 22
Data Analysis with Geometry Motivating Example: Credit Analysis default student balance income No No 729.5265 44361.625 No Yes 817.1804 12106.135 No No 1073.5492 31767.139 No No 529.2506 35704.494 No No 785.6559 38463.496 No Yes 919.5885 7491.559 3 / 22
Data Analysis with Geometry Task predict account default What is the outcome ? Y What are the predictors ? X j 4 / 22
From data to feature vectors The vast majority of ML algorithms we see in class treat instances as "feature vectors". We can represent each instance as a vector in Euclidean space . ⟨ x 1 , … , x p , y ⟩ 5 / 22
From data to feature vectors The vast majority of ML algorithms we see in class treat instances as "feature vectors". We can represent each instance as a vector in Euclidean space . ⟨ x 1 , … , x p , y ⟩ every measurement is represented as a continuous value in particular, categorical variables become numeric (e.g., one-hot encoding) 5 / 22
From data to feature vectors Here is the same credit data represented as a matrix of feature vectors default student balance income 1 0 1717.0716 38408.89 1 1 1983.2345 25687.93 -1 1 883.1573 18213.08 1 0 1975.6530 38221.84 -1 0 0.0000 32809.33 -1 0 528.0893 46389.34 6 / 22
Technical notation Observed values will be denoted in lower case. So means the th x i i observation of the random variable . X Matrices are represented with bold face upper case. For example X will represent all observed predictors. (or ) will usually mean the number of observations, or length of N n Y . will be used to denote which observation and to denote which i j covariate or predictor. 7 / 22
Technical notation Vectors will not be bold, for example may mean all predictors for x i subject , unless it is the vector of a particular predictor . i x j All vectors are assumed to be column vectors, so the -th row of i X will be , i.e., the transpose of . x ′ x i i 8 / 22
Geometry and Distances Now that we think of instances as vectors we can do some interesting operations. Let's try a first one: define a distance between two instances using Euclidean distance p ( x 1 j − x 2 j ) 2 d ( x 1 , x 2 ) = ∑ ⎷ j =1 9 / 22
Geometry and Distances K-nearest neighbor classi�cation Now that we have a distance between instances we can create a classifier. Suppose we want to predict the class for an instance . x K-nearest neighbors uses the closest points in predictor space predict . Y 1 ^ Y = ∑ y k . k x k ∈ N k ( x ) represents the -nearest points to . How would you use to ^ N k ( x ) k x Y make a prediction? 10 / 22
Geometry and Distances 11 / 22
Geometry and Distances Inductive bias The assumptions we make about our data that allow us to make predictions. In KNN, our inductive bias is that points that are nearby will be of the same class. 12 / 22
Geometry and Distances Parameter is a hyper-parameter , it's value may affect prediction K accuracy significantly. Question: which situation may lead to overfitting , high or low values of K ? Why? 13 / 22
The importance of transformations Feature scaling is an important issue in distance-based methods. Which of these two features will affect distance the most? 14 / 22
Recommend
More recommend