machine learning regressions
play

Machine Learning - Regressions Amir H. Payberah payberah@kth.se - PowerPoint PPT Presentation

Machine Learning - Regressions Amir H. Payberah payberah@kth.se 07/11/2018 The Course Web Page https://id2223kth.github.io 1 / 81 Where Are We? 2 / 81 Where Are We? 3 / 81 Lets Start with an Example 4 / 81 The Housing Price Example


  1. Machine Learning - Regressions Amir H. Payberah payberah@kth.se 07/11/2018

  2. The Course Web Page https://id2223kth.github.io 1 / 81

  3. Where Are We? 2 / 81

  4. Where Are We? 3 / 81

  5. Let’s Start with an Example 4 / 81

  6. The Housing Price Example (1/3) ◮ Given the dataset of m houses. Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 . . . . . . . . . ◮ Predict the prices of other houses, as a function of the size of living area and number of bedrooms? 5 / 81

  7. The Housing Price Example (2/3) Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 . . . . . . . . . � 2104 � � 1600 � � 2400 � x ( 1 ) = y ( 1 ) = 400 x ( 2 ) = y ( 2 ) = 330 x ( 3 ) = y ( 3 ) = 369 3 3 3  x ( 1 ) ⊺   2104 3   400  x ( 2 ) ⊺ 1600 3 330         X = = y = x ( 3 ) ⊺ 2400 3 369             . . . .       . . . . . . . . ◮ x ( i ) ∈ R 2 : x ( i ) is the living area, and x ( i ) is the number of bedrooms of the i th 1 2 house in the training set. 6 / 81

  8. The Housing Price Example (3/3) Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 . . . . . . . . . ◮ Predict the prices of other houses ^ y as a function of the size of their living areas x 1 , and number of bedrooms x 2 , i.e., ^ y = f ( x 1 , x 2 ) ◮ E.g., what is ^ y , if x 1 = 4000 and x 2 = 4 ? ◮ As an initial choice: ^ y = f w ( x ) = w 1 x 1 + w 2 x 2 7 / 81

  9. Linear Regression 8 / 81

  10. Linear Regression (1/2) ◮ Our goal: to build a system that takes input x ∈ R n and predicts output ^ y ∈ R . ◮ In linear regression, the output ^ y is a linear function of the input x . y = f w ( x ) = w 1 x 1 + w 2 x 2 + · · · + w n x n ^ y = w ⊺ x ^ • ^ y : the predicted value • n : the number of features • x i : the i th feature value • w j : the j th model parameter ( w ∈ R n ) 9 / 81

  11. Linear Regression (2/2) ◮ Linear regression often has one additional parameter, called intercept b : y = w ⊺ x + b ^ ◮ Instead of adding the bias parameter b , we can augment x with an extra entry that is always set to 1. y = f w ( x ) = w 0 x 0 + w 1 x 1 + w 2 x 2 + · · · + w n x n , where x 0 = 1 ^ 10 / 81

  12. Linear Regression - Model Parameters ◮ Parameters w ∈ R n are values that control the behavior of the model. ◮ w are a set of weights that determine how each feature affects the prediction. • w i > 0 : increasing the value of the feature x i , increases the value of our prediction ^ y . • w i < 0 : increasing the value of the feature x i , decreases the value of our prediction ^ y . • w i = 0 : the value of the feature x i , has no effect on the prediction ^ y . 11 / 81

  13. How to Learn Model Parameters w ? 12 / 81

  14. Linear Regression - Cost Function (1/2) ◮ One reasonable model should make ^ y close to y , at least for the training dataset. ◮ Residual: the difference between the dependent variable y and the predicted value ^ y . r ( i ) = y ( i ) − ^ y ( i ) 13 / 81

  15. Linear Regression - Cost Function (2/2) ◮ Cost function J ( w ) y ( i ) is to the corresponding y ( i ) . • For each value of the w , it measures how close the ^ • We can define J ( w ) as the mean squared error (MSE): m J ( w ) = MSE ( w ) = 1 y ( i ) − y ( i ) ) 2 � ( ^ m i y − y ) 2 ] = 1 y − y || 2 = E [( ^ m || ^ 2 14 / 81

  16. How to Learn Model Parameters? ◮ We want to choose w so as to minimize J ( w ). ◮ Two approaches to find w : • Normal equation • Gradient descent 15 / 81

  17. Normal Equation 16 / 81

  18. Derivatives and Gradient (1/3) ◮ The first derivative of f ( x ), shown as f ′ ( x ), shows the slope of the tangent line to the function at the poa x . ◮ f ( x ) = x 2 ⇒ f ′ ( x ) = 2x ◮ If f(x) is increasing, then f ′ ( x ) > 0 ◮ If f(x) is decreasing, then f ′ ( x ) < 0 ◮ If f(x) is at local minimum/maximum, then f ′ ( x ) = 0 17 / 81

  19. Derivatives and Gradient (2/3) ◮ What if a function has multiple arguments, e.g., f ( x 1 , x 2 , · · · , x n ) ◮ Partial derivatives: the derivative with respect to a particular argument. ∂ f ∂ x 1 , the derivative with respect to x 1 • ∂ f ∂ x 2 , the derivative with respect to x 2 • ∂ f ∂ x i : shows how much the function f will change, if we change x i . ◮ ◮ Gradient: the vector of all partial derivatives for a function f .  ∂ f  ∂ x 1 ∂ f   ∂ x 2 ∇ x f ( x ) =   . .   .   ∂ f ∂ x n 18 / 81

  20. Derivatives and Gradient (3/3) ◮ What is the gradient of f ( x 1 , x 2 , x 3 ) = x 1 − x 1 x 2 + x 2 3 ?   ∂ ∂ x 1 ( x 1 − x 1 x 2 + x 2 3 )  1 − x 2  ∂ x 2 ( x 1 − x 1 x 2 + x 2 ∂ ∇ x f ( x ) = 3 )  = − x 1      ∂ ∂ x 3 ( x 1 − x 1 x 2 + x 2 3 ) 2x 3 19 / 81

  21. Normal Equation (1/2) ◮ To minimize J ( w ), we can simply solve for where its gradient is 0: ∇ w J ( w ) = 0 y = w ⊺ x ^ [ x ( 1 ) 1 , x ( 1 ) 2 , · · · , x ( 1 )   x ( 1 ) ⊺ y ( 1 ) n ]     ^ [ x ( 2 ) 1 , x ( 2 ) 2 , · · · , x ( 2 ) x ( 2 ) ⊺ y ( 2 ) ^  n ]        X = = y = ^ .  .   .    . . .     . . .         x ( m ) ⊺ y ( m ) [ x ( m ) 1 , x ( m ) 2 , · · · , x ( m ) ^ n ] y = w ⊺ X ⊺ or ^ y = Xw ^ 20 / 81

  22. Normal Equation (2/2) ◮ To minimize J ( w ), we can simply solve for where its gradient is 0: ∇ w J ( w ) = 0 J ( w ) = 1 y − y || 2 m || ^ 2 , ∇ w J ( w ) = 0 1 y − y || 2 ⇒ ∇ w m || ^ 2 = 0 1 m || Xw − y || 2 ⇒ ∇ w 2 = 0 ⇒ ∇ w ( Xw − y ) ⊺ ( Xw − y ) = 0 ⇒ ∇ w ( w ⊺ X ⊺ Xw − 2 w ⊺ X ⊺ y + y ⊺ y ) = 0 ⇒ 2 X ⊺ Xw − 2 X ⊺ y = 0 ⇒ w = ( X ⊺ X ) − 1 X ⊺ y 21 / 81

  23. Normal Equation - Example (1/7) Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 1416 2 232 3000 4 540 ◮ Predict the value of ^ y , when x 1 = 4000 and x 2 = 4 . ◮ We should find w 0 , w 1 , and w 2 in ^ y = w 0 + w 1 x 1 + w 2 x 2 . ◮ w = ( X ⊺ X ) − 1 X ⊺ y . 22 / 81

  24. Normal Equation - Example (2/7) Living area No. of bedrooms Price 2104 3 400 1600 3 330 2400 3 369 1416 2 232 3000 4 540 1 2104 3 400     1 1600 3 330     X = 1 2400 3 y = 369         1 1416 2 232     1 3000 4 540 import breeze.linalg._ val X = new DenseMatrix(5, 3, Array(1.0, 1.0, 1.0, 1.0, 1.0, 2104.0, 1600.0, 2400.0, 1416.0, 3000.0, 3.0, 3.0, 3.0, 2.0, 4.0)) val y = new DenseVector(Array(400.0, 330.0, 369.0, 232.0, 540.0)) 23 / 81

  25. Normal Equation - Example (3/7) 1 2104 3       1 1 1 1 1 1 1600 3 5 10520 15   X ⊺ X =  = 2104 1600 2400 1416 3000 1 2400 3 10520 23751872 33144         3 3 3 2 4 1 1416 2 15 33144 47  1 3000 4 val Xt = X.t val XtX = Xt * X 24 / 81

  26. Normal Equation - Example (4/7)   4 . 90366455e + 00 7 . 48766737e − 04 − 2 . 09302326e + 00 ( X ⊺ X ) − 1 = 7 . 48766737e − 04 2 . 75281889e − 06 − 2 . 18023256e − 03   − 2 . 09302326e + 00 − 2 . 18023256e − 03 2 . 22674419e + 00 val XtXInv = inv(XtX) 25 / 81

  27. Normal Equation - Example (5/7) 400       1 1 1 1 1 330 1871   X ⊺ y =  = 2104 1600 2400 1416 3000  369  4203712       3 3 3 2 4 232 5921  540 val Xty = Xt * y 26 / 81

  28. Normal Equation - Example (6/7)  4 . 90366455e + 00 − 2 . 09302326e + 00   1871  7 . 48766737e − 04 w = ( X ⊺ X ) − 1 X ⊺ y = 7 . 48766737e − 04 2 . 75281889e − 06 − 2 . 18023256e − 03 4203712     − 2 . 09302326e + 00 2 . 22674419e + 00 5921 − 2 . 18023256e − 03   − 7 . 04346018 e + 01 = 6 . 38433756 e − 02   1 . 03436047 e + 02 val w = XtXInv * Xty 27 / 81

  29. Normal Equation - Example (7/7) ◮ Predict the value of y , when x 1 = 4000 and x 2 = 4 . y = − 7 . 04346018e + 01 + 6 . 38433756e − 02 × 4000 + 1 . 03436047e + 02 × 4 ≈ 599 ^ val test = new DenseVector(Array(1.0, 4000.0, 4.0)) val yHat = w * test 28 / 81

  30. Normal Equation in Spark case class house(x1: Long, x2: Long, y: Long) val trainData = Seq(house(2104, 3, 400), house(1600, 3, 330), house(2400, 3, 369), house(1416, 2, 232), house(3000, 4, 540)).toDF val testData = Seq(house(4000, 4, 0)).toDF import org.apache.spark.ml.feature.VectorAssembler val va = new VectorAssembler().setInputCols(Array("x1", "x2")).setOutputCol("features") val train = va.transform(trainData) val test = va.transform(testData) import org.apache.spark.ml.regression.LinearRegression val lr = new LinearRegression().setFeaturesCol("features").setLabelCol("y").setSolver("normal") val lrModel = lr.fit(train) lrModel.transform(test).show 29 / 81

Recommend


More recommend