. . . . . . . . . . . . . . . . . Lecture 08: Ridge Regression, Equivalent Formulations and KKT Conditions Instructor: Prof. Ganesh Ramakrishnan February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . 1 / 9
. . . . . . . . . . . . . . . . . Recap: Duality and KKT conditions For the previously mentioned formulation of the problem, KKT w primal February 4, 2016 . . . . . . . . . . . . . 2 / 9 . . . . . . . . . . conditions for all difgerentiable functions (i.e. f , g i , h j ) with ˆ optimal and ( ˆ λ, ˆ µ ) dual optimal point are: i =1 ˆ ∇ f ( ˆ w ) + ∑ m λ i ∇ g i ( ˆ w ) + ∑ p j =1 ˆ µ j ∇ h j ( ˆ w ) = 0 g i ( ˆ w ) ≤ 0; 1 ≤ i ≤ m ˆ λ i ≥ 0; 1 ≤ i ≤ m ˆ λ i g i ( ˆ w ) = 0; 1 ≤ i ≤ m h j ( ˆ w ) = 0; 1 ≤ j ≤ p
. . . . . . . . . . . . . . . . Solving we get, From the second KKT condition we get, From the third KKT condition, From the fourth condition February 4, 2016 . . . . . . . . . . . . . . . . . . . . . 3 / 9 . . . Bound on λ in the regularized least square solution To minimize the error function subject to constraint | w | ≤ ξ , we apply KKT conditions at the point of optimality w ∗ ∇ w ∗ ( f ( w ) + λ g ( w )) = 0 (the fjrst KKT condition). Here, f ( w ) = ( φ w − Y ) T ( φ w − Y ) and, g ( w ) = ∥ w ∥ 2 − ξ . w ∗ = ( φ T φ + λ I ) − 1 φ T y ∥ w ∗ ∥ 2 ≤ ξ λ ≥ 0 λ ∥ w ∗ ∥ 2 = λξ
. . . . . . . . . . . . . . . . . optimal solution. Consider, Using the triangle inequality we obtain, February 4, 2016 . . . . . . . . . . . . . . . . . . 4 / 9 . . . . . Bound on λ in the regularized least square solution Values of w ∗ and λ that satisfy all these equations would yield an ( φ T φ + λ I ) − 1 φ T y = w ∗ We multiply ( φ T φ + λ I ) on both sides and obtain, ∥ ( φ T φ ) w ∗ + ( λ I ) w ∗ ∥ = ∥ φ T y ∥ ∥ ( φ T φ ) w ∗ ∥ + ( λ ) ∥ w ∗ ∥ ≥ ∥ ( φ T φ ) w ∗ + ( λ I ) w ∗ ∥ = ∥ φ T y ∥
. . . . . . . . . . . . . . . . . in the previous equation, i.e. February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . 5 / 9 . Bound on λ in the regularized least square solution ∥ ( φ T φ ) w ∗ ∥ ≤ α ∥ w ∗ ∥ for some α for fjnite | ( φ T φ ) w ∗ ∥ . Substituting ( α + λ ) ∥ w ∗ ∥ ≥ ∥ φ T y ∥ λ ≥ ∥ φ T y ∥ ∥ w ∗ ∥ − α Note that when ∥ w ∗ ∥ → 0 , λ → ∞ . (Any intuition?) Using ∥ w ∗ ∥ 2 ≤ ξ we get, λ ≥ ∥ φ T y ∥ √ ξ − α This is not the exact solution of λ but the bound proves the existence of λ for some ξ and φ .
. . . . . . . . . . . . . . . . Alternative objective function earlier: This is equivalent to solving often referred to as Ridge regression . February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . . 6 / 9 Substituting g ( w ) = ∥ w ∥ 2 − ξ , in the fjrst KKT equation considered ∇ w ∗ ( f ( w ) + λ · ( ∥ w ∥ 2 − ξ )) = 0 min ( ∥ Φ w − y ∥ 2 + λ ∥ w ∥ 2 ) for the same choice of λ . This form of regularized regression is
. . . . . . . . . . . . . . . . Regression so far Linear Regression: Ridge Regression: to reduce overfjtting on the training examples (we penalize model complexity) February 4, 2016 . . . . . . . . . . . . . . . . . . . . 7 / 9 . . . . ▶ y i = w ⊤ ϕ ( x i ) + b + ϵ i , where: y i ∈ R , and ϵ i is the error term i =1 ( y i − w ⊤ ϕ ( x i ) − b ) 2 ∑ n ▶ Objective: min w , b i =1 ( y i − w ⊤ ϕ ( x i ) − b ) 2 + λ ∥ w ∥ 2 ∑ n ▶ min w , b ▶ Here, regularization is applied on the linear regression objective
. . . . . . . . . . . . . . . . . Closed-form solutions to regression Linear regression and Ridge regression both have closed-form solutions February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . 8 / 9 ▶ For linear regression, w ∗ = ( ϕ ⊤ ϕ ) − 1 ϕ ⊤ y ▶ For ridge regression, w ∗ = ( ϕ ⊤ ϕ + λ I ) − 1 ϕ ⊤ y (for linear regression, λ = 0 )
. . . . . . . . . . . . . . . Claim: Error obtained on training data after minimizing ridge regression regression Goal: Do well on unseen (test) data as well. Therefore, high training error might be acceptable if test error can be lower February 4, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 9 ≥ error obtained on training data after minimizing linear
Recommend
More recommend