exponentially weighted aggregation laplace prior for
play

Exponentially weighted aggregation Laplace prior for linear - PowerPoint PPT Presentation

Introduction: prediction in high dimension Penalization and Lasso Exponentially weighted average Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin Grappin & Quentin Paris edwin.grappin@ensae.fr


  1. Introduction: prediction in high dimension Penalization and Lasso Exponentially weighted average Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin Grappin & Quentin Paris edwin.grappin@ensae.fr JPS - Les Houches - 2016 Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  2. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Goals & settings We observe n labels ( Y i ) i ∈{ 1 ,..., n } and there is a linear relation between the label and the p features ( X i , j ) j ∈{ 1 ,..., p } such that: Y = X β ⋆ + ξ, where Y ∈ R n , X ∈ R n × p , β ⋆ ∈ R p and ξ ∈ R n a random variable such that ξ i is N ( 0 , σ 2 ) . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  3. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Goals & settings We observe n labels ( Y i ) i ∈{ 1 ,..., n } and there is a linear relation between the label and the p features ( X i , j ) j ∈{ 1 ,..., p } such that: Y = X β ⋆ + ξ, where Y ∈ R n , X ∈ R n × p , β ⋆ ∈ R p and ξ ∈ R n a random variable such that ξ i is N ( 0 , σ 2 ) . Our interests are: Low prediction loss : � X ( β ⋆ − ˆ 2 (fitting β ⋆ is less β ) � 2 important), Good quality when p is large ( p >> n ), Efficient use of sparsity property of β ⋆ ( β ⋆ is s -sparse if at most s elements are non null). Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  4. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  5. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  6. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: There is not a unique solution when p > n , Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  7. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: There is not a unique solution when p > n , Does not detect meaningful features among all features, Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  8. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: There is not a unique solution when p > n , Does not detect meaningful features among all features, Performance is focus on fitting the data not predicting labels. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  9. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Penalized regression In our case, a good estimator has the following properties: Guarantees on prediction results, Use sparsity assumption to manage p > n , Computationnaly fast (of paramount importance when p is large). Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  10. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Penalized regression In our case, a good estimator has the following properties: Guarantees on prediction results, Use sparsity assumption to manage p > n , Computationnaly fast (of paramount importance when p is large). Penalized regression is a method that combines the usual fitting term with a penalty term : � � ˆ � Y − X β � 2 β pen = arg min 2 + λ P ( β ) , β ∈ R p P is the penalty function and λ ≥ 0 controls the trade off between the two terms. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  11. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Subset selection with a ℓ 0 penalization An intuitive candidate would be a penalization based on ℓ 0 pseudo-norm (the sparsity level): p � � β � 0 = ✶ β i � = 0 . i = 1 � � ˆ � Y − X β � 2 2 + λ � β � 0 β ℓ 0 = arg min . β ∈ R p Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  12. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Subset selection with a ℓ 0 penalization An intuitive candidate would be a penalization based on ℓ 0 pseudo-norm (the sparsity level): p � � β � 0 = ✶ β i � = 0 . i = 1 � � ˆ � Y − X β � 2 2 + λ � β � 0 β ℓ 0 = arg min . β ∈ R p The penalty forces many elements of ˆ β to be null. It chooses the most important features. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  13. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Subset selection with a ℓ 0 penalization An intuitive candidate would be a penalization based on ℓ 0 pseudo-norm (the sparsity level): p � � β � 0 = ✶ β i � = 0 . i = 1 � � ˆ � Y − X β � 2 2 + λ � β � 0 β ℓ 0 = arg min . β ∈ R p The penalty forces many elements of ˆ β to be null. It chooses the most important features. Due to the ℓ 0 pseudo-norm, the objective function is nonconvex. Hence, computational time grows exponentially with p . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  14. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Choice of the penalization term Let q > 0, we consider the estimators � � ˆ � Y − X β � 2 2 + λ � β � q β q = arg min . q β ∈ R p If q <1 , the solution is sparse but the problem is nonconvex . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  15. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Choice of the penalization term Let q > 0, we consider the estimators � � ˆ � Y − X β � 2 2 + λ � β � q β q = arg min . q β ∈ R p If q > 1 , the problem is convex but the solution is not sparse . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  16. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Choice of the penalization term Let q > 0, we consider the estimators � � ˆ � Y − X β � 2 2 + λ � β � q β q = arg min . q β ∈ R p If q = 1 , the solution is sparse and the problem is convex . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Recommend


More recommend