learning optimal linear regularizers
play

Learning Optimal Linear Regularizers Matthew Streeter Setup - PowerPoint PPT Presentation

Learning Optimal Linear Regularizers Matthew Streeter Setup Want to produce a model Will minimize training loss + regularizer: L train () + R() Ultimately, we care about test loss: L test () Setup Want to produce a


  1. Learning Optimal Linear Regularizers Matthew Streeter

  2. Setup ● Want to produce a model θ ● Will minimize training loss + regularizer: L train (θ) + R(θ) ● Ultimately, we care about test loss: L test (θ)

  3. Setup ● Want to produce a model θ ● Will minimize training loss + regularizer: L train (θ) + R(θ) ● Ultimately, we care about test loss: L test (θ) ● An optimal regularizer: R(θ) = L test (θ) - L train (θ) ○ suggests that a good regularizer should upper bound the generalization gap

  4. What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R )

  5. What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R )

  6. What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R )

  7. What makes a good regularizer? ● Want to find regularizer R that minimizes L test (θ R ) Approximate by maximizing over small set of models (estimating test loss using validation set)

  8. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ)

  9. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide)

  10. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide) Solves a sequence of linear programs

  11. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide) Solves a sequence of linear programs Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models

  12. Learning linear regularizers ● Linear regularizer: R(θ) = λ * feature_vector(θ) ● LearnReg : given models with known training & validation loss, finds best λ (in terms of approximation on previous slide) Solves a sequence of linear programs Under certain assumptions, can “jump” to optimal λ given data from just 1 + |λ| models ● TuneReg: uses LearnReg iteratively to do hyperparameter tuning

  13. Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers

  14. Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers

  15. Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers LearnReg kicks in here

  16. Hyperparameter tuning experiment ● Inception-v3 transfer learning problem, linear combination of 4 regularizers LearnReg kicks in here

Recommend


More recommend