short course
play

Short Course in Supervised Learning Robust Optimization and - PowerPoint PPT Presentation

Robust Optimization & Machine Learning 6. Robust Optimization Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised Learning Motivations Examples Thresholding and robustness Boolean data


  1. Robust Optimization & Machine Learning 6. Robust Optimization Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised Learning Motivations Examples Thresholding and robustness Boolean data Theory Lecture 6: Preliminaries Robust Optimization in Machine Learning Main results Special cases Globalized robustness Chance constraints References Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012

  2. Robust Optimization & Outline Machine Learning 6. Robust Optimization in Supervised Learning Robust Supervised Learning Robust Supervised Learning Motivations Motivations Examples Thresholding and Examples robustness Thresholding and robustness Boolean data Boolean data Theory Preliminaries Main results Special cases Theory Globalized robustness Preliminaries Chance constraints Main results References Special cases Globalized robustness Chance constraints References

  3. Robust Optimization & Outline Machine Learning 6. Robust Optimization in Supervised Learning Robust Supervised Learning Robust Supervised Learning Motivations Motivations Examples Thresholding and Examples robustness Thresholding and robustness Boolean data Boolean data Theory Preliminaries Main results Special cases Theory Globalized robustness Preliminaries Chance constraints Main results References Special cases Globalized robustness Chance constraints References

  4. Robust Optimization & Supervised learning problems Machine Learning 6. Robust Optimization in Supervised Many supervised learning problems ( e.g. , classification, regression) Learning can be written as L ( X T w ) min Robust Supervised w Learning where L is convex, and X contains the data. Motivations Examples Thresholding and robustness Boolean data Theory Preliminaries Main results Special cases Globalized robustness Chance constraints References

  5. Robust Optimization & Penalty approach Machine Learning 6. Robust Optimization in Supervised Often, optimal value and solutions of optimization problems are Learning sensitive to data. Robust Supervised Learning A common approach to deal with sensitivity is via penalization, e.g. : Motivations Examples L ( X T w ) + � Wx � 2 min ( W = weighting matrix). Thresholding and 2 robustness x Boolean data Theory Preliminaries ◮ How do we choose the penalty? Main results Special cases ◮ Can we choose it in a way that reflects knowledge about problem Globalized robustness structure, or how uncertainty affects data? Chance constraints References ◮ Does it lead to better solutions from machine learning viewpoint?

  6. Robust Optimization & Support Vector Machine Machine Learning 6. Robust Optimization in Supervised Support Vector Machine (SVM) classification problem: Learning m � ( 1 − y i ( z T min i w + b )) + Robust Supervised Learning w , b i = 1 Motivations Examples Thresholding and ◮ Z := [ z 1 , . . . , z m ] ∈ R n × m contains the data points . robustness Boolean data ◮ y ∈ {− 1 , 1 } m contain the labels . Theory Preliminaries ◮ x := ( w , b ) contains the classifier parameters , allowing to Main results Special cases classify a new point z via the rule Globalized robustness Chance constraints y = sgn ( z T w + b ) . References

  7. Robust Optimization & Robustness to data uncertainty Machine Learning 6. Robust Optimization in Supervised Assume the data matrix is only partially known, and address the Learning robust optimization problem: Robust Supervised m Learning � ( 1 − y i (( z i + u i ) T w + b )) + , min w , b max Motivations U ∈U Examples i = 1 Thresholding and robustness where U = [ u 1 , . . . , u m ] and U ⊆ R n × m is a set that describes additive Boolean data Theory uncertainty in the data matrix. Preliminaries Main results Special cases Globalized robustness Chance constraints References

  8. Robust Optimization & Measurement-wise, spherical uncertainty Machine Learning 6. Robust Optimization in Supervised Assume Learning U = { U = [ u 1 , . . . , u m ] ∈ R n × m : � u i � 2 ≤ ρ } , Robust Supervised Learning where ρ > 0 is given. Motivations Examples Thresholding and robustness Boolean data Theory Preliminaries Main results Special cases Globalized robustness Chance constraints Robust SVM reduces to References m � ( 1 − y i ( z T i w + b ) + ρ � w � 2 ) + . min w , b i = 1

  9. Robust Optimization & Link with classical SVM Machine Learning 6. Robust Optimization in Supervised Classical SVM contains l 2 -norm regularization term: Learning m � ( 1 − y i ( z T i w + b )) + + λ � w � 2 min 2 . Robust Supervised Learning w , b i = 1 Motivations Examples where λ > 0 is a penalty parameter. Thresholding and robustness Boolean data Theory Preliminaries Main results Special cases Globalized robustness Chance constraints With spherical uncertainty, robust SVM is similar to classical SVM. References When data is separable, the two models are equivalent . . .

  10. Robust Optimization & Separable data Machine Learning 6. Robust Optimization in Supervised Learning Robust Supervised Learning Motivations Examples Thresholding and robustness Boolean data Theory Preliminaries Main results Special cases Globalized robustness Chance constraints References Maximally robust classifier for separable data, with spherical uncertainties around each data point. In this case, the robust counterpart reduces to the classical maximum-margin classifier problem.

  11. Robust Optimization & Interval uncertainty Machine Learning 6. Robust Optimization in Supervised Assume Learning U = { U ∈ R n × m : ∀ ( i , j ) , | U ij | ≤ ρ } , where ρ > 0 is given. Robust Supervised Learning Motivations Examples Thresholding and robustness Boolean data Theory Preliminaries Main results Special cases Globalized robustness Robust SVM reduces to Chance constraints m References � ( 1 − y i ( z T min i w + b ) + ρ � w � 1 ) + . w , b i = 1 The l 1 -norm term encourages sparsity, and may not regularize the solution.

  12. Robust Optimization & Separable data Machine Learning 6. Robust Optimization in Supervised Learning Robust Supervised Learning Motivations Examples Thresholding and robustness Boolean data Theory Preliminaries Main results Special cases Globalized robustness Chance constraints References Maximally robust classifier for separable data, with box uncertainties around each data point. This uncertainty model encourages sparsity of the solution.

  13. Robust Optimization & Other uncertainty models Machine Learning 6. Robust Optimization in Supervised We may generalize the approach to other uncertainty models, Learning retaining tractability: ◮ “Measurement-wise” uncertainty models: perturbations affect Robust Supervised Learning each data point independent of each other. Motivations Examples ◮ Other models couple the way uncertainties affect each Thresholding and robustness measurement; for example we may control the number of errors Boolean data across all the measurements. Theory Preliminaries ◮ Norm-bound models allow for uncertainty of data matrix that is Main results bounded in matrix norm. Special cases Globalized robustness ◮ A whole theory is presented in [1]. Chance constraints References

  14. Robust Optimization & Thresholding and robustness Machine Learning 6. Robust Optimization in Supervised Consider standard l 1 -penalized SVM: Learning m � ( 1 − y i ( w T x i + b )) + + λ � w � 1 φ λ ( X ) := min Robust Supervised w , b Learning i = 1 Motivations Examples Constrained counterpart: Thresholding and robustness m Boolean data 1 � ( 1 − y i ( x T ψ c ( X ) := min i w + b )) + : � w � 1 ≤ c Theory m w , b Preliminaries i = 1 Main results Special cases Globalized robustness Chance constraints References ◮ Basic goal: solve these problems in the large-scale case. ◮ Approach: use robustness to sparsify the data matrix in a controlled way.

  15. Robust Optimization & Thresholding data Machine Learning 6. Robust Optimization in Supervised We threshold the data using an absolute level t : Learning � 0 if | x i , j | ≤ t ( x i ( t )) j := Robust Supervised 1 otherwise Learning Motivations Examples This will make the data sparser, resulting in memory and time savings. Thresholding and robustness Boolean data Theory Preliminaries Main results Special cases Globalized robustness Chance constraints References

  16. Robust Optimization & Handling thresholding errors Machine Learning 6. Robust Optimization in Supervised Handle thresholding errors via robust counterpart: Learning m � ( 1 − y i ( w T z i + b )) + + λ � w � 1 . ( w ( t ) , b ( t )) := arg min max Robust Supervised Learning w , b � Z − X � ∞ ≤ t i = 1 Motivations Examples Above problem is tractable. Thresholding and robustness Boolean data Theory Preliminaries Main results Special cases Globalized robustness Chance constraints References The solution w ( t ) at threshold level t satisfies m 0 ≤ 1 i w ( t ) + b ( t ))) + + λ � w ( t ) � 1 − φ λ ( X ) ≤ 2 t � ( 1 − y i ( x T λ . m i = 1

Recommend


More recommend