Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo 2 RIKEN AIP 3 University of Michigan Jul. 9 th - 12 th @ COLT 2020 Han Bao 1,2 Clayton Scott 3 Masashi Sugiyama 2,1
Adversarial Attacks 2 Adding inperceptible small noise can fool classifiers! [Goodfellow+ 2015] original data perturbed data Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In ICLR , 2015.
Penalize Vulnerable Prediction 3 : -ball should be penalized prediction too close to boundary robust 0-1 loss usual 0-1 loss no penalty no penalty penalized! no penalty Robust Classification Usual Classification ℓ 01 ( x , y , f ) = { ℓ γ ( x , y , f ) = { 1 if yf ( x ) ≤ 0 1 if ∃Δ ∈ 2 ( γ ) . yf ( x + Δ ) ≤ 0 0 otherwise 0 otherwise 2 ( γ ) = { x ∈ ℝ d ∣ ∥ x ∥ 2 ≤ γ } γ
In Case of Linear Predictors 4 no penalty penalized! robust 0-1 loss linear predictors ℱ lin = { x ↦ θ ⊤ x ∣ ∥ θ ∥ 2 = 1} margin = θ ⊤ x x θ ⊤ x > γ θ ⊤ x ≤ γ ℓ γ ( x , y , f ) = { = 1 { yf ( x ) ≤ γ } := ϕ γ ( yf ( x )) 1 if ∃Δ ∈ 2 ( γ ) . yf ( x + Δ ) ≤ 0 0 otherwise
Formulation of Classification 5 are not easy to optimize! & non-robust wrong correct wrong correct minimize 0-1 risk minimize -robust 0-1 risk Robust Classification Usual Classification (restricted to linear predictors) γ R ϕ γ ( f ) = 𝔽 [ ϕ γ ( Yf ( X )) ] R ϕ 01 ( f ) = 𝔽 [ ϕ 01 ( Yf ( X )) ] robust 0-1 loss ϕ γ ( α ) = 1 { α ≤ γ } 0-1 loss ϕ 01 ( α ) = 1 { α ≤ 0} ϕ 01 ϕ γ ϕ 01 ϕ γ
What surrogate is desirable? final learning criterion target risk surrogate risk … 6 Calibrated surrogate Target loss (0-1 loss) easily optimizable Surrogate loss ϕ R ϕ ( f ) R * ϕ R ψ ( f ) ϕ 01 R * ψ f m f ∞
What surrogate is calibrated? wrong ? surrogate [Bartlett+ 2006] calibrated convex & 7 non-robust surrogate correct robust 0-1 wrong correct 0-1 loss Robust Classification Usual Classification calibrated ϕ ϕ ϕ ′ (0) < 0 ϕ 01 ϕ γ P. L. Bartlett, M. I. Jordan, & J. D. McAuliffe. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association , 101(473), 138-156.
Short Course on Calibration Analysis ̶ how to analyze loss calibration property ̶ Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approximation , 2007.
Conditional Risk and Calibration , there exists surrogate excess conditional risk target excess conditional risk )- calibrated for a target loss is ( , 9 , and such that for all . if for any (prediction) (class prob.) Definition. Conditional Risk = Risk at a single x R ϕ ( f ) = 𝔽 X [ ℙ ( Y = + 1 | X ) ϕ ( f ( X )) + ℙ ( Y = − 1 | X ) ϕ ( − f ( X )) ] ℙ ( Y = + 1 | X ) := η f ( X ) := α C ϕ ( α , η ) := ηϕ ( α ) + (1 − η ) ϕ ( − α ) ϕ ψ ℱ ψ ε > 0 δ > 0 α ∈ A ℱ η ∈ [0,1] C ϕ ( α , η ) − C * ϕ , ℱ ( η ) < δ ⟹ C ψ ( α , η ) − C * ψ , ℱ ( η ) < ε A ℱ := { f ( x ) ∣ f ∈ ℱ , x ∈ 𝒴 }
Main Tool: Calibration Function 10 target excess conditional risk s.t. Definition. (calibration function) : biconjugate of increasing monotonically target excess risk surrogate excess risk surrogate excess conditional risk )-calibrated )-calibrated for all δ ( ε ) = η ∈ [0,1] inf inf C ϕ ( η , α ) − C * ϕ , ℱ ( η ) C ψ ( η , α ) − C * ψ , ℱ ( η ) ≥ ε α ∈ A ℱ ■ Provides iff condition ψ ℱ ⟺ δ ( ε ) > 0 ε > 0 ▶ ( , ■ Provides excess risk bound ψ ≤ ( δ **) − 1 ( R ϕ ( f ) − R * ϕ ) ψ ℱ ⟹ R ψ ( f ) − R * ▶ ( , A ℱ := { f ( x ) ∣ f ∈ ℱ , x ∈ 𝒴 } δ ** δ
Example: Binary Classification ( ▶ squared loss ) hinge loss : all measurable functions [Bartlett+ 2006] , )-calibrated iff Theorem. If surrogate is convex, it is ( 11 ϕ 01 ϕ ϕ 01 ℱ all ▶ differentiable at 0 ϕ ′ (0) < 0 ℱ all δ δ 1 1 ε ε 0 0 1 1 ϕ ( α ) = (1 − α ) 2 δ ( ε ) = ε 2 ϕ ( α ) = [1 − α ] + δ ( ε ) = ε P. L. Bartlett, M. I. Jordan, & J. D. McAuliffe. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association , 101(473), 138-156.
Analysis of Robust Classification robust 0-1 correct wrong non-robust surrogate Any convex surrogates? calibrated restricted to linear predictors ϕ γ ϕ
No convex calibrated surrogate non-robust 13 non-robust surrogate conditional risk is plotted correct non-robust correct non-robust minimizer! calibration function wrong wrong correct s.t. is non-robust Proof Sketch )-calibrated. Theorem. Any convex surrogate is not ( , correct ϕ γ ℱ lin convex in α | α | ≤ γ δ ( ε ) = η ∈ [0,1] inf inf C ϕ ( η , α ) − C * ϕ , ℱ ( η ) C ϕ γ ( η , α ) − C * ϕ γ , ℱ ( η ) ≥ ε α ∈ A ℱ − γ γ α α α η ≈ 1 η ≈ 0 η ≈ 1 2
How to find calibrated surrogate? correct conditional risk is quasiconcave consider a surrogate such that surrogate conditional risk is plotted Idea. To make conditional risk not minimized in non-robust area non-robust wrong correct 14 all superlevels are convex non-robust correct non-robust wrong correct − γ γ α α α η ≈ 1 η ≈ 0 η ≈ 1 2 ϕ
Example: Shifted Ramp Loss Ramp loss Shifted ramp loss 15 calibration function ) conditional risk ( ϕ ( α ) = clip [0,1] ( ) 1 − α 2 α − 1 1 ϕ β ( α ) = clip [0,1] ( ) 1 − α + β + β 2 α − 1 + β 1 + β η > 1/2 assume 0 < β < 1 − γ
Calibrated Surrogate Losses for Adversarially Robust Classification Example: Quasiconcavity is important correct non-robust correct because minimizer lies in non-robust area conditional risk under linear predictors correct non-robust correct under restriction to linear predictors No convex calibrated surrogate ⇐ minimizing target minimizing surrogate Calibrated surrogate loss non-robust wrong correct = minimize robust 0-1 loss Robust classification 16 shifted ramp loss ℙ ( Y = + 1 | X ) = 1 2
Recommend
More recommend