Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks Victor Magron , MAC Team, CNRS–LAAS Jointly certified with T. Chen, J.-B. Lasserre and E. Pauwels IPAM, UCLA 28 February 2020 6 5 Hidden Input Output 1 4 3 2
Lipschitz constant of neural networks Hidden Input Output Applications: WGAN, certification Existing works: [Lattore et al.’18] based on linear programming (LP) Network setting: K -classifier, ReLU network , 1 + m layers (1 input layer + m hidden layer), A i weights, b i biases Score of label k � K = c kT x m with last activation vector c k Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 1 / 11
Lipschitz constant of neural networks x 0 ∈ R p z 0 ∈ R p z 1 ∈ R p 1 . . . z m ∈ R p m z i = A i x i − 1 + b i x i − 1 = ReLU ( z i − 1 ) Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11
Lipschitz constant of neural networks x 0 ∈ R p z 0 ∈ R p z 1 ∈ R p 1 . . . z m ∈ R p m z i = A i x i − 1 + b i x i − 1 = ReLU ( z i − 1 ) L IPSCHITZ CONSTANT : L ||·|| = inf { L : ∀ x , y ∈ X , | f ( x ) − f ( y ) | ≤ L || x − y ||} f Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11
Lipschitz constant of neural networks x 0 ∈ R p z 0 ∈ R p z 1 ∈ R p 1 . . . z m ∈ R p m z i = A i x i − 1 + b i x i − 1 = ReLU ( z i − 1 ) L IPSCHITZ CONSTANT : L ||·|| = inf { L : ∀ x , y ∈ X , | f ( x ) − f ( y ) | ≤ L || x − y ||} f = sup {||∇ f ( x ) || ∗ : x ∈ X } = sup { t T ∇ f ( x ) : x ∈ X , || t || ≤ 1 } Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11
Lipschitz constant of neural networks x 0 ∈ R p z 0 ∈ R p z 1 ∈ R p 1 . . . z m ∈ R p m z i = A i x i − 1 + b i x i − 1 = ReLU ( z i − 1 ) L IPSCHITZ CONSTANT : L ||·|| = inf { L : ∀ x , y ∈ X , | f ( x ) − f ( y ) | ≤ L || x − y ||} f = sup {||∇ f ( x ) || ∗ : x ∈ X } = sup { t T ∇ f ( x ) : x ∈ X , || t || ≤ 1 } G RADIENT for a fixed label k : � m � T diag ( ReLU ′ ( z i )) ∏ ∇ f ( x 0 ) = A i c k i = 1 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11
A polynomial optimization formulation ReLU (left) & its semialgebraicity (right) 1 1 0.5 0.5 0 u 0 u -0.5 -0.5 -1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 x x u = max { x , 0 } u ( u − x ) = 0, u ≥ x , u ≥ 0 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 3 / 11
A polynomial optimization formulation ReLU’ (left) & its semialgebraicity (right) 1.5 1.5 1 1 u 0.5 u 0.5 0 0 -0.5 -0.5 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 x x u ( u − 1 ) = 0, ( u − 1 u = 1 { x ≥ 0 } 2 ) x ≥ 0 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 4 / 11
A polynomial optimization formulation Local Lipschitz constant: x 0 ∈ ball of center ¯ x 0 and radius ε Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 5 / 11
A polynomial optimization formulation Local Lipschitz constant: x 0 ∈ ball of center ¯ x 0 and radius ε One single hidden layer ( m = 1 ): t T A T diag ( u ) c sup x , u , z , t ( z − Ax − b ) 2 = 0 t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0 u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 5 / 11
A polynomial optimization formulation Local Lipschitz constant: x 0 ∈ ball of center ¯ x 0 and radius ε One single hidden layer ( m = 1 ): t T A T diag ( u ) c sup x , u , z , t ( z − Ax − b ) 2 = 0 t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0 u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0 “C HEAP ” and “ TIGHT ” upper bound? Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 5 / 11
The moment-sums of squares hierarchy NP-hard NON CONVEX Problem f ⋆ = sup f ( x ) Theory (Primal) (Dual) � sup f d µ inf λ with µ proba ⇒ INFINITE LP ⇐ with λ − f � 0 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 6 / 11
The moment-sums of squares hierarchy NP-hard NON CONVEX Problem f ⋆ = sup f ( x ) Practice (Primal Relaxation ) (Dual Strengthening ) � x α d µ moments λ − f = sum of squares finite number ⇒ ⇐ fixed degree SDP L ASSERRE ’ S H IERARCHY of CONVEX P ROBLEMS ↑ f ∗ [Lasserre/Parrilo 01] ⇒ ( n + 2 d degree d & n vars = n ) SDP VARIABLES Numeric solvers = ⇒ Approx Certificate Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 6 / 11
The sparse hierarchy [Waki, Lasserre 06] Correlative sparsity pattern f = x 2 x 5 + x 3 x 6 − x 2 x 3 − x 5 x 6 + x 1 ( − x 1 + x 2 + x 3 − x 4 + x 5 + x 6 ) 6 5 1 Chordal graph 4 3 2 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 7 / 11
The sparse hierarchy [Waki, Lasserre 06] Correlative sparsity pattern f = x 2 x 5 + x 3 x 6 − x 2 x 3 − x 5 x 6 + x 1 ( − x 1 + x 2 + x 3 − x 4 + x 5 + x 6 ) 6 5 1 Chordal graph 4 3 2 C 1 = { 1, 4 } C 2 = { 1, 2, 3, 5 } 1 Subsets C 1 , C 2 , C 3 C 3 = { 1, 3, 5, 6 } 2 Average size κ ❀ ( κ + 2 d κ ) vars Dense SDP: 210 vars Sparse SDP: 115 vars Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 7 / 11
Our “heuristic relaxation” method: HR-2 Go between 1 ST & 2 ND stair in SPARSE hierarchy Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11
Our “heuristic relaxation” method: HR-2 Go between 1 ST & 2 ND stair in SPARSE hierarchy t T A T diag ( u ) c sup x , u , z , t ( z − Ax − b ) 2 = 0 t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0 u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11
Our “heuristic relaxation” method: HR-2 Go between 1 ST & 2 ND stair in SPARSE hierarchy t T A T diag ( u ) c sup x , u , z , t ( z − Ax − b ) 2 = 0 t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0 u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0 Pick SDP variables for products in { x , t } , { u , z } up to deg 4 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11
Our “heuristic relaxation” method: HR-2 Go between 1 ST & 2 ND stair in SPARSE hierarchy t T A T diag ( u ) c sup x , u , z , t ( z − Ax − b ) 2 = 0 t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0 u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0 Pick SDP variables for products in { x , t } , { u , z } up to deg 4 Pick SDP variables for products in { x , z } , { t , u } up to deg 2 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11
HR-2 on random ( 80, 80 ) networks Weight matrix A with band structure of width s SHOR : Shor’s relaxation given by 1 ST stair in the hierarchy LipOpt- 3 : LP based method LBS : lower bound given by 10 4 random samples ● ● ● 4 ● ● ● 100 ● Algorithm ● 3 ● HR−2 ● ● ● ● ● ● ● SHOR ● ● ● ● 10 ● ● ● LipOpt−3 2 ● ● ● ● ● ● ● ● ● ● ● LBS ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 20 30 40 10 20 30 40 s s Upper bound Time Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 9 / 11
HR-2 on trained ( 784, 500 ) network MNIST classifier ( SDP-NN ) from Raghunathan et al. Certified defenses against adversarial examples , ICLR’18 HR-2 SHOR LipOpt-3 LBS Bound 14.56 < 17.85 Out of RAM 9.69 Global Lipschitz Time 12246 > 2869 Out of RAM - 12.70 < 16.07 - 8.20 Bound Local Lipschitz 20596 > 4217 - - Time Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 10 / 11
What’s next? M ORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY : exploit term sparsity [ Wang-M.-Lasserre 19] Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11
What’s next? M ORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY : exploit term sparsity [ Wang-M.-Lasserre 19] Term sparsity pattern graph xy 1 yz Chordal extension � Link with Jared Miller’s poster! x y z Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11
What’s next? M ORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY : exploit term sparsity [ Wang-M.-Lasserre 19] Term sparsity pattern graph xy 1 yz Chordal extension � Link with Jared Miller’s poster! x y z C ERTIFIED bounds � embed ML into “ CRITICAL ” dynamical systems Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11
Recommend
More recommend