Polynomial Optimization for Bounding Lipschitz Constants of Deep - PowerPoint PPT Presentation

Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks Victor Magron , MAC Team, CNRS–LAAS Jointly certified with T. Chen, J.-B. Lasserre and E. Pauwels IPAM, UCLA 28 February 2020 6 5 Hidden Input Output 1 4 3 2

Lipschitz constant of neural networks Hidden Input Output Applications: WGAN, certification Existing works: [Lattore et al.’18] based on linear programming (LP) Network setting: K -classifier, ReLU network , 1 + m layers (1 input layer + m hidden layer), A i weights, b i biases Score of label k � K = c kT x m with last activation vector c k Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 1 / 11

Lipschitz constant of neural networks x 0 ∈ R p z 0 ∈ R p z 1 ∈ R p 1 . . . z m ∈ R p m z i = A i x i − 1 + b i x i − 1 = ReLU ( z i − 1 ) Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11

Lipschitz constant of neural networks x 0 ∈ R p z 0 ∈ R p z 1 ∈ R p 1 . . . z m ∈ R p m z i = A i x i − 1 + b i x i − 1 = ReLU ( z i − 1 ) L IPSCHITZ CONSTANT : L ||·|| = inf { L : ∀ x , y ∈ X , | f ( x ) − f ( y ) | ≤ L || x − y ||} f Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11

Lipschitz constant of neural networks x 0 ∈ R p z 0 ∈ R p z 1 ∈ R p 1 . . . z m ∈ R p m z i = A i x i − 1 + b i x i − 1 = ReLU ( z i − 1 ) L IPSCHITZ CONSTANT : L ||·|| = inf { L : ∀ x , y ∈ X , | f ( x ) − f ( y ) | ≤ L || x − y ||} f = sup {||∇ f ( x ) || ∗ : x ∈ X } = sup { t T ∇ f ( x ) : x ∈ X , || t || ≤ 1 } Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11

Lipschitz constant of neural networks x 0 ∈ R p z 0 ∈ R p z 1 ∈ R p 1 . . . z m ∈ R p m z i = A i x i − 1 + b i x i − 1 = ReLU ( z i − 1 ) L IPSCHITZ CONSTANT : L ||·|| = inf { L : ∀ x , y ∈ X , | f ( x ) − f ( y ) | ≤ L || x − y ||} f = sup {||∇ f ( x ) || ∗ : x ∈ X } = sup { t T ∇ f ( x ) : x ∈ X , || t || ≤ 1 } G RADIENT for a fixed label k : � m � T diag ( ReLU ′ ( z i )) ∏ ∇ f ( x 0 ) = A i c k i = 1 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11

A polynomial optimization formulation ReLU (left) & its semialgebraicity (right) 1 1 0.5 0.5 0 u 0 u -0.5 -0.5 -1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 x x u = max { x , 0 } u ( u − x ) = 0, u ≥ x , u ≥ 0 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 3 / 11

A polynomial optimization formulation ReLU’ (left) & its semialgebraicity (right) 1.5 1.5 1 1 u 0.5 u 0.5 0 0 -0.5 -0.5 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 x x u ( u − 1 ) = 0, ( u − 1 u = 1 { x ≥ 0 } 2 ) x ≥ 0 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 4 / 11

A polynomial optimization formulation Local Lipschitz constant: x 0 ∈ ball of center ¯ x 0 and radius ε Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 5 / 11

A polynomial optimization formulation Local Lipschitz constant: x 0 ∈ ball of center ¯ x 0 and radius ε One single hidden layer ( m = 1 ): t T A T diag ( u ) c sup x , u , z , t  ( z − Ax − b ) 2 = 0    t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0   u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0  Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 5 / 11

A polynomial optimization formulation Local Lipschitz constant: x 0 ∈ ball of center ¯ x 0 and radius ε One single hidden layer ( m = 1 ): t T A T diag ( u ) c sup x , u , z , t  ( z − Ax − b ) 2 = 0    t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0   u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0  “C HEAP ” and “ TIGHT ” upper bound? Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 5 / 11

The moment-sums of squares hierarchy NP-hard NON CONVEX Problem f ⋆ = sup f ( x ) Theory (Primal) (Dual) � sup f d µ inf λ with µ proba ⇒ INFINITE LP ⇐ with λ − f � 0 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 6 / 11

The moment-sums of squares hierarchy NP-hard NON CONVEX Problem f ⋆ = sup f ( x ) Practice (Primal Relaxation ) (Dual Strengthening ) � x α d µ moments λ − f = sum of squares finite number ⇒ ⇐ fixed degree SDP L ASSERRE ’ S H IERARCHY of CONVEX P ROBLEMS ↑ f ∗ [Lasserre/Parrilo 01] ⇒ ( n + 2 d degree d & n vars = n ) SDP VARIABLES Numeric solvers = ⇒ Approx Certificate Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 6 / 11

The sparse hierarchy [Waki, Lasserre 06] Correlative sparsity pattern f = x 2 x 5 + x 3 x 6 − x 2 x 3 − x 5 x 6 + x 1 ( − x 1 + x 2 + x 3 − x 4 + x 5 + x 6 ) 6 5 1 Chordal graph 4 3 2 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 7 / 11

The sparse hierarchy [Waki, Lasserre 06] Correlative sparsity pattern f = x 2 x 5 + x 3 x 6 − x 2 x 3 − x 5 x 6 + x 1 ( − x 1 + x 2 + x 3 − x 4 + x 5 + x 6 ) 6 5 1 Chordal graph 4 3 2 C 1 = { 1, 4 } C 2 = { 1, 2, 3, 5 } 1 Subsets C 1 , C 2 , C 3 C 3 = { 1, 3, 5, 6 } 2 Average size κ ❀ ( κ + 2 d κ ) vars Dense SDP: 210 vars Sparse SDP: 115 vars Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 7 / 11

Our “heuristic relaxation” method: HR-2 Go between 1 ST & 2 ND stair in SPARSE hierarchy Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11

Our “heuristic relaxation” method: HR-2 Go between 1 ST & 2 ND stair in SPARSE hierarchy t T A T diag ( u ) c sup x , u , z , t  ( z − Ax − b ) 2 = 0    t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0   u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0  Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11

Our “heuristic relaxation” method: HR-2 Go between 1 ST & 2 ND stair in SPARSE hierarchy t T A T diag ( u ) c sup x , u , z , t  ( z − Ax − b ) 2 = 0    t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0   u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0  Pick SDP variables for products in { x , t } , { u , z } up to deg 4 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11

Our “heuristic relaxation” method: HR-2 Go between 1 ST & 2 ND stair in SPARSE hierarchy t T A T diag ( u ) c sup x , u , z , t  ( z − Ax − b ) 2 = 0    t 2 ≤ 1, ( x − ¯ s.t. x 0 + ε )( x − ¯ x 0 − ε ) ≤ 0   u ( u − 1 ) = 0, ( u − 1/2 ) z ≥ 0  Pick SDP variables for products in { x , t } , { u , z } up to deg 4 Pick SDP variables for products in { x , z } , { t , u } up to deg 2 Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11

HR-2 on random ( 80, 80 ) networks Weight matrix A with band structure of width s SHOR : Shor’s relaxation given by 1 ST stair in the hierarchy LipOpt- 3 : LP based method LBS : lower bound given by 10 4 random samples ● ● ● 4 ● ● ● 100 ● Algorithm ● 3 ● HR−2 ● ● ● ● ● ● ● SHOR ● ● ● ● 10 ● ● ● LipOpt−3 2 ● ● ● ● ● ● ● ● ● ● ● LBS ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 20 30 40 10 20 30 40 s s Upper bound Time Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 9 / 11

HR-2 on trained ( 784, 500 ) network MNIST classifier ( SDP-NN ) from Raghunathan et al. Certified defenses against adversarial examples , ICLR’18 HR-2 SHOR LipOpt-3 LBS Bound 14.56 < 17.85 Out of RAM 9.69 Global Lipschitz Time 12246 > 2869 Out of RAM - 12.70 < 16.07 - 8.20 Bound Local Lipschitz 20596 > 4217 - - Time Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 10 / 11

What’s next? M ORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY : exploit term sparsity [ Wang-M.-Lasserre 19] Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11

What’s next? M ORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY : exploit term sparsity [ Wang-M.-Lasserre 19] Term sparsity pattern graph xy 1 yz Chordal extension � Link with Jared Miller’s poster! x y z Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11

What’s next? M ORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY : exploit term sparsity [ Wang-M.-Lasserre 19] Term sparsity pattern graph xy 1 yz Chordal extension � Link with Jared Miller’s poster! x y z C ERTIFIED bounds � embed ML into “ CRITICAL ” dynamical systems Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11

Polynomial Optimization for Bounding Lipschitz Constants of Deep - PowerPoint PPT Presentation

Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks Victor Magron , MAC Team, CNRSLAAS Jointly certified with T. Chen, J.-B. Lasserre and E. Pauwels IPAM, UCLA 28 February 2020 6 5 Hidden Input Output 1 4 3 2

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks Mahyar Fazlyab,

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Fundamental constants, gravitation and cosmology Jean-Philippe UZAN Constants Physical theories

Towards Secure Distance Bounding Ioana Boureanu, Katerina Mitrokotsa, Serge Vaudenay COLE

On the Need for Provably Secure Distance Bounding Serge Vaudenay COLE POLYTECHNIQUE

Learning From Data Lecture 6 Bounding The Growth Function Bounding the Growth Function Models

Typically represent objects by bounding boxes. People have tried Goal rotated bounding boxes

Computer Graphics MTAT.03.015 Raimond Tunnel The Road So Far... Bounding Box With bounding

Spatial Data Structures Hierarchical Bounding Volumes Hierarchical Bounding Volumes Grids Grids

Distance Bounding for RFID Prof. Gildas Avoine Universit e catholique de Louvain, Belgium

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

On Kauffman polynomial of alternating knot and HOMFLY polynomial of its Whitehead double

PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting Sum-of-Squares Error Function 0

Property of the interior polynomial from the HOMFLY polynomial

Lipschitz Quotients [S. Bates], W.B.J., J. Lindenstrauss, D. Preiss, G. Schechtman Background

Computing the static potential using non-string-like trial states Lattice 2016 - Southhampton

KGL Code Upgrades Town Hall Meeting August 5, 2020 Project Scope Life Safety Systems Upgrades

NUTRITION PROGRAMS AT THE KANSAS DEPARTMENT OF HEALTH AND ENVIRONMENT PAIGE JOHNSON

Clicker 1 What happens if a graphics object is used to draw a shape that exceeds the boundaries

Day 2 Mistakes are Beautiful Things (Believe in Yourself!) Video 1 * As you are watching *

MANIFOLDS AND DUALITY ANDREW RANICKI Classication of manifolds Uniqueness

intro associations frequent patterns

0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 R 32 = C 3 is misclassified as