approximate verification of deep neural networks with
play

Approximate Verification of Deep Neural Networks with Provable - PowerPoint PPT Presentation

Approximate Verification of Deep Neural Networks with Provable Guarantees Xiaowei Huang, University of Liverpool Outline Background and Challenges Safety Definition and Layer-by-Layer Refinement Game-based Approach for a Single Layer


  1. Approximate Verification of Deep Neural Networks with Provable Guarantees Xiaowei Huang, University of Liverpool

  2. Outline Background and Challenges Safety Definition and Layer-by-Layer Refinement Game-based Approach for a Single Layer Verification Experimental Results

  3. Human-Level Intelligence

  4. Robotics and Autonomous Systems

  5. Deep neural networks all implemented with

  6. Major problems and critiques ◮ un-safe, e.g., lack of robustness (this talk) ◮ hard to explain to human users ◮ ethics, trustworthiness, accountability, etc.

  7. Figure: safety in image classification networks

  8. Figure: safety in natural language processing networks

  9. Figure: safety in voice recognition networks

  10. Figure: safety in security systems

  11. Outline Background and Challenges Safety Definition and Layer-by-Layer Refinement Safety Definition Challenges Approaches Game-based Approach for a Single Layer Verification Experimental Results

  12. Certification of DNN

  13. Safety Requirements ◮ Pointwise Robustness (this talk) ◮ if the decision of a pair (input, network) is invariant with respect to the perturbation to the input. ◮ Network Robustness ◮ or more fundamentally, Lipschitz continuity, mutual information, etc ◮ model interpretability

  14. Safety Definition: Human Driving vs. Autonomous Driving Traffic image from “The German Traffic Sign Recognition Benchmark”

  15. Safety Definition: Human Driving vs. Autonomous Driving Image generated from our tool

  16. Safety Problem: Incidents

  17. Safety Definition: Illustration

  18. Safety Definition: Deep Neural Networks ◮ R n be a vector space of inputs (points) ◮ f : R n → C , where C is a (finite) set of class labels, models the human perception capability, ◮ a neural network classifier is a function ˆ f ( x ) which approximates f ( x )

  19. Safety Definition: Deep Neural Networks A (feed-forward) neural network N is a tuple ( L , T , Φ), where ◮ L = { L k | k ∈ { 0 , ..., n }} : a set of layers. ◮ T ⊆ L × L : a set of sequential connections between layers, ◮ Φ = { φ k | k ∈ { 1 , ..., n }} : a set of activation functions φ k : D L k − 1 → D L k , one for each non-input layer.

  20. Safety Definition: Traffic Sign Example

  21. Maximum Safe Radius Definition The maximum safe radius problem is to compute the minimum distance from the original input α to an adversarial example, i.e., α ′ ∈ D {|| α − α ′ || k | α ′ is an adversarial example } MSR ( α ) = min (1)

  22. Challenges Challenge 1: continuous space, i.e., there are an infinite number of points to be tested in the high-dimensional space Challenge 2: The spaces are high dimensional Challenge 3: the functions f and ˆ f are highly non-linear, i.e., safety risks may exist in the pockets of the spaces Challenge 4: not only heuristic search but also verification

  23. Approach 1: Single Layer – Discretisation Define manipulations δ k : D L k → D L k over the activations in the vector space of layer k . δ 2 δ 2 δ 1 δ 1 α x,k α x,k δ 3 δ 3 δ 4 δ 4 Figure: Example of a set { δ 1 , δ 2 , δ 3 , δ 4 } of valid manipulations in a 2-dimensional space

  24. Exploring a Finite Number of Points η k ( α x,k ) η k ( α x,k ) α x j +1 ,k α x j +1 ,k δ k δ k α x j ,k α x j ,k δ k δ k δ k δ k α x 2 ,k α x 2 ,k α x,k = α x 0 ,k α x,k = α x 0 ,k α x 1 ,k α x 1 ,k δ k δ k δ k δ k δ k δ k

  25. Finite Approximation Definition Let τ ∈ (0 , 1] be a manipulation magnitude. The finite maximum safe radius problem FMSR ( τ, α ) is defined over the manipulation magnitude τ (details to be given later). Lemma For any τ ∈ (0 , 1] , we have that MSR ( α ) ≤ FMSR ( τ, α ) .

  26. Approach 2: Single Layer – Exhaustive Search η k ( α x,k ) η k ( α x,k ) α x j +1 ,k α x j +1 ,k δ k δ k α x j ,k α x j ,k δ k δ k δ k δ k α x 2 ,k α x 2 ,k α x,k = α x 0 ,k α x,k = α x 0 ,k α x 1 ,k α x 1 ,k δ k δ k δ k δ k δ k δ k Figure: exhaustive search (verification) vs. heuristic search

  27. Approach 3: Single Layer – Anytime Algorithms

  28. Approach 4: Layer-by-Layer Refinement Will explain how to determine τ ∗ 0 later.

  29. Approach 2: Layer-by-Layer Refinement

  30. Approach 2: Layer-by-Layer Refinement

  31. Outline Background and Challenges Safety Definition and Layer-by-Layer Refinement Game-based Approach for a Single Layer Verification Experimental Results

  32. Preliminaries: Lipschitz network Definition Network N is a Lipschitz network with respect to distance function L k if there exists a constant � c > 0 for every class c ∈ C such that, for all α, α ′ ∈ D , we have | N ( α ′ , c ) − N ( α, c ) | ≤ � c · || α ′ − α || k . (2) Most known types of layers, including fully-connected, convolutional, ReLU, maxpooling, sigmoid, softmax, etc., are Lipschitz continuous [4].

  33. Preliminaries: Feature-Based Partitioning Partition the input dimensions with respect to a set of features. Here, features in the simplest case can be a uniform partition, i.e., do not necessarily follow a particular method. Useful for the reduction to two-player game, in which player One chooses a feature and player Two chooses how to manipulate the selected feature.

  34. Preliminaries: Input Manipulation Let τ > 0 be a positive real number representing the manipulation magnitude, then we can define input manipulation operations δ τ, X , i : D → D for X ⊆ P 0 , a subset of input dimensions, and i : P 0 → N , an instruction function by: � α ( j ) + i ( j ) ∗ τ, if j ∈ X δ τ, X , i ( α )( j ) = α ( j ) , otherwise for all j ∈ P 0 .

  35. Approximation Based on Finite Optimisation Definition Let τ ∈ (0 , 1] be a manipulation magnitude. The finite maximum safe radius problem FMSR ( τ, α ) based on input manipulation is as follows: min min min i ∈I {|| α − δ τ, X , i ( α ) || k | δ τ, X , i ( α ) is an adv. example } Λ ′ ⊆ Λ( α ) X ⊆ � λ ∈ Λ ′ P λ (3) Lemma For any τ ∈ (0 , 1] , we have that MSR ( α ) ≤ FMSR ( τ, α ) . We need to determine the condition for τ to satisfy so that FMSR ( τ, α ) = MSR ( α ).

  36. Grid Space Definition An image α ′ ∈ η ( α, L k , d ) is a τ -grid input if for all dimensions p ∈ P 0 we have | α ′ ( p ) − α ( p ) | = n ∗ τ for some n ≥ 0. Let G ( α, k , d ) be the set of τ -grid inputs in η ( α, L k , d ).

  37. misclassification aggregator Definition An input α 1 ∈ η ( α, L k , d ) is a misclassification aggregator with respect to a number β > 0 if, for any α 2 ∈ η ( α 1 , L k , β ), we have that N ( α 2 ) � = N ( α ) implies N ( α 1 ) � = N ( α ). Lemma If all τ -grid inputs are misclassification aggregators with respect to 1 2 d ( k , τ ) , then MSR ( k , d , α, c ) ≥ FMSR ( τ, k , d , α, c ) − 1 2 d ( k , τ ) .

  38. Conditions for Achieving Misclassification Aggregator Given a class label c , we let g ( α ′ , c ) = c ′ ∈ C , c ′ � = c { N ( α ′ , c ) − N ( α ′ , c ′ ) } min (4) be a function maintaining for an input α ′ the minimum confidence margin between the class c and another class c ′ � = N ( α ′ ). Lemma Let N be a Lipschitz network with a Lipschitz constant � c for every class c ∈ C. If 2 g ( α ′ , N ( α ′ )) d ( k , τ ) ≤ (5) max c ∈ C , c � = N ( α ′ ) ( � N ( α ′ ) + � c ) for all τ -grid input α ′ ∈ G ( α, k , d ) , then all τ -grid inputs are misclassification aggregators with respect to 1 2 d ( k , τ ) .

  39. Main Theorem Theorem Let N be a Lipschitz network with a Lipschitz constant � c for every class c ∈ C. If 2 g ( α ′ , N ( α ′ )) d ( k , τ ) ≤ max c ′ ∈ C , c ′ � = N ( α ′ ) ( � N ( α ′ ) + � c ′ ) for all τ -grid inputs α ′ ∈ G ( α, k , d ) , then we can use FMSR ( τ, k , d , α, c ) to estimate MSR ( k , d , α, c ) with an error bound 1 2 d ( k , τ ) .

  40. Two Player Game Player-I Player-I … Player-II Player-II … … … Player-I … … … Player-I … … … … Player-II … … … … Player-II MCTS: Random Simulation Admissible A*/Alpha-Beta Pruning: More Tree Expansion

  41. Flow of Reductions Monte-Carlo Upper Tree Search Two-Player Bound Lipschitz Turn-Based Finite MSR or Optimal Constants Game MSR or FR Finite FR Rewards of Problem problem Player I Lower Admissible A* Bound or Alpha-Beta Pruning

  42. Outline Background and Challenges Safety Definition and Layer-by-Layer Refinement Game-based Approach for a Single Layer Verification Experimental Results

  43. Convergence of Lower and Upper Bounds

  44. Experimental Results: GTSRB Image Classification Network for The German Traffic Sign Recognition Benchmark Total params: 571,723

  45. Experimental Results: GTSRB

  46. Experimental Results: imageNet Image Classification Network for the ImageNet dataset, a large visual database designed for use in visual object recognition software research. Total params: 138,357,544

  47. Experimental Results: ImageNet

Recommend


More recommend