not all neurons are created equal
play

Not all Neurons are created equal: Towards a feature level Deep - PowerPoint PPT Presentation

Not all Neurons are created equal: Towards a feature level Deep Neural Network Test Coverage Metric Nils Wenzler - CSC2125: Topics in Software Engineering Winter 2019 Problem DNN Problem Does it work? Does it really work? DNN Problem


  1. Not all Neurons are created equal: Towards a feature level Deep Neural Network Test Coverage Metric Nils Wenzler - CSC2125: Topics in Software Engineering Winter 2019

  2. Problem DNN

  3. Problem Does it work? Does it really work? DNN ฀

  4. Problem Steer left! DNN

  5. Problem Steer right! DNN

  6. Problem Go straight! DNN

  7. Problem Did I test it enough? Did I test it in the right way? DNN ฀

  8. Structure 1. Problem 2. Current DNN Test Coverage Metrics 3. α -Bin Coverage 4. Practical Evaluation

  9. General Approach Use a test coverage metric for • Building test suites that • Cover all significant behaviours of a deep neural network Not a proof of correctness but evidence towards correctness!

  10. Current DNN Test Coverage Metrics

  11. Current DNN Test Coverage Metrics • High research interest • White-box testing • Focused on single neurons

  12. Current DNN Test Coverage Metrics 𝑚𝑝𝑥 𝑜 : lowest output value during training ℎ𝑗𝑕ℎ 𝑜 : highest output value during training

  13. Current DNN Test Coverage Metrics 𝑚𝑝𝑥 𝑜 ℎ𝑗𝑕ℎ 𝑜 Neural Coverage 0.2 k-multisection Neuron Coverage k=6 Neuron Boundary Coverage Strong Neuron Activation Cov.

  14. Structure 1. Problem 2. Current DNN Test Coverage Metrics 3. α -Bin Coverage 4. Practical Evaluation

  15. Yet another metric? Less then 1 ‰ of total coverage metric! Number of neurons per layer in AlexNet

  16. Not all Neurons are created equal Current metrics put equal emphasis on each neuron, but: Is a first layer neuron as important as an output layer neuron? Make use of domain specific knowledge concerning layer architectures!

  17. 𝑚𝑝𝑥 𝑜 ℎ𝑗𝑕ℎ 𝑜 Neural Coverage 0.2 k-multisection Neuron Coverage k=6 Neuron Boundary Coverage Strong Neuron Activation Cov. Bin Coverage # bins dependend on layer

  18. α -Bin Coverage Equally distribute so-called bins throughout layers. Each layer contributes approximate same share to coverage metric. 𝑚𝑝𝑥 𝑜 ℎ𝑗𝑕ℎ 𝑜 k-multisection Neuron Coverage k=6 Bin Coverage # bins dependend on layer

  19. α -Bin Coverage Let 𝑀 𝑗 denote the number of neurons in Layer i. Let 𝑀 𝑛𝑏𝑦 be the maximum of all 𝑀 𝑗 . Let α ∈ (0, ∞ ] . The minimum number of bins per layer for α -Bin Coverage is defined as: 𝐶𝑗𝑜𝑡 = 𝑀 𝑛𝑏𝑦 ⋅ α The number of bins per neuron in Layer i is defined as: 𝐶𝑗𝑜𝑡 𝑙 𝑗 = 𝑀 𝑗

  20. Structure 1. Problem 2. Current DNN Test Coverage Metrics 3. α -Bin Coverage 4. Practical Evaluation

  21. Practical Evaluation The main questions: 1. Can α -Bin Coverage be implemented in a practically feasible way? 2. Can α -Bin Coverage be optimized with a greedy search approach? 3. How does α -Bin Coverage relate to other DNN coverage metrics? 4. Can α -Bin Coverage be used to find wrong behaviours?

  22. Practical Evaluation The main questions: 1. Can α -Bin Coverage be implemented in a practically feasible way? 2. Can α -Bin Coverage be optimized with a greedy search approach? 3. How does α -Bin Coverage relate to other DNN coverage metrics? 4. Can α -Bin Coverage be used to find wrong behaviours?

  23. Practically feasible? Test setup (1/2): • 10 layer DNN inspired by Nvidea End to End approach using ReLu • Trained on 45,500 publicly available labeled images • Implemented in Python using Tensorflow

  24. Practically feasible? Test setup (2/2): • Created greedy optimizer that uses image transforms to optimize coverage metric • Compare behaviour of α -Bin Coverage & Neuron Coverage

  25. Performance Greedy search transforms Add image to test suite Determine 𝑚𝑝𝑥 𝑜 Select random Add transforms to Evaluate coverage and ℎ𝑗𝑕ℎ 𝑜 image image Iterate on transforms Determining 𝑚𝑝𝑥 𝑜 and ℎ𝑗𝑕ℎ 𝑜 only needs to be done once and can be approximated through random sampling. Calculating α -Bin Coverage incrementally: constant time (dependend on network size).

  26. Greedy search: Transforms Transformations: Translation, Brightness, Contrast, Blur

  27. Practical Evaluation The main questions: 1. Can α -Bin Coverage be implemented in a practically feasible way? 2. Can α -Bin Coverage be optimized with a greedy search approach? 3. How does α -Bin Coverage relate to other DNN coverage metrics? 4. Can α -Bin Coverage be used to find wrong behaviours?

  28. Greedy Optimization: Bin Coverage

  29. Greedy Optimization: Bin Coverage ReLu Activations: Neuron Boundary Coverage is practically limited at 50%

  30. Greedy Optimization: Bin Coverage Obtain 74% 0.05-Bin Coverage with ~220 images

  31. Greedy Optimization: Neuron Coverage

  32. Neuron Coverage Optimization: Layer View

  33. Neuron Coverage Optimization: Layer View Output layer is „fully tested “ for an image with a steering angle > 11.5°

  34. Bin Coverage Optimization: Layer View

  35. Bin Coverage Optimization Layer View Output layer is „fully tested “ after testing 3656 images which correspond to 0.2° steps in -360° to +360°

  36. Practical Evaluation The main questions: 1. Can α -Bin Coverage be implemented in a practically feasible way? 2. Can α -Bin Coverage be optimized with a greedy search approach? 3. How does α -Bin Coverage relate to other DNN coverage metrics? 4. Can α -Bin Coverage be used to find wrong behaviours?

  37. Deviation from target labels in test suite Example: Transformed Output: Target: Image 234° 160°

  38. Conclusions • Current DNN test coverage metrics deal all neurons equally • This introduces an intrinsic focus on the neurons of low layers in modern architectures • α -Bin Coverage is a practically feasible approach to equally distribute a test coverage metric over all layers • First evidence shows that α -Bin Coverage can be used for finding erroneous behaviours and creating test suites automatically

  39. Let‘s discuss! Some points to consider: • Only one model in evaluation • Limited number of test runs • Only one domain • Why greedy search? • What is this strange α value? Why do we need it? • How about classification tasks?

  40. Greedy search Stack transformations on randomly selected images to optimize coverage metric. Add an image to test suite if it significantly increases coverage metric Transformations: Translation, Brightness, Contrast, Blur

Recommend


More recommend