and inference for convolutional
play

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 - PowerPoint PPT Presentation

Band-limited Training and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast Training of Convolutional Networks through FFTs Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x


  1. Band-limited Training and Inference for Convolutional Neural Networks 1

  2. 2

  3. FFT IFFT 3

  4. 4

  5. Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x xfft FFT(x) offt Out: o xfft  yfft IFFT(offt) Filter: y FFT(y) yfft

  6. Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x xfft FFT(x) offt Out: o xfft  yfft IFFT(offt) Filter: y FFT(y) yfft

  7. Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x xfft FFT(x) Out: o offt xfft  yfft IFFT(offt) Filter: y FFT(y) yfft

  8. Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x xfft FFT(x) offt Out: o xfft  yfft IFFT(offt) Filter: y FFT(y) yfft

  9. Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation cuDNN cu DNN: Subs ubstantial tantial memor ory y wor orkspace space neede ded d for or intermed ermediate iate resul ults. ts. Data: x xfft FFT(x) offt Out: o xfft  yfft IFFT(offt) Filter: y FFT(y) yfft

  10. Band-limiting = masking out high frequencies xfft Data: x xCfft Band-limited FFT(x) (xfft) offt Out: o xCfft  yCfft IFFT(offt) Filter: y yfft Band-limited FFT(y) (yfft) yCfft

  11. xfft Data: x xCfft Band-limited Less memory used FFT(x) (xfft) offt Out: o xCfft  yCfft IFFT(offt) Filter: y yfft Band-limited FFT(y) (yfft) yCfft

  12. xfft Data: x xCfft Band-limited Less memory used FFT(x) (xfft) offt Out: o xCfft  yCfft IFFT(offt) Filter: y yfft Band-limited FFT(y) Faster computation (yfft) yCfft

  13. Preserve enough of the spectrum to retain high accuracy of models. xfft Data: x xCfft Band-limited Less memory used FFT(x) (xfft) offt Out: o xCfft  yCfft IFFT(offt) Filter: y yfft Band-limited FFT(y) Faster computation (yfft) yCfft

  14. 14

  15. 2. Conjugate symmetry 1-j 1+j 15

  16. 2. Conjugate symmetry 1+j 16

  17. 2. Conjugate symmetry DC 1+j 3. Real values 17

  18. 2. Conjugate symmetry DC 1+j 3. Real values 4. No constraints 18

  19. 2. Conjugate symmetry DC 3. Real values 4. No constraints 5. 1 st compression 19

  20. 2. Conjugate symmetry DC 3. Real values 4. No constraints 5. 1 st compression 20

  21. 2. Conjugate symmetry DC 3. Real values 4. No constraints 5. 1 st compression 6. 2 nd compression 21

  22. 2. Conjugate symmetry DC 3. Real values 4. No constraints 5. 1 st compression 6. 2 nd compression 22

  23. Test Accuracy (%) 95 90 ResNet-18 on CIFAR-10 85 0 20 40 60 80 Compression rate (%) 23

  24. 93.5% Test Accuracy (%) 95 90 ResNet-18 on CIFAR-10 85 0 20 40 60 80 Compression rate (%) 24

  25. 93.5% Test Accuracy (%) 95 92% 90 ResNet-18 on CIFAR-10 85 0 20 40 60 80 Compression rate (%) 25

  26. 93.5% Test Accuracy (%) 95 92% 90 ResNet-18 on CIFAR-10 85 0 20 40 60 80 Test Accuracy (%) 80 75.3% 71.2% 70 DenseNet-121 on CIFAR-100 60 0 20 40 60 80 Compression rate (%) 26

  27. ▪ ▪ ▪ ▪ ▪ ▪ ▪ 27

  28. ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ 30

  29. Cross-correlate input data and filter: x ∗ c y F x ω = F x n F y ω = F y n x ∗ c y = F −1 (F x ω ʘ F y ω ) Spectrum of convolution: S ω = F x ω ʘ F y ω 𝐍 𝐝 𝛛 = ቊ 𝟐, 𝛛 ≤ 𝐝 𝐏, 𝛛 > 𝐝 x ∗ c y = F −1 [ F x ω ʘ M c ω ) ʘ (F y ω ʘ M c ω ] x ∗ c y = F −1 S ω ʘ M c ω 𝑂−1 |𝑦 𝑜 | 2 = σ 𝜕=0 Energy (Parseval’s theorem): σ 𝑜=0 2𝜌 𝑦 𝜕 | 2 |𝐺 31

  30. 32

  31. DenseNet-121 on CIFAR-100 80 70 Test accuracy (%) 60 50 40 30 20 10 C=50 C=75 0 0 20 40 60 80 Inference Compression Rate (%) 33

  32. DenseNet-121 on CIFAR-100 80 70 Test accuracy (%) 60 50 40 30 20 10 C=0 C=50 C=75 C=85 0 0 20 40 60 80 Inference Compression Rate (%) 34

  33. ResNet-18 on CIFAR-10 performance Normalized 100 (%) 50 GPU memory allocated 0 0 20 40 60 80 performance 100 Normalized 50 (%) Epoch time 0 0 20 40 60 80 Compression rate (%) 35

  34. 100 ResNet-18 on CIFAR-10 80 Test accuracy (%) 60 40 Train Compression Rate (%): 20 C=0 0 0 10 20 30 40 50 60 70 80 Inference Compression Rate (%) 36

  35. 100 ResNet-18 on CIFAR-10 80 Test accuracy (%) 60 40 Train Compression Rate (%): 20 C=0 C=85 0 0 10 20 30 40 50 60 70 80 Inference Compression Rate (%) 37

  36. 100 ResNet-18 on CIFAR-10 80 Test accuracy (%) 60 40 Train Compression Rate (%): 20 C=0 C=85 0 0 10 20 30 40 50 60 70 80 Inference Compression Rate (%) Smooth degradation of accuracy during inference 38

  37. 100 ResNet-18 on CIFAR-10 80 Test accuracy (%) 60 40 Train Compression Rate (%): 20 C=0 C=30 C=50 C=85 0 0 10 20 30 40 50 60 70 80 Inference Compression Rate (%) Apply the same compression rate to training and inference 39

  38. Test Accuracy (%) 95 100 90 50 GPU memory allocated 85 0 0 50 0 20 40 60 80 100 Test Accuracy (%) 80 50 70 Epoch time 0 60 0 20 40 60 80 0 50 Compression rate (%) Compression rate (%) 40

  39. 41

  40. 42

  41. 43

  42. 44

  43. 45

  44. “Speaking of longer term, it would be nice if the community migrated to a fully open sourced implementation for all of this [convolution operations, etc.]. This stuff is just too important to the progress of the field for it to be locked away in proprietary implementations . The more people working together on this the better for everyone. There's plenty of room to compete on the hardware implementation side.” Scott Gray https://github.com/soumith/convnet-benchmarks/issues/93 46

Recommend


More recommend