Band-limited Training and Inference for Convolutional Neural Networks 1
2
FFT IFFT 3
4
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x xfft FFT(x) offt Out: o xfft yfft IFFT(offt) Filter: y FFT(y) yfft
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x xfft FFT(x) offt Out: o xfft yfft IFFT(offt) Filter: y FFT(y) yfft
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x xfft FFT(x) Out: o offt xfft yfft IFFT(offt) Filter: y FFT(y) yfft
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x xfft FFT(x) offt Out: o xfft yfft IFFT(offt) Filter: y FFT(y) yfft
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation cuDNN cu DNN: Subs ubstantial tantial memor ory y wor orkspace space neede ded d for or intermed ermediate iate resul ults. ts. Data: x xfft FFT(x) offt Out: o xfft yfft IFFT(offt) Filter: y FFT(y) yfft
Band-limiting = masking out high frequencies xfft Data: x xCfft Band-limited FFT(x) (xfft) offt Out: o xCfft yCfft IFFT(offt) Filter: y yfft Band-limited FFT(y) (yfft) yCfft
xfft Data: x xCfft Band-limited Less memory used FFT(x) (xfft) offt Out: o xCfft yCfft IFFT(offt) Filter: y yfft Band-limited FFT(y) (yfft) yCfft
xfft Data: x xCfft Band-limited Less memory used FFT(x) (xfft) offt Out: o xCfft yCfft IFFT(offt) Filter: y yfft Band-limited FFT(y) Faster computation (yfft) yCfft
Preserve enough of the spectrum to retain high accuracy of models. xfft Data: x xCfft Band-limited Less memory used FFT(x) (xfft) offt Out: o xCfft yCfft IFFT(offt) Filter: y yfft Band-limited FFT(y) Faster computation (yfft) yCfft
14
2. Conjugate symmetry 1-j 1+j 15
2. Conjugate symmetry 1+j 16
2. Conjugate symmetry DC 1+j 3. Real values 17
2. Conjugate symmetry DC 1+j 3. Real values 4. No constraints 18
2. Conjugate symmetry DC 3. Real values 4. No constraints 5. 1 st compression 19
2. Conjugate symmetry DC 3. Real values 4. No constraints 5. 1 st compression 20
2. Conjugate symmetry DC 3. Real values 4. No constraints 5. 1 st compression 6. 2 nd compression 21
2. Conjugate symmetry DC 3. Real values 4. No constraints 5. 1 st compression 6. 2 nd compression 22
Test Accuracy (%) 95 90 ResNet-18 on CIFAR-10 85 0 20 40 60 80 Compression rate (%) 23
93.5% Test Accuracy (%) 95 90 ResNet-18 on CIFAR-10 85 0 20 40 60 80 Compression rate (%) 24
93.5% Test Accuracy (%) 95 92% 90 ResNet-18 on CIFAR-10 85 0 20 40 60 80 Compression rate (%) 25
93.5% Test Accuracy (%) 95 92% 90 ResNet-18 on CIFAR-10 85 0 20 40 60 80 Test Accuracy (%) 80 75.3% 71.2% 70 DenseNet-121 on CIFAR-100 60 0 20 40 60 80 Compression rate (%) 26
▪ ▪ ▪ ▪ ▪ ▪ ▪ 27
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ 30
Cross-correlate input data and filter: x ∗ c y F x ω = F x n F y ω = F y n x ∗ c y = F −1 (F x ω ʘ F y ω ) Spectrum of convolution: S ω = F x ω ʘ F y ω 𝐍 𝐝 𝛛 = ቊ 𝟐, 𝛛 ≤ 𝐝 𝐏, 𝛛 > 𝐝 x ∗ c y = F −1 [ F x ω ʘ M c ω ) ʘ (F y ω ʘ M c ω ] x ∗ c y = F −1 S ω ʘ M c ω 𝑂−1 |𝑦 𝑜 | 2 = σ 𝜕=0 Energy (Parseval’s theorem): σ 𝑜=0 2𝜌 𝑦 𝜕 | 2 |𝐺 31
32
DenseNet-121 on CIFAR-100 80 70 Test accuracy (%) 60 50 40 30 20 10 C=50 C=75 0 0 20 40 60 80 Inference Compression Rate (%) 33
DenseNet-121 on CIFAR-100 80 70 Test accuracy (%) 60 50 40 30 20 10 C=0 C=50 C=75 C=85 0 0 20 40 60 80 Inference Compression Rate (%) 34
ResNet-18 on CIFAR-10 performance Normalized 100 (%) 50 GPU memory allocated 0 0 20 40 60 80 performance 100 Normalized 50 (%) Epoch time 0 0 20 40 60 80 Compression rate (%) 35
100 ResNet-18 on CIFAR-10 80 Test accuracy (%) 60 40 Train Compression Rate (%): 20 C=0 0 0 10 20 30 40 50 60 70 80 Inference Compression Rate (%) 36
100 ResNet-18 on CIFAR-10 80 Test accuracy (%) 60 40 Train Compression Rate (%): 20 C=0 C=85 0 0 10 20 30 40 50 60 70 80 Inference Compression Rate (%) 37
100 ResNet-18 on CIFAR-10 80 Test accuracy (%) 60 40 Train Compression Rate (%): 20 C=0 C=85 0 0 10 20 30 40 50 60 70 80 Inference Compression Rate (%) Smooth degradation of accuracy during inference 38
100 ResNet-18 on CIFAR-10 80 Test accuracy (%) 60 40 Train Compression Rate (%): 20 C=0 C=30 C=50 C=85 0 0 10 20 30 40 50 60 70 80 Inference Compression Rate (%) Apply the same compression rate to training and inference 39
Test Accuracy (%) 95 100 90 50 GPU memory allocated 85 0 0 50 0 20 40 60 80 100 Test Accuracy (%) 80 50 70 Epoch time 0 60 0 20 40 60 80 0 50 Compression rate (%) Compression rate (%) 40
41
42
43
44
45
“Speaking of longer term, it would be nice if the community migrated to a fully open sourced implementation for all of this [convolution operations, etc.]. This stuff is just too important to the progress of the field for it to be locked away in proprietary implementations . The more people working together on this the better for everyone. There's plenty of room to compete on the hardware implementation side.” Scott Gray https://github.com/soumith/convnet-benchmarks/issues/93 46
Recommend
More recommend