Exploring Sparsity in Recurrent Neural Networks Sharan Narang May 9, 2017 Silicon Valley AI Lab
Speech Recognition with Deep Learning English
Scaling with Data Compa paris rison on of Spe peech Recogni gnition tion App pproac roaches hes Title Accuracy Deep Learning Traditional methods Data + Model Size (Speed)
Model Sizes 500.00 461.87 400.00 300.00 270.79 Number of Parameters (in millions) 200.00 Size (in MB) 115.47 100.00 67.70 32.56 8.14 0.00 Deep Speech 1 Deep Speech 2 Deep Speech 2 (RNN) (GRU) Baidu du Speec ech h Models els
Future Vision
Sparse Neural Networks
Pruning Weights Dense Initial Network Pruning Weights Sparse Final Network Start of Training During Training End of Training Epochs
Pruning Approach 0.5 0.4 Prune Threshold 0.3 Recurrent 0.2 Linear 0.1 0 0 5 10 15 20 Epoch
Pruning Layers 100% 95% 90% Sparsity 85% Pruned Percent 80% 75% 70% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Layers
Results Model Layer Size # of Params CER Relative Perf RNN Dense 1760 67 million 10.67 0.0% RNN Sparse 1760 8.3 million 12.88 -20.71% RNN Sparse 2560 11.1 million 10.59 0.75% RNN Sparse 3072 16.7 million 10.25 3.95% GRU Dense 2560 115 million 9.55 0.0% GRU Sparse 2560 13 million 10.87 -13.82% GRU Sparse 3568 17.8 million 9.76 -2.2%
Equal Parameter Networks 60 small_dense_train small_dense_dev0 55 large_sparse_train large_sparse_dev0 50 45 CTC Cost 40 35 30 25 20 0 5 10 15 20 Epoch Number
Sparsity v/s Accuracy 10% 10.89 CER Baseline line 0% -10% 13.0 CER Relative Accuracy -20% -30% -40% -50% -60% 17.4 CER -70% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Sparsity
Models don’t need to be retrained
Compression RNN Model 10 8.11 8 Compression 6.06 6 4.04 4 2 0 1760 Sparse 2560 Sparse 3072 Sparse Sparse se Models els
Speedup 12 10 10 8 6 5.33 Measured Speedup 3.89 Expected Speedup 4 2.90 1.93 2 1.16 0 1760 Sparse 2560 Sparse 3072 Sparse Sparse se Models els
Conclusion • Sparse Neural Networks can achieve good accuracy while significantly reducing the number of parameters • Threshold based approach works for fully connected layers, recurrent layers and GRU layers • Improvements in Sparse Matrix Vector libraries can result in higher speedup for Sparse Neural Networks
Thank You!
Sharan Narang sharan@baidu.com http://research.baidu.com Silicon Valley AI Lab
Recommend
More recommend