recurrent neural networks
play

Recurrent Neural Networks Sharan Narang May 9, 2017 Silicon Valley - PowerPoint PPT Presentation

Exploring Sparsity in Recurrent Neural Networks Sharan Narang May 9, 2017 Silicon Valley AI Lab Speech Recognition with Deep Learning English Scaling with Data Compa paris rison on of Spe peech Recogni gnition tion App pproac roaches


  1. Exploring Sparsity in Recurrent Neural Networks Sharan Narang May 9, 2017 Silicon Valley AI Lab

  2. Speech Recognition with Deep Learning English

  3. Scaling with Data Compa paris rison on of Spe peech Recogni gnition tion App pproac roaches hes Title Accuracy Deep Learning Traditional methods Data + Model Size (Speed)

  4. Model Sizes 500.00 461.87 400.00 300.00 270.79 Number of Parameters (in millions) 200.00 Size (in MB) 115.47 100.00 67.70 32.56 8.14 0.00 Deep Speech 1 Deep Speech 2 Deep Speech 2 (RNN) (GRU) Baidu du Speec ech h Models els

  5. Future Vision

  6. Sparse Neural Networks

  7. Pruning Weights Dense Initial Network Pruning Weights Sparse Final Network Start of Training During Training End of Training Epochs

  8. Pruning Approach 0.5 0.4 Prune Threshold 0.3 Recurrent 0.2 Linear 0.1 0 0 5 10 15 20 Epoch

  9. Pruning Layers 100% 95% 90% Sparsity 85% Pruned Percent 80% 75% 70% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Layers

  10. Results Model Layer Size # of Params CER Relative Perf RNN Dense 1760 67 million 10.67 0.0% RNN Sparse 1760 8.3 million 12.88 -20.71% RNN Sparse 2560 11.1 million 10.59 0.75% RNN Sparse 3072 16.7 million 10.25 3.95% GRU Dense 2560 115 million 9.55 0.0% GRU Sparse 2560 13 million 10.87 -13.82% GRU Sparse 3568 17.8 million 9.76 -2.2%

  11. Equal Parameter Networks 60 small_dense_train small_dense_dev0 55 large_sparse_train large_sparse_dev0 50 45 CTC Cost 40 35 30 25 20 0 5 10 15 20 Epoch Number

  12. Sparsity v/s Accuracy 10% 10.89 CER Baseline line 0% -10% 13.0 CER Relative Accuracy -20% -30% -40% -50% -60% 17.4 CER -70% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Sparsity

  13. Models don’t need to be retrained

  14. Compression RNN Model 10 8.11 8 Compression 6.06 6 4.04 4 2 0 1760 Sparse 2560 Sparse 3072 Sparse Sparse se Models els

  15. Speedup 12 10 10 8 6 5.33 Measured Speedup 3.89 Expected Speedup 4 2.90 1.93 2 1.16 0 1760 Sparse 2560 Sparse 3072 Sparse Sparse se Models els

  16. Conclusion • Sparse Neural Networks can achieve good accuracy while significantly reducing the number of parameters • Threshold based approach works for fully connected layers, recurrent layers and GRU layers • Improvements in Sparse Matrix Vector libraries can result in higher speedup for Sparse Neural Networks

  17. Thank You!

  18. Sharan Narang sharan@baidu.com http://research.baidu.com Silicon Valley AI Lab

Recommend


More recommend