Neural Computing Journal Club 14/10/2020 Presented by Edward Jones 1
Overview of the Review ● Introduction of the paper and authors ● Central idea of the Lottery Ticket Hypothesis ● Discussion of the experiments ● Relation to other work 2
The Authors 3
The Lottery Ticket Hypothesis “A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.” 4
5
6
7
8
9
“A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.” 12
“A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.” 13
“A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.” 14
“A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.” 15
“A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.” ≤ 16
17
Results - Lenet 18
Results - Resnet-18 19
Other findings ● Dropout works synergistically with pruning ● These sparse networks appear to generalise better ● Winning tickets are robust to weight noise 20
Follow-up work 21
Summary ● Winning tickets (subnetworks) exists ● Both structure and intialisation matter for winning tickets ● This could lead to optimisation of training 22
References Bellec, G., Kappel, D., Maass, W., and Legenstein, R. (2017). Deep Lange, R. (2020). The Lottery Ticket Hypothesis: A Survey. Medium . Rewiring: Training very sparse deep networks. arXiv:1711.05136 [cs, Available at: stat] . Available at: http://arxiv.org/abs/1711.05136 [Accessed October https://towardsdatascience.com/the-lottery-ticket-hypothesis-a-survey- 14, 2020]. d1f0f62f8884 [Accessed October 14, 2020]. Bogdan, P. A., Rowley, A. G. D., Rhodes, O., and Furber, S. B. (2018). LeCun, Y., Denker, J. S., and Solla, S. A. Optimal Brain Damage. 8. Structural Plasticity on the SpiNNaker Many-Core Neuromorphic System. Front. Neurosci. 12. doi:10.3389/fnins.2018.00434. The Lottery Ticket Hypothesis with Jonathan Frankle (2020). Available at: https://www.youtube.com/watch?v=SfjJoevBbjU [Accessed October Frankle, J., and Carbin, M. (2019). The Lottery Ticket Hypothesis: Finding 14, 2020]. Sparse, Trainable Neural Networks. arXiv:1803.03635 [cs] . Available at: http://arxiv.org/abs/1803.03635 [Accessed October 13, 2020]. Yu, H., Edunov, S., Tian, Y., and Morcos, A. S. (2020). Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP. Frankle, J., Dziugaite, G. K., Roy, D. M., and Carbin, M. (2020a). Linear arXiv:1906.02768 [cs, stat] . Available at: Mode Connectivity and the Lottery Ticket Hypothesis. http://arxiv.org/abs/1906.02768 [Accessed October 14, 2020]. arXiv:1912.05671 [cs, stat] . Available at: http://arxiv.org/abs/1912.05671 [Accessed October 14, 2020]. Thank you Frankle, J., Schwab, D. J., and Morcos, A. S. (2020b). The Early Phase of Neural Network Training. arXiv:2002.10365 [cs, stat] . Available at: http://arxiv.org/abs/2002.10365 [Accessed October 14, 2020]. 23
Recommend
More recommend