Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang*, Tianyi Zhou*, Jeff A. Bilmes University of Washington, Seattle
Dropout has a few Drawbacks... • Dropout encourages DNNs to apply the same linear model to different data points but does not enforce local smoothness. • Dropping zeros has no effects but still counts in drop rates. • Dropout does not work well with BatchNorm. B D B D
Dropout has a few Drawbacks... • Dropout encourages DNNs to apply the same linear model to different data points but does not enforce local smoothness. • Dropping zeros has no effects but still counts in drop rates. • Dropout does not work well with BatchNorm. B D • Jumpout improves dropout with three modifications with (almost) no extra computation/memory costs.
Jumpout Modification I – Encourage Local Smoothness • Instead of applying a constant dropout rate, the dropout rate is sampled from the positive part of a gaussian distribution, and the standard deviation is used to control the strength of regularization. Data point Monotone dropout rate Row of W Constant dropout rate
Jumpout Modification II - Better Control of Regularization • The dropout rate is 0.8 normalized by the 0.7 proportion of active neurons 0.6 ReLU Activation Portion of the input layer so that we 0.5 can better control the 0.4 regularization for different 0.3 layers and for different 0.2 training stages. 0.1 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 Epoch conv 1 conv 2 conv 3 fc 1 Portion of active neurons across training epochs for different layers.
Jumpout Modification III - Synergize well with Batchnorm • The rescaling factor for training is changed to 1 – 𝑞 $%.'( to account for both the changes of the mean and the variance. 1.2 1.2 1.15 1.15 1.1 1.1 1.05 1.05 Variance Ratio Mean Ratio 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.8 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 Epoch Epoch (1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75) (1 - p)^(-1) (1 - p)^(-0.5) (1 - p)^(-0.75) Changes of Mean (left) and Variance (right) when applying various rescaling factors. Blue: Dropout Grey: Jumpout
Results STL10
Thank you! • For more details, please come to our poster session Tuesday 06:30 - 09:00 PM Pacific Ballroom #29
Recommend
More recommend