Dropout as a Structured Shrinkage Prior Eric Nalisnick , José Miguel Hernández-Lobato , Padhraic Smyth University of Cambridge University of California, Irvine
Dropout & Multiplicative Noise (2012) Standard Neural After Applying Network Dropout 2
<latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout & Multiplicative Noise Implementation as Multiplicative Noise : (2012) Hidden Units Weights Diagonal Matrix of Random Variables λ i,i ∼ p ( λ ) Standard Neural After Applying Network Dropout 3
<latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout & Multiplicative Noise Implementation as Multiplicative Noise : (2012) Hidden Units Weights Diagonal Matrix of Random Variables λ i,i ∼ p ( λ ) • Dropout corresponds to p( λ ) being Bernoulli. • Gaussian, beta, and uniform noise have Standard Neural After Applying Network Dropout been shown to work as well. 4
<latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout & Multiplicative Noise Implementation as Multiplicative Noise : (2012) Hidden Units Weights Diagonal Matrix of Random Variables λ i,i ∼ p ( λ ) • Dropout corresponds to p( λ ) being Bernoulli. • Gaussian, beta, and uniform noise have Standard Neural After Applying Network Dropout been shown to work as well. 5
Dropout as a Gaussian Scale Mixture 6
Dropout as a Gaussian Scale Mixture Gaussian Scale Mixtures A random variable θ is a Gaussian scale mixture iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : 7
Dropout as a Gaussian Scale Mixture Gaussian Scale Mixtures A random variable θ is a Gaussian scale mixture iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Can be reparametrized into a hierarchical form : 8
<latexit sha1_base64="US8pWfsk+4hnLFkFwbrxIjiSxQ=">ACEnicbVDLSgNBEJyNrxhfUY9eBoOQAi7UdBjwIsniWAekMRldjJxszsLjO9alj2G7z4K148KOLVkzf/xsnjoIkFDUVN91dXi4Btv+tlJLyura+n1zMbm1vZOdnevroNIUVajgQhU0yOaCe6zGnAQrBkqRqQnWMbno/9xh1Tmgf+NYxC1pGk7/MepwSM5GYL927Mi7cJbmsucRvYA8SXSd4u4rHSl+QmLidubCcFN5uzS/YEeJE4M5JDM1Td7Fe7G9BIMh+oIFq3HDuETkwUcCpYkmlHmoWEDkmftQz1iWS6E09eSvCRUbq4FyhTPuCJ+nsiJlLrkfRMpyQw0PeWPzPa0XQO+vE3A8jYD6dLupFAkOAx/ngLleMghgZQqji5lZMB0QRCibFjAnBmX95kdTLJe4VL46yVXILI40OkCHKI8cdIoq6AJVUQ1R9Iie0St6s56sF+vd+pi2pqzZzD76A+vzB6ZAnOo=</latexit> <latexit sha1_base64="WqtOMyEvjU7vCAHV+56b9SYRNo=">ACKnicbVDLSsNAFJ34rPVdelmsAgVtCRV0GXFjQsXFewDmhAmk0k7dDIJMxOhHyPG3/FTRdKceuHOGkjaOuFYc6ce+6de48XMyqVaU6NldW19Y3N0lZ5e2d3b79ycNiRUSIwaeOIRaLnIUkY5aStqGKkFwuCQo+Rrje6y/PdZyIkjfiTGsfECdGA04BipDTlVm4DN2VZzQ6RGnpBOszclJ+zCyuDthcxX45DfaX2g+7oyzXwh9td/Y8cytVs27OAi4DqwBVUETLrUxsP8JSLjCDEnZt8xYOSkSimJGsrKdSBIjPEID0teQo5BIJ52tmsFTzfgwiIQ+XMEZ+7siRaHMh9bKfEy5mMvJ/3L9RAU3Tkp5nCjC8fyjIGFQRTD3DfpUEKzYWAOEBdWzQjxEAmGl3S1rE6zFlZdBp1G3LuNx6tqExV2lMAxOAE1YIFr0AT3oAXaAIMX8AbewYfxakyMqfE5l64YRc0R+BPG1ze4V6jG</latexit> <latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout as a Gaussian Scale Mixture Let’s assume a Gaussian prior on Gaussian Scale Mixtures the NN weights … A random variable θ is a Gaussian scale mixture f l ( h n,l − 1 Λ l W l ) iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Noise Weights λ i,i ∼ p ( λ ) w i,j ∼ N(0 , σ 2 0 ) Can be reparametrized into a hierarchical form : 9
<latexit sha1_base64="WqtOMyEvjU7vCAHV+56b9SYRNo=">ACKnicbVDLSsNAFJ34rPVdelmsAgVtCRV0GXFjQsXFewDmhAmk0k7dDIJMxOhHyPG3/FTRdKceuHOGkjaOuFYc6ce+6de48XMyqVaU6NldW19Y3N0lZ5e2d3b79ycNiRUSIwaeOIRaLnIUkY5aStqGKkFwuCQo+Rrje6y/PdZyIkjfiTGsfECdGA04BipDTlVm4DN2VZzQ6RGnpBOszclJ+zCyuDthcxX45DfaX2g+7oyzXwh9td/Y8cytVs27OAi4DqwBVUETLrUxsP8JSLjCDEnZt8xYOSkSimJGsrKdSBIjPEID0teQo5BIJ52tmsFTzfgwiIQ+XMEZ+7siRaHMh9bKfEy5mMvJ/3L9RAU3Tkp5nCjC8fyjIGFQRTD3DfpUEKzYWAOEBdWzQjxEAmGl3S1rE6zFlZdBp1G3LuNx6tqExV2lMAxOAE1YIFr0AT3oAXaAIMX8AbewYfxakyMqfE5l64YRc0R+BPG1ze4V6jG</latexit> Dropout as a Gaussian Scale Mixture Let’s assume a Gaussian prior on Gaussian Scale Mixtures the NN weights … A random variable θ is a Gaussian scale mixture f l ( h n,l − 1 Λ l W l ) iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Definition of a Gaussian Scale Mixture Can be reparametrized into a hierarchical form : 10
<latexit sha1_base64="WqtOMyEvjU7vCAHV+56b9SYRNo=">ACKnicbVDLSsNAFJ34rPVdelmsAgVtCRV0GXFjQsXFewDmhAmk0k7dDIJMxOhHyPG3/FTRdKceuHOGkjaOuFYc6ce+6de48XMyqVaU6NldW19Y3N0lZ5e2d3b79ycNiRUSIwaeOIRaLnIUkY5aStqGKkFwuCQo+Rrje6y/PdZyIkjfiTGsfECdGA04BipDTlVm4DN2VZzQ6RGnpBOszclJ+zCyuDthcxX45DfaX2g+7oyzXwh9td/Y8cytVs27OAi4DqwBVUETLrUxsP8JSLjCDEnZt8xYOSkSimJGsrKdSBIjPEID0teQo5BIJ52tmsFTzfgwiIQ+XMEZ+7siRaHMh9bKfEy5mMvJ/3L9RAU3Tkp5nCjC8fyjIGFQRTD3DfpUEKzYWAOEBdWzQjxEAmGl3S1rE6zFlZdBp1G3LuNx6tqExV2lMAxOAE1YIFr0AT3oAXaAIMX8AbewYfxakyMqfE5l64YRc0R+BPG1ze4V6jG</latexit> Dropout as a Gaussian Scale Mixture Let’s assume a Gaussian prior on Gaussian Scale Mixtures the NN weights … A random variable θ is a Gaussian scale mixture f l ( h n,l − 1 Λ l W l ) iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Definition of a Gaussian Scale Mixture SWITCH TO HIERARCHICAL PARAMETRIZATION Can be reparametrized into a hierarchical form : 11
Recommend
More recommend