Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT
Given a ReLU fully-connected network, how many hidden nodes are required to memorize arbitrary data points? N Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Given a ReLU fully-connected network, how many hidden nodes are required to memorize arbitrary data points? N 1-hidden-layer, scalar regression: N d x ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
We prove that for 2-hidden-layer networks, neurons are su ffi cient . Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
We prove that for 2-hidden-layer networks, neurons are su ffi cient . Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
We prove that for 2-hidden-layer networks, neurons are su ffi cient . Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) d y d x 2 Nd y 2 Nd y ⋮ Depth-width trade-o ff ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N ⋮ ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N ⋮ ⋮ ⋮ ⋮ ⋮ ImageNet ( 1M, 1k) memorized with 2k-2k-4k N = d y = Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N Depth-width trade-o ff ⋮ ⋮ ⋮ ⋮ ⋮ ImageNet ( 1M, 1k) memorized with 2k-2k-4k N = d y = Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ hidden layers: L 8 Nd y 8 Nd y d y … d x ≈ ≈ L L ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ hidden layers: L 8 Nd y 8 Nd y d y … d x ≈ ≈ L L ⋮ ⋮ ⋮ ⋮ A Network with params can memorize if W = Ω ( N ) W Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) ⟹ C = Θ ( W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) su ffi cient for -hidden-layer W = Ω ( N ) L ⟹ C = Ω ( W ) C ≤ VCdim = O ( WL log W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) su ffi cient for -hidden-layer W = Ω ( N ) L t h g i T y ⟹ C = Ω ( W ) l r a e N C ≤ VCdim = O ( WL log W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Other results - Tighter su ffi cient condition for memorizing in residual network - SGD trajectory analysis near memorizing global minimum Poster #233 Wed Dec 11th 5PM-7PM @ East Exhibition Hall B + C Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Recommend
More recommend