small relu networks are powerful memorizers a tight
play

Small ReLU networks are powerful memorizers: a tight analysis of - PowerPoint PPT Presentation

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT Given a ReLU fully-connected network, how many hidden nodes


  1. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT

  2. Given a ReLU fully-connected network, 
 how many hidden nodes are required to memorize arbitrary data points? N Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  3. Given a ReLU fully-connected network, 
 how many hidden nodes are required to memorize arbitrary data points? N 1-hidden-layer, scalar regression: N d x ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  4. We prove that for 2-hidden-layer networks, neurons are su ffi cient . 
 Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  5. We prove that for 2-hidden-layer networks, neurons are su ffi cient . 
 Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  6. We prove that for 2-hidden-layer networks, neurons are su ffi cient . 
 Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) d y d x 2 Nd y 2 Nd y ⋮ Depth-width trade-o ff ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  7. Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  8. Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N ⋮ ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  9. Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N ⋮ ⋮ ⋮ ⋮ ⋮ ImageNet ( 1M, 1k) memorized with 2k-2k-4k N = d y = Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  10. Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N Depth-width trade-o ff ⋮ ⋮ ⋮ ⋮ ⋮ ImageNet ( 1M, 1k) memorized with 2k-2k-4k N = d y = Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  11. 2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  12. 2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ hidden layers: L 8 Nd y 8 Nd y d y … d x ≈ ≈ L L ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  13. 2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ hidden layers: L 8 Nd y 8 Nd y d y … d x ≈ ≈ L L ⋮ ⋮ ⋮ ⋮ A Network with params can memorize if W = Ω ( N ) W Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  14. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  15. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) ⟹ C = Θ ( W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  16. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  17. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) su ffi cient for -hidden-layer W = Ω ( N ) L ⟹ C = Ω ( W ) C ≤ VCdim = O ( WL log W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  18. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) su ffi cient for -hidden-layer W = Ω ( N ) L t h g i T y ⟹ C = Ω ( W ) l r a e N C ≤ VCdim = O ( WL log W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  19. Other results - Tighter su ffi cient condition for memorizing in residual network - SGD trajectory analysis near memorizing global minimum Poster #233 Wed Dec 11th 5PM-7PM @ East Exhibition Hall B + C Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Recommend


More recommend