neural architecture search
play

Neural Architecture Search CS 4803 / 7643 Deep Learning Erik - PowerPoint PPT Presentation

Neural Architecture Search CS 4803 / 7643 Deep Learning Erik Wijmans, 10/29/2020 Background 2 Background <latexit


  1. Neural Architecture Search CS 4803 / 7643 Deep Learning Erik Wijmans, 10/29/2020

  2. Background 2

  3. Background <latexit sha1_base64="CaktXdkYARXyqYFqNAIrfVL3XGI=">ACWXicbVFda9swFJXdJc2yj2btY1/EwiCBEeyx0sFeStdCH/bQwdIWYmNk5ToRlWQjXY8G4z/Zh8HYX+lDldiMLd0BcQ/nfkj3KC2ksBgEvzx/51mnu9t73n/x8tXrvcGb/Subl4bDlOcyNzcpsyCFhikKlHBTGAqlXCd3n5Z569/gLEi19xVUCs2EKLTHCGTkoGRaSETqoIl4CspFiuEzT6rxOqtHde7oa08gK1eicyeqsriMJGc7+KF8bYZQ14e5zMysyYrHEsRvRsibEyWAYTIN6FMStmRIWlwmg/tonvNSgUYumbWzMCgwrphBwSXU/ai0UDB+yxYwc1QzBTauNs7U9J1T5jTLjTsa6Ub9u6NiytqVSl3leh+7nVuL/8vNSsw+xZXQRYmgeXNRVkqKOV3bTOfCAEe5coRxI9xbKV8ywzi6z+g7E8LtlZ+Sqw+T8GgSfPs4PDlt7eiRQ/KWjEhIjskJuSCXZEo4+UkevI7X9X7nt/z+02p7U9B+Qf+AePvBy1iA=</latexit> min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] θ 3

  4. Background <latexit sha1_base64="CaktXdkYARXyqYFqNAIrfVL3XGI=">ACWXicbVFda9swFJXdJc2yj2btY1/EwiCBEeyx0sFeStdCH/bQwdIWYmNk5ToRlWQjXY8G4z/Zh8HYX+lDldiMLd0BcQ/nfkj3KC2ksBgEvzx/51mnu9t73n/x8tXrvcGb/Subl4bDlOcyNzcpsyCFhikKlHBTGAqlXCd3n5Z569/gLEi19xVUCs2EKLTHCGTkoGRaSETqoIl4CspFiuEzT6rxOqtHde7oa08gK1eicyeqsriMJGc7+KF8bYZQ14e5zMysyYrHEsRvRsibEyWAYTIN6FMStmRIWlwmg/tonvNSgUYumbWzMCgwrphBwSXU/ai0UDB+yxYwc1QzBTauNs7U9J1T5jTLjTsa6Ub9u6NiytqVSl3leh+7nVuL/8vNSsw+xZXQRYmgeXNRVkqKOV3bTOfCAEe5coRxI9xbKV8ywzi6z+g7E8LtlZ+Sqw+T8GgSfPs4PDlt7eiRQ/KWjEhIjskJuSCXZEo4+UkevI7X9X7nt/z+02p7U9B+Qf+AePvBy1iA=</latexit> min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] θ 4

  5. Background 5

  6. Background <latexit sha1_base64="kwMgK7bRn8ZwDzGpM64k+2Cfa3M=">ACcnicbVFdaxQxFM2MH61btavFwUbXYQtlGWmKAp9Ka2KDz5UcNvCZhgy2Tu7oUlmSO5Il2F+gH/PN3+FL/0Bze4MfrReCPdwTu69uSdZqaTDKPoZhLdu37m7tn6vt3H/wcPN/qPHJ6orICxKFRhzLuQEkDY5So4Ky0wHWm4DQ7P1rqp9/AOlmYr7goIdF8ZmQuBUdPpf3vTEuT1jl0lCmOc4FV/XHpqGtwHAOyJtWyrL6Q5PWw4tdutihzEn9p+R90zAFOU5+M59bYpi36WK/7cWsnM1x7foUJuStD+IRtEq6E0Qd2BAujhO+z/YtBCVBoNCcecmcVRiUnOLUihoeqxyUHJxzmcw8dBwDS6pV5Y19JVnpjQvrD8G6Yr9u6Lm2rmFzvzN5T7urYk/6dNKszfJbU0ZYVgRDsorxTFgi79p1NpQaBaeMCFlf6tVMy5QL9L/W8CfH1lW+Ck71R/GYUfXk9ODjs7Fgnz8hLMiQxeUsOyCdyTMZEkF/Bk+B5sB1chk/DF2HnXRh0NVvknwh3rwAOiL7n</latexit> min f ∈ F min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] θ 6

  7. Background <latexit sha1_base64="kwMgK7bRn8ZwDzGpM64k+2Cfa3M=">ACcnicbVFdaxQxFM2MH61btavFwUbXYQtlGWmKAp9Ka2KDz5UcNvCZhgy2Tu7oUlmSO5Il2F+gH/PN3+FL/0Bze4MfrReCPdwTu69uSdZqaTDKPoZhLdu37m7tn6vt3H/wcPN/qPHJ6orICxKFRhzLuQEkDY5So4Ky0wHWm4DQ7P1rqp9/AOlmYr7goIdF8ZmQuBUdPpf3vTEuT1jl0lCmOc4FV/XHpqGtwHAOyJtWyrL6Q5PWw4tdutihzEn9p+R90zAFOU5+M59bYpi36WK/7cWsnM1x7foUJuStD+IRtEq6E0Qd2BAujhO+z/YtBCVBoNCcecmcVRiUnOLUihoeqxyUHJxzmcw8dBwDS6pV5Y19JVnpjQvrD8G6Yr9u6Lm2rmFzvzN5T7urYk/6dNKszfJbU0ZYVgRDsorxTFgi79p1NpQaBaeMCFlf6tVMy5QL9L/W8CfH1lW+Ck71R/GYUfXk9ODjs7Fgnz8hLMiQxeUsOyCdyTMZEkF/Bk+B5sB1chk/DF2HnXRh0NVvknwh3rwAOiL7n</latexit> min f ∈ F min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] θ Set of networks 7

  8. Neural Architecture Search 8

  9. Neural Architecture Search High Level Overview 9

  10. Neural Architecture Search High Level Overview Search Space 10

  11. Neural Architecture Search High Level Overview <latexit sha1_base64="kwMgK7bRn8ZwDzGpM64k+2Cfa3M=">ACcnicbVFdaxQxFM2MH61btavFwUbXYQtlGWmKAp9Ka2KDz5UcNvCZhgy2Tu7oUlmSO5Il2F+gH/PN3+FL/0Bze4MfrReCPdwTu69uSdZqaTDKPoZhLdu37m7tn6vt3H/wcPN/qPHJ6orICxKFRhzLuQEkDY5So4Ky0wHWm4DQ7P1rqp9/AOlmYr7goIdF8ZmQuBUdPpf3vTEuT1jl0lCmOc4FV/XHpqGtwHAOyJtWyrL6Q5PWw4tdutihzEn9p+R90zAFOU5+M59bYpi36WK/7cWsnM1x7foUJuStD+IRtEq6E0Qd2BAujhO+z/YtBCVBoNCcecmcVRiUnOLUihoeqxyUHJxzmcw8dBwDS6pV5Y19JVnpjQvrD8G6Yr9u6Lm2rmFzvzN5T7urYk/6dNKszfJbU0ZYVgRDsorxTFgi79p1NpQaBaeMCFlf6tVMy5QL9L/W8CfH1lW+Ck71R/GYUfXk9ODjs7Fgnz8hLMiQxeUsOyCdyTMZEkF/Bk+B5sB1chk/DF2HnXRh0NVvknwh3rwAOiL7n</latexit> min f ∈ F min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] Search Space θ Set of networks 11

  12. Neural Architecture Search High Level Overview Search Space Search Method 12

  13. Neural Architecture Search High Level Overview Search Space Proposed 
 Architecture Search Method Evaluation Method 13

  14. Neural Architecture Search High Level Overview Search Space Proposed 
 Architecture Search Method Evaluation Method 14

  15. Neural Architecture Search High Level Overview Search Space Best Model Proposed 
 Architecture Search Method Evaluation Method 15

  16. Neural Architecture Search High Level Overview Search Space Best Model Proposed 
 Architecture Search Method Evaluation Method 16

  17. Neural Architecture Search Evaluation Method 17

  18. Neural Architecture Search Evaluation Method • Generally, this is performance on held-out data. 18

  19. Neural Architecture Search Evaluation Method • Generally, this is performance on held-out data. • Evaluation is typically done by (partially) training the network and evaluating its performance on held-out data. 19

  20. Neural Architecture Search High Level Overview Search Space Proposed 
 Architecture Search Method Evaluation Method 20

  21. Neural Architecture Search High Level Overview Search Space Proposed 
 Architecture Search Method Evaluation Method 21

  22. Search via Reinforcement Learning 22

  23. Search via Reinforcement Learning NAS-RL 23

  24. Search via Reinforcement Learning NAS-RL • Motivated by the observation that a DNN architecture can be specified by a string of variable length (i.e. Breadth-first traversal of their DAG) 24

  25. Search via Reinforcement Learning NAS-RL • Motivated by the observation that a DNN architecture can be specified by a string of variable length (i.e. Breadth-first traversal of their DAG) • Use reinforcement learning to train an RNN that builds the network 25

  26. Search via Reinforcement Learning NAS-RL Input Op 1 Op 2 Op N Softmax 26

  27. Search via Reinforcement Learning NAS-RL Input Op 1 Op 2 Op N Softmax 27

  28. Search via Reinforcement Learning NAS-RL 28

  29. Search via Reinforcement Learning NAS-RL 29

  30. Search via Reinforcement Learning NAS-RL 30

  31. Search via Reinforcement Learning NAS-RL • Performance is on-par with other CNNs of the time 31

  32. Search via Reinforcement Learning NAS-RL • This is a very general method 32

  33. Search via Reinforcement Learning NAS-RL • This is a very general method • The cost of that is compute: This used 800 GPUs (for an unspecified amount of time) and trained >12,000 candidate architectures 33

  34. Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 34

  35. Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” • This is similar to “Human Neural Architecture Search” 35

  36. Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 36

  37. Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 37

  38. Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 38

  39. Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 39

  40. Search via Reinforcement Learning NASNet h i+1 concat add add add add add sep ! iden ! sep ! sep ! avg ! iden ! avg ! avg ! sep ! sep ! 3x3 tity 3x3 5x5 3x3 tity 3x3 3x3 5x5 3x3 h i ... h i-1 Normal Cell 40

  41. Search via Reinforcement Learning NASNet h i+1 concat add add max ! sep ! avg ! iden ! 3x3 3x3 3x3 tity add add add sep ! sep ! max ! sep ! avg ! sep ! 7x7 5x5 3x3 7x7 3x3 5x5 h i ... h i-1 Reduction Cell 41

  42. Search via Reinforcement Learning NASNet • Performance is on-par with other CNNs at the time but with less parameters/compute 42

  43. Application Efficient Neural Networks (MnasNet) 43

  44. Application Efficient Neural Networks (MnasNet) • One benefit of search via RL is that validation performance need not be the only metric 44

  45. Application Efficient Neural Networks (MnasNet) • One benefit of search via RL is that validation performance need not be the only metric 45

  46. Application Efficient Neural Networks (MnasNet) • One benefit of search via RL is that validation performance need not be the only metric 46

  47. Application Efficient Neural Networks (MnasNet) • One benefit of search via RL is that validation performance need not be the only metric 47

  48. Search via Gradient Optimization Differentiable Architecture Search (DARTS) 48

  49. Search via Gradient Optimization Differentiable Architecture Search (DARTS) 49

  50. Search via Gradient Optimization Differentiable Architecture Search (DARTS) 50

  51. Search via Gradient Optimization Differentiable Architecture Search (DARTS) 51

  52. Search via Gradient Optimization Differentiable Architecture Search (DARTS) 52

Recommend


More recommend