Neural Architecture Search CS 4803 / 7643 Deep Learning Erik Wijmans, 10/29/2020
Background 2
Background <latexit sha1_base64="CaktXdkYARXyqYFqNAIrfVL3XGI=">ACWXicbVFda9swFJXdJc2yj2btY1/EwiCBEeyx0sFeStdCH/bQwdIWYmNk5ToRlWQjXY8G4z/Zh8HYX+lDldiMLd0BcQ/nfkj3KC2ksBgEvzx/51mnu9t73n/x8tXrvcGb/Subl4bDlOcyNzcpsyCFhikKlHBTGAqlXCd3n5Z569/gLEi19xVUCs2EKLTHCGTkoGRaSETqoIl4CspFiuEzT6rxOqtHde7oa08gK1eicyeqsriMJGc7+KF8bYZQ14e5zMysyYrHEsRvRsibEyWAYTIN6FMStmRIWlwmg/tonvNSgUYumbWzMCgwrphBwSXU/ai0UDB+yxYwc1QzBTauNs7U9J1T5jTLjTsa6Ub9u6NiytqVSl3leh+7nVuL/8vNSsw+xZXQRYmgeXNRVkqKOV3bTOfCAEe5coRxI9xbKV8ywzi6z+g7E8LtlZ+Sqw+T8GgSfPs4PDlt7eiRQ/KWjEhIjskJuSCXZEo4+UkevI7X9X7nt/z+02p7U9B+Qf+AePvBy1iA=</latexit> min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] θ 3
Background <latexit sha1_base64="CaktXdkYARXyqYFqNAIrfVL3XGI=">ACWXicbVFda9swFJXdJc2yj2btY1/EwiCBEeyx0sFeStdCH/bQwdIWYmNk5ToRlWQjXY8G4z/Zh8HYX+lDldiMLd0BcQ/nfkj3KC2ksBgEvzx/51mnu9t73n/x8tXrvcGb/Subl4bDlOcyNzcpsyCFhikKlHBTGAqlXCd3n5Z569/gLEi19xVUCs2EKLTHCGTkoGRaSETqoIl4CspFiuEzT6rxOqtHde7oa08gK1eicyeqsriMJGc7+KF8bYZQ14e5zMysyYrHEsRvRsibEyWAYTIN6FMStmRIWlwmg/tonvNSgUYumbWzMCgwrphBwSXU/ai0UDB+yxYwc1QzBTauNs7U9J1T5jTLjTsa6Ub9u6NiytqVSl3leh+7nVuL/8vNSsw+xZXQRYmgeXNRVkqKOV3bTOfCAEe5coRxI9xbKV8ywzi6z+g7E8LtlZ+Sqw+T8GgSfPs4PDlt7eiRQ/KWjEhIjskJuSCXZEo4+UkevI7X9X7nt/z+02p7U9B+Qf+AePvBy1iA=</latexit> min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] θ 4
Background 5
Background <latexit sha1_base64="kwMgK7bRn8ZwDzGpM64k+2Cfa3M=">ACcnicbVFdaxQxFM2MH61btavFwUbXYQtlGWmKAp9Ka2KDz5UcNvCZhgy2Tu7oUlmSO5Il2F+gH/PN3+FL/0Bze4MfrReCPdwTu69uSdZqaTDKPoZhLdu37m7tn6vt3H/wcPN/qPHJ6orICxKFRhzLuQEkDY5So4Ky0wHWm4DQ7P1rqp9/AOlmYr7goIdF8ZmQuBUdPpf3vTEuT1jl0lCmOc4FV/XHpqGtwHAOyJtWyrL6Q5PWw4tdutihzEn9p+R90zAFOU5+M59bYpi36WK/7cWsnM1x7foUJuStD+IRtEq6E0Qd2BAujhO+z/YtBCVBoNCcecmcVRiUnOLUihoeqxyUHJxzmcw8dBwDS6pV5Y19JVnpjQvrD8G6Yr9u6Lm2rmFzvzN5T7urYk/6dNKszfJbU0ZYVgRDsorxTFgi79p1NpQaBaeMCFlf6tVMy5QL9L/W8CfH1lW+Ck71R/GYUfXk9ODjs7Fgnz8hLMiQxeUsOyCdyTMZEkF/Bk+B5sB1chk/DF2HnXRh0NVvknwh3rwAOiL7n</latexit> min f ∈ F min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] θ 6
Background <latexit sha1_base64="kwMgK7bRn8ZwDzGpM64k+2Cfa3M=">ACcnicbVFdaxQxFM2MH61btavFwUbXYQtlGWmKAp9Ka2KDz5UcNvCZhgy2Tu7oUlmSO5Il2F+gH/PN3+FL/0Bze4MfrReCPdwTu69uSdZqaTDKPoZhLdu37m7tn6vt3H/wcPN/qPHJ6orICxKFRhzLuQEkDY5So4Ky0wHWm4DQ7P1rqp9/AOlmYr7goIdF8ZmQuBUdPpf3vTEuT1jl0lCmOc4FV/XHpqGtwHAOyJtWyrL6Q5PWw4tdutihzEn9p+R90zAFOU5+M59bYpi36WK/7cWsnM1x7foUJuStD+IRtEq6E0Qd2BAujhO+z/YtBCVBoNCcecmcVRiUnOLUihoeqxyUHJxzmcw8dBwDS6pV5Y19JVnpjQvrD8G6Yr9u6Lm2rmFzvzN5T7urYk/6dNKszfJbU0ZYVgRDsorxTFgi79p1NpQaBaeMCFlf6tVMy5QL9L/W8CfH1lW+Ck71R/GYUfXk9ODjs7Fgnz8hLMiQxeUsOyCdyTMZEkF/Bk+B5sB1chk/DF2HnXRh0NVvknwh3rwAOiL7n</latexit> min f ∈ F min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] θ Set of networks 7
Neural Architecture Search 8
Neural Architecture Search High Level Overview 9
Neural Architecture Search High Level Overview Search Space 10
Neural Architecture Search High Level Overview <latexit sha1_base64="kwMgK7bRn8ZwDzGpM64k+2Cfa3M=">ACcnicbVFdaxQxFM2MH61btavFwUbXYQtlGWmKAp9Ka2KDz5UcNvCZhgy2Tu7oUlmSO5Il2F+gH/PN3+FL/0Bze4MfrReCPdwTu69uSdZqaTDKPoZhLdu37m7tn6vt3H/wcPN/qPHJ6orICxKFRhzLuQEkDY5So4Ky0wHWm4DQ7P1rqp9/AOlmYr7goIdF8ZmQuBUdPpf3vTEuT1jl0lCmOc4FV/XHpqGtwHAOyJtWyrL6Q5PWw4tdutihzEn9p+R90zAFOU5+M59bYpi36WK/7cWsnM1x7foUJuStD+IRtEq6E0Qd2BAujhO+z/YtBCVBoNCcecmcVRiUnOLUihoeqxyUHJxzmcw8dBwDS6pV5Y19JVnpjQvrD8G6Yr9u6Lm2rmFzvzN5T7urYk/6dNKszfJbU0ZYVgRDsorxTFgi79p1NpQaBaeMCFlf6tVMy5QL9L/W8CfH1lW+Ck71R/GYUfXk9ODjs7Fgnz8hLMiQxeUsOyCdyTMZEkF/Bk+B5sB1chk/DF2HnXRh0NVvknwh3rwAOiL7n</latexit> min f ∈ F min E ( x,y ) ∼ D [ L ( f ( x ; θ ) , y )] Search Space θ Set of networks 11
Neural Architecture Search High Level Overview Search Space Search Method 12
Neural Architecture Search High Level Overview Search Space Proposed Architecture Search Method Evaluation Method 13
Neural Architecture Search High Level Overview Search Space Proposed Architecture Search Method Evaluation Method 14
Neural Architecture Search High Level Overview Search Space Best Model Proposed Architecture Search Method Evaluation Method 15
Neural Architecture Search High Level Overview Search Space Best Model Proposed Architecture Search Method Evaluation Method 16
Neural Architecture Search Evaluation Method 17
Neural Architecture Search Evaluation Method • Generally, this is performance on held-out data. 18
Neural Architecture Search Evaluation Method • Generally, this is performance on held-out data. • Evaluation is typically done by (partially) training the network and evaluating its performance on held-out data. 19
Neural Architecture Search High Level Overview Search Space Proposed Architecture Search Method Evaluation Method 20
Neural Architecture Search High Level Overview Search Space Proposed Architecture Search Method Evaluation Method 21
Search via Reinforcement Learning 22
Search via Reinforcement Learning NAS-RL 23
Search via Reinforcement Learning NAS-RL • Motivated by the observation that a DNN architecture can be specified by a string of variable length (i.e. Breadth-first traversal of their DAG) 24
Search via Reinforcement Learning NAS-RL • Motivated by the observation that a DNN architecture can be specified by a string of variable length (i.e. Breadth-first traversal of their DAG) • Use reinforcement learning to train an RNN that builds the network 25
Search via Reinforcement Learning NAS-RL Input Op 1 Op 2 Op N Softmax 26
Search via Reinforcement Learning NAS-RL Input Op 1 Op 2 Op N Softmax 27
Search via Reinforcement Learning NAS-RL 28
Search via Reinforcement Learning NAS-RL 29
Search via Reinforcement Learning NAS-RL 30
Search via Reinforcement Learning NAS-RL • Performance is on-par with other CNNs of the time 31
Search via Reinforcement Learning NAS-RL • This is a very general method 32
Search via Reinforcement Learning NAS-RL • This is a very general method • The cost of that is compute: This used 800 GPUs (for an unspecified amount of time) and trained >12,000 candidate architectures 33
Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 34
Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” • This is similar to “Human Neural Architecture Search” 35
Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 36
Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 37
Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 38
Search via Reinforcement Learning NASNet • Instead, limit the search space with “blocks” 39
Search via Reinforcement Learning NASNet h i+1 concat add add add add add sep ! iden ! sep ! sep ! avg ! iden ! avg ! avg ! sep ! sep ! 3x3 tity 3x3 5x5 3x3 tity 3x3 3x3 5x5 3x3 h i ... h i-1 Normal Cell 40
Search via Reinforcement Learning NASNet h i+1 concat add add max ! sep ! avg ! iden ! 3x3 3x3 3x3 tity add add add sep ! sep ! max ! sep ! avg ! sep ! 7x7 5x5 3x3 7x7 3x3 5x5 h i ... h i-1 Reduction Cell 41
Search via Reinforcement Learning NASNet • Performance is on-par with other CNNs at the time but with less parameters/compute 42
Application Efficient Neural Networks (MnasNet) 43
Application Efficient Neural Networks (MnasNet) • One benefit of search via RL is that validation performance need not be the only metric 44
Application Efficient Neural Networks (MnasNet) • One benefit of search via RL is that validation performance need not be the only metric 45
Application Efficient Neural Networks (MnasNet) • One benefit of search via RL is that validation performance need not be the only metric 46
Application Efficient Neural Networks (MnasNet) • One benefit of search via RL is that validation performance need not be the only metric 47
Search via Gradient Optimization Differentiable Architecture Search (DARTS) 48
Search via Gradient Optimization Differentiable Architecture Search (DARTS) 49
Search via Gradient Optimization Differentiable Architecture Search (DARTS) 50
Search via Gradient Optimization Differentiable Architecture Search (DARTS) 51
Search via Gradient Optimization Differentiable Architecture Search (DARTS) 52
Recommend
More recommend