Neural Architecture Optimization 神经网络结构优化 赵鉴
CONTENTS 1.AutoML 2.NAS 3.NAO 4.Experiments 5.Conclusion
01 01 AutoML Auto Machine Learning
Typical Machine Learning • Fixed data order • Fixed model space • Fixed loss function 4
Auto Machine Learning • Auto data selection/process • Auto model selection and training • Auto hyper parameter tuning 5
02 02 NAS Neural Architecture Search
Architecture of a Neural Network is Crucial to its Performance ImageNet Winning Neural Architectures AlexNet 2012 Inception 2014 ResNet 2015 ZFNet 2013 7
NAS Neural Architecture Search Automatic Not many human Given Dataset efforts i.e., CIFAR-10, CIFAR-100 PTB, WikiText-2 Output … Network architecture that fits given dataset Target Task on the target task Goal i.e., image classification, well language modeling, Alleviate the pain of … human efforts 8
General Framework Generate Architectures Child Controller Network Train and Get Valid Performance 9
Typical Search Methods/Algorithms • Reinforcement Learning • Evolutionary Computing • Take each architecture choice • Changing the architecture as (i.e., sub-architecture) as mutation and selection action • Take the valid performance as • Take valid performance as fitness reward • Evolve the architectures • Use policy gradient to search the best action • AmoebaNet • NAS-RL (Google, 2017) • … • NASNet (Google, 2017) • ENAS (CMU & Google, 2018) • … 10
Results of Previous NAS Works • In terms of building • In terms of pushing SOTA results products with AutoML • On ImageNet • Microsoft, Google, … • Startups focus on AutoML 11
03 03 Neural Architecture Optimization Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu NIPS 2018
Are Previous NAS Works Perfect Enough? Why Search in Discrete How about Optimize in space? Continuous Space? • Exponentially large and thus • Compact and easy to optimize hard to search • Bring gradient (based optimization) back! 13
Basic Methods • Use a string to indicate the architectures • Search based on the data (𝑦, 𝑧) , where 𝑦 is arch string, 𝑧 is its valid performance “node 2, conv 1x1, node 1, max pooling, node 1, max pooling, node 1, conv 3x3, node 2, conv 3x3, node 2, conv 1x1” 14
Neural Architecture Optimization (NAO) Encoder - LSTM 01 • Encodes the discrete string tokens 𝒚 to an embedding vector 𝒇 𝒚 in continuous space Performance Predictor - FCN 02 • Maps 𝒇 𝒚 to its valid performance • Move towards the direction of gradients Decoder - LSTM 03 • Decoders the embedding vector 𝒇 𝒚′ back to the discrete tokens 𝒚′ 15
Gradient-Based Search in Continuous Space 16
Training & Inferencing • Train Encoder-Predictor-Decoder • Architecture pool of hundreds of (𝑦, 𝑧) pairs • Data augmentation: • symmetry architectures, swap two branches • i.e. “node1 conv 1x1 node2 conv 3x3” - > “node2 conv 3x3 node1 conv 1x1” • Encoder maps architecture 𝑦 into 𝒇 𝒚 • Performance-Predictor loss: squared error • 𝑴 𝒒𝒒 = 𝒚∈𝒀 (𝒕 𝒚 − 𝒈(𝒇 𝒚 )) 𝟑 • Decoder loss: reconstruction loss, nll loss • 𝑴 𝒔𝒇𝒅 = 𝒚∈𝒀 (− 𝐦𝐩𝐡 𝒇 𝑸 𝑬 𝒚 𝒇 𝒚 ) • Jointly train three components together • 𝑴 = 𝝁𝑴 𝒒𝒒 + (𝟐 − 𝝁)𝑴 𝒔𝒇𝒅 • Generate new architectures: • Generate new architecture embedding with step size 𝜽: 𝒇 𝒚′ = 𝒇 𝒚 + 𝜽𝛂𝒇 𝒚 • Decoder maps 𝒇 𝒚′ back into 𝒚 ′ 17 • Iterate: Train and evaluate new generated architectures and iterate over above steps
h[i-1] h[i] Weight Share conv 1x1 conv 1x1 conv 1x1 conv 1x1 conv 1x1 conv 1x1 conv 3x3 conv 3x3 conv 3x3 conv 3x3 conv 3x3 conv 3x3 max pool pool max pool max pool max max pool max pool avg pool avg pool avg pool avg pool avg pool avg pool add add add concat Architecture 1: “node 2, conv 1x1, node 1, max pooling, node 1, max pooling, node 1, conv 3x3, node 2, conv 3x3, node 2, conv 1x1” Architecture 2: “node 1, conv 3x3, node 2, max pooling, node 2, conv 1x1, node 2, conv 1x1, node 1, conv 3x3, node 1, max poo lin g”
04 04 Experiments and Results
Task Language Modeling Image Classification Modeling the probability Classify the images distribution over sequences of words in natural language CIFAR-10 PTB 10 classes Penn Tree Bank 50000 images for training 10000 images for testing CIFAR-100 WT2 100 classes WikiText-2 50000 images for training 10000 images for testing 20
CIFAR-10 21
Transfer to CIFAR-100 22
PTB 23
Transfer to WikiText-2 24
05 05 Conclusion
Conclusion New automatic architecture design algorithm • Encodes discrete description into continuous embedding • Performs the optimization within continuous space • Uses gradient based method rather than search discrete decisions Project Link • Paper Link: https://arxiv.org/abs/1808.07233 • Code Link: https://github.com/renqianluo/NAO 26
Thanks . QA
Recommend
More recommend