Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jefg Dean 1 / 21
What is device placement Consider a T ensorFlow computational graph G, which ● consists of M operations {o 1 ,o 2 , …, o M }, and a list of D available devices. A placement P = {p 1 ,p 2 , …, p M } is an assignment of an ● operation o i to a device p i . 2 / 21
Why device placement Trend toward many-device training, bigger models, larger ● batch sizes Growth in size and computational requirements of training ● and inference 3 / 21
Typical approaches Use a heterogeneous distributed environment with a mixture ● of many CPUs and GPUs Often based on greedy heuristics ● Require deep understanding of devices: bandwidth, latency ● behavior Are not fmexible enough and does not generalize well ● 4 / 21
ML for device placement ML is repeatedly replacing rule based heuristics ● RL can be applied to device placement ● – Efgective search across large state and action spaces to fjnd optimal solution – Automatic learning from underlying environment only based on reward function 5 / 21
RL based device placement Output RL Input model Neural Model Assignment of ops in Policy Neural model to devices Available Devs CPU GPU Evaluate runtime 6 / 21
Problem formulation : expected runtime : trainable parameters of policy : runtime : policy : output placements 7 / 21
Training with REINFORCE Learn the network parameters using Adam optimizer based ● on policy gradients computed via the REINFORCE equation: Use K placement samples to estimate policy gradients & use ● a baseline term B to reduce variance: 8 / 21
Model architecture 9 / 21
Challenges Vanishing ● Exploding gradient issue ● Large memory footprints ● 10 / 21
Distributed training 11 / 21
Experiments Recurrent Neural Language Model (RNNLM) ● Neural Machine Translation with attention mechanism(NMT) ● Inception-V3 ● 12 / 21
Learned placement on NMT 13 / 21
NMT end-to-end runtime 14 / 21
Learned placement on Inception-V3 15 / 21
Inception-V3 end-to-end runtime 16 / 21
Profming on NMT 17 / 21
Profming on Inception-V3 18 / 21
Profming on Inception-V3 19 / 21
Running times (in seconds) 20 / 21
Summary Propose a RL model to optimize device placements for ● neural networks Use policy gradient to learn parameters ● Policy fjnds non-trival assignment of operations to devices ● that outperform heuristic approaches Profjling of results show policy learns implicit trade-ofgs ● between computation and communication in hardware 21 / 21
Recommend
More recommend