device placement optimization with reinforcement learning
play

Device Placement Optimization with Reinforcement Learning Azalia - PowerPoint PPT Presentation

Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jefg Dean 1 / 21 What is device placement Consider a


  1. Device Placement Optimization with Reinforcement Learning Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jefg Dean 1 / 21

  2. What is device placement Consider a T ensorFlow computational graph G, which ● consists of M operations {o 1 ,o 2 , …, o M }, and a list of D available devices. A placement P = {p 1 ,p 2 , …, p M } is an assignment of an ● operation o i to a device p i . 2 / 21

  3. Why device placement Trend toward many-device training, bigger models, larger ● batch sizes Growth in size and computational requirements of training ● and inference 3 / 21

  4. Typical approaches Use a heterogeneous distributed environment with a mixture ● of many CPUs and GPUs Often based on greedy heuristics ● Require deep understanding of devices: bandwidth, latency ● behavior Are not fmexible enough and does not generalize well ● 4 / 21

  5. ML for device placement ML is repeatedly replacing rule based heuristics ● RL can be applied to device placement ● – Efgective search across large state and action spaces to fjnd optimal solution – Automatic learning from underlying environment only based on reward function 5 / 21

  6. RL based device placement Output RL Input model Neural Model Assignment of ops in Policy Neural model to devices Available Devs CPU GPU Evaluate runtime 6 / 21

  7. Problem formulation : expected runtime : trainable parameters of policy : runtime : policy : output placements 7 / 21

  8. Training with REINFORCE Learn the network parameters using Adam optimizer based ● on policy gradients computed via the REINFORCE equation: Use K placement samples to estimate policy gradients & use ● a baseline term B to reduce variance: 8 / 21

  9. Model architecture 9 / 21

  10. Challenges Vanishing ● Exploding gradient issue ● Large memory footprints ● 10 / 21

  11. Distributed training 11 / 21

  12. Experiments Recurrent Neural Language Model (RNNLM) ● Neural Machine Translation with attention mechanism(NMT) ● Inception-V3 ● 12 / 21

  13. Learned placement on NMT 13 / 21

  14. NMT end-to-end runtime 14 / 21

  15. Learned placement on Inception-V3 15 / 21

  16. Inception-V3 end-to-end runtime 16 / 21

  17. Profming on NMT 17 / 21

  18. Profming on Inception-V3 18 / 21

  19. Profming on Inception-V3 19 / 21

  20. Running times (in seconds) 20 / 21

  21. Summary Propose a RL model to optimize device placements for ● neural networks Use policy gradient to learn parameters ● Policy fjnds non-trival assignment of operations to devices ● that outperform heuristic approaches Profjling of results show policy learns implicit trade-ofgs ● between computation and communication in hardware 21 / 21

Recommend


More recommend