learning
play

LEARNING AUTHORS: MARTN ABADI, PAUL BARHAM, JIANMIN CHEN, ZHIFENG - PowerPoint PPT Presentation

TENSORFLOW: A SYSTEM FOR LARGE-SCALE MACHINE LEARNING AUTHORS: MARTN ABADI, PAUL BARHAM, JIANMIN CHEN, ZHIFENG CHEN, ANDY DAVIS, JEFFREY DEAN, MATTHIEU DEVIN, SANJAY GHEMAWAT, GEOFFREY IRVING, MICHAEL ISARD, MANJUNATH KUDLUR, JOSH LEVENBERG,


  1. TENSORFLOW: A SYSTEM FOR LARGE-SCALE MACHINE LEARNING AUTHORS: MARTÍN ABADI, PAUL BARHAM, JIANMIN CHEN, ZHIFENG CHEN, ANDY DAVIS, JEFFREY DEAN, MATTHIEU DEVIN, SANJAY GHEMAWAT, GEOFFREY IRVING, MICHAEL ISARD, MANJUNATH KUDLUR, JOSH LEVENBERG, RAJAT MONGA, SHERRY MOORE, DEREK G. MURRAY, BENOIT STEINER, PAUL TUCKER, VIJAY VASUDEVAN, PETE WARDEN, MARTIN WICKE, YUAN YU, AND XIAOQIANG ZHENG

  2. OVERVIEW • Large Scale ML System • Distributed Compute and Training • Multi-node • Heterogenous Environemnts • Dataflow Graphs • Open Source • Mathematically Flexible • Bespoke Loss & Kernels • Fault Tolerant

  3. DATAFLOW GRAPHS Input 2 Input 1 Multiply Add Input 3 Mutability! Output

  4. PRIOR WORK • DistBelief • Architecture • Parameter Server Worker • Workers • Inflexible Layers Parameter Worker Server • Inflexible Training Algorithms • RNNs, LSTMs, GCNs challenging Worker • Optimized for large clusters • Caffe & Theano • Similar TensorFlow is designed to improve flexibility!

  5. DistBelief/Keras/Etc TensorFlow Input 1 Input 2 Input 1 Input 2 Dense Multiply Add Atomic Dense Multiply Add Output Output

  6. ACCELERATOR ABSTRACTION CPU GPU TPU

  7. UNITS OF TENSORFLOW • Graph • Subgraph Partitioned subgraphs are distributed to individual compute devices • Edges • Tensors Multidimensional arrays • Vertices • Operations Add, Multiply, Sigmoid • Automatic Partitioning • Subgraphs distributions maximize compute efficiency

  8. CONTROL FLOW EXECUTION • Graph Partitioned and Distributed • Synchronous Execution • Classically frowned upon • Send + Recv Replace Split Edges • GPUs make appealing • Send • All workers forced to take same • Pushes value from one device to another parameters • Recv • Backup workers stochastically • Blocks until value available eliminate straggling processes • “Deferred execution”

  9. DIFFERENTIATION & BACKPROP • Symbolic representation • Automatically computes backprop code • Like PS architectures, enables distributed training via +/- write operations

  10. IMPLEMENTATION

  11. SINGLE MACHINE BENCHMARKS

  12. SPARSE AND DENSE FETCHES FOR SYNC

  13. CNN IMPLEMENTATIONS

  14. SYNC AND NON-SYNCED PROCESSES

  15. TRAINING LARGE MODELS

  16. CRITICISM • No actual accuracy comparisons • Convergence comparisons in synchrony analysis? • Lacking capability for abstracted computation • Reason why Keras runs on top of TF

  17. CONCLUSION • Built a ML system that is: • Robust • Distributable • Extensible • Fast • In the ensuing years • Used extensively • Extended

  18. REFERECES • TensorFlow: A System for Large-Scale Machine Learning TensorFlow: A system for large-scale machine learning. M. Abadi, P. Barham, J. Chen et al. 2016

Recommend


More recommend