deep learning
play

Deep Learning of Optimization Heuristics - PowerPoint PPT Presentation

End-to-end Deep Learning of Optimization Heuristics http://chriscummins.cc/pact17 Chris Cummins University of Edinburgh Pavlos Petoumenos University of Edinburgh Zheng Wang Lancaster University Hugh Leather University of Edinburgh


  1. End-to-end Deep Learning of Optimization Heuristics http://chriscummins.cc/pact17

  2. Chris Cummins University of Edinburgh Pavlos Petoumenos University of Edinburgh Zheng Wang Lancaster University Hugh Leather University of Edinburgh

  3. compilers are very complex { millions } hundreds, _main: .cfi_start int main( thousands, of choices int argc, proc ## BB#0: char** arg) {... pushq %rbp ... s c t i s i r u e h d e d o c - d n a h e ) s a l e r e f o m e t i y b t e a d o f t u ( o

  4. Machine learning in compilers features optimization ( d e r i v e d f r o m I R ) decision y = f(x) model

  5. Machine learning in compilers Best Driver Decisions Training Data Optimization Training Training Data Training Data Heuristic Programs Feature Feature Extractor Vectors

  6. Machine learning in compilers Best Driver Decisions Training Data Optimization Training Training Data Training Data Heuristic Programs Feature Feature Extractor Vectors 1 . h a r d t o g e t r i g h t 2. time consuming the human bit! 3. repetitious

  7. Feature space Feature “Y” Use a CPU Use a GPU Learned Heuristic Feature “X”

  8. Feature space need good Feature “Y” features! Use a CPU Use a GPU Learned Heuristic Feature “X”

  9. Ways to fail incomplete unsuitable irrelevant e.g. not capturing the right e.g. wrong combination of e.g. missing critical features / model information information

  10. What we have Best Driver Decisions Training Data Predictive Training Training Data Training Data Model Programs Feature Feature Extractor Vectors

  11. What we need Best Driver Decisions Training Data Predictive Training Training Data Training Data Model Programs

  12. Contributions Heuristics without features Beats expert approach Learning across heuristics

  13. Our approach int main(int argc, char **argv) { ... Program Deep Optimization Decision Code Learning

  14. Our approach g n s i s c e o r e p r p normalize identifiers & code style { 1.var/fun names: ‘foo’ , ‘bar’ , … to ‘a’ , ‘b’ , … 2.sanitize whitespace 3.consistent use of optional braces int Rewriter Encoder Code in main(int argc, char **argv) { ... encode as sequence of vocabulary indices Vocabulary table for characters + lang keywords Program Deep Optimization Decision Code Learning

  15. Our approach summarize sequence as vector s e c i d n i b a c o v p a m e c (2 layer LSTM network) a p s l a e r o t n i Rewriter Rewriter Encoder Encoder Code in Language Heuristic Embedding Model Model predict optimization on vector (2 layer DNN) Program Deep Optimization Decision Code Learning

  16. Our approach Rewriter Encoder Code in Language Heuristic Embedding Model Model Program Deep Optimization Decision Code Learning

  17. How does it work?

  18. w e l l How does it work?

  19. Prior Art Heterogeneous Mapping Thread Coarsening CGO’13 PACT’14 Grewe et. al Magni et. al

  20. Prior Art Heterogeneous Mapping Thread Coarsening Decision Space Binary One-of-six classification classification {CPU, GPU} {1, 2, 4, 8, 16, 32} Cascading Model Decision Tree Neural Networks CGO’13 PACT’14

  21. Prior Art Heterogeneous Mapping Thread Coarsening Features 4 features 7 features Combined from 7 raw Principle Components of 34 values. raw values. 2 p a p e r s ! Instruction counts / ratios. Instruction counts / ratios / relative deltas. CGO’13 PACT’14

  22. Our Approach Heterogeneous Mapping Thread Coarsening int int main(int main(int argc ... argc ... 1. Use the same model design for both 2. No tweaking of parameters 3. Minimum change - 3 line diff

  23. Prior Art Heterogeneous Mapping Thread Coarsening Hardware 4x GPU 2x CPU-GPU architectures architectures Training Programs 7 Benchmark Suites 3 Benchmark Suites CGO’13 PACT’14

  24. results

  25. 14% and 5% improvements over state-of-the-art State-of-the-art DeepTune w. Transfer Learning 2.38x 2.09x Speedup Speedup 1.06x 1.01x Heterogeneous Mapping Thread Coarsening

  26. 14% and 5% improvements over state-of-the-art State-of-the-art DeepTune w. Transfer Learning 2 1 5 7 6 b b e e n n c c h h m m a a r r k k s s 2.38x 2.09x Speedup Speedup 1.06x 1.01x Heterogeneous Mapping Thread Coarsening

  27. Transfer Learning Heterogeneous Mapping Thread Coarsening general specialized Language Language Heuristic Heuristic Embed- Embed- Model Model Model Model ding ding

  28. Transfer Learning Heterogeneous Mapping Thread Coarsening general specialized Language Language Heuristic Heuristic Embed- Embed- Model Model Model Model ding ding initialize with values

  29. 14% and 5% improvements over state-of-the-art State-of-the-art DeepTune w. Transfer Learning 2.38x 2.09x Speedup Speedup 1.06x 1.01x Heterogeneous Mapping Thread Coarsening

  30. 14% and 11% improvements over state-of-the-art State-of-the-art DeepTune w. Transfer Learning 2.38x 2.09x 1.12x Speedup Speedup 1.06x 1.01x Heterogeneous Mapping Thread Coarsening

  31. Try it for yourself! a f i c t t r A * C o m p * l e t t n e e A * t * s i W s E n e T o C l l C D C * o * e c A u s u m P e E e R n * o t e v t d y s a * E a d l e u t a code and data on GitHub r e s w o r b e h t n i s n u r http://chriscummins.cc/pact17

  32. End-to-end Deep Learning Optimisation Heuristics of Problem: feature design is hard Featureless heuristics First cross-domain learning 11-14% speedups http://chriscummins.cc/pact17

Recommend


More recommend