End-to-end Deep Learning of Optimization Heuristics http://chriscummins.cc/pact17
Chris Cummins University of Edinburgh Pavlos Petoumenos University of Edinburgh Zheng Wang Lancaster University Hugh Leather University of Edinburgh
compilers are very complex { millions } hundreds, _main: .cfi_start int main( thousands, of choices int argc, proc ## BB#0: char** arg) {... pushq %rbp ... s c t i s i r u e h d e d o c - d n a h e ) s a l e r e f o m e t i y b t e a d o f t u ( o
Machine learning in compilers features optimization ( d e r i v e d f r o m I R ) decision y = f(x) model
Machine learning in compilers Best Driver Decisions Training Data Optimization Training Training Data Training Data Heuristic Programs Feature Feature Extractor Vectors
Machine learning in compilers Best Driver Decisions Training Data Optimization Training Training Data Training Data Heuristic Programs Feature Feature Extractor Vectors 1 . h a r d t o g e t r i g h t 2. time consuming the human bit! 3. repetitious
Feature space Feature “Y” Use a CPU Use a GPU Learned Heuristic Feature “X”
Feature space need good Feature “Y” features! Use a CPU Use a GPU Learned Heuristic Feature “X”
Ways to fail incomplete unsuitable irrelevant e.g. not capturing the right e.g. wrong combination of e.g. missing critical features / model information information
What we have Best Driver Decisions Training Data Predictive Training Training Data Training Data Model Programs Feature Feature Extractor Vectors
What we need Best Driver Decisions Training Data Predictive Training Training Data Training Data Model Programs
Contributions Heuristics without features Beats expert approach Learning across heuristics
Our approach int main(int argc, char **argv) { ... Program Deep Optimization Decision Code Learning
Our approach g n s i s c e o r e p r p normalize identifiers & code style { 1.var/fun names: ‘foo’ , ‘bar’ , … to ‘a’ , ‘b’ , … 2.sanitize whitespace 3.consistent use of optional braces int Rewriter Encoder Code in main(int argc, char **argv) { ... encode as sequence of vocabulary indices Vocabulary table for characters + lang keywords Program Deep Optimization Decision Code Learning
Our approach summarize sequence as vector s e c i d n i b a c o v p a m e c (2 layer LSTM network) a p s l a e r o t n i Rewriter Rewriter Encoder Encoder Code in Language Heuristic Embedding Model Model predict optimization on vector (2 layer DNN) Program Deep Optimization Decision Code Learning
Our approach Rewriter Encoder Code in Language Heuristic Embedding Model Model Program Deep Optimization Decision Code Learning
How does it work?
w e l l How does it work?
Prior Art Heterogeneous Mapping Thread Coarsening CGO’13 PACT’14 Grewe et. al Magni et. al
Prior Art Heterogeneous Mapping Thread Coarsening Decision Space Binary One-of-six classification classification {CPU, GPU} {1, 2, 4, 8, 16, 32} Cascading Model Decision Tree Neural Networks CGO’13 PACT’14
Prior Art Heterogeneous Mapping Thread Coarsening Features 4 features 7 features Combined from 7 raw Principle Components of 34 values. raw values. 2 p a p e r s ! Instruction counts / ratios. Instruction counts / ratios / relative deltas. CGO’13 PACT’14
Our Approach Heterogeneous Mapping Thread Coarsening int int main(int main(int argc ... argc ... 1. Use the same model design for both 2. No tweaking of parameters 3. Minimum change - 3 line diff
Prior Art Heterogeneous Mapping Thread Coarsening Hardware 4x GPU 2x CPU-GPU architectures architectures Training Programs 7 Benchmark Suites 3 Benchmark Suites CGO’13 PACT’14
results
14% and 5% improvements over state-of-the-art State-of-the-art DeepTune w. Transfer Learning 2.38x 2.09x Speedup Speedup 1.06x 1.01x Heterogeneous Mapping Thread Coarsening
14% and 5% improvements over state-of-the-art State-of-the-art DeepTune w. Transfer Learning 2 1 5 7 6 b b e e n n c c h h m m a a r r k k s s 2.38x 2.09x Speedup Speedup 1.06x 1.01x Heterogeneous Mapping Thread Coarsening
Transfer Learning Heterogeneous Mapping Thread Coarsening general specialized Language Language Heuristic Heuristic Embed- Embed- Model Model Model Model ding ding
Transfer Learning Heterogeneous Mapping Thread Coarsening general specialized Language Language Heuristic Heuristic Embed- Embed- Model Model Model Model ding ding initialize with values
14% and 5% improvements over state-of-the-art State-of-the-art DeepTune w. Transfer Learning 2.38x 2.09x Speedup Speedup 1.06x 1.01x Heterogeneous Mapping Thread Coarsening
14% and 11% improvements over state-of-the-art State-of-the-art DeepTune w. Transfer Learning 2.38x 2.09x 1.12x Speedup Speedup 1.06x 1.01x Heterogeneous Mapping Thread Coarsening
Try it for yourself! a f i c t t r A * C o m p * l e t t n e e A * t * s i W s E n e T o C l l C D C * o * e c A u s u m P e E e R n * o t e v t d y s a * E a d l e u t a code and data on GitHub r e s w o r b e h t n i s n u r http://chriscummins.cc/pact17
End-to-end Deep Learning Optimisation Heuristics of Problem: feature design is hard Featureless heuristics First cross-domain learning 11-14% speedups http://chriscummins.cc/pact17
Recommend
More recommend