building opensuse with link time optimizations
play

Building openSUSE with link-time optimizations Jan Hubika and - PowerPoint PPT Presentation

Building openSUSE with link-time optimizations Jan Hubika and Martin Lika SUSElabs jh@suse.cz, mliska@suse.cz Outlilne What is link-time optimization? Link-time optimization and GCC Benchmarks Can we build openSUSE with


  1. Building openSUSE with link-time optimizations Jan Hubička and Martin Liška SUSElabs jh@suse.cz, mliska@suse.cz

  2. Outlilne ● What is link-time optimization? ● Link-time optimization and GCC ● Benchmarks ● Can we build openSUSE with link-time optimization by default?

  3. What is link-time optimization?

  4. Per-file compilation GCC File1.c File1.o GCC File2.c File2.o a.out ld / gold GCC File3.c File3.o GCC File4.c File4.o

  5. Link-time optimization File1.o GCC File1.c a.out ld / gold IL File2.o GCC File2.c IL link-time GCC File3.o LTO plug-in File3.c compiler IL File4.o GCC File4.c IL

  6. Benefits of LTO ● Symbol promotion (from linker’s resolution data most symbols become “static”) ● Cross-module inlining, constant propagation ● Aggressive unreachable code removal ● Profile propagation ● EH optimization (propagation of “nothrow”) ● Identical code folding ● Optimized code layout ● and more :)

  7. Problems of LTO ● Whole toolchain has to be restructured ● Slow compile-edit cycle ● Harder bugreports (ofuen whole program needed to reproduce issue) ● Not 100% transparent to user, but in most cases all one needs to do is to add -flto

  8. Link-time optimization and GCC

  9. Modernizing GCC (LTO perspective) ● 1 Function-at-a-time 9 9 9 ( G C C 2 . 9 5 ) : ● 2 New inliner (first high-level opt . in gcc) 0 0 1 ( G C C 3 . 0 ) : ● 2 Unit-at-a-time; intermodule compilation for C; Inter-procedural 0 0 4 ( G C C 3 . 4 ) : optimization framework ● 2 New SSA optimization framework 0 0 5 ( G C C 4 . 0 ) : ● 2 Inter-procedural optimizations: profile guided inlining, pure/const 0 0 6 ( G C C 4 . 1 ) : discovery, mod/ref, inter-procedural constant propagation, --fwhole-program –combine ● 2 Inter-procedural optimization on SSA; early optimization and inlining. 0 0 8 ( G C C 4 . 4 ) : ● 2 Basic LTO framework (5 years in development) 0 1 0 ( G C C 4 . 5 ) : ● 2 WHOPR (parallel link-time optimization); Firefox builds 0 1 1 ( G C C 4 . 6 ) :

  10. Link-time optimization File1.o GCC File1.c a.out ld / gold IL File2.o GCC File2.c IL link-time GCC File3.o LTO plug-in File3.c compiler IL File4.o GCC File4.c IL

  11. Parallelized Link-time optimization (WHOPR) File1.o GCC File1.c a.out ld / gold IL File2.o GCC File2.c IL Local opt. GCC File3.o LTO plguin File3.c IL Local opt. File4.o GCC File4.c IL Whole Program Local opt Analysis

  12. Modernizing GCC (LTO perspective) ● 2 Memory use optimizations, new inliner heuristics, new inter-procedural 0 1 2 ( G C C 4 . 7 . 0 ) : constant propagation with clonning ● 2 symbol table; propagation of values passed through aggregates 0 1 3 ( G C C 4 . 8 . 0 ) : ● 2 slim LTO objects by default; on demand loading of functions; devirtualization 0 1 4 ( G C C 4 . 9 . 0 ) : pass; feedback directed code layout ● 2 Identical Code Folding; COMDAT optimization; One Definition Rule for C++; 0 1 5 ( G C C 5 ) : alignment propagation; correct command line options handling with LTO ● 2 Linker-plugin now detects type of output binary. C&Fortran type merging. Better 0 1 6 ( G C C 6 ) : alias anaysis ● 2 Inter-procedural value range propagaion; bitwise propagation 0 1 7 ( G C C 7 ) : ● 2 Early debug info. Profile representation rewrite. Function splitting now by default. 0 1 8 ( G C C 8 ) : Reworked runtime estimation; Malloc attribute propagation

  13. GCC optimization pipeline Compile time Link-time serial Link-time parallel Low level opts: Streaming in symbols, Symbol & type Stream in Parser types and declarations streaming in and apply Common subexpression ellim. + merging & link transformations Forward propagation Copy propagation Partial Redundancy Ellim. Code hoisting IL Copy propagation High level opts: generation Store motion Inter-procedural If conversion (whole program) Constant prop. Loop invariant motion Complette unroll Loop unrolling Opts: Forward prop. Doloop optimization Early opts: Alias analysis Web construction Return slot opt. Dead symbol ellim. Copy propagation Early Inliner Redudancy ellim. Symbol promotion Common subexpression ellim. Constant prop. Jump threading Dead store ellim. profile analysis Forwward prop. Dead code ellim. Instruction combine Identical code folding Jump threading Conditional store ellim. Function partitioning devirtualization Scalar repl. of aggr. Copy prop. Instruction splitting Constant propagation Alias analysis If combine Live range shrinking const/destr merging Redundancy ellim. Tail recursion Scheduling Inlining Dead store ellim. Copy loop headers Register allocation pure/const/nothrow Dead code ellim. Scalar repl. of aggr. Global common subexpr. Ellim. mod/ref Tail recursion Dead store ellim. comdat Shrink wrapping Switch conversion Dead code ellim. Stack adjustment opt. pure/const/nothrow Reassociation Register renaming EH optimization Sincos, bswap opt. Constant prop. Profile guessing Loop invariant motion Code reordering Partitioning Partial redundancy ellim. Scheduling Loop splitting X87 register stack streaming out IP analysis Unroll and jam Code/data alignment Loop dsitribution Machine dependent reorg. streaming out Loop interchange ... Code output

  14. Benchmarks

  15. SPECint 2006 performance (relative to GCC 6) 2.5 2 1.5 1 generic native 0.5 0 GCC 7 -O2 GCC 8 -O2

  16. SPECint 2006 performance (relative to GCC 6) 14 12 10 8 6 generic native 4 2 0 GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast

  17. SPECint 2006 performance (relative to GCC 6) GCC -Ofast relative to GCC 6 -O2 120 14 100 12 80 10 60 GCC 6 GCC 7 40 GCC 8 8 20 6 generic 0 native 429.mcf 458.sjeng 445.gobmk 403.gcc 462.libquantum 473.astar 464.h264ref 471.omnetpp 401.bzip2 483.xalancbmk 400.perlbench 456.hmmer Geomean 4 -20 2 0 GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast

  18. SPECint 2006 performance (relative to GCC 6) 14 12 10 8 6 generic native 4 2 0 GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast

  19. SPECint 2006 performance (relative to GCC 6) 14 12 10 8 6 generic 4 native 2 0 GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast

  20. Clang & ICC -Ofast relative to GCC 8 10 SPECint 2006 performance (relative to GCC 6) 0 400.perlbench 401.bzip2 456.hmmer 458.sjeng 471.omnetpp 473.astar Geomean 403.gcc 429.mcf 445.gobmk 462.libquantum 464.h264ref 483.xalancbmk -10 clang/flang 6 14 ICC 18 -20 12 -30 10 -40 8 -50 6 generic 4 native 2 0 GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast

  21. SPECint 2006 performance (relative to GCC 6) 14 12 10 8 6 generic 4 native 2 0 GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast

  22. SPECint 2006 performance (relative to GCC 6) 14 12 10 8 6 generic 4 native 2 0 GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto

  23. SPECint 2006 performance (relative to GCC 6) 14 GCC -Ofast -flto relative to -Ofast 8 12 6 10 4 8 2 6 generic 0 4 native 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Geomean -2 2 -4 0 GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto

  24. SPECint 2006 performance (relative to GCC 6) 14 12 10 8 6 generic 4 native 2 0 GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto GCC 8 -Ofast lto

  25. SPECint 2006 performance (relative to GCC 6) 25 20 15 10 generic native 5 0 GCC 8 -Ofast lto ICC 18 -Ofast -flto GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto clang/fllang 6 -Ofast -flto

  26. SPECint 2006 performance (relative to GCC 6) 25 Clang/flang 6 and ICC 18 relative to GCC 8 (-O2 -flto) 120 20 100 80 15 60 clang/flang 6 40 ICC 18 10 20 generic native 0 5 400.perlbench 401.bzip2 456.hmmer 458.sjeng 462.libquantum 471.omnetpp 473.astar Geomean 403.gcc 429.mcf 445.gobmk 464.h264ref 483.xalancbmk -20 0 -40 GCC 8 -Ofast lto ICC 18 -Ofast -flto GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto clang/fllang 6 -Ofast -flto

  27. SPECint 2006 performance (relative to GCC 6) 25 20 15 10 generic native 5 0 GCC 8 -Ofast lto ICC 18 -Ofast -flto GCC 7 -O2 GCC 8 -O2 GCC 6 -Ofast GCC 7 -Ofast GCC 8 -Ofast clang/flang 6 -Ofast ICC 18 -Ofast GCC 8 -O2 -flto clang/fllang 6 -Ofast -flto

Recommend


More recommend