thinlto
play

ThinLTO A Fine-Grained Demand-Driven Infrastructure Teresa - PowerPoint PPT Presentation

ThinLTO A Fine-Grained Demand-Driven Infrastructure Teresa Johnson, Xinliang David Li tejohnson,davidxl@google.com EuroLLVM 2015 Outline CMO Background ThinLTO Motivation and Overview ThinLTO Details Build System


  1. ThinLTO A Fine-Grained Demand-Driven Infrastructure Teresa Johnson, Xinliang David Li tejohnson,davidxl@google.com EuroLLVM 2015

  2. Outline ● CMO Background ● ThinLTO Motivation and Overview ● ThinLTO Details ● Build System Integration ● LLVM Prototype Status ● Preliminary Experimental Results ● Next Steps EuroLLVM 2015

  3. Cross-Module Optimization (CMO) Background x.c y.c z.c Monolithic LTO (Link-Time Optimization): ● Frontend parsing and IR generations FE FE FE are all done in parallel ● Cross Module Optimization is done at x.ir y.ir z.ir link time via a linker plugin ● CMO pass consumes lots of memory and is generally not parallel ● CMO is done in linker process Cross Module ● Doesn’t scale to very large Optimizations and Real applications Linking lto.o a.out EuroLLVM 2015

  4. CMO Background (cont) x.c y.c z.c Monolithic LTO with Parallel BE: FE FE FE ● CMO still performed in serial ● Intramodule optimization and code x.ir y.ir generation done in parallel (thread or z.ir process level) ● Example: HPUX Itanium compiler Cross Module (SYZYGY framework) Optimizations and Real Linking BE BE lto.o a.out EuroLLVM 2015

  5. CMO Background (cont) x.c y.c z.c LTO with WHOPR (gcc): FE FE FE ● Frontend parsing and IR generations are all done in parallel ● Backend compilations are done in parallel x.ir y.ir z.ir ● Inline decisions/analysis made in serial IPA ● Inline transformations in parallel within IPA, Partition, IR rewrite partitions during backend compilations BE invocation and real link ● Requires materializing clones into partitions, increasing serial IR I/O overhead ● Summary based IPA is done in serial temp1.ir temp2.ir ● Partitioning is done in serial ● IR reading and rewriting is done in serial BE BE temp1.o a.out temp2.o EuroLLVM 2015

  6. Parallel CMO: What if we can... Module Importing Fully Parallel CMO: x.c y.c z.c ● End-2-end compilation in parallel ● Cross Module Optimization is enabled for each compilation IPO + IPO + IPO + ● Each module is extended with other BE BE BE modules as if they were from headers (Note most of the performance comes from cross module inlining) y.o z.o x.o linking a.out EuroLLVM 2015

  7. Lightweight IPO (LIPO) approach x.c y.c z.c Fully Parallel CMO In LIPO mode Profile (gcc/google): Data ● Source importing is pre-computed (using dynamic call graph from IPO + IPO + IPO + BE BE BE profile) ● Source compilations are extended with auxiliary modules during parsing ● Module groups are usually small or capped x.o y.o z.o ● Captures most of the full LTO gain ● Compilations are fully parallel! linking a.out EuroLLVM 2015

  8. Problems with LIPO mode ● Need profile data to compute module group ● Importing done at module level limits the scalability and performance headroom ● Duplicated parsing/compilation of auxiliary modules ● Doesn’t tolerate stale profile EuroLLVM 2015

  9. Full Parallel CMO: A new model x.c y.c z.c ● Delay importing until IR files have been generated FE FE FE ● Allows fine grained importing at function level greatly increasing the x.ir y.ir z.ir number of useful functions that can be imported -- liberating more performance ● No more duplicated parsing in FE BE+deman BE+deman BE+deman d driven d driven d driven IPO IPO IPO (But how do they synchronize & communicate ?) x.o y.o z.o linking a.out EuroLLVM 2015

  10. ThinLTO: An Implementation of the New Model x.c y.c z.c Profile Data ... (Optional) FE FE FE x.ir+func y.ir+func z.ir+func index index index Super Thin Linker Plugin + Linker Func Map Global Analysis Summary BE+deman BE+deman BE+deman (optional) d driven d driven d driven IPO IPO IPO x.o y.o z.o a.out EuroLLVM 2015

  11. ThinLTO: An Implementation of the New Model ● IR includes summary data and function x.c y.c z.c position index (can be in ELF symtab) Profile Data ... (Optional) FE FE FE x.ir+func y.ir+func z.ir+func index index index Super Thin Linker Plugin + Linker Func Map Global Analysis Summary BE+deman BE+deman BE+deman (optional) d driven d driven d driven IPO IPO IPO x.o y.o z.o a.out EuroLLVM 2015

  12. ThinLTO: An Implementation of the New Model ● IR includes summary data and function x.c y.c z.c position index (can be in ELF symtab) Profile Data ... ● To enable demand driven IPO in (Optional) FE FE FE backend compilation, the ThinLTO plugin simply aggregates a global x.ir+func y.ir+func z.ir+func function map index index index Super Thin Linker Plugin + Linker Func Map Global Analysis Summary BE+deman BE+deman BE+deman (optional) d driven d driven d driven IPO IPO IPO x.o y.o z.o a.out EuroLLVM 2015

  13. ThinLTO: An Implementation of the New Model ● IR includes summary data and function x.c y.c z.c position index (can be in ELF symtab) Profile Data ... ● To enable demand driven IPO in (Optional) FE FE FE backend compilation, the ThinLTO plugin simply aggregates a global x.ir+func y.ir+func z.ir+func function map index index index ● Enables backend to do importing at Super Thin Linker Plugin + Linker function granularity: minimizing memory footprint, IO/networking overhead Func Map Global Analysis Summary BE+deman BE+deman BE+deman (optional) d driven d driven d driven IPO IPO IPO x.o y.o z.o a.out EuroLLVM 2015

  14. ThinLTO: An Implementation of the New Model ● IR includes summary data and function x.c y.c z.c position index (can be in ELF symtab) Profile Data ... ● To enable demand driven IPO in (Optional) FE FE FE backend compilation, the ThinLTO plugin simply aggregates a global x.ir+func y.ir+func z.ir+func function map index index index ● Enables backend to do importing at Super Thin Linker Plugin + Linker function granularity: minimizing memory footprint, IO/networking overhead Func Map ● Function importing is based on function summary, optional global analysis Global Analysis Summary BE+deman BE+deman BE+deman summary, and profile data, with a (optional) d driven d driven d driven IPO IPO IPO priority queue to maximize benefits x.o y.o z.o a.out EuroLLVM 2015

  15. ThinLTO: An Implementation of the New Model ● ThinLTO plugin does very minimal work x.c y.c z.c ○ No IPA by default Profile Data ... ○ No IR reading, partitioning, and IR (Optional) FE FE FE rewriting, so minimal I/O ● It can scale to programs of any size, x.ir+func y.ir+func z.ir+func and allow IPO on machines with tiny index index index memory footprints and without Super Thin Linker Plugin + Linker significantly increasing time (requirements for enabling by default) Func Map ● For single node ThinLTO build, the BE parallel processes will be launched in Global Analysis Summary BE+deman BE+deman BE+deman the linker process by the plugin (optional) d driven d driven d driven IPO IPO IPO x.o y.o z.o a.out EuroLLVM 2015

  16. ThinLTO Advantages ● Highly Scalable ○ Thin plugin layer does not require large memory and is extremely fast ○ Fully parallelizable backend ● Transparent ○ Similar to classic LTO, via linker plugin ○ Doesn’t require profile (unlike LIPO) ● High Performance ○ Close to full LTO ○ Peak optimization can use profile and/or more heavyweight IPA ● Flexible ○ Friendly to both single build machine and distributed build systems Possible to enable by default at -O2! ➢ EuroLLVM 2015

  17. ThinLTO Phase 1: IR and Function Summary Generation x.c y.c z.c Profile Data (Optional) FE FE FE x.ir+func y.ir+func z.ir+func index index index Super Thin Linker Plugin + Linker Func Map Global Analysis Summary BE+deman BE+deman BE+deman (optional) d driven d driven d driven IPO IPO IPO x.o y.o z.o a.out EuroLLVM 2015

  18. ThinLTO Phase 1: IR and Function Summary Generation ● Generate IR files in parallel ○ E.g. bitcode as in a normal LLVM -flto -c compile ● Generate function index table ○ Maps functions to their offsets in the bitcode file ● Generate function summary data to guide later import decisions ○ Included in the function index table ○ E.g. function attributes such as size, branch count, etc ○ Optional profile data summary (e.g. function hotness) ● How to represent function index/summary table? ○ Metadata? LLVM IR? ○ ELF section? Discussed later... EuroLLVM 2015

  19. ThinLTO Phase 2: Thin Linker Plugin Layer x.c y.c z.c Profile Data ... (Optional) FE FE FE x.ir+func y.ir+func z.ir+func index index index Super Thin Linker Plugin + Linker Func Map Global Analysis Summary BE+deman BE+deman BE+deman (optional) d driven d driven d driven IPO IPO IPO x.o y.o z.o a.out EuroLLVM 2015

More recommend