accelerating mysql with jit compilers
play

Accelerating MySQL with JIT Compilers David Yeager Percona Live - PowerPoint PPT Presentation

Accelerating MySQL with JIT Compilers David Yeager Percona Live Santa Clara April 2018 What is a Just-In-Time Compiler? Java source code C/C++ source code Java Compiler C Compiler Bytecode Machine code Profiling Information Java JIT


  1. Accelerating MySQL with JIT Compilers David Yeager Percona Live Santa Clara April 2018

  2. What is a Just-In-Time Compiler? Java source code C/C++ source code Java Compiler C Compiler Bytecode Machine code Profiling Information Java JIT Compiler Dynimizer JIT Compiler Machine code Machine code 2

  3. How MySQL benefits from JITs time time time 3 OVH public cloud, 2 vCores x 2.3 Ghz (Broadwell Xeon) template B2-7, BHS1 datacenter

  4. How MySQL benefits from JITs MySQL 5.7 tpcc-mysq / Wordpress 4 *OVH public cloud, 2 vCores x 2.3 Ghz (Broadwell Xeon) template EG-7-SSD, BHS1 datacenter *tpcc-mysql is not validated or certified by the TPC corporation and so this is not an official TPC-C result

  5. Dynimizer Usage In a Nutshell Installation $ sudo bash -c 'bash <(wget -O - https://dynimize.com/install) -default' Usage $ sudo dyni -start Dynimizer started $ sudo dyni -status Dynimizer is running mysqld, pid: 20722, dynimizing $ sudo dyni -status Dynimizer is running mysqld, pid: 20722, dynimized 5

  6. Dynimizer Usage 4 Dynimizing 1 Start $ sudo dyni -status Dynimizer is running $ sudo dyni -start mysqld, pid: 20722, dynimizing Dynimizer started 2 Monitoring 5 Dynimized $ sudo dyni -status $ sudo dyni -status Dynimizer is running Dynimizer is running mysqld, pid: 20722, dynimized 3 Profjling $ sudo dyni -status Dynimizer is running pid 20722 mysqld, pid: 20722, profiling N drastically change phase? Y Reoptimize (can be disabled) 6

  7. Hardening Dynimizer For Production $ dyni -optimizeOnce:y Default is to reoptimize after large changes in workload This setting disables it ● Prevents temporary performance overhead if had to re-optimize in ● middle of a workload No changes to machine code == more stable ● More conservative ● If workload changes drastically, Dynimizer improvement will be reduced ● 7

  8. Hardening Dynimizer For Production $ dyni -secureCodeCache:y Default code cache is executable, readable and writable at the same time This setting makes code cache executable and read-only ● Enable automatically on SELinux for extra security ● You may want this enabled regardless ● $ dyni -pid <number> You may want to limit Dynimizer to a specific mysqld process 8

  9. Configuring with /etc/dyni.conf This is dyni.conf after default installation ● [options] log:/var/log/dyni.log Overridden by command-line options ● maxLogSize:1MB For example: – optimizeOnce: n fastCompile: n $ dyni -optimizeOnce:y will override dyni.conf initdService: n secureCodeCache: n Can target other programs by adding exe names under ● [exeList] [exeList] mysqld Non-mysqld targets not supported yet so test thoroughly! – #sysbench #tpcc_start #[users] #mysql 9

  10. Sources of performance gain OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses ● Use profiling information to better layout the machine code, reduce ● branching Other profile guided optimizations Hot call-site inlining, sparse conditional constant propagation ● Dead code elimination, copy propagation ● Loop unrolling, branch target alignment ● Other optimizations ● 10

  11. When can Dynimizer help? Least Beneficial Most Beneficial High CPU usage Low CPU usage scenarios ● ● Long running workloads Lots of writes to slow disks ● ● Well indexed queries IO bottleneck – ● Working set doesn't fit in buffer pool Have fully optimized MySQL, ● ● want even more performance Full table scans ● Read heavy workload ● Short mysqld process lifetime ● SELECT: lots of front-end CPU stalls ● > 5 k threads ● Working set fits into the buffer pool ● Current ptrace scales poorly – 11

  12. When can Dynimizer help? $ perf stat -e r0280:u,r0380 -p 30041 sleep 30 Performance counter stats for process id '30041': 3,224,918,396 r0280:u [100.00%] 39,530,772,359 r0380 3,224,918,396/39,530,772,359 = 8% I-cache misses a good indicator ● ● > 5% indicates instruction bandwidth is r0280 means I-cache misses for ● ● a serious bottleneck last several generations of Intel CPUs u: is user mode, r0380 is instruction ● fetches 12

  13. MySQL + Dynimizer Architecture TARGET PROCESS DYNIMIZER BEING OPTIMIZED PROCESS LINUX PERF_EVENTS SUBSYSTEM COLLECT SAMPLE BASED PROFILING DATA + ORIGINAL PROGRAM IR MACHINE CODE LINUX PTRACE HIGH-LEVEL READ PROCESS STATE (MACHINE CODE, DATA) OPTIMIZATIONS IR MICROARCHITECTURE- SPECIFIC OPTIMIZATIONS IR CONVERT TO LINUX PTRACE CODE CACHE MACHINE CODE COMMIT OPTIMIZED MACHINE CODE 13

  14. Dynimizer is the Everyman's PGO Profile Guided Optimization Dynimizer JIT Available in GCC Orders of magnitude easier ● ● Compile with instrumentation – Trivial usage: $ dyni -start – Training run with profiling – Not required to build from source – Recompile – 1-5 minutes to optimize – Difficult to find a representative Zero downtime ● ● workload that will stand up over time Includes shared libraries ● Labour intensive ● Way more flexible ● For large scale MySQL ● Can optimize code for each run – deployments that can amortize the labour 14

  15. Supported Targets ● Linux x86-64 Version Optimization Target ● That means mysqld 5.5 – 5.7 MySQL Server 5.5 – 10.2 MariaDB Server 5.5 – 5.7 Percona Server 15

  16. Sysbench: MySQL 5.7 OLTP-RO 10000 9000 Transactions/Second 8000 7000 CPU: Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz, 4 cores, 8 Threads (Kaby Lake) 6000 RAM: 32 GB of 2400 MHz DDR4 5000 *This is a dedicated server rented from OVH, 4000 model: SP-32 Server, data center BHS 5 *Relative speedups the similar across various table 3000 size or number of tables, so long fits into memory 2000 1000 1 2 4 8 16 32 64 128 Threads WITH Dynimizer 16 WITHOUT Dynimizer

  17. Transactions/Sec 100000 Sysbench: MySQL 5.7 OLTP Simple 80000 60000 140000 140000 140000 40000 120000 120000 120000 Transactions/Second Transactions/Second Transactions/Second 20000 1 2 4 8 16 32 100000 100000 100000 Threads 80000 80000 WITH Dynimizer 80000 WITHOUT Dynimizer 60000 60000 60000 40000 40000 40000 20000 20000 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 20000 Threads Threads 1 2 4 8 16 32 64 128 WITH Dynimizer WITH Dynimizer 17 WITHOUT Dynimizer WITHOUT Dynimizer Threads

  18. 45% 40% Sysbench: TPS Increase 35% 30% 25% 20% 15% 10% 55% 5% 50% 1 2 4 8 16 32 45% Threads 40% 35% oltp read-only 30% oltp-simple 25% select 20% select-random-ranges 15% 10% 5% 1 2 4 8 16 32 64 128 Threads Threads 18 oltp read-only oltp-simple

  19. 55% 50% 45% Reduction in Branch Mispredictions 40% 35% 30% 25% 20% -20.0% 15% 10% -25.0% 5% -30.0% 1 2 4 8 16 32 -35.0% Threads -40.0% -45.0% oltp read-only -50.0% oltp-simple -55.0% select -60.0% select-random-ranges -65.0% -70.0% 1 2 4 8 16 32 64 128 Threads 19 oltp read-only oltp-simple

  20. 55% 50% 45% Reduction in ITLB Misses 40% 35% 30% 25% 20% 20.0% 15% 10% 0.0% 5% 1 2 4 8 16 32 -20.0% Threads -40.0% oltp read-only -60.0% oltp-simple select -80.0% select-random-ranges -100.0% 1 2 4 8 16 32 64 128 Threads 20 oltp read-only oltp-simple

  21. 55% 50% 45% Reduction in I-Cache Misses 40% 35% 30% 25% -25.0% 20% 15% -30.0% 10% 5% -35.0% 1 2 4 8 16 32 -40.0% Threads -45.0% oltp read-only -50.0% oltp-simple select -55.0% select-random-ranges -60.0% 1 2 4 8 16 32 64 128 Threads oltp read-only 21 oltp-simple

  22. 30.0% 20.0% Increase in Instructions Per Cycle 10.0% 0.0% 1 2 4 8 16 32 6 70.0% Threads 60.0% 50.0% oltp read-only 40.0% oltp-simple 30.0% select select-random-ranges 20.0% 10.0% 0.0% 1 2 4 8 16 32 64 128 Threads oltp read-only oltp-simple 22 select select-random-ranges

  23. Caveats: Steep warmup curve Will be reduced in next major release 23

  24. Caveats: Memory Usage 4 GB per process during the dynimizing phase only ● Freed once optimized – Extra RAM not necessary. Just increase swap by 4 GB – May not be appropriate for some micro cloud instances – Will be reduced in next major release. ● 24

  25. Noteworthy attributes Exploiting Run-Time Information ● Zero downtime ● Optimize in minutes ● Target app source code not required ● Optimize across shared libraries ● Simple usage ● Little to no configuration necessary ● 25

  26. Coming soon... Cache compilation for instant optimized restart of target processes (mysqld) ● Lower profiling and memory overheads ● Improved phase change detection ● More optimizations ● Toggle between code cache versions depending on program phase ● Many more target programs to optimize. ● Have observed similar improvements with MongoDB – Many new optimizations and speedups along the way ● 26

  27. Questions? To learn more visit dynimize.com 27

  28. Rate My Session 28

Recommend


More recommend