Hardware and JVM Design Trends? Those Don’t Affect Me! Or Do They? Dan Heidinga Eclipse OpenJ9 Project Lead Interpreter Lead, IBM Runtimes @danheidinga DanHeidinga
Important disclaimers § THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. § WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. § ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES. § ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE. § IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE. § IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. § NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: – CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS 2
Who am I? § One of the Project Leads for the Eclipse OpenJ9 JVM. § I've been involved with virtual machine development at IBM since 2007 and lead IBM’s Interpreter team working on OpenJ9. § I've represented IBM on both the JSR 292 ('invokedynamic') and JSR 335 ('lambda') expert groups and lead J9's implementation of both JSRs. And am now involved in Project Valhalla and Panama. § I’ve also maintain the bytecode verifier and deal with various other parts of the runtime. 3
Trends § Moore’s law and singled threaded performance § The multicore explosion § Multi-tenant: bare metal, virtualization & containers § The deployment strategy shift § Spectre: Caught passing bad checks § Cloud as an opportunity 4
Trends § Moore’s law and singled threaded performance § The multicore explosion § Multi-tenant: bare metal, virtualization & containers § The deployment strategy shift § Spectre: Caught passing bad checks § Cloud as an opportunity 5
Moore’s Law § The number of transistors doubles every two years § Amazing single threaded performance improvements 6 https://commons.wikimedia.org/wiki/File:Moore%27s_Law_Transistor_Count_1971-2016.png
Benchmark wars: Peak performance is #1 § Cooperative suspend (1999) § Adaptive compilation (1999) § Escape analysis and stack allocating objects (2001) § Real Time Specification for Java (2005 - 2011) § Dynamic Ahead of Time compiled code Resource sharing? in production (2006) Not so much! § Hot code replacement (2007) § Compressed references (2007) § Metronome soft real-time GC (2008) 7
Long lead times affect JVM design Time between ordering and racking a server? Often 6+ months Buy the biggest server you can afford 9
Long lead times affect JVM design Time between ordering and racking a server? Often 6+ months Buy the biggest server you can afford The JVM better use all of it! 10
Trends § Moore’s law and singled threaded performance § The multicore explosion § Multi-tenant: bare metal, virtualization & containers § The deployment strategy shift § Spectre: Caught passing bad checks § Cloud as an opportunity 11
Welcome to the multicore revolution § ~2006 RIP Moore’s Law? § Singled threaded performance no longer growing by leaps and bounds § Welcome to the multicore revolution 12 https://commons.wikimedia.org/wiki/File:Moore%27s_Law_Transistor_Count_1971-2016.png
java.util.concurrent brings ForkJoin to the masses § ForkJoinPool brought the concept of work stealing to user code in Java 7 – The same technique that Java’s GC’s used to parallelize garbage collection § Java 8 continued the trend towards parallelism with the introduction of Lambda, ParallelStreams and Spliterators § The focus still on peak performance – but now it’s up to you (and the platform) to write good multithread code 13
Common Characteristics Deployments: infrequent Startup: Small fraction of up time Focus on peak performance 14
Aside: Development and Debugging https://xkcd.com/303/ 15
Trade peak performance for faster startup -Xquickstart (OpenJ9) -client (Hotspot) 16
Adaptive JIT Compilation § Methods start out running bytecode form directly interpreter § After many invocations (or via sampling) code get compiled at cold ‘cold’ or ‘warm’ level warm § Low overhead sampling thread is used to identify hot methods hot § Methods may get recompiled at ‘hot’ or ‘scorching’ levels (for more profiling optimizations) scorching § Transition to ‘scorching’ goes through a temporary profiling step 17
Trends § Moore’s law and singled threaded performance § The multicore explosion § Multi-tenant: bare metal, virtualization & containers § The deployment strategy shift § Spectre: Caught passing bad checks § Cloud as an opportunity 18
http://www.fanpop.com/clubs/home-improvement-tv-show/images/33144922/title/wilson-wallpaper 19
Footprint and Startup SharedClasses cache 20
Footprint and Startup Classfile ROMClass J9RAMClass 21
ShareClasses: ROM pays off JVM 1 JVM 2 JVM 3 22
ShareClasses: ROM pays off JVM 1 JVM 2 JVM 3 23
ShareClasses: ROM pays off JVM 1 JVM 2 JVM 3 Faster startup, Smaller footprint Shared Classes Cache 24
Balloon drivers
–XX:+IdleTuningGcOnIdle -XX:+IdleTuningCompactOnIdle “…around 30 percent of data center servers have not delivered information or computing services in the last six months.” https://developer.ibm.com/javasdk/2017/09/25/still-paying-unused-memory-java-app-idle/
Container Support § Container detection mechanism (-XX:+UseContainerSupport) – This allows JVM to tune itself based on limits imposed by the container. § Increase default heap size when running in container – JVM is typically the only application in the container – Reserve a larger chunk of container memory for heap § Better utilization of memory available to the container § -XX:MaxRAMPercentage and -XX:InitialRAMPercentage – Existing options for adjusting heap -Xmx/-Xms accept absolute values. – Cumbersome to adjust –Xmx/-Xms when container memory limit can vary – Adjust the heap based on percentage of the physical memory rather than an absolute value. § Adjust Runtime.availableProcessors() based on cgroup limits § Add container limits to the Javacore file
OpenJ9 JIT memory auto-scaling based on container limits • Docker container without swap space configured -> Exceed limit? Kaboom! • Docker container with swap space configured -> Exceed limit? Swap! Performance cost • JIT must be aware of container memory limits to avoid causing performance or functional instability! • JIT compiler scratch memory usage is auto-scaled based on container memory limit • Reduce number of compilation threads as container limit is approached • Lower optimization level of compiled methods as container limit is approached • Different sized containers running same application on OpenJ9 could behave differently • Ramp up to peak throughput could be slower in smaller containers • Peak memory usage could be lower in smaller containers
Trends § Moore’s law and singled threaded performance § The multicore explosion § Multi-tenant: bare metal, virtualization & containers § The deployment strategy shift § Spectre: Caught passing bad checks § Cloud as an opportunity 29
The new world 30
Continuous integration 31
Continuous deployment Checkout Compile Test Deploy 32
Key idea of cloud Making it faster and cheaper to get ideas into production with less risk 33
Cloud: horizontal scaling 34
Microservices 35
Serverless / Function as a Service https://github.com/apache/incubator-openwhisk/blob/master/docs/images/OpenWhisk.png 36
New World Characteristics Deployments: frequent, multiple times a day Startup: Larger fraction of uptime Lots of “first” runs: fewer opportunities to reuse work from previous runs 37
The JVM needs to change 38
“Dynamic” AOT through ShareClasses Shared Classes Cache ROM Classes AOT $ java –Xshareclasses ... 39
Shared cache enablement Enabled with –Xshareclasses • Population of the cache happens naturally & transparently at runtime • Applicable to boot, extension, & application loaders • & all URLClassloader-subclasses Helper APIs to enable non-URLClassloaders (manual) • Disclaim the JIT and meta data after startup • Reduce footprint further • Debug data is only paged in as needed •
New AOT ramp up and throughput on Liberty Daytrader7 OpenJ9 default JIT vs –Xtune:virtualized with old AOT vs –Xtune:virtualized with new AOT • New AOT ramps up faster than the old AOT and stabilizes at higher throughput • New AOT cuts AOT vs JIT compiled code throughput delta roughly in half
Recommend
More recommend