H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical - PowerPoint PPT Presentation

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical Engineering and Computer Science Seoul National University, Korea

 Android apps are programmed using Java  Android uses DVM instead of JVM for running Java  Some people believe that Android is successful partl y due to DVM; is this really true?  How DVM performs compared to JVM? • Evaluate on the same board using the same benchmarks  How DVM affects the performance of Android apps? • Analyze runtime profile 2 Virtual Machine & Optimization Lab

 Comparison of DVM and JVM  Evaluation of DVM and JVM  Evaluation of Android apps  Conclusion 3 Virtual Machine & Optimization Lab

 VM for executing Java in Android platform • Java code in applications, framework, and core libraries • Executes dex files instead of class files of Java VM (JVM) • DX (class-to-dex) • Dex file has different bytecode ISA 4 Virtual Machine & Optimization Lab

 DVM has a register-based bytecode, while JVM has a stack-based bytecode JAVA SOURCE CODE public static int add(int a, int b) { int c = a + b; return c; } JVM DVM 0: iload_0 |0000: add-int v0, v1, v2 1: iload_1 |0002: return v0 2: iadd 3: istore_2 4: iload_2 5: ireturn 5 Virtual Machine & Optimization Lab

DVM interpreter is supposed to be faster than JVM’s, due to fewer bytecode count and operand accesses • According to Shi’s “stack vs. register” paper [TACO’08] • DVM has two interpreters (assembly version, C version), while our JVM has C version only 6 Virtual Machine & Optimization Lab

Higher performance requires just-in-time compilation, which translates bytecode to native code at runtime  Both VMs employ adaptive compilation • Interpret initially, when finding hot spot, compiling it  DVM’s JIT compilation unit is a hot path called a tra ce, while JVM’s is a hot method • For lower memory footprint, yet competitive performance • But, the reality is … 7 Virtual Machine & Optimization Lab

Blocks:Loop 5 2 1 4 7 3 6  Interpret initially, count at each trace entry 1 • Trace entry: target of jump, next bytecode of trace  If counter > threshold, trace recording starts 2 3  Trace recording stops when meeting a branch or a method call; trace is enqueued for JITC 4 4  A join BB can be compiled multiple times  Chaining is used for control transfer at the en 5 6 d of a trace: chaining cells are added 7 7 • [Jump to a VM internal function + address cache] 8 Virtual Machine & Optimization Lab

 Code quality: too short (~3 bytecode) traces • Fewer optimizations, higher overhead of chaining cells  Preciseness of hot trace detection • Counters are shared among traces to reduce space  Register allocation • Cannot map virtual registers to physical registers globally – v0=v0+v1 requires two loads from v0 and v1 and a store to v0  Can affect performance and memory, negatively 9 Virtual Machine & Optimization Lab

Java Source Code Dalvik Bytecode public static int factorial( ) { |0000: const/4 v0, #int 1 // #1 int result = 1; |0001: move v1, v0 for(int i = 1 ; i < 10000 ; i++) { |0002: const/16 v2, #int 10000 // #2710 result = result * i; |0004: if-ge v0, v2, 000a // +0006 } |0006: add-int/2addr v1, v0 return result; |0007: add-int/lit8 v0, v0, #int 1 // #01 } |0009: goto 0002 // -0007 |000a: return v1 Generated Machine code ( 12 instructions generated ) v0, v2, 000a // // add- add-int int/lit8 /lit8 v0, v0, v0, v0, #int #int // if-ge // if- ge v0, v2, 000a label1: LDR R3, [RFP, #0] /2addr v1, v0 1 1 // add- // add-int int/2addr v1, v0 ADDS R1, R1, #1 LDR R0, [RFP, #4] CMP R3, R2 0002 STR R2, [RFP, #8] LDR R1, [RFP, #0] // // goto goto 0002 STR R0,[RFP, #4] BGE label2 ADDS R0, R0, R1 STR R1,[RFP, #0] STR R0, [RFP, #4] B label1 label2: …… 10 Virtual Machine & Optimization Lab

Java Source Code Java Bytecode public static int factorial( ) { |0000: iconst_1 |0001: istore_0 int result = 1; |0002: iconst_1 for(int i = 1 ; i < 10000 ; i++) { |0003: istore_1 result = result * i; |0004: iload_1 } |0005: sipush 10000 return result; |0008: if_icmpge <21> } |0011: iload_0 |0012: iload_1 |0013: iadd |0014: istore_0 |0015: iinc 1 1 |0018: goto <4> |0021: iload_0 |0022: ireturn Generated Machine code ( 8 instructions generated ) L2: // iload_0 // iload_0 // //iinc iinc 1 1 1 1 // // sipush sipush 10000 10000 // iload_1 // iload_1 ADD v4, v4, #1 LDR v8, [pc, #+0] @const 10 // iadd // iadd STR v4, [rJFP, #-4] 000 ADD v3, v3, v4 LSL #0 // //goto goto <4> <4> // // if_icmpge if_icmpge <21> <21> // istore_0 B L2 CMP v4, v8 LSL #0 STR v3, [rJFP, #-8] BGE L1 11 L1: …… Virtual Machine & Optimization Lab

 Tablet PC with ARM Cortex-A8 and 1GB memory  Android 2.3 Gingerbread on Linux 2.6.35  PhoneME advanced JVM (HotSpot) on Linux 2.6.32  EEMBC GrinderBench  DVM JITC generates Thumb2 code, while JVM JITC generates ARM code • Thumb2 reduces code size by 15%, performance by 6% 12 Virtual Machine & Optimization Lab

2.5 2 1.5 1 0.5 0 Chess kXML Parallel PNG RegEx Geomean JVM Interpreter DVM Interpreter DVM C Interpreter DVM assembly interpreter is faster than JVM’s, but its C interpreter is similar 13 Virtual Machine & Optimization Lab

1.2 1 0.8 0.6 0.4 0.2 0 Chess kXML Parallel PNG RegEx Geomean JVM Dynamic Bytecode Count DVM Dynamic Bytecode Count DVM executes 40% fewer bytecode instructions 14 Virtual Machine & Optimization Lab

2.5 2 1.5 1 0.5 0 Chess kXML Parallel PNG RegEx Geomean JVM Dynamic Bytecode Size DVM Dynamic Bytecode Size DVM requires a 60% larger program than the JVM for achieving the same job 15 Virtual Machine & Optimization Lab

20 18 16 14 12 10 8 6 4 2 0 Chess kXML Parallel PNG RegEx Geomean JVM JITC DVM JITC DVM with JITC is three times slower than JVM with JITC 16 Virtual Machine & Optimization Lab

1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Chess kXML Parallel PNG RegEx Geomean JVM Compiled Bytecode Size DVM Compiled Bytecode Size DVM compiles a smaller amount of bytecode because of its trace-based JITC 17 Virtual Machine & Optimization Lab

2.5 2 1.5 1 0.5 0 Chess kXML Parallel PNG RegEx Geomean JVM Generated Code Size DVM Generated Code Size DVM generates 35% larger machine code than the JVM’s 18 Virtual Machine & Optimization Lab

How many times a Dalvik bytecode is translated redundantly? Chess kXML Parallel PNG RegEx Avg. Ratio 1.18 1.08 1.15 1.15 1.13 1.13 19 Virtual Machine & Optimization Lab

How many instructions are generated for 1 byte of bytecode ? 4 3.5 3 2.5 2 1.5 1 0.5 0 Chess kXML Parallel PNG RegEx Geomean Chaining cell overhead JVM: ~1.3 instructions/1 byte of JVM DVM: ~2.7 instructions/1 byte of DVM = ~4.5 instructions/1 byte of JVM 20 Virtual Machine & Optimization Lab

8 6.00% 7 5.00% 6 4.00% 5 4 3.00% 3 2.00% 2 1.00% 1 0 0.00% Chess kXML Parallel PNG RegEx Geomean Chess kXML Parallel PNG RegEx Geomean JVM Compile Time DVM Compile Time JVM Compile Overhead DVM Compile Overhead DVM compilation time is 4 times longer 21 Virtual Machine & Optimization Lab

1.2 1.15 1.1 1.05 1 0.95 0.9 0.85 0.8 Chess kXML Parallel PNG RegEx Geomean DVM Original DVM Trace Extension DVM Trace Extension (Opt) Even if we extend the trace and add more optimizations, the impact is not high 22 Virtual Machine & Optimization Lab

 Low code quality due to short trace, low optimization • Expanding the trace would not help much  Little difference for Jelly Bean JITC • A preliminary implementation of a naïve method-based JIT C is included (but disabled currently)  One question: how come Android apps work fine? 23 Virtual Machine & Optimization Lab

 Profile results based on OProfile • DVM portion ( interpreter and JITC code ) • Native portion ( kernel+library and native app )  Run the apps for ~5 sec (since EEMBC runs ~5 sec) Applications Category Running Details Load the stage 1-1 AngryBirds Game Play for 5 seconds DoodleJump Game Seesmic Refresh facebook feed SNS Refresh timeline Twitter SNS Astro File File Navigator Search file system Manager Navigation Navigate constellations Google Sky Map 24 Virtual Machine & Optimization Lab

100% 80% 60% 40% 20% 0% Native Native app DVM Fortunately, the DVM portion is much smaller, so slower DVM affects much less 25 Virtual Machine & Optimization Lab

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Interpreter(except GC) GC JITC 26 Virtual Machine & Optimization Lab

Garbage collection (GC) portion is way too high • GC for benchmarks take less than 2% • GC might be too frequent or takes longer time JITC portion is much smaller than interpreter’s: Why? • Fewer hot spots than benchmarks? • Reuse of JITC-generated code is lower? 27 Virtual Machine & Optimization Lab

Numbers are log scale 1000000 100000 10000 1000 100 10 1 App loops iterate much fewer than benchmark loops. 28 Virtual Machine & Optimization Lab

Numbers are log scale 10000000 1000000 100000 10000 1000 App methods are called much fewer than benchmark methods 29 Virtual Machine & Optimization Lab

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical - PowerPoint PPT Presentation

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical Engineering and Computer Science Seoul National University, Korea Android apps are programmed using Java Android uses DVM instead of JVM for running Java Some people

Download session notes now at www.poppymoon.com Dr. Poppy Moon www.poppymoon.com Dr. Poppy Moon

1 st Blood Moon April 15, 2014 1 st Blood Moon April 15, 2014 A Warning to America 2 nd Blood

Aim I can explain the movement of the Moon. Success Criteria I can explain that the Moon

Development in MOON -- MOON-1 prototype detector status -- NOMACHI, Masaharu , Osaka University

The Moon 2 1 Earths Moon is the largest in size in relation to its object, except for

MOON MOON-1 prototype detector and R&D for SuperNEMO Osaka University M. Nomachi For

The Phases of the Moon By: Miss Hannah Why does the Moon have phases? It depends on the

Financing Moon Base Facilities and Financing Moon Base Facilities and Infrastructure

THE SUN, EARTH, & MOON Phases of the Moon Som omething thing to Thi o Think k Abou out!

How to develop the ARM 64bit board, Samsung TM2 with Exynos5433 Chanwoo Choi

for Large Volumetric Canvas Yeojin Kim 1 , Byungmoon Kim 2 , Jiyang Kim 1 and Young J. Kim 1 1 Ewha

KIM WRIGHT KIM WRIGHT KIM WRIGHT KIM WRIGHT Corporate Director, Community Services, London

Oxygen on the Moon Oxygen on the Moon Group 3 Group 3 Tyler Watt Tyler Watt Brian Pack Brian

1. The Lunar Disc Edexcel GCSE Astronomy Course 2.1 Know the shape of the Moon 2.2 Be able to

Hitch- -hiking to the Moon hiking to the Moon Hitch A Concept and a Proposal for International

HALF MOON BAY HIGH SCHOOL ATHLETICS CODE OF CONDUCT Half Moon Bay High School Athletics aspires

District Governor Club of Brookings Dan Little, DVM Rotary President Holger Knaack and Susanne

Introduction to Linux dynamic device management Birmingham Linux User Group 21 April 2011 Nick

ACM Evaluation Using SDR Channel Emulation 2014/2015 SCS

Wi-FM Resolving Neighborhood Wireless Affairs by Listening to Music Marcel Flores, Uri Klarman,

Research of Theories and Methods of Classification and Dimensionality Reduction Jie Gui (

Sciences http://dvm-system.org Graph problems; Sparce matrices; Scientific and

Towards NFC-Aware Process Execution for Dynamic Environments WiVS 2011 Kristof Hamann Sebastian

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical - PowerPoint PPT Presentation

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical Engineering and Computer Science Seoul National University, Korea Android apps are programmed using Java Android uses DVM instead of JVM for running Java Some people

Download session notes now at www.poppymoon.com Dr. Poppy Moon www.poppymoon.com Dr. Poppy Moon

1 st Blood Moon April 15, 2014 1 st Blood Moon April 15, 2014 A Warning to America 2 nd Blood

Aim I can explain the movement of the Moon. Success Criteria I can explain that the Moon

Development in MOON -- MOON-1 prototype detector status -- NOMACHI, Masaharu , Osaka University

The Moon 2 1 Earths Moon is the largest in size in relation to its object, except for

MOON MOON-1 prototype detector and R&amp;D for SuperNEMO Osaka University M. Nomachi For

The Phases of the Moon By: Miss Hannah Why does the Moon have phases? It depends on the

Financing Moon Base Facilities and Financing Moon Base Facilities and Infrastructure

THE SUN, EARTH, &amp; MOON Phases of the Moon Som omething thing to Thi o Think k Abou out!

How to develop the ARM 64bit board, Samsung TM2 with Exynos5433 Chanwoo Choi

for Large Volumetric Canvas Yeojin Kim 1 , Byungmoon Kim 2 , Jiyang Kim 1 and Young J. Kim 1 1 Ewha

KIM WRIGHT KIM WRIGHT KIM WRIGHT KIM WRIGHT Corporate Director, Community Services, London

Oxygen on the Moon Oxygen on the Moon Group 3 Group 3 Tyler Watt Tyler Watt Brian Pack Brian

1. The Lunar Disc Edexcel GCSE Astronomy Course 2.1 Know the shape of the Moon 2.2 Be able to

Hitch- -hiking to the Moon hiking to the Moon Hitch A Concept and a Proposal for International

HALF MOON BAY HIGH SCHOOL ATHLETICS CODE OF CONDUCT Half Moon Bay High School Athletics aspires

District Governor Club of Brookings Dan Little, DVM Rotary President Holger Knaack and Susanne

Introduction to Linux dynamic device management Birmingham Linux User Group 21 April 2011 Nick

ACM Evaluation Using SDR Channel Emulation 2014/2015 SCS

Wi-FM Resolving Neighborhood Wireless Affairs by Listening to Music Marcel Flores, Uri Klarman,

Research of Theories and Methods of Classification and Dimensionality Reduction Jie Gui (

Sciences http://dvm-system.org Graph problems; Sparce matrices; Scientific and

Towards NFC-Aware Process Execution for Dynamic Environments WiVS 2011 Kristof Hamann Sebastian

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical

MOON MOON-1 prototype detector and R&D for SuperNEMO Osaka University M. Nomachi For

THE SUN, EARTH, & MOON Phases of the Moon Som omething thing to Thi o Think k Abou out!