hera jvm
play

Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual - PowerPoint PPT Presentation

Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine Ross McIlroy and Joe Sventek University of Glasgow Department of Computing Science Carnegie Trust for the Universities of Scotland Heterogeneous Multi-Core Architectures


  1. Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine Ross McIlroy and Joe Sventek University of Glasgow Department of Computing Science Carnegie Trust for the Universities of Scotland

  2. Heterogeneous Multi-Core Architectures • CPUs are becoming increasingly Multi-Core • Should these cores all be identical? - Specialise cores for particular workloads - Large core for sequential code, many small cores for parallel code • Found in specialist niches currently - e.g. network processors (Intel IXP), games consoles (Cell) • Likely to become more common - On-chip GPUs (AMD Fusion), Intel Larrabee

  3. Developing for HMAs Application Threads

  4. Developing for HMAs Main Arch Code Secondary Arch Code Application Threads

  5. Developing for HMAs Main Arch Code Secondary Arch Code Main Core Secondary Cores

  6. Developing for HMAs Main Arch Code Secondary Arch Code Support Code Main Core Secondary Cores

  7. Developing for HMAs Main Arch Code Secondary Arch Code Support Code Main Core Secondary Cores

  8. Developing for HMAs Main Arch Code Secondary Arch Code Support Code Main Core Secondary Cores

  9. Developing for HMAs Main Arch Code Secondary Arch Code Support Code Libraries main.o secondary.o Main Core Secondary Cores

  10. Hera-JVM • Hide this heterogeneity from the application developer - Present the illusion of a homogeneous multi-threaded virtual machine - The same code will run on either core type • Runtime system is aware of heterogeneous resources - Can transparently migrate threads between core types based upon this knowledge • Provide portable application behaviour hints to enable runtime system to infer the application’s heterogeneity - Explicit Code Annotations - Static Code Analysis / Typing information - Runtime Monitoring / Profiling

  11. Developing for Hera-JVM Application Threads Main Core Secondary Cores

  12. Developing for Hera-JVM Application Threads Branching Sequential Integer Float Random Memory Code Memory Access Access Main Core Secondary Cores

  13. Developing for Hera-JVM Application Threads Branching Sequential Integer Float Random Memory Code Memory Access Access Runtime System Rand Int, Float, Seq Main Core Sec. Core Costs Int, Float Costs Rand Main Core Secondary Cores

  14. Developing for Hera-JVM Application Threads Branching Sequential Integer Float Random Memory Code Memory Access Access Runtime System Rand Int, Float, Seq Main Core Sec. Core Costs Int, Float Costs Rand Main Core Secondary Cores

  15. Developing for Hera-JVM Application Threads Branching Sequential Integer Float Random Memory Code Memory Access Access Runtime System Rand Int, Float, Seq Main Core Sec. Core Costs Int, Float Costs Rand Main Core Secondary Cores

  16. Developing for Hera-JVM Application Threads Branching Sequential Integer Float Random Memory Code Memory Access Access Runtime System Rand Int, Float, Seq Main Core Sec. Core Costs Int, Float Costs Rand Main Core Secondary Cores

  17. Developing for Hera-JVM Application Threads Branching Sequential Integer Float Random Memory Code Memory Access Access Runtime System Rand Int, Float, Seq Main Core Sec. Core Costs Int, Float Costs Rand Main Core Secondary Cores

  18. Developing for Hera-JVM Application Threads Branching Sequential Integer Float Random Memory Code Memory Access Access Runtime System Rand Int, Float, Seq Main Core Sec. Core Costs Int, Float Costs Rand Main Core Secondary Cores

  19. Developing for Hera-JVM Application Threads Branching Sequential Integer Float Random Memory Code Memory Access Access Runtime System Rand Int, Float, Seq Main Core Sec. Core Costs Int, Float Costs Rand Main Core Secondary Cores

  20. Developing for Hera-JVM Application Threads Branching Sequential Integer Float Random Memory Code Memory Access Access Runtime System Rand Int, Float, Seq Main Core Sec. Core Costs Int, Float Costs Rand Main Core Secondary Cores

  21. Cell Processor

  22. A JVM for Two Architectures • Built upon JikesRVM - Java in Java - PowerPC and x86 support

  23. A JVM for Two Architectures • Built upon JikesRVM - Java in Java - PowerPC and x86 support Application Java Library Runtime System Low Level PPE Assembly Compiler PPE Assembler

  24. A JVM for Two Architectures • Built upon JikesRVM - Java in Java - PowerPC and x86 support Application Java Library Runtime System Low Level PPE Assembly Compiler PPE Assembler SPE Assembler

  25. A JVM for Two Architectures • Built upon JikesRVM - Java in Java - PowerPC and x86 support Application Java Library Runtime System Low Level PPE Low Level Assembly Compiler Assembly PPE Assembler SPE Assembler

  26. A JVM for Two Architectures • Built upon JikesRVM - Java in Java - PowerPC and x86 support Application Java Library Runtime System Low Level PPE Low Level SPE Assembly Compiler Assembly Compiler PPE Assembler SPE Assembler

  27. A JVM for Two Architectures • Built upon JikesRVM - Java in Java - PowerPC and x86 support Application Java Library Runtime System Low Level PPE Low Level SPE Assembly Compiler Assembly Compiler PPE Assembler SPE Assembler

  28. A JVM for Two Architectures • Built upon JikesRVM - Java in Java - PowerPC and x86 support Application Java Library Runtime System Low Level PPE Low Level SPE Assembly Compiler Assembly Compiler PPE Assembler SPE Assembler

  29. A JVM for Two Architectures • Built upon JikesRVM - Java in Java - PowerPC and x86 support Application Java Library Runtime System Low Level PPE Low Level SPE Assembly Compiler Assembly Compiler PPE Assembler SPE Assembler

  30. A JVM for Two Architectures • Built upon JikesRVM - Java in Java - PowerPC and x86 support Application Java Library Runtime System Low Level PPE Low Level SPE Assembly Compiler Assembly Compiler PPE Assembler SPE Assembler

  31. Migration • A thread can migrate between the PPE and SPE cores at any method invocation - Migration is triggered either by an explicit annotation or is signalled dynamically by the scheduler - Syscalls and native methods always migrate back to PPE • Migration from core type A to B: - Thread “traps” to support code on core A, which saves arguments - Method JITed for core type B if required - Migration marker and migration support frame pushed onto stack - Thread placed on ready queue of core type B

  32. SPE Local Memory • Instead of a cache, SPEs have 256KB of explicitly accessible local memory • Main memory accessed through DMA using MFC (Memory Flow Controller) • Setting up many small DMA transfers is costly Main Local Memory SPE MFC Memory

  33. Software Caching in a High Level Language • Java bytecodes are typed, therefore, we have high level knowledge of what’s being cached - Cache an object completely when it is accessed - Cache arrays in 1KB blocks • Java memory model only requires coherency operations at synchronisation points • Methods are cached in their entirety when invoked

  34. Hera-JVM Performance Single Threaded &" SPE v.s. PPE Speedup %#$" %" !#$" !" & & & & & & & & & & & . + 0 $ ) . $ % $ ! % $ 4 + ! $ ! , , 3 7 , % # * % + % % # 1 ! " 2 ( ( - " ( ( % 0 # " 1 , : / / , % ! $ 6 + + ! 9 + " ! ' ! 5 * . & # . ) 3 " + ) $ ) ( $ / $ : ( ! 8 " ' ' ' # " !

  35. Hera-JVM Performance Single Threaded &" SPE v.s. PPE Speedup %#$" %" !#$" !" & & & & & & & & & & & . + 0 $ ) . $ % $ ! % $ 4 + ! $ ! , , 3 7 , % # * % + % % # 1 ! " 2 ( ( - " ( ( % 0 # " 1 , : / / , % ! $ 6 + + ! 9 + " ! ' ! 5 * . & # . ) 3 " + ) $ ) ( $ / $ : ( ! 8 " ' ' ' # " !

  36. 6 SPEs v.s. PPE Speedup '!" '#" Hera-JVM Performance !" #" $" %" &" ( ) * ( + , " - . / 0 1 2 3 , + 4 " ( ) * ( 5 . , ( 1 " ( Multi-Threaded ) * ( 2 6 ) " * ) (6 threads) + / 4 1 7 . , 2 + " - + 2 8 9 / " ( ) * : 4 " - 5 1 ; < 6 0 = + " > . 9 " ? , . 7 1 , " - + / 4 1 @ . , 2 + " @ + ) 5 , 1 ( ( "

  37. Proportion of Execution Time by Operation 96*1:50#;*458# )*+,-.//# <58.0.-# =-15)># +,.01234*# ?81)@# A*)16#B.+*-C# +153.67-*8# B145#B.+*-C# !"# $!"# %!"# &!"# '!"# (!!"#

  38. Data Cache Hit-Rate )*+,-.//$ +,.01234*$ +153.67-*8$ ($ !"#"$%&#$'"#($ !"'%$ !"'$ !"&%$ !"&$ !"#%$ !"#$%#&'()"**************** !"#$ +"+$ +#",'-."*/%*0123*4"$'5,/6* +$ !"*$ !")$ !"($ !"'$ !"&$ !"%$ !"#$ +!%$ *'$ ))$ )!$ (,$ '%$ &'$ %)$ %!$ #,$ ,%$ +'$ )$ !$ 7'/'*8')9"*:;<"*+236*

Recommend


More recommend