automatic generation of opencl code for arm architectures
play

Automatic Generation of OpenCL Code for ARM Architectures Sergio - PowerPoint PPT Presentation

Automatic Generation of OpenCL Code for ARM Architectures Sergio Afonso Alejandro Acosta Francisco Almeida safonsof@ull.es aacostad@ull.es falmeida@ull.es High Performance Computing Group: http://cap.pcg.ull.es/ Introduction Systems on


  1. Automatic Generation of OpenCL Code for ARM Architectures Sergio Afonso Alejandro Acosta Francisco Almeida safonsof@ull.es aacostad@ull.es falmeida@ull.es High Performance Computing Group: http://cap.pcg.ull.es/

  2. Introduction • Systems on Chip have experienced an increase of performance due to the growth of the smartphone market • We can find heterogeneity between different devices and between processors contained in each one of them • It is still difficult to write portable high performance code for heterogeneous platforms such as these ones • There are tools to automatically obtain accelerated code (OpenMP, OpenACC), but they are not designed for their use in the development of mobile applications • A great range of computer vision and image processing applications in mobile devices benefit from parallel processing Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

  3. Android • Java: Main Renderscript Java programming language Native Source Code Source Code Source Code of Android and easiest (.rs) (.java) (.c) llvm-rs-cc Java Compiler to program, for which Compiler an extensive range of GCC Java llvm-rs-cc tools is provided Compiler ByteCode Compiler (.class) • Renderscript: A Dex Compiler language for high performance computing LLVM Dalvik JNI Native in Android, which allows Executable ByteCode Libraries (.dex) (.bc) data parallelism Android application package (.APK) • Native (JNI): Useful LIBBCC (LLVM) Android Run Time (ART) for reusing C/C++ code and for accessing vendor-specific native ... CPU CPU GPU Memory libraries Hardware Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

  4. Paralldroid • It unifies all Android Java programming models: It can Source Code (.java) generate Renderscript, OpenCL Renderscript Native / OpenCL Generator Generators and native code Paralldroid Code • It defines a set of annotations Refactoring that suit more naturally a Java Renderscript Java Native Source Code Source Code Source Code program than i.e. OpenMP (.rs) (.java) (.c) llvm-rs-cc Java Compiler Compiler • It makes the development of GCC Java llvm-rs-cc parallel methods substantially Compiler ByteCode Compiler (.class) easier than plain OpenCL and Dex Compiler all the code needed in order to Dalvik run it from Java LLVM JNI Native Executable ByteCode Libraries (.bc) (.dex) • It is implemented as an Android application package (.APK) extension to the OpenJDK Android Run Time (ART) LIBBCC (LLVM) compiler ... CPU CPU GPU Memory Hardware Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

  5. Paralldroid Native Create Native Code OCL Tree Translator Java Code Code Native AST OpenCL OCL Kernel Create OpenCL Code Code Tree Translator OpenCL AST OpenJDK Java Parser Java Java AST Translator Create Java Code Code Java AST Native Create Native Code Native Tree Translator Code Annotations Native AST detector Java Java AST Translator Create Java Code Code Java AST Java AST RS Create RS Code RS Tree Translator Code RS AST Java Create Java Code Java AST Translator Code Java AST Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

  6. Example @Target(OPENCL) public class GrayScale { @Declare private float gMonoMult[] = {0.299f, 0.587f, 0.114f}; @Map(TO) private int width; @Map(TO) private int height; public GrayScale(int width, int height) { this.width = width; this.height = height; } @Parallel public void run(@Input Bitmap src, @Output Bitmap out, @Index int x, @Index int y) { int pixel = src.getPixel(x, y); int acc; acc = (int)(Color.red(pixel) * gMonoMult[0]); acc += (int)(Color.green(pixel) * gMonoMult[1]); acc += (int)(Color.blue(pixel) * gMonoMult[2]); out.setPixel(x, y, Color.argb(Color.alpha(pixel), acc, acc, acc)); } } Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

  7. Computational results 640x480 1920x1080 20 20 Gen. OCL | SXZ Gen. OCL | SXZ Gen. RS | SXZ Gen. RS | SXZ Gen. OCL | XU3 Gen. OCL | XU3 15 Gen. RS | XU3 15 Gen. RS | XU3 Speed-up ⟶ Speed-up ⟶ 10 10 5 5 0 0 GrayScale Levels Convolve3x3 Convolve5x5 GrayScale Levels Convolve3x3 Convolve5x5 Algorithm ⟶ Algorithm ⟶ GrayScale Convolve5x5 20 20 Gen. OCL | SXZ Gen. OCL | XU3 Gen. OCL | SXZ Gen. OCL | XU3 Gen. RS | SXZ Gen. RS | XU3 Gen. RS | SXZ Gen. RS | XU3 15 15 Speed-up ⟶ Speed-up ⟶ 10 10 5 5 0 0 640x480 1024x768 1280x720 1366x768 1600x900 1920x1080 640x480 1024x768 1280x720 1366x768 1600x900 1920x1080 Image dimensions (px) ⟶ Image dimensions (px) ⟶ Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

  8. Conclusion • Paralldroid eases the acceleration of Android applications by means of Java annotations and AST (Abstract Syntax Tree) transformations • Our annotations are familiar to developers that know OpenMP, but they are also adapted to the object-oriented programming paradigm • Each Java class in an application can be implemented using a different programming language, so that each algorithm can run using the one that provides the best results transparently • We are currently working on improving the performance of the OpenCL backend Acknowledgement: This work was supported by the Spanish Ministry of Education and Science through the TIN2011-24598 and TIN2016-78919-R projects, the CAPAP-H network, the NESUS IC1315 COST Action and the EC (ERDF) Sergio Afonso, Alejandro Acosta, Francisco Almeida Automatic Generation of OpenCL Code for ARM Architectures

Recommend


More recommend