Accelerating NNEF Framework on OpenCL Devices Using clDNN - PowerPoint PPT Presentation

Accelerating NNEF Framework on OpenCL Devices Using clDNN Meng-Shiun Yu, Tai-Liang Chen, and Jenq-Kuen Lee Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan {msyu, tlchen}@pllab.cs.nthu.edu.tw, jklee@cs.nthu.edu.tw IWOCL 2020 - The 8th International Workshop on OpenCL

Agenda • Overview • Design of Software Stack • Experiments Results 2 IWOCL 2020 - The 8th International Workshop on OpenCL

Background • NNEF - Neural Network Exchange Format An intermediate representation of open specification and the well-defined Vision/AI Applications Vision and Neural Net Trained Networks Inferencing Runtime GPU CPU GPU 3 IWOCL 2020 - The 8th International Workshop on OpenCL

Overview Training frameworks NNEF NNEF Converter Translator clDNN Intel HD Graphics 4 IWOCL 2020 - The 8th International Workshop on OpenCL

The Flow for NNEF Enabled in clDNN with OpenCL NNEF-Tools Parser AI framework: TensorFlow, Caffe, Mobilenet_v1 PyTorch, … graph.nnef Mobilenet_v1 kernel.dat beginGraph(…) operation(…) endGraph(…) clDNN - Construct Topology Initial engine Add operator Build Setup Input & / topology into topology Network Inference Neural Network Compilation Distribution to Execution Neural Network OpenCL Kernel Inferencing Results 5 IWOCL 2020 - The 8th International Workshop on OpenCL

The Flow for NNEF Enabled in clDNN with OpenCL 6 IWOCL 2020 - The 8th International Workshop on OpenCL

NNEF Interpreter void cldnn_add_operation(cldnn::engine &engine, cldnn::topology &topology, Operation operation) { auto id = operation.outputs.get(0).identifier(); static map<string, Operation> op_dict; op_dict[id] = operation; /* input node */ if ("external" == operation.name) { add_input_node(engine, topology, operation); } else if ("variable" == operation.name) { add_data_node(engine, topology, operation); } else if ("conv" == operation.name) { add_op_conv(engine, topology, operation, op_dict); } else if ("add" == operation.name) { add_op_add(engine, topology, operation); } … else { std::cout << "unsupported op: " << operation.name << std::endl; } } 7 IWOCL 2020 - The 8th International Workshop on OpenCL

NNEF Interpreter static void add_op_conv(cldnn::engine &engine, cldnn::topology topology, Operation &operation, map<string, Operation> op_dict, struct op_shape &shape_info) { string output = operation.outputs.get(0).identifier(); string input = operation.inputs.get(0).identifier(); string weight = operation.inputs.get(1).identifier(); auto stride_shape = operation.attribs.get("stride"). … vector<int> dia_v{dia_h, dia_w}; tensor dia_ts(dia_v); vector<int> stride{1,1,stride_h, stride_w}; tensor stride_ts(stride); vector<int> pad_v{0, 0, padding_h, padding_w}; tensor pad_ts(pad_v); ... auto conv_op = convolution(name, input, {weight}, {bias_name}, stride_ts, pad_ts, dia_ts, false, 1.0, last_pad_ts); topology.add(conv_op); } 8 IWOCL 2020 - The 8th International Workshop on OpenCL

NNEF Interpreter void cldnn_execute(cldnn::engine& engine, cldnn::topology& topology) { vector<float> ftensor; load_image(input_img, ftensor); network network(engine, topology); layout in_layout(data_types::f32, format::bfyx, {1,3,224,224}); memory input_mem = memory::allocate(engine, in_layout); set_values(input_mem, move(ftensor)); network.set_input_data("input", input_mem); auto outputs = network.execute(); auto output_ptr = outputs.at("output").get_memory().pointer<float>(); ... } 9 IWOCL 2020 - The 8th International Workshop on OpenCL

Experiments Environments Hardware: • Intel Core i7-7700 CPU 3.60GHz • HD Graphics 630 graphics card Software: • clDNN 2019 R2 • OpenCL 2.1 • NNEF parser v1.0 10 IWOCL 2020 - The 8th International Workshop on OpenCL

Experimental Results 11 IWOCL 2020 - The 8th International Workshop on OpenCL

Conclusion • We proposed a translator that accelerated NNEF on OpenCL devices via clDNN. • The experimental results shown that we improved the execution efficiency about six times 12 IWOCL 2020 - The 8th International Workshop on OpenCL

Accelerating NNEF Framework on OpenCL Devices Using clDNN - PowerPoint PPT Presentation

Accelerating NNEF Framework on OpenCL Devices Using clDNN Meng-Shiun Yu, Tai-Liang Chen, and Jenq-Kuen Lee Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan {msyu, tlchen}@pllab.cs.nthu.edu.tw, jklee@cs.nthu.edu.tw

OpenCL Kernel Compilation Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Accelerating Tandem MS Protein Database Searches Using OpenCL Programming devices the

Han Dong Dibyajyoti Ghosh Fahad Zafar Shujia Zhou Motivation Explore OpenCL in accelerating

Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014.

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Introduction to OpenCL David Black-Schaffer david.black-schaffer@it.uu.se 1 Disclaimer I

OpenCL on FPGAs Contains material from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Synchronization in OpenCL Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Training neural network acoustic models on (multichannel) waveforms Ron Weiss in SANE 2015

Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL Dmitri

GPU Parallel Implementation of The Approximate K-SVD Algorithm Using OpenCL Paul Irofti 1 Bogdan

CUDA (Compute Unified Device Dr. Bharathwaj Bharath Muthuswamy Architecture) and OpenCL

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and

PERFORMANCE CONSIDERATIONS FOR OPENCL ON NVIDIA GPUS Karthik Raghavan Ravi, 4/4/16 THE PROBLEM

Experiences with OpenCL in PyFR: 2014Present F.D. Witherden 1 and P.E. Vincent 2 1 Department

Improving Performance of OpenCL on CPUs Ralf Karrenberg karrenberg@cs.uni-saarland.de Sebastian

THE OPERATIONAL PERSPECTIVE Solomon Feferman ******** Advances in Proof Theory In honor of

MLIR Tutorial: Building a Compiler with MLIR LLVM Developers Meeting, Euro-LLVM 2019 Mehdi Amini

CS 423 Operating System Design: This is the Syllabus Professor Adam Bates Fall 2018

Data Core Operations Todor Ivanov 1 , Ahmad Ghazal 2 , Alain Crolotte 3 , Pekka Kostamaa 3 , Yoseph

Introduction Outline What is an operating system? History of operating systems

Splunk Adaptive Operations Framework Technology Partner FAQ Last updated 09/2018 STRATEGIC

Chapter 3: Operating Systems Daniel Merkle Based on Slides by J. Glenn Brookshear and DM526

Saratoga a bundle convergence layer draft-wood-dtnrg-saratoga-01.txt Lloyd Wood Wood, Eddy,