arxiv 1602 04283v1 cs dc 13 feb 2016
play

arXiv:1602.04283v1 [cs.DC] 13 Feb 2016 ABSTRACT formance in - PDF document

Deep Learning on FPGAs: Past, Present, and Future Griffin Lacey Graham Taylor Shawki Areibi University of Guelph University of Guelph University of Guelph 50 Stone Rd E 50 Stone Rd E 50 Stone Rd E Guelph, Ontario Guelph, Ontario Guelph,


  1. Deep Learning on FPGAs: Past, Present, and Future Griffin Lacey Graham Taylor Shawki Areibi University of Guelph University of Guelph University of Guelph 50 Stone Rd E 50 Stone Rd E 50 Stone Rd E Guelph, Ontario Guelph, Ontario Guelph, Ontario laceyg@uoguelph.ca gwtaylor@uoguelph.ca sareibi@uoguelph.ca arXiv:1602.04283v1 [cs.DC] 13 Feb 2016 ABSTRACT formance in important domains such as computer vision, speech recognition, and natural language processing. The The rapid growth of data size and accessibility in recent study of these data-driven techniques is called deep learn- years has instigated a shift of philosophy in algorithm de- ing, and is seeing significant attention from two important sign for artificial intelligence. Instead of engineering algo- groups of the technology community: researchers, who are rithms by hand, the ability to learn composable systems au- interested in exploring and training these models to achieve tomatically from massive amounts of data has led to ground- top performance across tasks, and application scientists, who breaking performance in important domains such as com- are interested in deploying these models for novel, real world puter vision, speech recognition, and natural language pro- applications. However, both of these groups are limited by cessing. The most popular class of techniques used in these the need for better hardware acceleration to accommodate domains is called deep learning , and is seeing significant scaling beyond current data and algorithm sizes. attention from industry. However, these models require in- The current state of hardware acceleration for deep learn- credible amounts of data and compute power to train, and ing is largely dominated by using clusters of graphics pro- are limited by the need for better hardware acceleration cessing units (GPU) as general purpose processors (GPGPU) to accommodate scaling beyond current data and model [18]. GPUs have orders of magnitude more computational sizes. While the current solution has been to use clusters cores compared to traditional general purpose processors of graphics processing units (GPU) as general purpose pro- (GPP), and allow a greater ability to perform parallel com- cessors (GPGPU), the use of field programmable gate arrays putations. In particular, the NVIDIA CUDA platform for (FPGA) provide an interesting alternative. Current trends GPGPU programming is most dominant, with major deep in design tools for FPGAs have made them more compatible learning tools utilizing this platform to access GPU accel- with the high-level software practices typically practiced in eration [16, 26, 13, 19]. More recently, the open parallel the deep learning community, making FPGAs more accessi- programming standard OpenCL has gained traction as an ble to those who build and deploy models. Since FPGA ar- alternative tool for heterogeneous hardware programming, chitectures are flexible, this could also allow researchers the with interest from these popular tools gaining momentum. ability to explore model-level optimizations beyond what is OpenCL, while trailing CUDA in terms of support in the possible on fixed architectures such as GPUs. As well, FP- deep learning community, has two unique features which dis- GAs tend to provide high performance per watt of power tinguish itself from CUDA. First is the open source, royalty- consumption, which is of particular importance for appli- free standard for development, as opposed to the single ven- cation scientists interested in large scale server-based de- dor support of CUDA. The second is the support for a wide ployment or resource-limited embedded applications. This variety of alternative hardware including GPUs, GPPs, field review takes a look at deep learning and FPGAs from a programmable gate-arrays (FPGA), and digital signal pro- hardware acceleration perspective, identifying trends and cessors (DSP). innovations that make these technologies a natural fit, and motivates a discussion on how FPGAs may best serve the 1.1 The Case for FPGAs needs of the deep learning community moving forward. The imminent support for alternative hardware is espe- 1. INTRODUCTION cially important for FPGAs, a strong competitor to GPUs The effects of machine learning on our everyday life are for algorithm acceleration. Unlike GPUs, these devices have far-reaching. Whether you are clicking through personal- a flexible hardware configuration, and often provide better ized recommendations on websites, using speech to commu- performance per watt than GPUs for subroutines important nicate with your smart-phone, or using face-detection to get to deep learning, such as sliding-windows computation [24]. the perfect picture on your digital camera, some form of However, programming of these devices requires hardware artificial intelligence is involved. This new wave of artifi- specific knowledge that many researchers and application cial intelligence is accompanied by a shift in philosophy for scientists may not possess, and as such, FPGAs have been algorithm design. Where past attempts at learning from often considered a specialist architecture. Recently, FPGA data involved much “feature engineering” by hand using ex- tools have adopted software-level programming models, in- pert domain-specific knowledge, the ability to learn compos- cluding OpenCL, which has made them a more attractive able feature extraction systems automatically from massive option for users trained in mainstream software development amounts of example data has led to ground-breaking per- practices.

Recommend


More recommend