Towards General-Purpose Neural Network Computing Schuyler Eldridge 1 - PowerPoint PPT Presentation

Towards General-Purpose Neural Network Computing Schuyler Eldridge 1 Amos Waterland 2 Margo Seltzer 2 Jonathan Appavoo 3 Ajay Joshi 1 1 Boston University Department of Electrical and Computer Engineering 2 Harvard University School of Engineering and Applies Sciences 3 Boston University Department of Computer Science 24 th International Conference on Parallel Architectures and Compilation Techniques PACT ’15 1/23

Why Do We Care About Neural Networks? “Good” solutions for hard problems Capable of learning Output Layer Bias Neural networks, again? Hidden The neural network hype cycle Layer has been a bumpy ride Modern, resurgent interest in Hidden neural networks is driven by: Layer Big, real-world data sets “Free” availability of transistors Use of accelerators Input The need for continued Layer performance improvements PACT ’15 2/23

Neural Network Computing is Hot (Again) Existing approaches Dedicated neural network/vector processors from the 1990s [1] Ongoing NPU work for approximate computing [2, 3, 4] High performance deep neural network architectures [5, 6] Neural networks as primitives We treat neural networks as an application primitive [1] J. Wawrzynek et al., “Spert-II: a vector microprocessor system,” Computer , vol. 29, no. 3, pp. 79–86, Mar 1996. [2] H. Esmaeilzadeh et al., “Neural acceleration for general-purpose approximate programs,” in MICRO , 2012. [3] R. St. Amant, et al., “General-purpose code acceleration with limited-precision analog computation,” in ISCA , 2014. [4] T. Moreau, et al., “Snnap: Approximate computing on programmable socs via neural acceleration,” in HPCA , 2015. [5] T. Chen, et al. “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in ASPLOS , 2014. [6] Z. Du, et al., “Shidiannao: shifting vision processing closer to the sensor,” in ISCA , 2015. PACT ’15 3/23

Our Vision of the Future of Neural Network Computing Approximate Automatic Machine Computing [1] Parallelization [2] Learning Process Process Process Process 1 2 3 N Operating System User/Supervisor Interface Multicontext/threaded NN Accelerator input la ye r output la ye r ... input la ye r output la ye r ... input la ye r output la ye r input la ye r output la ye r input la ye r output la ye r ... ... . . ... . . . ... . input la ye r . . . output la ye r . . . ... . . . . . . input la ye r . . . output la ye r . . ... . ... ... . . . . . ... . . . ... . . . . . . . . . . . . . . hidde . n la ye rs . . . ... . . . . . . ... . . hidde n la ye rs . . hidde n la ye rs hidde n la ye rs . hidde n la . ye rs . hidde n la ye rs hidde n la ye rs [1] H. Esmaeilzadeh et al., “Neural acceleration for general-purpose approximate programs,” in MICRO , 2012. [2] A. Waterland et al. “Asc: Automatically scalable computation,” in ASPLOS , 2014. PACT ’15 4/23

Our Contributions Towards this Vision X-FILES: Hardware/Software Extensions E x tensions f or the I ntegration of Machine L earning in E veryday S ystems A defined user and supervisor interface for neural networks This includes supervisor architectural state (hardware) DANA: A Possible Multi-Transaction Accelerator D ynamically A llocated N eural Network A ccelerator An accelerator aligning with our multi transaction vision I apologize for the names There is no association with files or filesystems X-FILES is plural (like extensions) PACT ’15 5/23

An Overview of X-FILES/DANA Hardware X-FILES Arbiter DANA Transaction Core 1 Register File ASID Transaction Table Control Queue L1 Data $ ASID TID NNID State PE Table Core 2 ASID PE-1 Entry-1 L1 Data $ Entry-N PE-N ASID-NNID Table Pointer Num ASIDs Core N ASID NN Config Cache ASID-NNID L1 Data $ Table Memory Entry-1 Walker Memory Entry-2 L2 $ Components General purpose cores Transaction storage A backend accelerator that “executes” transactions Supervisor resources for memory safety Dedicated memory interface PACT ’15 6/23

At the User Level We Deal With “Transactions” Neural Network Transactions A transaction encapsulates a request by a process to compute the output of a specific neural network for a provided input User Transaction API: X-Files newWriteRequest Core Hardware Arbiter writeData readDataPoll Core/Accelerator Interface Identifiers We use the RoCC interface of the Rocket RISC-V NNID : Neural Network ID microprocessor [1, 2] TID : Transaction ID [1] A. Waterman et al., “The risc-v instruction set manual, volume i: User-level isa, version 2.0,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2014-54, May 2014. [2] A. Waterman et al.,, “The risc-v instruction set manual volume ii: Privileged architecture version 1.7,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2015-49, May 2015. PACT ’15 7/23

At the Supervisor Level We Deal With Address Spaces Application Application Use cases: Process Process Process Process 1 2 3 N Single transaction Multiple transactions Operating System Sharing of networks User/Supervisor Interface Multiple networks Multicontext/threaded NN Accelerator input la ye r output la ye r ... input la ye r output la ye r input la ye r output la ye r input la ye r . . . output la ye r ... ... . ... . ... . input la ye r . . . output la ye r . . . . . . . . ... ... . . . ... . . . ... . . . . . . . . . . . . . hidde . n la ye rs . . . . ... . hidde n la ye rs hidde n la ye rs . hidde n la ye . rs . hidde n la ye rs We maintain the state of executing transactions We group transactions into Address Spaces Address Spaces are identified by an OS-managed ASID Each ASID defines the set of accessible networks Networks can be shared transparently if the OS allows this PACT ’15 8/23

An ASID – NNID Table Enables NNID Dereferencing ASID-NNID Table Ptr 0: *ASID-NNID Num NNIDs *IO Queue Num ASIDs 1: *ASID-NNID Num NNIDs *IO Queue 0: *NN Configuration 2: *ASID-NNID Num NNIDs *IO Queue 1: *NN Configuration 2: *NN Configuration Ring Buffers Header Layers Neurons Status/Header *Input *Output Weights ASID – NNID Table The OS establishes and maintains the ASID – NNID Table We assign ASID s and NNID s sequentially The ASID – NNID Table contains an optional asynchronous memory interface PACT ’15 9/23

A Compact Binary Neural Network Configuration binaryPoint Layer0-neuron0Ptr neuron0-weight0Ptr neuron0-weight0 neuronsInLayer neuron0-numberOfWeights neuron0-weight1 totalEdges totalNeurons neuronsInNextLayer neuron0-activationFunction neuron0-weight2 Info neuron0-steepness neuron0-weight3 totalLayers Layers Weights layer1-neuron0Ptr Neurons weightsPtr neuron0-bias neuronsInLayer neuron1-weight0 ... neuronsInNextLayer neuron1-weight0Ptr ... We condense the normal FANN neural network data structure We use a reduced configuration from the Fast Artificial Neural Network (FANN) library [1] containing: Global information Per-layer information Per-neuron information Per-neuron weights [1] S. Nissen, “Implementation of a fast artificial neural network library (fann),” Department of Computer Science University of Copenhagen (DIKU), Tech. Rep., 2003. PACT ’15 10/23

DANA: An Example Multi-Transaction Accelerator PE Table Register File Entry-1 PE-1 Entry-2 PE-2 Transaction Control Entry-N PE-N Table NN Configuration Cache Cache Memory-1 Entry-1 Cache Memory-2 Entry-2 NN Transaction-1 IO Memory X-FILES NN Transaction-2 IO Memory Arbiter DANA Components Control logic determines actions given transaction state Network configurations are stored in a Configuration Cache Per-transaction IO Memory stores inputs and outputs A Register File stores intermediate outputs Logical neurons are mapped to Processing Elements PACT ’15 11/23

DANA: Single Transaction Execution Register File PE Table PE1 Output PE2 Layer PE3 Transaction PE4 Table Bias Control Hidden NN Configuration Cache Layer Cache Memory-1 ASID/NNID Per-Transaction IO Memory Input Layer X-FILES Arbiter DANA PACT ’15 12/23

DANA: Multi-Transaction Execution Output Layer Bias Hidden Layer Register File PE Table R-1 R-2 R-3 R-1 R-2 R-3 PE1 PE2 Input PE3 Transaction Layer PE4 Table Control TID-1 TID-2 NN Configuration Cache Output Cache Memory-1 ASID/NNID Layer Cache Memory-2 ASID/NNID Bias Per-Transaction IO Memory I-1 I-2 Hidden X-FILES I-1 I-2 Layer Arbiter DANA Input Layer PACT ’15 13/23

We Organize All Data in Blocks of Elements 4 Elements Per Block element 4 element 3 element 2 element 1 8 Elements Per Block element 8 element 7 element 6 element 5 element 4 element 3 element 2 element 1 Blocks for temporal locality We exploit neural network temporal locality of data Here, data refers to inputs or weights Larger block widths reduce inter-module communication Block width is an architectural parameter PACT ’15 14/23

Towards General-Purpose Neural Network Computing Schuyler Eldridge 1 - PowerPoint PPT Presentation

Towards General-Purpose Neural Network Computing Schuyler Eldridge 1 Amos Waterland 2 Margo Seltzer 2 Jonathan Appavoo 3 Ajay Joshi 1 1 Boston University Department of Electrical and Computer Engineering 2 Harvard University School of Engineering

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Extreme Neural Network Computing Transforms Speech Quality Extreme Neural Network

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Intelligent Computing: Neural Network Case Time to Gather Stones Quantum Computing Quantum

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

2019 Montana News Consumption Survey Sponsored by the Greater Montana Foundation Methods Summary

900 M Arun-3 HEP An Overview SJVN Arun-3 Power Development Company (SAPDC) SAPDC is

Nor Northern thern Paci acific ic Rail Railway y Dr Drawin wings gs Dean ONeill

November 11, 2003 R. Lai, M. Siddiqui, B. Pitman, M. Nishimoto, K. Johnson, S. Din, O. Fordham,

Refinery Operations Planning Sarah Kuper Sarah Shobe Andy Hill Refinery Operations Planning

Introduction to the 9 Elements of a Watershed Restoration Plan Maya Cough-Schulze | Nonpoint

"Raising More Money wit ith Fewer People What You Can Do to Make a Dif ifference"

Grady Hig igh School Additions and Renovations August 21, 2019 Agenda Welcome Betsy Bockman

Towards General-Purpose Neural Network Computing Schuyler Eldridge 1 - PowerPoint PPT Presentation

Towards General-Purpose Neural Network Computing Schuyler Eldridge 1 Amos Waterland 2 Margo Seltzer 2 Jonathan Appavoo 3 Ajay Joshi 1 1 Boston University Department of Electrical and Computer Engineering 2 Harvard University School of Engineering

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Extreme Neural Network Computing Transforms Speech Quality Extreme Neural Network

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Intelligent Computing: Neural Network Case Time to Gather Stones Quantum Computing Quantum

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

2019 Montana News Consumption Survey Sponsored by the Greater Montana Foundation Methods Summary

900 M Arun-3 HEP An Overview SJVN Arun-3 Power Development Company (SAPDC) SAPDC is

Nor Northern thern Paci acific ic Rail Railway y Dr Drawin wings gs Dean ONeill

November 11, 2003 R. Lai, M. Siddiqui, B. Pitman, M. Nishimoto, K. Johnson, S. Din, O. Fordham,

Refinery Operations Planning Sarah Kuper Sarah Shobe Andy Hill Refinery Operations Planning

Introduction to the 9 Elements of a Watershed Restoration Plan Maya Cough-Schulze | Nonpoint

&quot;Raising More Money wit ith Fewer People What You Can Do to Make a Dif ifference&quot;

Grady Hig igh School Additions and Renovations August 21, 2019 Agenda Welcome Betsy Bockman

"Raising More Money wit ith Fewer People What You Can Do to Make a Dif ifference"