brain inspired computing for advanced image and pattern
play

BRAIN-INSPIRED COMPUTING FOR ADVANCED IMAGE AND PATTERN RECOGNITION - PowerPoint PPT Presentation

BRAIN-INSPIRED COMPUTING FOR ADVANCED IMAGE AND PATTERN RECOGNITION Leti Devices Workshop | Marc DURANTON | December 4, 2016 IMAGE RECOGNITION: KEY FOR FUTURE APPLICATIONS Assemble Nationale Oblisque de Louxor = Rue Royale Near rue


  1. BRAIN-INSPIRED COMPUTING FOR ADVANCED IMAGE AND PATTERN RECOGNITION Leti Devices Workshop | Marc DURANTON | December 4, 2016

  2. IMAGE RECOGNITION: KEY FOR FUTURE APPLICATIONS Assemblée Nationale Obélisque de Louxor = Rue Royale Near rue Saint-Honoré Bus turning Truck Car Car Leti Devices Workshop | Marc Duranton | December 4, 2016 | 2

  3. Leti Devices Workshop | Marc Duranton | December 4, 2016 | 3

  4. COMPETITION ON IMAGENET: SINCE 2012, CONVOLUTIONAL NEURAL NETWORKS (CNN) ARE LEADING! Team/algorithm Date Test error From NVIDIA Supervision 2012 15.3% Clarifai 2013 11.7% GoogLeNet 2014 6.66% Microsoft 05/02/2015 4.94% Google 02/03/2015 4.82% Baidu/ Deep Image 10/05/2015 4.58% Shenzhen Institutes 10/12/2015 3.57% of Advanced (the CNN has 152 Technology, Chinese layers) Academy of Sciences Now ? Leti Devices Workshop | Marc Duranton | December 4, 2016 | 4

  5. Leti Devices Workshop | Marc Duranton | December 4, 2016 | 5

  6. DEEP LEARNING AND NEUROMORPHIC SYSTEMS AT LETI AND LIST EXPLORATION & EXPLOITATION IMPLEMENTATION Neuromorphic MATERIALS & DEVICES Leti Devices Workshop | Marc Duranton | December 4, 2016 Leti Devices Workshop | Marc Duranton | December 4, 2016 | 6

  7. DEEP LEARNING AND NEUROMORPHIC SYSTEMS AT LETI AND LIST Exploitation of Deep Neural Networks • Image recognition, annotation and indexing EXPLORATION & Tools for fast and accurate Neural EXPLOITATION Network (NN) exploration & Architecture benchmarking: N2D2 • Neural Network exploration (including with spike coding and new materials) Neuromorphic Leti Devices Workshop | Marc Duranton | December 4, 2016 Leti Devices Workshop | Marc Duranton | December 4, 2016 | 7

  8. N2D2: PLATFORM FOR DEVELOPING DEEP DEEP LEARNING WITH N2-D2 PLATFORM NEURAL NETWORK APPLICATIONS • N2D2 is a platform to design and generate deep neural network (DNN) and to select the computing platform which fit best application needs • Fast benchmarking of Components Off the Shelf and exports to dedicated ASIC: • Parallel processors (OpenCL, OpenMP) • GPU (OpenCL, Cuda, CuDNN) • FPGA (RTL, HLS) • Leti & List specific processors (like P-Neuro ) Emulated MPSOC DSP NN GPU Technology Accessibility FPGA Energy Efficiency Digital IC NeuroDSP Spike NN Spider Signal IC Reptile Mix NVRAM + Spike Leti Devices Workshop | Marc Duranton | December 4, 2016 Leti Devices Workshop | Marc Duranton | December 4, 2016 | 8

  9. FAST AND ACCURATE NN EXPLORATION Automated architecture mapping and benchmarking tool flow 3) Analysis of network 1) Deep network builder 2) Learning a database performances ; Environment Type=Pool Output [env] PoolWidth=2 Recon. rate Learning SizeX=8 PoolHeight=2 categories and SizeY=8 NbChannels=32 ConfigSection=env.config Stride=2 localization [env.config] ; Third layer (fully connected) ImageScale=0 [fc1] Input=conv2 ; First layer (convolutionnal) Type=Fc Test Recon. [conv1] NbOutputs=100 Input=env rate Type=Conv ; Output layer (fully KernelWidth=3 connected) KernelHeight=3 [fc2] NbChannels=32 Input=fc1 Stride=1 Type=Fc N2D2 software framework NbOutputs=10 Inference phase ; Second layer (pooling) [pool1] Input=conv1 4) CPU, GPU and FPGA-based real-time implementation � OpenMP � OpenCL � HLS FPGA � � � � Wide targets range, perfs and power metrics Leti Devices Workshop | Marc Duranton | December 4, 2016 Leti Devices Workshop | Marc Duranton | December 4, 2016 | 9

  10. EXAMPLE OF INDUSTRIAL APPLICATION of N2D2: ROLLING MILL CONSTRAINTS SOLUTION Real time with very high throughput (20m/s) • Database labelling and Processing • Tiny defect ( ~ mm) with low contrast Fast NN topology Exploration • Complex environment (oil vapor, few space for inspection..) Performance vs complexity analysis � � � Real time performance achievable on FPGA (direct code generation) � � � � � From scratch exploration (database and NN construction) to industrial application 1) Defects labeling and visualization 2) NN Exploration and benchmarking 3) Defects identifications after NN learning Recon. rate Recon. rate Learning Computing complexity Test Recon. rate 40 60 40 60 40 60 40 60 40 60 40 60 60 3x3 3x3 5x5 5x5 3x3 3x3 5x5 5x5 3x3 3x3 5x5 5x5 3x3 8 8 8 8 16 16 16 16 32 32 32 32 32 Leti Devices Workshop | Marc Duranton | December 4, 2016 Leti Devices Workshop | Marc Duranton | December 4, 2016 | 10

  11. DEEP LEARNING AND NEUROMORPHIC SYSTEMS AT LETI AND LIST Exploitation of Deep neural Networks • Image recognition, annotation and indexing Tools for fast and accurate Neural EXPLORATION & Network (NN) exploration & Architecture EXPLOITATION benchmarking: N2D2 • Neural Network exploration (including with IMPLEMENTATION spike coding and new materials) Diversity of implementations: • Software solution / GPU • Reconfigurable devices / FPGA Neuromorphic • Dedicated implementations Full CMOS and binary coding: P-NEURO • • Full CMOS and “spike coding” • Using new materials Leti Devices Workshop | Marc Duranton | December 4, 2016 Leti Devices Workshop | Marc Duranton | December 4, 2016 | 11

  12. N2D2 and P-Neuro: complete solution for Deep Learning in smart nodes � Fast benchmarking of Components Off The Shelf: � Parallel processors � GPU � FPGA (HLS) OpenMP OpenCL CUDA HLS FPGA Parallel CPU GPU FPGA � Performance of P-Neuro neural network processing unit � Example on Faces extraction, Target Frequency Energy � Database of 18000 images efficiency � Comparison of 5 different Quad ARM A7 900 MHz 380 images/W architectures Quad ARM A15 2000 MHz 350 images/W � Focus on energy efficiency � Expected performance of P-Neuro : Tegra K1 850 MHz 600 images/W � FDSOI 28nm, 1GHz Intel I7 3400 MHz 160 images/W � 1.8 TOPs/W, <0.5 mm 2 (4 cores) P-Neuro (FPGA) 100 MHz 2 000 images/W � Fully scalable from 1 to 1024 cores P-Neuro (ASIC) 500 MHz 125 000 images/W � Ready for integration in smart nodes Leti Devices Workshop | Marc Duranton | December 4, 2016 Leti Devices Workshop | Marc Duranton | December 4, 2016 | 12

  13. SPIKE-BASED CODING Spiking frequency Pixel brightness V 841 addresses 29x29 pixels f MIN Rate-based input coding f MAX t layer 1 layer 2 layer 3 layer 4 Correct Output Time Leti Devices Workshop | Marc Duranton | December 4, 2016 | 13

  14. THE PROMISES OF SPIKE-CODING NN Reduced computing complexity and natural temporal and spatial parallelism Simple and efficient performance tunability capabilities Spiking NN best exploit NVMs such as RRAM, for massively parallel synaptic memory Formal neurons Spiking neurons Base operation - Multiply- + Accumulate only Accumulate (MAC) Activation function - Non-linear + Simple threshold function Parallelism - Spatial + Spatial and temporal multiplexing multiplexing Two test chips implemented in 65nm Reptile: 3 tiles of 12 neurons Spider: 25 tiles of 12 neurons Advanced technology nodes Comparison of Analog and Digital neurons Gain of Analog neuron (less area) reduces → Curves cross at 22nm node Leti Devices Workshop | Marc Duranton | December 4, 2016 | 14

  15. DEEP LEARNING AND NEUROMORPHIC SYSTEMS AT LETI AND LIST Exploitation of Deep neural Networks • Image recognition, annotation and indexing Tools for fast and accurate Neural EXPLORATION & Network (NN) exploration & Architecture EXPLOITATION benchmarking: N2D2 • Neural Network exploration (including with IMPLEMENTATION spike coding and new materials) Diversity of implementations: • Software solution / GPU • Reconfigurable devices / FPGA Neuromorphic • Dedicated implementations Full CMOS and binary coding: P-NEURO • • Full CMOS and “spike coding” • Using new materials MATERIALS & DEVICES Take full advantage of advanced devices to break the density and power issues: • 3D integration, CoolCube TM . • RRAM, PCM and new devices, Leti Devices Workshop | Marc Duranton | December 4, 2016 Leti Devices Workshop | Marc Duranton | December 4, 2016 | 15

  16. 3D SPIKING NEURAL NETWORK Neural Networks Naturally 3D for 2D inputs, layers optimally distributed in stacked dies Vertical connections between layers: minimizes interconnect length, avoid routing congestion NEMESIS 3D two-layers SNN test chip 1 st layer: 48 macro-block neurons, 1024 synapses per neuron (49 152 total) 2 nd layer: 50 fully connected neurons, 2 400 synapses Nemesis Test Chip ALTIS 130nm CuCu bonding Two-layers 2D 3D SNN circuit Total area 7,97 3,63 (-54%) (mm²) Power (mW) 428 354 (-17%) Critical path 9,00 6,63 (-26%) (ns) [B. Belhadj, R. Heliot, P. Vivet, CASSES’2014] � 3D offers 2x better total area and 25% better power efficiency vs 2D � � � Leti Devices Workshop | Marc Duranton | December 4, 2016 | 16

  17. LEARNING FROM NEUROSCIENCE: A STDP (SPIKE TIMING DEPENDENT PLASTICITY) PRIMER Neuron STDP = correlation detector Electrical � Possible signal pre-synaptic post-synaptic Neuron Neuron learning model of Synapse Axon the brain? Dendrite Causality Synaptic weight Potentiation modification (%) (LTP) < t pre t post t post < t pre Anti-Causality Depression (LTD) Δt = t post - t pre Leti Devices Workshop | Marc Duranton | December 4, 2016 | 17

Recommend


More recommend