Self-Tuning Bio-Inspired Massively-Parallel Computing Steve Furber - PowerPoint PPT Presentation

Self-Tuning Bio-Inspired Massively-Parallel Computing Steve Furber The University of Manchester steve.furber@manchester.ac.uk EXADAPT Mar 2012 1

Outline • 63 years of progress • Many cores make light work • Building brains • The SpiNNaker project • The networking challenge • A generic neural modelling platform • Plans & conclusions EXADAPT Mar 2012 2

Manchester Baby (1948) EXADAPT Mar 2012 3

SpiNNaker CPU (2011) EXADAPT Mar 2012 4

63 years of progress • Baby: – filled a medium-sized room – used 3.5 kW of electrical power – executed 700 instructions per second • SpiNNaker ARM968 CPU node: – fills ~3.5mm 2 of silicon (130nm) – uses 40 mW of electrical power – executes 200,000,000 instructions per second EXADAPT Mar 2012 5

Energy efficiency • Baby: – 5 Joules per instruction • SpiNNaker ARM968: – 0.000 000 000 2 Joules per instruction 25,000,000,000 times (James Prescott Joule born Salford, 1818) better than Baby! EXADAPT Mar 2012 6

Moore’s Law Transistors per Intel chip Millions of transistors per chip 100 Pentium 4 Pentium III 10 Pentium Pentium II 486 1 386 286 0.1 8086 0.01 4004 8080 8008 0.001 1970 1975 1980 1985 1990 1995 2000 Year EXADAPT Mar 2012 7

…the Bad News • atomic scales • less predictable • less reliable EXADAPT Mar 2012 8

Multi-core CPUs • High-end uniprocessors – diminishing returns from complexity – wire vs transistor delays • Multi-core processors – cut-and-paste – simple way to deliver more MIPS • Moore’s Law – more transistors – more cores … but what about the software? EXADAPT Mar 2012 10

Multi-core CPUS • General-purpose parallelization – an unsolved problem – the ‘Holy Grail’ of computer science for half a century? – but imperative in the many-core world • Once solved – few complex cores, or many simple cores? – simple cores win hands-down on power-efficiency! EXADAPT Mar 2012 11

Back to the future • Imagine… – a limitless supply of (free) processors – load-balancing is irrelevant – all that matters is: • the energy used to perform a computation • formulating the problem to avoid synchronisation • abandoning determinism • How might such systems work? EXADAPT Mar 2012 12

Building brains • Brains demonstrate – massive parallelism (10 11 neurons) – massive connectivity (10 15 synapses) – excellent power-efficiency • much better than today’s microchips – low-performance components (~ 100 Hz) – low-speed communication (~ metres/sec) – adaptivity – tolerant of component failure – autonomous learning EXADAPT Mar 2012 14

Bio-inspiration • How can massively parallel computing resources accelerate our understanding of brain function? • How can our growing understanding of brain function point the way to more efficient parallel, fault-tolerant computation? EXADAPT Mar 2012 15

Building brains • Neurons • multiple inputs, single output (c.f. logic gate) • useful across multiple scales (10 2 to 10 11 ) • Brain structure • regularity • e.g. 6-layer cortical ‘ microarchitecture ’ EXADAPT Mar 2012 16

Spike Timing Dependent Plasticity EXADAPT Mar 2012 17

Learning patterns • Spot the pattern? Neuron ID Simulation time (msec) EXADAPT Mar 2012 18

Learning patterns • Now you see it! Neuron ID Simulation time (msec) EXADAPT Mar 2012 19

Learning patterns Delay after pattern input (ms) Simulation time EXADAPT Mar 2012 20

Self-tuning: in brains • With STDP, and no other re-inforcement • neurons learn the statistics of their inputs • and, with just a little mutual inhibition • populations distribute themselves across the range of presented inputs. • New inputs are interpreted against these learnt statistics. • Bayes would be very proud! Masquelier & Thorpe, 2007 EXADAPT Mar 2012 21

SpiNNaker project • Multi-core CPU node – 18 ARM968 processors – to model large-scale systems of spiking neurons • Scalable up to systems with 10,000s of nodes – over a million processors – >10 8 MIPS total EXADAPT Mar 2012 23

Design principles • Virtualised topology – physical and logical connectivity are decoupled • Bounded asynchrony – time models itself • Energy frugality – processors are free – the real cost of computation is energy EXADAPT Mar 2012 24

SpiNNaker system EXADAPT Mar 2012 25

CMP node EXADAPT Mar 2012 26

SpiNNaker chip Mobile DDR SDRAM interface EXADAPT Mar 2012 27

SpiNNaker SiP Multi-chip packaging by UNISEM Europe EXADAPT Mar 2012 28

Self-tuning: fault-tolerance • Strategy: for all components consider: – fault insertion – how do we test the FT feature? – fault detection – we have a problem! – fault isolation – contain the damage – reconfiguration – repair the damage • Goal: minimize performance deficit x time – real-time system, so checkpoint & restart inapplicable EXADAPT Mar 2012 29

Circuit-level fault-tolerance • Delay-insensitive comms – 3-of-6 RTZ on chip data – 2-of-7 NRZ off chip Rx Tx • Deadlock resistance ack – Tx & Rx circuits have high deadlock immunity – Tx & Rx can be reset independently din dout • each injects a token at reset (2 phase) (4 phase) • true transition detector filters surplus token ¬reset ¬ack EXADAPT Mar 2012 30

System-level fault-tolerance • Breaking symmetry – any processor can be Monitor Processor • local ‘election’ on each chip, after self -test – all nodes are identical at start-up • addresses are computed relative to node with host connection (0,0) – system initialised using flood-fill • nearest-neighbour packet type • boot time (almost) independent of system scale EXADAPT Mar 2012 31

Application-level fault-tolerance • Cross-system delay << 1ms – hardware routing – ‘emergency’ routing • failed links • congestion – permanent fault • reroute (s/w) EXADAPT Mar 2012 32

The networking challenge • Emulate the very high connectivity of real neurons • A spike generated by a neuron firing must be conveyed efficiently to >1,000 inputs • On-chip and inter-chip spike communication should use the same delivery mechanism EXADAPT Mar 2012 34

Network – packets • Four packet types – MC (multicast): source routed; carry events (spikes) – P2P (point-to-point): used for bootstrap, debug, monitoring, etc – NN (nearest neighbour): build address map, flood-fill code – FR (fixed route): carry 64-bit debug data to host • Timestamp mechanism removes errant packets – which could otherwise circulate forever Event ID (32 bits) Header (8 bits) T ER TS 0 - P Header (8 bits) Payload (32 bits) Address (16+16 bits) T SQ TS 1 - P Dest Srce EXADAPT Mar 2012 35

Network – MC Router • All MC spike event packets are sent to a router • Ternary CAM keeps router size manageable at 1024 entries (but careful network mapping also essential) • CAM ‘hit’ yields a set of destinations for this spike event – automatic multicasting • CAM ‘miss’ routes event to a ‘default’ output link Event ID 0 0 1 0 X 1 0 1 X 000000010000010000 001001 On-chip Inter-chip EXADAPT Mar 2012 36

Topology mapping Topology 72 14 Core 10 01 Synapse 03 06 10 1 12 07 Fragment of 11 2 MC table 15 9 07 8 02 7 23 0 23 3 23 3 23 72 - 72 2 72 2 94 0 94 3 94 2 0 0 72 2 3 2 2 4 6 5 6 3 3 3 3 01 09 2 Node 94 1 1 1 0 0 1 94 23 2 2 23 0 23 0 23 - 72 1 72 2 72 1 Problem graph (circuit) 94 0 94 - 94 2 EXADAPT Mar 2012 37

Problem mapping SpiNNaker: ...abstract problem topology... ...problem topology loaded into firmware routing tables... Problem: represented as a network of nodes with a certain behaviour... ...problem is split into two parts... ...compile, link... Our job is to make the model behaviour reflect reality ...binary files loaded into core ...behaviour of each node instruction memory... embodied as an interrupt handler in code... The code says "send message" but has no control where the output message goes EXADAPT Mar 2012 38

Bisection performance • 1,024 links – in each direction • ~10 billion packets/s • 10Hz mean firing rate • 250 Gbps bisection bandwidth EXADAPT Mar 2012 39

Self-Tuning Bio-Inspired Massively-Parallel Computing Steve Furber - PowerPoint PPT Presentation

Self-Tuning Bio-Inspired Massively-Parallel Computing Steve Furber The University of Manchester steve.furber@manchester.ac.uk EXADAPT Mar 2012 1 Outline 63 years of progress Many cores make light work Building brains The

Biologically I nspired Hardware System What is Bio-Inspired System? Why do we need

SELF TUNING MEMORY MANAGEMENT FOR DATA SERVERS By Sangeetha Sivaprakasam Introduction : 1)

Bio-PEPAd: Integrating exponential and deterministic delays Jane Hillston. LFCS and CSBE,

Bio-PEPAd: Integrating exponential and deterministic delays Jane Hillston. LFCS and SynthSys,

Immune Systems Companion slides for the book Bio-Inspired Artificial Intelligence: Theories, 1

RatSLAM: A Bio-inspired Approach to Robot Navigation Phil Bradfield University of Hamburg

Introduction to the Center for Bio-mediated and Bio-inspired Geotechnics (CBBG) by Edward

Bio-mimetic Robot Control Companion slides for the book Bio-Inspired Artificial Intelligence:

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

TUNING Russia: Development of master programmes in engineering education using the Tuning

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Elementary Particles Lecture 4 Niels Tuning Harry van der Graaf Niels Tuning (1) Thanks

One Resilience Noel L.J. Miranda, Bio-security/Bio-threats Preparedness Consultant ARF

Knowledge development and transfer of best practice on bio-safety/bio- security/bio-risk

Open Source Integration Options and Ideas for OpenVMS 1 18/05/2018 Abstract This talk will

Principles of High Load Peter Milne peter@aerospike.com @helipilot50 Wisdom vs Guessing

Open Source Project DataCentric Networking Eireann Leverett Cambridge University March 9,

Riding Apache Camel on Cloud willem.jiang@gmail.com blog: https://willemjiang.github.io weibo:

National Knowledge Network Overview 15 th December 14 rsm@nkn.in Experience life with 1000000000

RapidIO Overview March 2014 Barry Wood, Chair, RapidIO Technical Working Group

WHERE BLOCKCHAINS FAIL (AND WHY HPC IS OF NO HELP) MAARTEN VAN STEEN 2001 2007 2017

Insert Presentation Title: Insert Presentation Subtitle Your Name Point 1: Make your point here