Adding Slow- Silent Virtual Channels f or Low- Power On- Chip Networks Hiroki Mat sut ani (Keio Univ, J apan) Michihiro Koibuchi (NI I , J apan) Daihan Wang (Keio Univ, J apan) Hideharu Amano (Keio Univ, J apan)
I am very sorry… • My f light was canceled on April 6. • I was wait ing f or rebooking at airport f or seven hours, but I couldn’t get a t icket . I got a f ever. • I arrived at Newcast le on April 7. • I couldn’t f ind my baggage; I wore only a shirt . • My hot el reservat ion was canceled w/ o asking; I didn’t have a place t o sleep… • I went t o anot her hot el t o book a room in my shirt sleeves in t he rain. The f ever was gone up. • Ms. J erder kindly did her present at ion on Apr 8. • I would like t hank her and ASYNC/ NOCS program commit t ee.
Volt age and f requency scaling Power gat ing Adding Slow- Silent Virtual Channels f or Low- Power On- Chip Networks Hiroki Mat sut ani (Keio Univ, J apan) Michihiro Koibuchi (NI I , J apan) Daihan Wang (Keio Univ, J apan) Hideharu Amano (Keio Univ, J apan)
I ntroduction: Area and power • Due t o t he f inger process t echnology, – Area const raint is relaxed – But power densit y becomes more serious • Adding ext ra hardware resources (e.g., VCs) – We can get a perf ormance margin; so – We can reduce volt age and f requency t o reduce power VC# 0 VC# 0 VC# 0 VC# 1 VC# 1 VC# 1 I ssues t o be t ackled in t his present at ion VC# 2 VC# 2 VC# 2 • Adding ext ra hardware increases t he leakage power Rout er (a) Rout er (b) Rout er (c) • How much resource is required t o minimize t ot al power
Outline: Slow- silent virtual channels • Net work-on-Chip (NoC) • On-Chip Rout er – Archit ect ure and it s power consumpt ion • Slow-silent virt ual channels – Volt age and f requency scaling – Run-t ime power gat ing of virt ual channels – Adapt ive VC act ivat ion • Evaluat ions (1VC, 2VC, 3VC, and 4VC) – Throughput – Power consumpt ion (wit h PG & volt age f req scaling) – How many VCs are required t o minimize power
Network- on- Chip (NoC) • Processor core Processor core Rout er – Largest component – Various low-power t echniques are used e.g., St andby current 11uA [Ishikawa,IEICE’05] • On-chip rout er – Area is not so large – Always preparing (act ive) f or packet inj ect ion An example t ile archit ect ure The next slides show “Rout er archit ect ure” and “I t s power” (ASPLA 90nm CMOS)
On- Chip Router: Architecture • 5-input 5-out put rout er (dat a widt h is 64-bit ) Each VC has a FI FO Each physical buf f er (4 x 64 bit s) channel has 2 VCs ARBITER X+ X+ FIFO X- X- FIFO Y+ Y+ FIFO Y- Y- FIFO 5x5 XBAR CORE CORE FIFO HW amount is 34 kilo gat es and 64% of area is used f or FI FO
On- Chip Router: Pipeline • A header f lit goes t hrough a rout er in 3 cycles – RC (Rout ing comput at ion) – VSA (Virt ual channel / Swit ch allocat ion) – ST (Swit ch t raversal) A packet consist s of a header and 3 dat a f lit s • E.g., Packet t ransf er f rom rout er A t o C @ROUTER B @ROUTER C @ROUTER A RC VSA ST HEAD RC VSA ST RC VSA ST DATA 1 ST ST ST ST DATA 2 ST ST ST ST ST DATA 3 1 2 3 4 5 6 7 8 9 10 11 12 ELAPSED TIME [CYCLE]
On- Chip Router: Power consumption • Place-and-rout ed wit h 90nm CMOS • Post layout simulat ion at 200MHz Packet swit ching power is large � Volt age f req scaling Power consumpt ion of a rout er when n port s are used [mW] A rout er consumes more power as t he rout er processes more packet s
On- Chip Router: Power consumption Packet swit ching power is large � Volt age f req scaling Power consumpt ion when no port is used � st andby power Leakage (55.0%) Dynamic (45.0%) Channels (49.4%) is t he largest � Runt ime power gat ing Leakage of channel buf Standby power of the on-chip router
Outline: Slow- silent virtual channels • Net work-on-Chip (NoC) • On-Chip Rout er – Archit ect ure and it s power consumpt ion • Slow-silent virt ual channels – Volt age and f requency scaling – Run-t ime power gat ing of virt ual channels – Adapt ive VC act ivat ion • Evaluat ions (1VC, 2VC, 3VC, and 4VC) – Throughput – Power consumpt ion (wit h PG & volt age f req scaling) – How many VCs are required t o minimize power
Slow- Silent Virtual Channels • Adding ext ra VCs • Volt age & f requency – Perf ormance improves scaling (VFS) – Set t he reduced volt age and f requency – I n response t o t he Perf ormance margin perf ormance margin − α ( V V ) ∝ = ⋅ ⋅ ⋅ 2 th f P a C f V switching CV • Problem Lat ency vs. accept ed t raf f ic – Adding ext ra VCs 2-VC 3-VC 1-VC 4-VC increases leakage power – We can reduce volt age – I t may overwhelm VFS and f requency We f ocus on run-t ime power gat ing of VCs t o reduce leakage
Power Gating of virtual channels • Run-t ime power gat ing of virt ual channels No packet s in a VC � Sleep (t urn of f t he power supply) – Packet arrives at t he VC � Wakeup (t urn on t he power) – ARBI TER X+ X+ sleep X- X- sleep Y+ Y+ sleep Y- Y- sleep 5x5 XBAR CORE CORE sleep
Power Gating of virtual channels • Run-t ime power gat ing of virt ual channels No packet s in a VC � Sleep (t urn of f t he power supply) – Packet arrives at t he VC � Wakeup (t urn on t he power) – ARBI TER X+ X+ sleep X- X- sleep Y+ Y+ sleep Y- Y- sleep Link shut down has been st udied f or on- & of f -chip net works, 5x5 XBAR but prior work uses SRAM buf f ers [Chen,ISLPED’03] [Soteriou,TPDS’07] CORE CORE sleep � We use small regist ered FI FOs f or light -weight NoC rout ers
Power Gating: Various overheads Pipeline st all of a • Area overhead rout er occurs Sleep Active – Power swit ches FIFO FIFO • Perf ormance overhead Wait ing f or – Wakeup delay channel wakeup – Pipeline st all is caused � Frequent on/ of f should be avoided • Power overhead – Driving power swit ches – Short sleeps adversely increases dynamic power � Frequent on/ of f should be avoided
Power Gating: Various overheads Pipeline st all of a • Area overhead rout er occurs Sleep Active – Power swit ches FIFO FIFO • Perf ormance overhead Wait ing f or – Wakeup delay channel wakeup – Pipeline st all is caused Power switch � Frequent on/ of f should be avoided Vdd sleep • Power overhead – Driving power swit ches Virt ual Vdd – Short sleeps adversely Circuit block increases dynamic power � Frequent on/ of f should be avoided GND Cont rol t hat gradually act ivat es VCs in response t o workload
Power Gating: VC activation policy • Virt ual channel (VC) level power gat ing • Virt ual-channel select ion: – All packet s use VC# 0 when t hey are inj ect ed t o NoC – VC number is increased when t he packet conf lict s VC#0 VC#0 VC#0 VC#1 VC#1 VC#1 Only VC# 0 is used if workload is low VC#2 VC#2 VC#2 Rout er (a) Rout er (b) Rout er (c)
Power Gating: VC activation policy • Virt ual channel (VC) level power gat ing • Virt ual-channel select ion: – All packet s use VC# 0 when t hey are inj ect ed t o NoC – VC number is increased when t he packet conf lict s All VCs are act ivat ed if workload is high VC#0 VC#0 VC#0 VC#1 VC#1 VC#1 VC#2 VC#2 VC#2 Rout er (a) Rout er (b) Rout er (c) High peak perf ormance of VCs wit h t he least leakage power
Power Gating: Routing design • A virt ual-channel layer – A virt ual net work consist ing of VCs wit h t he same VC# • Deadlock-f reedom [Duato,TPDS’93] [Koibuchi,ICPP’03] – Moving upper t o lower layers VC# 0 � VC# 1 � VC# 2 � VC# 3 – Only bot t om layer must guarant ee deadlock-f reedom All VC layers except f or t he bot t om can employ any rout ing, VC# 0 VC# 0 VC# 0 VC Layer # 0 as f ar as t he bot t om guarant ees deadlock-f ree by it self VC# 1 VC# 1 VC# 1 VC Layer # 1 VC# 2 VC# 2 VC# 2 VC Layer # 2 VC# 3 VC# 3 VC# 3 VC Layer # 3 Rout er (a) Rout er (b) Rout er (c)
Outline: Slow- silent virtual channels • Net work-on-Chip (NoC) • On-Chip Rout er – Archit ect ure and it s power consumpt ion • Slow-silent virt ual channels – Volt age and f requency scaling – Run-t ime power gat ing of virt ual channels – Adapt ive VC act ivat ion • Evaluat ions (1VC, 2VC, 3VC, and 4VC) – Throughput – Power consumpt ion (wit h PG & volt age f req scaling) – How many VCs are required t o minimize power
Evaluations of slow- silent VCs • Preliminary • Process t echnology – Leakage modeling of PG – ASPLA 90nm CMOS – Breakeven point of PG – 1.00V (baseline) • Simulat ion paramet ers • Evaluat ion it ems – Original t hroughput Topology 2-D Mesh (8x8) – Power consumpt ion w/ o Rout ing DOR (XY rout ing) PG and VFS Buf f er size 4-f lit (WH swit ching) – Power consumpt ion w/ # of VCs 1VC, 2VC, 3VC, 4VC PG and VFS Lat ency 3-cycle per 1-hop • Traf f ic pat t erns • Which is t he best ? – Unif rom + NPB t races – 1VC, 2VC, 3VC, and 4VC (BT, SP, CG, MG, I S)
Recommend
More recommend