CS137: Today Electronic Design Automation • Bit-Level – Addition – LUT Cascades • For Sums Day 9: January 30, 2006 – Applications • FSMs Parallel Prefix • SATADD • Data Forwarding • Pointer Jumping – Applications 1 2 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Ripple Carry Addition • Simple “definition” of addition • Serially resolve carry at each bit Introduction / Reminder Addition in Log Time 3 4 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon CLA Functions • Think about each • What functions can g(c[i-1]) be? adder bit as a – g(x)=1 computing a function • a[i]=b[i]=1 on the carry in – g(x)=x – C[i]=g(c[i-1]) • a[i] xor b[i]=1 – Particular function f will – g(x)=0 depend on a[i], b[i] • A[i]=b[i]=0 – G=f(a,b) 5 6 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 1
Functions Combining • What functions can g(c[i-1]) be? • Want to combine functions – Compute c[i]=g i (g i-1 (c[i-2])) – g(x)=1 Generate – Compute compose of two functions • a[i]=b[i]=1 • What functions will the – g(x)=x Propagate compose of two of these • a[i] xor b[i]=1 functions be? – g(x)=0 Squash – Same as before • Propagate, generate, • A[i]=b[i]=0 squash 7 8 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Compose Rules Compose Rules (LSB MSB) (LSB MSB) Compose Result Compose Result GG GG S GP GP G GS GS S PG PG G PP PP P PS PS S SG SG G SP SP S SS SS S 9 10 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Reduce Tree Combining • Do it again… • Combine g[i-3,i-2] and g[i-1,i] • What do we get? 11 12 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 2
Associative Reduce � Prefix Prefix Tree Prefix Tree • Shows us how to compute the Nth value in O(log(N)) time • Can actually produce all intermediate values in this time – w/ only a constant factor more hardware 13 14 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Parallel Prefix • Important Pattern Generalizing • Applicable any time operation is associative • Function Composition is always LUT Cascade associative 15 16 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Cascaded LUT Delay Model Parallel Prefix LUT Cascade? • Can we do better than N×Tmux? • Can we compute LUT cascade in O(log(N)) time? • Can we compute mux cascade using parallel prefix? • Tcascade =T(3LUT) + T(mux) • Don’t pay • Can we make mux cascade associative? – General interconnect – Full 4-LUT delay 17 18 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 3
Parallel Prefix Mux cascade Parallel Prefix Mux cascade • How can mux transform S � mux-out? • How can mux transform S � mux-out? – A=0, B=0 � mux-out=0 – A=0, B=0 � mux-out=0 Stop= S – A=1, B=1 � mux-out=1 – A=1, B=1 � mux-out=1 Generate= G – A=0, B=1 � mux-out=S – A=0, B=1 � mux-out=S Buffer = B – A=1, B=0 � mux-out=/S – A=1, B=0 � mux-out=/S Invert = I 19 20 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Parallel Prefix Mux cascade Two-mux transforms • SS � S • GS � S • BS � S • IS � S • How can 2 muxes transform input? • SG � G • GG � G • BG � G • IG � G • Can I compute 2-mux transforms from 1 • SB � S • GB � G • BB � B • IB � I mux transforms? • SI � G • GI � S • BI � I • II � B 21 22 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Generalizing mux-cascade Associative Reduce Mux-Cascade • How can N muxes transform the input? • Is mux transform composition associative? Can be hardwired, no general interconnect 23 24 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 4
Prefix Sum • Common Operation: For Sums – Want B[x] such that B[x]=A[0]+A[1]+…A[x] – For I=0 to x • B[x]=B[x-1]+A[x] 25 26 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Prefix Sum Other simple operators • Compute in tree fashion • Prefix-OR – A[I]+A[I+1] • Prefix-AND – A[I]+A[I+1]+A[I+2]+A[I+3] • Prefix-MAX – … • Prefix-MIN • Combine partial sums back down tree – S(0:7)+S(8:9)+S(10)=S(0:10) 27 28 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Find-First One Arbitration • Often want to find first M requestors • Useful for arbitration – E.g. Assign unique memory ports to first M – Finds first (highest-priority) requestor processors requesting – Also magnitude finding in numbers • Prefix-sum across all potential • How: requesters – Prefix-OR • Counts requesters, giving unique – Locally compute X[I-1]^X[I] number to each – Flags the first one • Know if one of first M – Perhaps which resource assigned 29 30 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 5
Partitioning Channel Width • Use something to order • Prefix sum on delta wires at each node – E.g. spectral linear ordering – To compute net channel widths at all points along channel – …or 1D cellular swap to produce linear order – E.g. 1D ordered • Maybe use with cellular placement scheme • Parallel prefix on area of units – If not all same area • Know where the midpoint is 31 32 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Rank Finding • Looking for I’th ordered element • Do a prefix-sum on high-bit only FA/FSM Evaluation – Know m=number of things > 01111111… • High-low search on result (regular expression recognition) – I.e. if number > I, recurse on half with leading zero – If number < I, search for (I-m)’th element in half with high-bit true • Find median in log 2 (N) time 33 34 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Finite Automata Function Specialization • Machine has finite state: S • But, this is just functions – …and function composition is associative • On each cycle • Given that we know input sequence: – Input I – I 0 ,I 1 ,I 2 … – Compute output and new state • Can compute specialized functions: • Based on inputs and current state – f i (s)=f(s,I i ) • O i ,S (i+1) =f(S i ,I i ) • What is f i (s)? • Intuitively, a sequential process – Worst-case, a translation table: • S=0 � NS0, S=1 � NS1 …. – Must know previous state to compute next – Must know state to compute output 35 36 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 6
Function Composition Recursive Function Composition • Now: O (i+m) ,S (i+m+1) = • Now: O (i+m) ,S (i+m+1) = f (i+m) (f (i+m-1) (f (i+m-2) (…f i (S i )))) f (i+m) (f (i+m-1) (f (i+m-2) (…f i (S i )))) • Can we compute the function • We can compute the composition composition? – f (i+1,i) (s)=f (i+1) (f i (s)) – f (i+1,i) (s)=f (i+1) (f i (s)) • Repeat to compute – What is f (i+1,i) (s)? – f (i+3,i) (s)=f (i+3,i+2) (f (i+1,i) (s)) • A translation table just like f i (s) and f (i+1) (s) – Etc. until have computed: f (i+m,i) (s) in • Table of size |S|, can fillin in O(|S|) time O(log(m)) steps 37 38 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Implications Saturated Addition • If can get input stream, • S (i+1) =max(min(I i +S i ,maxval),minval) – Any FA can be evaluated in O(log(N)) time • Could model as FSM with: – Regular Expression recognition in – |S|=maxval-minval O(log(N)) • So, in theory, FSM result applies • Any streaming operator with finite state • …but |S| might be 2 16 , 2 24 – Where the input stream is independent of the output stream – Can be run arbitrarily fast by using parallel- prefix on FSM evaluation 39 40 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon SATADD Composition SATADD Composition • Can compute composition efficiently [Papadantonakis et al. FPT2005] 41 42 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 7
SATADD Reduce Tree Data Forwarding UltraScalar From Henry, Kuszmaul, et al. ARVLSI’99, SPAA’99, ISCA’00 43 44 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Ultrascalar: concept model Consider Machine • Each FU has a full RF – FU=Functional Unit – RF=Register File • Build network between FUs – use network to connect produce/consume – user register names to configure interconnect • Signal data ready along network 45 46 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Ultrascalar Concept Ultrascalar: cyclic prefix • Linear delay • O(1) register cost / FU • Complete renaming at each FU – different set of registers – so when say complete RF at each FU, that’s only the logical registers 47 48 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 8
Recommend
More recommend