Lava 4 (relevant to take home exam) Stepping back to see the bigger picture Where can more info. be found? What are the hot research topics? 1
Prefix Given inputs x1, x2, x3 … xn Compute x1, x1*x2, x1*x2*x3, … , x1*x2 *…* xn Where * is an arbitrary associative (but not necessarily commutative) operator 2
Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall performance depends on making them fast But they should also have low power consumption... Parallel prefix is a good example of a connection pattern for which it is interesting to do better synthesis 3
Serial prefix inputs n=8 depth d=7 size s=7 (number ops) least most significant Pictures generated by symbolic evaluation of Lava descriptions Style is specific to parallel prefix 4
serr _ [a] = [a] serr op (a:b:bs) = a:cs where c = op(a,b) cs = serr op (c:bs) *Main> simulate (serr plus) [1..10] [1,3,6,10,15,21,28,36,45,55] 5
Sklansky 6
Sklansky 32 inputs, depth 5, 80 operators 7
skl _ [a] = [a] skl op as = init los ++ ros' where (los,ros) = (skl op las, skl op ras) ros' = fan op (last los : ros) (las,ras) = halveList as 8
Brent Kung fewer ops, at cost of being deeper. Fanout only 2 9
BK recursive pattern P is another half size network operating on only the thick wires 10
Ladner Fischer NOT the same as Sklansky; many books and papers are wrong about this (including slides from Digital Circuit Design course) 11
Question How do we design fast low power prefix networks? 12
Answer Generalise the above recursive constructions Use dynamic programming to search for a good solution User Wired to increase accuracy of power and delay estimations (see later lecture by Emil) 13
BK recursive pattern P is another half size network operating on only the thick wires This is an alternative view to the ”forwards and backwards trees ” that some of you saw in Jeppson’s course 14
BK recursive pattern generalised Each S is a serial network like that shown earlier 15
4 2 3 … 4 This sequence of numbers determines how the outer ” layer ” looks 16
4 2 3 … 4 sequence for widths of fans at bottom is closely related -1 +1 4 2 3 … 4 17
4 2 3 … 4 sequence for widths of fans at bottom is closely related 3 2 3 … 5 18
4 2 3 … 4 So just look at all possibilities for this sequence and for each one find the best possibility for the smaller P Then pick best overall! Dynamic programming 19
Search! need a measure function (e.g. number of operators) Very similar to a ” shortest paths ” algorithm 20
The real code! wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . . 21
The real code! wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where f1 is the measure function being prefix f = memo pm optimised for where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . . 22
The real code! wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm g is max width of small S and F networks. Controls fanout. where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . . 23
The real code! wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where context prefix f = memo pm delays in where wire numbers (positions) in pm ([d],_,w) = trywire ([d],w) allowed depth pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) (is,xs,w) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . . 24
The real code! wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) use memoisation to avoid pm (is,xs,w) = ((bestOnE xs is f).dropFail) expensive recomputation [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . . 25
The real code! wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) base case: single wire [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . . 26
The real code! wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] Fail if it is simply impossible where to fit a prefix network in the . . . . available depth 27
The real code! wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . . For each candidate sequence: Build the resulting network (where call of (prefix f) gives the best network for the recursive call inside) (Needed to think hard about controlling size of search space) 28
The real code! parpre f1 g ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([d],_,w) = trywire ([d],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC ds (prefix f)| ds <- topds g h (length is)] where . . . . Finally, pick the best among all these candidates 29
Result when minimising number of ops, depth 6, 33 inputs, fanout 7 This network is Depth Size Optimal (DSO) depth + number of ops = 2(number of inputs)-2 (known to be smallest possible no. ops for given depth, inputs) 6 + 58 = 2*33 – 2 30
64 inputs, depth 8, size 118 (also DSO) BUT not min. depth. We need to move away from DSO if we want shallow networks 31
A further generalisation 32
parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([],_,w) = trywire ([],w) pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis] 33
parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([],_,w) = trywire ([],w) pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) extra base case for 0 inputs pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis] 34
parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx) where prefix f = memo pm where pm ([],_,w) = trywire ([],w) pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w) pm (is,xs,w) = ((bestOnE xs is f).dropFail) [wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis] now there are 2 recursive calls 35
Result When minimising no. of ops: gives same as Ladner Fischer for 2^n inputs, depth n, considerably fewer ops and lower fanout elsewhere (non power of 2, deeper) Translates into low power plus decent speed when exported to Design Compiler 36
Link to Wired allows more accurate estimates. Can then explore design space 37
Can also export to Cadence SoC Encounter 38
Wired Start with Lava-like description and then gradually add placement info. + wiring ”guides” Can still use our bag of programming tricks (still embedded in Haskell) Quick but relatively accurate design exploration See lecture by Emil on thursday 39
Obvious questions This is very low level. What about higher up, earlier in the design? (Tentative assertion: these were general programming idioms with possible application at other levels of abstraction.) What about the cases when such a structural approach is inappropriate? Can we make refinement work? Can we design appropriate GENERIC verification methods? 40
Recommend
More recommend