CS137: Previously Electronic Design Automation • Cover (map) LUTs for minimum delay – solve optimally • Retiming for minimum clock period Day 12: October 28, 2005 – solve optimally Covering and Retiming 1 2 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Today Example • Solving cover/retime separately not optimal • Cover+retime 3 4 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Example: Retimed Example 5 6 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon 1
Example: Retimed Example 2 Note : only 4 signals here (2 w/ 2 delays each) 7 8 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Example 2 Example 2: retimed Cycle Bound : 2 9 10 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Example 2: retimed Basic Observation • Registers break up circuit, limiting coverage Cycle Bound : 1 – fragmentation – prevent grouping 11 12 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon 2
Phase Ordering Problem Observation #1 • General problem – don’t know effect/results of other mapping • Retiming flops to input of (fanout free) step subgraph is trivial (and always doable) – Will see this many places • Here – don’t know delay (what can be packed into LUT) if retime first – If we do not retime first • fragmention: forced breaks at bad places 13 14 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Observation #1: Consequence Fanout Problem? • Can cover ignoring flop placement • Then retime flops to input Can I use the same trick here? 15 16 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Fanout Problem? Different Replication Problem Cannot retime without replicating. Replicating increases I/O (so cut size). 17 18 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon 3
Different Replication Problem Different Replication Problem Can now retime and cover with single LUT. 19 20 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Replication Replication • Once add registers • Key idea: – can’t just grab max flow and get replication – represent timing paths in graph • (compare flowmap) – differentiating based on number of registers in path • Or, can’t just ignore flop placement when have reconvergent fanout through – new graph : all paths from node to output flop have same number of flip-flops – label nodes u d where d is flip-flops to output 21 22 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Deal with Replication Example c • Expanded Graph: c 0 – start with target output node – for each input u to current expanded graph a b • grab its input edge (x → u) with weight (w(e)) • add node x (d+w(e)) to graph (if necessary) • add edge x (d+w(e)) → u d with weight (w(e)) i – continue breadth first until have enough j • enough for flow cut • at most |E|=k × n node depth required 23 24 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon 4
Example Example c c c 0 c 0 a 0 b 1 a 0 b 1 a b a b i 0 j 0 c 1 i j i j 25 26 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Example Example 2 a c c c 0 e e 0 b d a 0 b 1 c 0 d 0 a b a 1 a 0 b 0 b 1 i 0 j 0 c 1 i j a 1 b 2 i 1 j 1 i 0 j 0 27 28 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Expanded Graph Labeling • Expanded graph does not have fanout • Key idea #1: of different flip-flop depths from the – compute distances/delay like flowmap same node. • dynamic programming • Key idea #2: – count distance from register • Can now cover ignoring flip-flops and • like G-1/c graph trivially retime. 29 30 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon 5
Labeling: Edge Weights Labeling: Edge Weight Idea • To target clock period c • same idea: – use graph G-1/c – will need register ever c LUT delays – paper: – credit with registers as encounter • assign weight -c*w(e)+1 – charge a fraction (1/c) every LUT delay • (same thing scaled by c and negated) – know net distance at each point – if negative (delays > c*registers) • cannot distribute to achieve c – otherwise • labeling tells where to distribute 31 32 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Labeling: Flow cut LUT Map and Retime • Label node as before (flowmap) – L(v)=min{l(u)+w(e)| ∃ u → v} • Start with outputs – trivially can be L(v)-1/c == new LUT • Cover with LUT based on cut • Correspond to flowmap case: L(v)+1 – move flip-flops to inputs of LUT • note min vs. max and -1/c vs. +1 due to rescaling to match retiming formulation and G- • Recursively cover inputs 1/c graph • Use label to retime • in this formulation, a combinational circuit of – r(v)= ⎡ l(v) ⎤ +1/c depth 4 would have L(v)=-4/c – if can put this and all L(v)’s in one LUT • this can be L(v) • construct and compute flow cut to test 33 34 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Target Clock Period c Variations • As before (retiming) • Relaxation/Iteration – binary search to find optimal c – original computed labels iteratively • Flow cover – Cong+Wu/ICCAD96 showed can use flowmap-style min-cut • Find all k-cuts first – Pan+Liu/FPGA’98 35 36 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon 6
Summary Admin • Can optimally solve • Assignment 3A due today – LUT map for delay – Anyone looked at SAT programming? – retiming for minimum clock period • Class meets all next week (MWF) • But, solving separately does not give – Monday reading online optimal solution to problem – Wed. reading (hardcopy handout) • Account for registers on paths – Fri. reading in email last night • Label based on register placement and (flow) cover ignoring registers • Labeling gives delay,covering, retiming 37 38 CALTECH CS137 Fall2005 -- DeHon CALTECH CS137 Fall2005 -- DeHon Today’s Big Ideas • Exploit freedom • Cost of decomposition – benefit of composite solution • Technique: – dynamic programming – network flow 39 CALTECH CS137 Fall2005 -- DeHon 7
Recommend
More recommend