synthesis and optimization of domino logic
play

Synthesis and optimization of domino logic Min Zhao and Sachin - PDF document

Synthesis and optimization of domino logic Min Zhao and Sachin Sapatnekar Department of Electrical Engineering University of Minnesota Minneapolis, MN 55455 1 Outline I Introduction to domino logic I Domino logic synthesis flow I Technology


  1. Synthesis and optimization of domino logic Min Zhao and Sachin Sapatnekar Department of Electrical Engineering University of Minnesota Minneapolis, MN 55455 1 Outline I Introduction to domino logic I Domino logic synthesis flow I Technology mapping of domino logic I Timing-driven static-domino partitioning 2 1 1 1

  2. Basics of domino logic clk out precharge evaluation d y out z x T c,f T c,r T c,f + P clk d : dynamic node 3 Advantages of domino logic I Speed advantages – Reduced fighting during transitions – Fewer transistors per gate, lower capacitive load I Area advantages – Mainly consists of NMOS – N+4 transistors instead of 2N transistor per gate I Therefore, domino logic is widely used in high- performance circuit design. 4 2 2 2

  3. Disadvantages of domino logic I Disadvantages – Non-inverting nature may require logic duplication – Strict timing constraints – Charge sharing, noise susceptibility – High clock routing overhead I Need automated techniques considering these issues for domino circuit design 5 Domino logic synthesis flow Logic description(BLIF, Verilog) Timing constraints Technology independent optimization Clocking strategy Partitioning - static-domino, between clock phases Parameterized library technology mapping Timing verification and optimization Library layout synthesizer Noise verification and optimization Physical design 6 3 3 3

  4. Technology mapping of domino logic 7 What is technology mapping? I Implement input network with gates in a library. a b c d e f g h 8 4 4 4

  5. Parameterized library I Large NMOS pull-down network of domino gate. – Small short circuit current and small driven load. – No complementary part. – The delay overhead of inverter may offset the advantage of fast switch speeds in small gates. I Dramatical increase of library number with the increase of length(s) and width(p) of gate. – (s,p): (3,6): 6877; (4,4): 3503; (4,6): 222943 I A parameterized library is applied for technology mapping of domino logic. 9 Problem definition I A parameterized library I A collection of gates that satisfy the constraints on the width and height of the pull-down(pull-up) implementation of a gate. I Cell layout produced on the fly I Technology mapping of domino logic – Given I An optimized Boolean network I A constraint on the width and height of domino gates – Find I Minimum cost solution to the problem that nodes in the network are implemented in domino logic 10 5 5 5

  6. General technology mapping algorithm I Dynamic programming algorithm is applied. I At each network node – pattern matching – cost calculation for each possible matching I The cost will be large if the library is large. 11 Parameterized library mapping algorithm I Starting point I Given an arbitrarily optimized network I It is first unated I Then mapped into a two input AND-OR DAG I Then the DAG is decomposed into trees. I Complexity – space complexity: O(WHN) – time complexity: O(W 2 H 2 N) I W: maximum number of parallel chains I H: maximum number of series transistors I N: number of nodes in the tree 12 6 6 6

  7. Subsolutions I Subsolution space at each node. W H {S,P} S = 2, S ≤ H P = 3, P ≤ W I Each stored subsolution is optimal for its subtree under specified constraints I Physically, – {S,P}(S ≥ 1 & P ≥ 1) represents a segment of a domino pull-down whose height and width are S and P – {1,1} represents a complete domino gate or a PI. 13 Basic Operations I OR operation: S=max(S l , S r ), P=P l +P r I AND operation: S=S l + S r , P=max(P l , P r ) I PI / Gate formation operation: S=1, P=1 – A gate formation operation corresponds to a situation where the structure collected so far is converted to a domino gate with an output at that network node. clk * Gate formation AND clk PI PI 14 7 7 7

  8. Node data structure I Store the optimal subsolutions for all possible [height, width] combinations from [1,1] to [H,W]. I Each optimal subsolution can be represented as {S, P, C, {S l , P l }, {S r , P r }} I S (1 ≤ S ≤ H) is the maximum height of the current solution. I P (1 ≤ P ≤ W) is the maximum width of the current solution. I C is the cost. I {S l , P l }, {S r , P r } is the subsolutions of left and right child whose combination provides the minimal cost of subsolution {S,P} 15 Node data calculations I {S, P} (S ≥ 1 & P ≥ 1) subsolution at a parent node is obtained by combining optimal subsolutions at child nodes. I {1, 1} subsolution at a node is obtained from the subsolution of the same node whose cost is minimal. I The procedure consists of – Node constraint functions – Node cost functions 16 8 8 8

  9. Node cost functions I Here, cost is area -- the number of transistors. I Literal operation: C=C+1 – Literal operation corresponds to a primary input or a situation where a new domino structure is started after gate formation operation. I OR/AND operation: C=Literal(C l ) + Literal(C r ) I Gate formation operation: C=C min +4 – The minimal cost solution, C min is the minimal value out of all H*W optimal subsolutions – ‘4’ includes two clock control transistors + an inverter 17 Node mapping algorithm For each valid [height width] subsolution of the left child { for each valid [height width] subsolution of the right child{ {S,P}= Node constraint functions ({S l , P l }, {S r , P r }); if {S, P} was within the constraints (H, W) { C = Node cost functions (C l , C r ) if (C<C[S,P] min ) then C[S, P] min = C. if (C<C min ) then C min =C. } } } C[1,1] = Gate formation ( C min ) 18 9 9 9

  10. An example Of all (S,P) mapping subsolutions for the children only those with I minimal cost are stored {4,3,8} {4,2,15} 8,{2,2},{2,3} {3,3,13} 13,{2,1},{2,3} {S, P, C} AND {3,2,13} {3,1,18} {2,2,3} {2,1,18} {2,1,8} {1,1,12} C min =8 AND node: AND OR {1,1,7} C = Cl+Cr P = max(Pl,Pr) S = Sl+Sr {2,3,5} Or node: C = Cl+Cr OR {3,2,7} PI P = P l + P r {1,1,9} S = max(S l , S r ) {1,2,2} {1,1,0} Gate formation: {1,1,6} C = C min + 4 S = 1 19 P = 1 Wide domino gate I NAND, NOR gate can be used to replace inverter. – Break up large stacks of series transistors into parallel chains 20 10 10 10

  11. Wide AND/OR domino gate mapping I Enlarged subsolution space is used. W b 2H a H a c 2W I Region a: standard domino gate mapping I Region b: wide AND domino gate mapping I Region c: wide OR domino gate mapping 21 Dual-monotonic gate I A common dual-monotonic XOR gate. clk clk O=a XOR b O=a XNOR b a a a a b b clk clk I The presence of an XOR/XNOR function decomposes the input network into small mapping trees, which causes a larger area and delay cost. 22 11 11 11

  12. Dual-monotonic gate mapping I Recognize the XOR/XNOR logic of the network by pattern matching. I Perform the technology mapping on the AND/OR/XOR/ XNOR subject network, mapping AND/OR nodes to the standard domino gate and XOR/XNOR nodes to dual- monotonic gate. I Permitted mapping scheme. XOR/XNOR XOR/XNOR OTHER OTHER XOR/XNOR AND/OR NODES NODES 23 Implementation and results(1) I Execution time: < 10 seconds I Comparison with another domino mapper Circuits Our approach Prasad et al. Reduction #trans/#level #trans/#level % 289/6 328/7 13.5% c8 890/2 890/3 0% I6 C880 1056/9 1499/7 42.0% I Comparison of various mapping methods Circuits Basic mapping Wide AND/OR gate Dual-mono gate #trans/#level #trans/#level #trans/#level 1824/9 1824/9 1360/7 C1355 C1908 1978/18 1965/18 1588/14 k2 2884/16 2738/15 2884/16 24 12 12 12

  13. Experimental results I Domino mapping vs. static mapping Circuits Domino SIS: 44-3.genlib Reduction Dup-ratio #trans/#levels #trans/#levels % % i6 761/3 1194/5 36.3% 13% C1355 1360/7 1378/20 1.3% 77% 4002/20 3140/34 -27.5% 92% C3540 25 Partitioning: Motivation I Use domino gates to speed up parts of the circuit; remainder is implemented in static CMOS I Domino logic is typically multiphase I General clocking strategy CLK Latch on ph1 Latch on ph1 Latch on ph2 Domino chain Domino chain Static Static Evaluated in ph2 Evaluated in ph1 Precharged in ph1 Precharged in ph2 26 13 13 13

  14. Another consideration I Observation: duplication cost can be reduced by proper partitioning I An example static * CUT A * * CUT B * * domino c + + + + * * * * * * * * I In addition to the partitioning cost, implementation cost varies with partitions. 27 Problem definition I Static-domino partitioning problem – Given I An optimized combinational circuit I The delay specification on the output of the network – Implement the nodes with domino+static logic I Minimize the cost while meeting delay specs I Satisfy the precedence constraints that no static logic gate is permitted to fan out a domino gate I Two-way domino partitioning I Partition the domino implementation into two phases, with inverters permitted between the phases. 28 14 14 14

Recommend


More recommend