Sequential Timing Optimization Long path timing constraints • Data must not reach destination FF too late d max (i,j) s i + d(i,j) + T setup s j + P i j s s i i sj d(i,j) T setup Short path timing constraints • FF should not get >1 data set per period d min (i,j) s i + d min (i,j) s j + T hold i j s i i sj d min (i,j) T hold 1
Clock skew optimization • Another approach for sequential timing optimization • Deliberately change the arrival times of the clock at various memory elements in a circuit for cycle borrowing – For zero skew, delay from clock source to all FF s = T For zero skew delay from clock source to all FF’s = T – Positive skew of at FF k • Change delay from clock source to FF k to T + – Negative skew of at FF k • Change delay from clock source to FF k to T – • Problem statement: set skews for optimized performance Sequential timing optimization • Two “true” sequential timing optimization methods – Retiming: moving latches around in a design Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk – Clock skew optimization: deliberately changing clock arrival times so that the circuit is not truly “synchronous” Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk Delay Finding the optimal clock period using skews • Represented by the optimization problem below - solve for P and optimal skews minimize P subject to j (for all pairs of FF’s (i,j) connected by a combinational path) s i + d min (i,j) s j + Thold s i + d max (i,j) + T setup s j + P • If d max (i,j) and d min (i,j) are constant – linear program in the variables s i and P 2
Graph-based approaches • For a constant clock period P, the linear program = system of difference constraints s p - s q constant • As before, perform a binary search on P • For each value of P build an equivalent constraint graph f ( P ) i j • Shortest path in the constraint graph gives a set of skews for a given value of P • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations Retiming Assume unit gate delays, no setup times Comb Block 1 Comb Block 2 FF FF FF Clk lk Clk lk Clk lk Initial Circuit: P=3 FF FF FF Clk Clk Clk Retimed Circuit: P=2 Retiming: Definition • Relocation of flip-flops (FF’s) and latches (usually to achieve lower clock periods) • Maintain the latency of all paths in circuit, i.e., number of FF stages on any input-output path must remain unchanged FF stages on any input-output path must remain unchanged 3
Graph Notation of Circuit w(e uv ) = 2 u v u v delay = d(u) delay = d(v) w(e uv ) = #latencies between u and v r(u) is # latencies moved across gate u r(PI) = r(PO) = 0: Merge them both into a “host” node h with r(h) = 0 w r (e uv ) = w(e uv ) + r(v) - r(u) w r (e uv ) = 2 w(e uv ) = 1 u v u v r(v) = 2 r(u) = 1 For a path from v 1 to v k • Consider a path of vertices v 1 v 2 v 3 v k w 12 w 23 w 34 W k-1,k – Define w(v 1 to v k ) = w 12 + w 23 + … + w (k-1,k) D fi ( t ) + + + – After retiming, w r (v 1 to v k ) = w 12r + w 23r + … + w (k-1,k)r = [w 12 +r(2)–r(1)]+[w 23 +r(3)–r(2)]+[w 23 +r(3)–r(2)]+…+[w (k-1,k) +r(k)–r(k-1)] = w(v 1 to v k ) + r(k) – r(1) – For a cycle, v 1 = v k , which implies that w r = w for a cycle – In other words, retiming leaves the # latencies unchanged on any cycle Constraints for retiming • Non-negativity constraints (cannot have negative latencies) – w r on each edge must be non-negative – For any edge from vertex u to vertex v, w r (u,v) = w(u,v) + r(v) – r(u) 0 w r (u,v) w(u,v) r(v) r(u) 0 i.e., r(u) – r(v) w(u,v) • Period constraints (need a latency if path delay period) – (or more precisely, path delay + T setup period) – For any path from vertex v 1 to vertex v k , under clock period P, w r (v 1 to v k ) = w(v 1 to v k ) + r(v k ) – r(v 1 ) 1 if delay(v 1 to v k ) > P i.e., r(v 1 ) – r(v k ) w(v 1 to v k ) – 1 if delay(v 1 to v k ) > P 4
Comb Block 1 Comb Block 2 FF FF FF Example G1 G2 G3 G4 Clk Clk Clk • Circuit graph: • Non-negativity constraints – Vertex weights = gate delays 1. r(h) – r(G1) 0 – Edge weights = # latencies 2. r(G1) – r(G2) 0 3. r(G2) – r(G3) 0 h h 4. r(G3) – r(G4) 1 0 5. r(G4) – r(h) 0 0 0 • Period constraints for P = 2 6. r(h) – r(G3) -1 1 G1 G4 1 7. r(G1) – r(G3) -1 8. r(G2) – r(G4) 0 0 1 9. r(G2) – r(h) 0 0 G2 G3 1 1 Graph-based approaches • System of difference constraints r(u) – r(v) c • Equivalent constraint graph c v u • Shortest path in the constraint graph gives a set of valid r values for a given value of P (note that period constraints change for different values of P) • If P is infeasible, there will be a negative cycle in the graph that will be detected during shortest-path calculations Corresponding shortest path problem h • Find shortest path from host to get 0 0 – r(h) = 0 – r(G1) = 0 G1 G4 0 -1 0 – r(G2) = 0 r(G2) 0 0 1 – r(G3) = 1 – r(G4) = 0 0 G2 G3 -1 • This gives the solution Comb Block 1 Comb Block 2 FF FF FF FF FF FF Clk Clk Clk Clk Clk Clk 5
Overall scheme for minimum period retiming • Objective: to find a retiming that minimizes the clock period (the assignment of r values may not be unique due to slack in the shortest path graph!) – Binary search over P = [0,P unretimed ] – P unretimed = period of unretimed circuit = upper bound on optimal P p pp p unretimed – Range in some iteration of the search = [P min , P max ] – Build shortest path graph with non-negativity constraints (independent of P) – At each value of P • Add period constraints to shortest path graph (related to W, D matrices discussed in class – will not describe here) • Solve shortest path problem • If negative cycle found, set P min = P; else set P max = P • Iterate until range of P is sufficiently small Finding shortest paths • Dijkstra’s algorithm – O(VlogV + E) for a graph with V vertices and E edges – Applicable only if all edge weights are non-negative – The latter condition does not hold in our case! • • Bellman-Ford algorithm Bellman-Ford algorithm – O(VE) for a graph with V vertices and E edges – Outline for I = 1 to V – 1 for each edge (u,v) E update neighbor’s weights as r(v) = min[r(u) + d(u,v),r(v)] for each edge (u,v) E if r(u) + d(u,v) > r(v) then a negative cycle exists • Basic idea: in iteration I, update lowest cost path with I edges • After V – 1 iterations, if any update is still required, a negative cycle exists “Relaxation” algorithm for retiming • Perform a binary search on clock period P as before • At each value of P check feasibility as follows – Repeat V-1 times (where V = # vertices) 1 1. Set r(u) 0 for each vertex Set r(u) = 0 for each vertex 2. Perform timing analysis to find clock period of the circuit 3. For any vertex u with delay > P, r(u)++ 4. If no such vertex exists, P is feasible 5. Else, retime the circuit using these values of r; update the circuit and go to step 1 – If Clock period > P after V – 1 iterations, then P is infeasible 6
The retiming-skew relationship Comb Block 1 Comb Block 2 • Skew FF FF FF Clk Clk Clk Delay = 1 • Retiming R i i FF FF FF Clk Clk Clk • Both borrow one unit of time from Comb Block 2 and lend it to Comb Block 1 • Magnitude of optimal skew = amount of delay that the FF has to move across • Can be generalized for another approach to retiming Can move from skews to retiming • Moving a flip-flop across a gate • More generally, G – left right �increasing its skew by delay(G) s1 FF j – s2 s2 s j = max 1 i 4 (s i +MAX(i,j)) FF k s3 s4 s k = max 1 i 4 (s i +MAX(i,k)) Delay=d New skew = s+d Old skew=s – right left ��reducing its skew by delay(G) – Another approach to retiming • Two-phase approach – Phase A: Find optimal skews (complexity depends on the number of FF’s, not the number of gates) – Phase B: Relocate FF’s to retime circuit (since most FF movements are seen to be local in practice, this does not take too long) – Not provably better than earlier approach in terms of complexity, but practically works very well 7
Recommend
More recommend