efficient generation of short and fast repeater tree
play

Efficient Generation of Short and Fast Repeater Tree Topologies - PowerPoint PPT Presentation

Efficient Generation of Short and Fast Repeater Tree Topologies Christoph Bartoschek, Dieter Rautenbach, Jens Vygen, Stephan Held Research Institute for Discrete Mathematics University of Bonn Aussois, 2006 The Repeater Tree Problem source


  1. Efficient Generation of Short and Fast Repeater Tree Topologies Christoph Bartoschek, Dieter Rautenbach, Jens Vygen, Stephan Held Research Institute for Discrete Mathematics University of Bonn Aussois, 2006

  2. The Repeater Tree Problem source sinks ◮ A signal has to be distributed from a source to a set of sinks. ◮ The delay on a source-sink path increases ◮ quadratically in the path length within the tree.

  3. The Repeater Tree Problem source sinks ◮ A signal has to be distributed from a source to a set of sinks. ◮ The delay on a source-sink path increases ◮ linearly in path length (assuming ideal repeater insertion).

  4. The Repeater Tree Problem source sinks ◮ A signal has to be distributed from a source to a set of sinks. ◮ The delay on a source-sink path increases ◮ linearly in path length (assuming ideal repeater insertion), ◮ with every bifurcation on the path.

  5. Importance of Repeater Trees ◮ As feature sizes decrease the wire resistances increase. ◮ More and more repeaters are needed: ◮ 10 − 20% repeaters in 130nm technology ◮ 20 − 30% repeaters in 90nm technology ◮ 30 − 40% repeaters in 65nm technology ◮ The speed, robustness and power consumption depend heavily on repeater insertion algorithms. ◮ Up to 30 Mio. instances are solved during timing closure. ⇒ Routines must be fast.

  6. The Repeater Tree Problem Input ◮ Repeater tree root-pin r with location Pl ( r ) ∈ R 2 . ◮ Set S of sink-pins s ∈ S with ◮ locations Pl ( s ) ∈ R 2 , ◮ required signal arrival times RAT s (w.l.o.g. AT r = 0), ◮ required signal parities + or − and ◮ input pin capacitances. ◮ A library L of repeaters (inverters and buffers of varying sizes) Output A repeater tree that connects r with all s ∈ S using wires and legally placed repeaters from L , such that the signal arrives with the correct parity at all s ∈ S .

  7. The Repeater Tree Problem Objectives ◮ Minimize power consumption ◮ Minimize wiring ◮ Maximize worst slack σ r , where σ r := min s ∈ S { RAT s − signal delay ( r , s ) }

  8. Previous Work ◮ Repeater insertion into given topology and a finite number of admissible locations L . ◮ Dynamic Programming with O ( |L| 2 ) running time (van Ginneken 1990). ◮ Running time was improved to O ( |L| log |L| ) (Shi and Li 2003, 2005).

  9. Previous Work ◮ Repeater insertion into given topology and a finite number of admissible locations L . ◮ Dynamic Programming with O ( |L| 2 ) running time (van Ginneken 1990). ◮ Running time was improved to O ( |L| log |L| ) (Shi and Li 2003, 2005). ◮ No satisfying solution exists for topology generation: ◮ Steiner Minimum Trees. Minimum power but poor delays due to long paths. ◮ Bounded radius Steiner trees. ◮ Heuristical splitting into critical and non-critical sub-trees.

  10. Our Contribution ◮ New topology generation: ◮ Balance between power and performance. A parameter ξ ∈ [0 , 1] allows scaling between power ξ = 0 and performance ξ = 1. ◮ Extremely fast. ◮ A linear time repeater insertion routine. Both parts are integrated into our delay optimization environment.

  11. Definition (Topology) A topology T is an arborescence rooted at r with δ + ( r ) = 1 and δ + ( u ) = 2 for all internal nodes u . The set of leaves is a subset of S . All internal nodes u are assigned placement coordinates Pl ( u ). Figure: Example of a topology

  12. Delay Model The delay from r to a sink s is modeled as: � c node · ( | E ( T [ r , s ] ) | − 1) + c wire · dist ( Pl ( u ) , Pl ( v )) ( u , v ) ∈ E ( T [ r , s ] ) ◮ c node : Delay penalty for bifurcation ◮ c wire : Delay per unit length ◮ Typical values are c node = 20 ps and c wire = 220 ps/mm.

  13. Delay Model - Example c wire = 1 , c node = 2.

  14. Justification of Delay Model Relation between critical path delays in our model (estimated delay) and after repeater insertion and exact timing analysis. 2 exact delay after buffering and sizing (ns) 1.5 1 0.5 0 0 0.5 1 1.5 2 estimated delay (ns)

  15. Bounds on Slack & Wire Length Lower Wire Length Bound A lower bound on the wire length is given by a SMT. Upper Slack Bound - Theorem The maximum possible slack σ max with respect to our delay model is at most: �� “ RATs − cwire dist ( Pl ( r ) , Pl ( s )) ” � − − c node · log 2 2 . cnode s ∈ S

  16. Proof. The maximum possible slack can be obtained by a topology T where all internal nodes share the root location: Pl ( u ) = Pl ( r ) ∀ internal nodes u . source All distance delays are minimum: c wire · dist ( Pl ( r ) , Pl ( s )) , ∀ s ∈ S .

  17. Proof. (continued) ◮ The problem reduces to: Find a topology that maximizes the worst slack with ◮ new sink locations Pl ′ ( s ) := Pl ( r ) ( ⇔ c wire = 0) and ◮ new required arrival times s := RAT s − c wire · dist ( Pl ( r ) , Pl ( s )) RAT ′ for all s ∈ S .

  18. Lemma For c wire = 0, c node = 1 and integer values for RAT s , s ∈ S , the maximum possible slack with respect to our delay model is at most � �� �� 2 − RAT s − log 2 . s ∈ S

  19. Proof of Lemma. ◮ Kraft’s inequality: There exists a rooted binary tree with n leeves at depths l 1 , l 2 , . . . , l n ⇔ n 2 − l i ≤ 1 . � i =1 ◮ Slack at root σ r is minimum over all sinks slacks ⇒ delay ( r , s ) = c node · ( | E ( T [ r , s ] ) | − 1) ≤ RAT s − σ r ∀ s ∈ S . = ⇒ The maximum slack achievable by any topology is bounded by � �� �� − RATs + σ − RATs � σ max = max { σ ∈ N | 2 ≤ 1 } = − c node log 2 2 cnode cnode . s ∈ S s ∈ S

  20. Improving the Upper Slack Bound Drawbacks of closed formula ◮ Closed formula ignores discrete structure of the problem. ◮ Computation creates numerical problems. Huffman Coding ◮ No closed formula. ◮ Slightly better bounds. ◮ Numerical stable and linear time computation.

  21. Topology Generation Algorithm Define criticality of s ∈ S by RAT s − c wire · dist ( Pl ( r ) , Pl ( s )); 1 Start with partial topology T ′ = { r , ∅} ; 2 Connect most critical sink s ∈ S to r . 3 while unconnected sinks exist do 4 Choose most critical unconnected sink s ∈ S \ V ( T ′ ); 5 Connect s to an arc e = ( u , v ) ∈ E ( T ′ ) such that 6 ξ · σ e + ( ξ − 1) · c wire · dist ( Pl ( s ) , Area ( e )) is maximized; end σ e is the slack at the root after connecting s to e . Area ( e ) is the area covered by the union of all shortest u − v -paths.

  22. Topology Generation Algorithm Define criticality of s ∈ S by RAT s − c wire · dist ( Pl ( r ) , Pl ( s )); 1 Start with partial topology T ′ = { r , ∅} ; 2 Connect most critical sink s ∈ S to r . 3 while unconnected sinks exist do 4 Choose most critical unconnected sink s ∈ S \ V ( T ′ ); 5 Connect s to an arc e = ( u , v ) ∈ E ( T ′ ) such that 6 1. ξ · σ e + ( ξ − 1) · c wire · dist ( Pl ( s ) , Area ( e )) and 2. − c wire · dist ( Pl ( s ) , Area ( e )) (iff ξ = 1) is maximized; end σ e is the slack at the root after connecting s to e . Area ( e ) is the area covered by the union of all shortest u − v -paths.

  23. Lemma For c wire = 0, c node = 1 , ξ > 0 and integer values for RAT s , s ∈ S , the algorithm generates a topology that realizes the maximum possible slack.

  24. Lemma For c wire = 0, c node = 1 , ξ > 0 and integer values for RAT s , s ∈ S , the algorithm generates a topology that realizes the maximum possible slack. Proof. Assume the sinks in S ′ ⊂ S are already connected optimally in T ′ . Let s ′ ∈ S \ S ′ . ◮ If all s ∈ S ′ have the same slack σ S ′ in T ′ . ◮ They are connected at maximum possible slack. ◮ The best possible slack for the set S ′ ∪ s ′ equals σ S ′ + 1. ◮ s ′ can be connected to any existing edge in T ′ such that its slack is ≤ σ S ′ + 1. ◮ Otherwise s ′ can be connected to any non-critical edge.

  25. Prim-Heuristic for Steiner Trees Wire Length Minimization ξ = 0: ◮ Instead of choosing next critical sink: ◮ Choose sink, which is closest to the preliminary topology T ′ . ◮ Well known heuristic existing in many variants. ⇒ 3 Hwang = 2 -approximation algorithm for SMT.

  26. Running Time The running time is O ( | S | 2 · Ψ), where Ψ is the running time for computing all shortest paths between a sink and a union of paths. (Ψ = 1 for l 1 -distances)

  27. Running Time The running time is O ( | S | 2 · Ψ), where Ψ is the running time for computing all shortest paths between a sink and a union of paths. (Ψ = 1 for l 1 -distances) Handling Large Instances ◮ Pre-clustering if | S | > 10 000 ◮ Facility location approximation [Massberg, Vygen 2005] ◮ Runtime: O ( | S | log | S | )

  28. Experimental Results ◮ 2.3 Mio. instances with up to 10 000 sinks were taken from current 90nm designs. ◮ The extreme cases ξ ∈ { 0 , 1 } are compared against 1. Length bound (SMT for | S | ≤ 30, heuristics for | S | > 30). 2. Slack bound (Huffman Coding). ◮ 4.6 Mio. topologies were computed in ≤ 100 seconds on a 2.6 GHz Opteron.

Recommend


More recommend