| n    | [7]   | $O(n \log n)$ | $O(n \log \log n)$ | $O(n \log^* n)$ | O(n)    |
|------|-------|---------------|--------------------|-----------------|---------|
| 16   | 4.9   | 1.47          | 2.28               | 1.49            | 1.52    |
| 64   | 314.7 | 8.84          | 15.75              | 27.14           | 26.71   |
| 256  | 23255 | 45.96         | 76.55              | 169.58          | 169.42  |
| 4096 |       | 1041.90       | 2148.60            | 2597.75         | 2760.25 |

TABLE III Run Times of Equal-Width Width-Constrained Algorithms (Times Are in MS)

remaining algorithms. The  $O(n \log n)$  algorithm is recommended for use in practice unless the number of components in a stack is very much larger than 4096.

## VII. CONCLUSION

We have shown that while the equal-width height-constrained and equal-width width-constrained stack folding problems cannot be solved by applying the greedy method and parametric search, respectively, these methods can be successfully applied if the input is first normalized. Normalization can be done in linear time. Hence the overall complexity is determined by that of applying the greedy method or parametric search to the normalized data.

We have developed a linear time algorithm for the equal-width height-constrained problem. This compares very favorably (both analytically and experimentally) with the  $O(n^2)$  dynamic programming algorithm of [7].

For the equal-width width-constrained problem we have developed four algorithms of complexity  $O(n \log n)$ ,  $O(n \log \log n)$ ,  $O(n \log^* n)$ , and O(n), respectively. All compare very favorably with the  $O(n^3)$  dynamic programming algorithm of [7]. Experimental results indicate that the  $O(n \log n)$  algorithm performs best on practical size instances.

#### REFERENCES

- G. N. Frederickson and D. B. Johnson, "Finding kth paths and p-centers by generating and searching good data structures," J. Algorithms, vol. 4, pp. 61–80, 1983.
- [2] \_\_\_\_, "Generalized selection and ranking: Sorted matrices," SIAM J. Computing, vol. 13, pp. 14–30, 1984.
- [3] G. N. Frederickson, "Optimal algorithms for tree partitioning," in Proc. 2nd ACM-SIAM Symp. Discrete Algorithms, San Francisco, CA, Jan. 1991, pp. 168–177.
- [4] \_\_\_\_\_, "Optimal parametric search algorithms in trees I: Tree partitioning," Tech. Rep., CSD-TR-1029, Purdue University, IN, 1992.
- [5] E. Horowitz and S. Sahni, Fundamentals of Computer Algorithms. Maryland: Computer Science, 1978.
- [6] L. Larmore, D. Gajski, and A. Wu, "Layout placement for sliced architecture," *IEEE Trans. Computer-Aided Design Integrat. Circuits Syst.*, vol. 11, no. 1, pp. 102–114, Jan. 1992.
  [7] D. Paik and S. Sahni, "Optimal folding of bit sliced stacks," *IEEE*
- [7] D. Paik and S. Sahni, "Optimal folding of bit sliced stacks," *IEEE Trans. Computer-Aided Design Integrat. Circuits Syst.*, vol. 12, no. 11, pp. 1679–1685, Nov. 1993.
- [8] E. Shragowitz, L. Lin and S. Sahni, "Models and algorithms for structured layout," vol. 20, no. 5, *Computer Aided Design*. Norwood, NJ: Butterworth & Co., 1988, pp. 263–271.
- [9] E. Shragowitz, J. Lee, and S. Sahni, "Placer-router for sea-of-gates design style," in *Progress in Computer Aided VLSI Design*, vol. 2, G. Zobrist, Ed. London, U.K.: Ablex, 1990, pp. 43-92.
- [10] A. Wu and D. Gajski, "Partitioning algorithms for layout synthesis from register-transfer netlists," *IEEE Trans. Computer-Aided Design Integrat. Circuits Syst.*, vol. 11, no. 4, pp. 453–463, Apr. 1992.

## **Non-Tree Routing**

## Bernard A. McCoy and Gabriel Robins

Abstract—An implicit premise of existing routing methods is that the routing topology must correspond to a tree (i.e., it does not contain cycles). In this paper we investigate the consequences of abandoning this basic axiom, and instead we allow routing topologies that correspond to arbitrary graphs (i.e., where cycles are allowed). We show that non-tree routing can significantly improve signal propagation delay, reduce signal skew, and afford increased reliability with respect to open faults that may be caused by manufacturing defects and electro-migration. Simulations on uniformly distributed nets indicate that depending on net size and technology parameters, our non-tree routing construction reduces maximum sourse-sink SPICE delay by an average of up to 62%, and reduces signal skew by an average of up to 63%, as compared with Steiner routing. Moreover, up to 77% of the total wirelength in non-trees

### I. INTRODUCTION

Recent advances in VLSI technology have steadily improved chip packing densities. As feature sizes decrease, device switching speeds tend to increase; however, thinner wires have higher resistance, causing signal propagation delay through the interconnect to increase [1]. Thus, interconnection delay has a greater impact on circuit speed, being responsible for up to 70% of the clock cycle in the design of dense, high-performance circuits [19]. In light of this trend, performance-driven routing has become central to the design of leading-edge digital systems [14].

Minimum spanning trees with bounded source-sink pathlengths were proposed in [9]. Boese *et al.* [5] have developed a "critical sink" routing approach which significantly reduces delay to specified sinks, thereby exploiting the critical-path information that is implicitly available during iterative timing-driven layout. More recently, Boese *et al.* [4], [3] have identified and exploited a high-quality, algorithmically tractable model of interconnect delay, based on an upper bound [18] for Elmore delay.

An implicit premise of previous methods is that a routing topology must correspond to a tree (i.e., an acyclic topology). In retrospect, this assumption seems natural, since a tree topology spans a net and thus achieves electrical connectivity using a minimum number of edges/wires. In this paper, we question this seemingly basic axiom, and investigate the consequences of allowing cycles in the routing. Thus, we recast the routing problem into a new formulation where the interconnection topology may correspond to an arbitrary graph.

At this point, the reader may ask: how can adding extra wires to an existing routing tree improve signal propagation delay? The answer lies in the tradeoff between the overall circuit capacitance and the source-sink resistances. Clearly, adding extra wires to a routing tree increases the overall capacitance; however, the added wires may significantly lower certain source-sink resistance values, which may in turn reduce source-sink signal propagation delays. The key observation is that often the decrease in resistance more than compensates for the associated increase in capacitance, especially in sub-micron technologies; a simple example of this phenomenon is illustrated in Fig. 1. Based on these observations, we propose a

Manuscript received December 16, 1993; revised January 6, 1995. This work was supported by NSF Young Investigator Award MIP-9457412. This paper was recommended by Associate Editor M. Sarrafzadeh.

B. A. McCoy is with GE Fanue, Charlottesville, VA 22901 USA.

G. Robins is with the Department of Computer Science, University of Virginia, Charlottesville, VA 22903 USA. IEEE Log Number 9410372.

0278-0070/95\$04.00 © 1995 IEEE

Fig. 1. An example of how adding an extra edge to the minimum spanning tree on the left (a) can yield the routing topology with reduced interconnect delay on the right (b); in this example, routing topology (a) has maximum source-sink SPICE delay of 1.3 ns, while the topology on the right has a SPICE delay of 1.0 ns, a 23% improvement (at a total wirelength penalty of only 9%). The interconnect parameters used correspond to a MOSIS 0.8  $\mu$ m CMOS process.

new non-tree routing scheme. Extensive SPICE simulations indicate that depending on net size and technology parameters, our non-tree routings reduce signal propagation delay by an average of up to 62% as compared to traditional Steiner routing.

With decreasing VLSI feature size, open faults in the interconnect (i.e., discontinuities in wires) tend to occur due to manufacturing defects and electro-migration, with both these problems becoming increasingly acute in the submicron regimes [11], [15]. Previous routing techniques do not address this issue of the *reliability* of the routing, i.e., the ability of the interconnect to tolerate open faults. In contrast, our non-tree routing techniques can tolerate an open fault along the majority of the wires. Another benefit of non-tree routings is that they improve signal skew by an average of up to 63% over Steiner routing, as well as reduce signal reflection [6]. Finally, our basic approach is amenable to numerous extensions such as routing with critical sinks [5].

The rest of our paper is organized as follows. Section II gives basic definitions, formalizes the problem of constructing optimal-delay interconnection graph topologies, and discusses the delay models. In Section III we present a non-tree routing heuristic. Section IV discusses the experimental results, and we conclude in Section V. A preliminary version of this work has appeared in [16].

### II. PROBLEM FORMULATION

A signal net  $N = \{n_0, n_1, \dots, n_k\}$  is a fixed set of pins in the Manhattan plane to be connected by a routing graph G = (N, E). Pin  $n_0 \in N$  is a source (i.e., where the signal originates), and the remaining pins are sinks (i.e., where the signal propagates to). Each edge  $e_{ij} \in E$  has an associated edge cost,  $d_{ij}$ , equal to the Manhattan distance between its two endpoints  $n_i$  and  $n_j$ ; the cost (or total wirelength) of G is the sum of its edge costs. We use  $t(n_i)$ to denote the signal propagation delay from the source to pin  $n_i$ . Our goal is to construct a routing topology which spans the net and which also minimizes the maximum source-sink delay.

Optimal Routing Graph (ORG) Problem: Given a signal net  $N = \{n_0, n_1, \dots, n_k\}$  with source  $n_0$ , find a set S of Steiner points and construct a routing graph  $G = (N \cup S, E), E \subseteq (N \cup S) \times (N \cup S)$ , such that,  $(G) = \max_{k=1}^{k} t(n_k)$  is minimized.

We note that the ORG problem is NP-complete by observing that if the *R/C* ratio is sufficiently small, the optimal solution to the ORG problem will be a *tree* with least wirelength—i.e., an optimal Steiner tree. The ORG formulation easily extends to address critical sinks, by associating a *criticality*  $\alpha_i \ge 0$  with each sink  $n_i$ , reflecting timing information obtained during the performance-driven placement phase [4], [5]. The goal would then be to construct a routing which minimizes the weighted sum of the sink delays  $\sum_{i=1}^{k} \alpha_i \cdot t(n_i)$ . The specific routing graph G that solves the ORG problem will depend on the model used to estimate the delay t(G), as well as on the particular technology parameters. Ideally, we would like to compute and optimize delay according to the complete physical attributes of the circuit. To this end, we could use the circuit simulator SPICE [17], which is generally regarded as the best available tool for obtaining a precise, complete measure of interconnect delay.

Unfortunately, SPICE delay is too computationally prohibitive to evaluate during the routing phase of layout, and we are thus compelled to seek other alternatives. Another delay model is the Elmore delay formula [10], which was shown in [2] to have both high accuracy and fidelity in comparison with SPICE. The Elmore delay is defined as follows. Given routing tree T rooted at  $n_0$ , let  $e_i$  denote the edge from pin  $n_i$  to its parent. The resistance and capacitance of edge  $e_i$  are denoted by  $r_{e_i}$  and  $c_{e_i}$ , respectively. Let  $T_i$  denote the subtree of T rooted at  $n_i$ , and let  $c_i$  denote the sink capacitance of  $n_i$ . We use  $C_i$  to denote the *tree capacitance* of  $T_i$ , namely the sum of sink and edge capacitances in  $T_i$ . Using this notation, the Elmore delay along edge  $e_i$  is equal to  $r_{e_i} (c_{e_i}/2 + C_i)$ . Let  $r_d$  denote the output driver resistance at the net's source. Then the Elmore delay  $t_{\rm ED}(n_i)$  from source  $n_0$  to sink  $n_i$  is given by:  $t_{\rm ED}(n_i) = r_d \cdot C_{n_0} + \sum_{i,j \in \text{path}(n_0,n_i)} r_{e_j} \cdot (c_{e_j}/2 + C_j)$ . We can extend the  $t_{\rm ED}$  function to entire trees by defining

We can extend the  $t_{\rm ED}$  function to entire trees by defining  $t_{\rm ED}(T) = \max_{i=1}^{k} t_{\rm ED}(n_i)$ . Because of its relatively simple form, Elmore delay can be calculated in O(k) time [18]. However, while the basic Elmore delay model outlined above applies only to tree topologies, Chan and Karplus have extended it to *RC* meshes [7]. Their method partitions the graph into a spanning tree and a set of *m* additional edges, then adds the extra edges back, updating the Elmore delay at each step. This increases the time complexity of the Elmore delay calculation to  $O(k \cdot m)$ . We use this method of delay calculation for general *RC* meshes in our approximation heuristic for the ORG problem (note that the work of [7] proposes a delay estimator but does not give a specific routing method).

Although the ORG problem formulation seeks to optimize delay, our solution below has additional benefits as well. For example, routings produced by our algorithm also have substantially reduced skew (i.e., maximum difference in signal propagation delay between any two terminals), which is important since a large signal skew diminishes the system performance [8]. Moreover, non-tree routings produced by our method afford circuit reliability in the sense that they are able to tolerate open faults due to manufacturing defects and electro-migration. This increase in reliability is quantified more precisely in Section IV.

# III. LOW DELAY ROUTING GRAPH HEURISTIC

After testing a number of candidate heuristics for the ORG problem, the most effective method that we found is as follows: starting with a Steiner tree topology, we search for a new edge to add, so that the maximum source-sink delay in the resulting routing graph will be minimized. This edge is then added to the routing graph, and the process is iterated (i.e., we look for yet another good edge to add). We terminate when no further delay improvement is possible; thus, the maximum source-sink delay of the routing produced by our algorithm is guaranteed to be no worse than that of the initial routing (and typically considerably better). Steiner points are allowed as junctures in the routing, in order to afford further opportunity for both delay and wirelength optimization, and we use the delay estimation method of [7] to guide our search inside the inner loop of this construction. An execution example of this method, called the Low Delay Routing Graph (LDRG) algorithm, is shown in Fig. 2, while the formal statement of the algorithm is given in Fig. 3.



Fig. 2. An execution of LDRG algorithm on a random 10-pin net. The Steiner tree shown on the left (a) has SPICE delay of 2.8 ns (Steiner points are square), while the LDRG routing on the right (b) has SPICE delay of 1.9 ns, a 32% improvement (the wirelength increase is 25%).

| Low Delay Routing Graph (LDRG) Algorithm                                        |
|---------------------------------------------------------------------------------|
| <b>Input:</b> signal net N with source $n_0 \in N$                              |
| <b>Output:</b> low-delay routing graph $G = (\hat{N}, E)$                       |
| <b>Compute</b> a Steiner routing $G = (\hat{N}, E)$ over $\hat{N} = N \cup S$ , |
| where $S$ are the possible Steiner points,                                      |
| and $E \subseteq \hat{N} \times \hat{N}$ is the set of Steiner tree edges       |
| While there is an edge $e_{ij} \in \hat{N} \times \hat{N}$                      |
| such that $t((\hat{N}, E \cup \{e_{ij}\})) < t(G)$ is minimized                 |
| <b>Do</b> $G = (\hat{N}, E \cup \{e_{ij}\})$                                    |
| Output resulting routing topology G                                             |

Fig. 3. The low delay routing graph algorithm.

TABLE I TECHNOLOGY PARAMETERS FOR THREE COMMON CMOS IC PROCESSES AND A TYPICAL MCM PROCESS

| Technology                       | IC1             | IC2      | IC3   | MCM              |
|----------------------------------|-----------------|----------|-------|------------------|
|                                  | 2.0µ            | $1.2\mu$ | 0.8µ  |                  |
| driver resistance $(\Omega)$     | 164             | 212      | 270   | 25               |
| wire resistance $(\Omega/\mu m)$ | 0.033           | 0.073    | 0.112 | 0.008            |
| wire capacitance $(fF/\mu m)$    | 0.019           | 0.022    | 0.039 | 0.06             |
| sink loading capacitance $(fF)$  | 5.70            | 7.06     | 1.00  | 1000             |
| layout area (mm <sup>2</sup> )   | 10 <sup>2</sup> | 102      | 102   | 100 <sup>2</sup> |

### IV. EXPERIMENTAL RESULT

We implemented the LDRG algorithm using C in the UNIX/Sun environment; code is available from the authors upon request. We ran trials on sets of 100 random nets for each of several net sizes, with pin locations uniformly distributed in a square layout region. Although internally the LDRG method uses the extension of the Elmore delay formula to graphs [7], for greater accuracy and realism, we used SPICE3e2 [17] to evaluate the performance of the actual routings produced by LDRG.

We used SPICE parameters that are representative of typical MOSIS 0.8  $\mu$ m, 1.2  $\mu$ m, and 2.0  $\mu$ m CMOS IC processes, as well as a typical MCM technology (see Table I). Our SPICE delay model assumes constant resistance and capacitance per unit length of interconnect (i.e., both resistance and capacitance are proportional to wirelength). In addition, sink loading capacitances were used at all the pins to model loads driven by the interconnect. In our LDRG implementation, the initial Steiner routing tree was computed using an efficient implementation of the Iterated 1-Steiner algorithm of Kahng and Robins [13], which is known to yield near-optimal Steiner trees [12].

Fig. 4(a) shows the percent improvement in maximum source-sink delay over Steiner routing. Significant improvment is observed for example, in the IC3 (0.8  $\mu$ m CMOS) technology for 20-pin nets, where on average LDRG wins over Steiner routing by 27% while incurring a 14% wirelength penalty. Even larger improvement is seen in the MCM technology, with a performance improvement of 44% for 10-pin nets and of 62% for 20-pin nets. Note that the percent improvement in delay is consistently greater than the percent increase

TABLE II

A DETAILED SUMMARY OF THE PERFORMANCE OF LDRG. FOR EACH TECHNOLOGY AND NET SIZE, 100 RANDOM NETS WERE GENERATED USING A UNIFORM DISTRIBUTION; SHOWN ÅRE THE AVERAGE PERCENT IMPROVEMENTS OVER STEINER ROUTING FOR DELAY AND SKEW. FOR COST, THE AVERAGE PERCENT INCREASE IN WIRELENGTH IS SHOWN. FOR RELIABILITY, WE GIVE THE PERCENTAGE OF TOTAL WIRELENGTH THAT LIES

ON CYCLES IN THE ROUTING TOPOLOGY, I.E., THAT CAN TOLERATE AN OPEN FAULT, "PERCENT WINNERS" IS THE PERCENT OF CASES

WHERE LDRG IMPROVED UPON THE INITIAL STEINER ROUTING, AND THE "WINNERS ONLY" STATISTICS ARE AVERAGES OVER ONLY THE WINNERS

| LDRG Algorithm Statistics |      |           |      |      |             |         |              |      |      |             |  |
|---------------------------|------|-----------|------|------|-------------|---------|--------------|------|------|-------------|--|
|                           | net  | All Cases |      |      |             | Percent | Winners Only |      |      |             |  |
|                           | size | Delay     | Cost | Skew | Reliability | Winners | Delay        | Cost | Skew | Reliability |  |
|                           | 5    | 0         | 0    | 0    | 0           | 0       | 0            | 0    | 0    | 0           |  |
| IC1                       | 10   | 9         | 2    | 10   | 7           | 15      | 53           | 8    | 60   | 67          |  |
|                           | 20   | 15        | 5    | 25   | 20          | 45      | 32           | 9    | 54   | 49          |  |
|                           | 5    | 0         | 0    | 0    | 0           | 0       | 0            | 0    | 0    | 0           |  |
| IC2                       | 10   | 12        | 7    | 13   | 17          | 35      | 33           | 18   | 51   | 58          |  |
|                           | 20   | 40        | 17   | 40   | 45          | 100     | 40           | 17   | 40   | 45          |  |
|                           | 5    | 8         | 7    | 9    | 36          | 35      | 22           | 38   | 8    | 44          |  |
| IC3                       | 10   | 17        | 13   | 26   | 33          | 85      | 20           | 36   | 16   | 42          |  |
|                           | 20   | 27        | 14   | 43   | 47          | 100     | 27           | 43   | 14   | 47          |  |
|                           | 5    | 38        | 92   | 40   | 87          | 100     | 38           | 92   | 40   | 87          |  |
| MCM                       | 10   | 44        | 73   | 44   | 79          | 100     | 44           | 73   | 44   | 79          |  |
|                           | 20   | 62        | 50   | 63   | 77          | 100     | 62           | 50   | 63   | 77          |  |

| TABLE III |                                                                |  |  |  |  |  |  |  |  |
|-----------|----------------------------------------------------------------|--|--|--|--|--|--|--|--|
| A         | SUMMARY OF HOW MANY EDGES WERE ADDED BY THE LDRG ALGORITHM,    |  |  |  |  |  |  |  |  |
|           | SHOWN AS A PERCENT OF THE TOTAL NUMBER OF THE 100 CASES TESTED |  |  |  |  |  |  |  |  |

|     | Nui      | nber o | of Ed | ges a | addee | l by | LDI | ₹G       |    |    |    |
|-----|----------|--------|-------|-------|-------|------|-----|----------|----|----|----|
|     | net size | 0      | 1     | 2     | 3     | 4    | 5   | 6        | 7  | 8  | 9  |
|     | 5        | 100    |       |       |       |      |     |          |    |    |    |
| IC1 | 10       | 85     | 15    |       |       |      |     |          |    |    |    |
|     | 20       | 55     | 45    |       |       |      |     |          |    |    |    |
|     | 5        | 100    |       |       |       |      |     |          |    |    |    |
| IC2 | 10       | 65     | 35    |       |       |      |     |          |    |    |    |
|     | 20       | 0      | 75    | 25    |       |      |     |          |    |    |    |
|     | 5        | 65     | 35    |       |       |      |     |          |    |    |    |
| IC3 | 10       | 15     | 85    |       |       |      |     |          |    |    |    |
|     | 20       | 0      | 75    | 15    | 10    |      |     |          |    |    |    |
|     | 5        | 0      | 80    | 20    |       |      |     |          |    |    |    |
| MCM | 10       | 0      | 5     | 15    | 25    | 55   |     |          |    |    |    |
|     | 20       | 0      | 30    | 10    | 5     | 5    | 10  | <b>5</b> | 10 | 10 | 15 |

in wirelength across all IC technologies and net sizes. The detailed data is given in Table II, and Table III shows how many extra edges are typically added to the initial topology by the LDRG algorithm.

We tallied the number of cases where LDRG was able to improve upon the initial Steiner routing (see Fig. 4(b)). We observe that the number of improvable cases increases with the net size, and approaches 100% for 20-pin nets in all technologies; for MCM routing, LDRG seems superior to Steiner routing for all net sizes. Although the wirelength penalty may seem high at first glance, we note that only a small fraction of nets need to be routed as non-trees, i.e., only the nets that contain critical paths; other, noncritical nets may still be routed using traditional Steiner routing, and thus the *overall* wirelength penalty for the entire circuit when using non-tree routing is much smaller than that indicated by the data.

An additional benefit of non-tree routing is a significant reduction in signal skew (i.e., the maximum difference between signal arrival times at any two pins). Fig. 4(d) shows the average percent improvment in signal skew over Steiner routing. For example, for 20-pin nets, LDRG yields 63% skew reduction for MCM and 43% skew reduction for IC3.

To quantify the increase in reliability afforded by non-tree routing, we measured the average percentage of the total wirelength that lies on cycles in the non-tree topology. This corresponds to the percentage of the total routing wirelength that can tolerate an open fault due to manufacturing defects or electro-migration, and is illustrated in Fig. 4(e). For example, for 20-pin nets under the MCM technology, 77% of the wirelength can tolerate an open fault, and for IC3, 47% of the wirelength is fault-tolerant.



Fig. 4. Averages for 100 uniformly distributed nets. (a) Maximum source-sink delay improvement over Steiner routing. (b) Percent of instances where LDRG improved upon the initial Steiner routing. (c) Wirelength increase over Steiner routing. (d) Signal skew improvement over Steiner routing. (e) Routing reliability afforded by LDRG (i.e., average percent of the total wirelength per net that can tolerate an open fault).

# V. CONCLUSION

We have explored the consequences of abandoning an implicit restriction common to previous routing formulations, namely the insistence on a strictly acyclic (tree) routing topology. Instead, we reformulated the routing problem as one of constructing a routing graph with low maximum source-sink delay. We have shown that adding a few extra wires to an initial routing tree can significantly improve signal propagation delay, by exploiting the tradeoff between circuit capacitance and path resistance. In particular, depending on net size and technology, non-tree routing can improve the average signal propagation delay by up to 62% over traditional Steiner routing. We have also shown that our non-tree routing technique also improves signal skew and the interconnect reliability, in the sense that non-tree routing can tolerate open faults due to manufacturing defects and electro-migration.

### ACKNOWLEDGMENT

We are grateful to Dr. B. Grafton of NSF for his support and encouragement. We would also like to thank Professors P. Chan and K. Karplus for valuable discussions and for the use of their code in computing Elmore delay for general *R/C* networks. Parasitics for the three IC technologies were provided by the MOSIS group, while the MCM interconnect parasitics are courtesy of Professor W. W.-M. Dai and the AT&T Microelectronics Division. Finally, we would like to thank the anonymous referees for their helpful comments. Our benchmarks are available on the World Wide Web at URL http://uvasc.cs.virginia.edu/ robins/.

### REFERENCES

- H. Bakoglu, Circuits, Interconnections and Packaging for VLSI. Reading, MA: Addison-Wesley, 1990.
- [2] K. D. Boese, A. B. Kahng, B. A. McCoy, and G. Robins, "Fidelity and near-optimality of elmore-based routing constructions," in *Proc. IEEE Int. Conf. Computer Design*, Cambridge, MA, Oct. 1993, pp. 81–84.
- [3] \_\_\_\_\_, "Rectilinear Steiner trees with minimum elmore delay," in Proc. ACM/IEEE Design Automation Conf., San Diego, CA, June 1994, pp. 381–386.
- [4] \_\_\_\_\_, "Near-optimal critical sink routing tree constructions," accepted for publication in *IEEE Trans. Computer-Aided Design*, 1995.
- [5] K. D. Boese, A. B. Kahng, and G. Robins, "High-performance routing trees with identified critical sinks," in *Proc. ACM/IEEE Design Automation Conf.*, June 1993, Dallas, pp. 182–187.
- [6] P. K. Chan, University of California, Santa Cruz, private communication, June 1993.

- [7] P. K. Chan and K. Karplus, "Computing signal delay in general RC networks by tree/link partitioning," *IEEE Trans. Computer-Aided Design*, vol. 9, no. 8, pp. 898–902, 1990.
- [8] J. Cong, A. B. Kahng, and G. Robins, "Matching-based methods for high-performance clock routing," *IEEE Trans. Computer-Aided Design*, vol. 12, no. 8, pp. 1157–1169, 1993.
- [9] J. Cong, A. B. Kahng, G. Robins, M. Sarrafzadeh, and C. K. Wong, "Provably good performance-driven global routing," *IEEE Trans. Computer-Aided Design*, vol. 11, no. 6, pp. 739-752, 1992.
- [10] W. C. Elmore, "The transient response of damped linear networks with particular regard to wide-band amplifiers," J. App. Phys., vol. 19, no. 1, pp. 55-63, 1948.
- [11] R. L. Geiger, P. E. Allen, and N. R. Strader, VLSI Design Techniques for Analog and Digital Circuits. New York: McGraw-Hill, 1990.
- [12] J. Griffith, G. Robins, J. S. Salowe, and T. Zhang, "Closing the gap: Near-optimal steiner trees in polynomial time," *IEEE Trans. Computer-Aided Design*, vol. 13, no. 11, pp. 1351–1365, 1994.
- [13] A. B. Kahng and G. Robins, "A new class of iterative Steiner tree heuristics with good performance," *IEEE Trans. Computer-Aided Design*, vol. 11, no. 7, pp. 893–902, 1992.
- [14] \_\_\_\_, On Optimal Interconnections for VLSI Layout. Boston, MA: Kluwer, 1995.
- [15] S. L. Long and S. E. Butner, Gallium Arsenide Digital Integrated Circuits. New York: McGraw-Hill, 1990.
- [16] B. A. McCoy and G. Robins, "Non-tree routing," in Proc. European Design and Test Conf., Paris, France, Feb. 1994, pp. 430–434.
- [17] L. Nagel, "SPICE2: A computer program to simulate semiconductor circuits," May 1975.
- [18] J. Rubinstein, P. Penfield, and M. A. Horowitz, "Signal delay in RC tree networks," *IEEE Trans. Computer-Aided Design*, vol. CAD-2, no. 3, pp. 202-211, 1983.
- [19] S. Sutanthavibul and E. Shragowitz, "An adaptive timing-driven layout for high speed VLSI," in *Proc. ACM/IEEE Design Automation Conf.*, 1990, pp. 90–95.