# ArchFP: Rapid Prototyping of pre-RTL Floorplans

Gregory G. Faust\*, Runjie Zhang\*, Kevin Skadron\*, Mircea R. Stan<sup>†</sup> and Brett H. Meyer<sup>‡</sup>,

\*Department of Computer Science, University of Virginia, Charlottesville, VA, USA

Email: {gf4ea, rz3vg, skadron}@cs.virginia.edu

<sup>†</sup> Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, USA

Email: mircea@virginia.edu

<sup>‡</sup>Department of Electrical and Computer Engineering, McGill University, Montréal, QC, Canada

Email: brett.meyer@mcgill.ca

Abstract—There has been a fundamental shift from ever more complex single cores to single chip multi-core (CMP) designs. Along with this opportunity come major challenges; notably the sheer size of the CMP design space. An integrated suite of tools is needed that provides life-cycle support from early prototyping to final design. Here we present ArchFP, a floorplanning tool targeted towards prototyping of pre-RTL CMP design concepts. As such, it is complementary to traditional floorplanners that are more appropriate later in the design cycle.

An ArchFP floorplan is specified using a model similar to that supported by GUI toolkits such as Java Swing and Windows Presentation Foundation. The floorplan design is comprised of a hierarchy of components placed within containers, called Layout Managers (LMs), that provide a variety of layout algorithms. Current LMs include a generalized grid LM, one that supports geographic hints for component placement, and a fixed layout of imported subcomponents. ArchFP is easy to extend with additional LMs that leverage these initial layout algorithms to support more specific design elements such as NoC configurations, cache partitioning strategies, SIMD design, etc. To the best of our knowledge, no one has previous used this approach for floorplanning. ArchFP is written in C++ for UNIX systems, and is free for download from http://lava.cs.virginia.edu/archfp.

We demonstrate the utility of ArchFP in the study of power delivery and temperature constraints in likely CMP configurations of Penryn-like cores over four technology scales.

#### I. INTRODUCTION

The focus of chip design has shifted towards the inclusion of many cores on a chip leading to the production of Chip Multi-Processors (CMPs). The size and complexity of the design space for CMPs is staggering. It spans multiple dimensions such as the number, type, and complexity of the cores, the size of on-chip cache, the cache sharing model, the type of on-chip network interconnect between components, local vs. global time synchronization, the type and number of memory controllers, etc. While there is a combinatorial explosion of the possible system architectures to contemplate, there is also a number of increasingly hard to overcome constraints that must be dealt with. These include pin count, power density, temperature, and total die size. Each of these are worthy of study in their own right, but making system-wide architectural decisions while attending to one or two of these constraints at a time can often lead to suboptimal designs [1].

In order to investigate the large CMP design space in a comprehensive fashion, increased emphasis must be placed on integrated tool suites capable of modeling CMPs and their onchip support systems. To allow rapid early stage investigation, the suite should contain tools capable of modeling systems at different levels of detail. An example of such a tool is McPAT [2] which contains hierarchical power, area, and timing models for various HW components. McPAT has been used to investigate a number of core cluster configurations in terms of their total area, power, and NoC latency without considering the actual layout of the various configurations.

However, many other CMP design investigations do require layout information. For example, at the University of Virginia, the effect of CMP layout on peak chip temperature was investigated for a variety of CMP configurations and target uses [3], [4]. It was shown that the lack of temperature-aware floorplanning can force runtime throttling of the voltage and/or clock speed thereby affecting performance. Also, Kumar et al. [5] studied area, power, and performance of various NoC topologies for CMPs, and found that a hierarchical bus NoC can result in reduced area and power consumption.

These studies and others like them [6] motivate the creation of a pre-RTL floorplanner targeted at CMP design. While traditional floorplanners operate well at the three levels of description supported by McPAT (*Architectural*, *Circuit*, and *Technology*), ArchFP works well at the *Architectural* and a higher *CMP* level of design in which clusters of *Architectural* level components can be combined into reusable/repeatable elements. Traditional floorplanning tools typically operate with detailed post synthesis information about the components and wires which comprise the design. However, this level of detail is not appropriate for early pre-RTL CMP design space exploration. In addition, traditional floorplanners do not attend to the metrics of concern to layout at the CMP level which are different than those later in the design cycle.

ArchFP retargets the model of Graphical User Interface (GUI) design toolkits such as Java Swing [7] and Windows Presentation Foundation [8] to provide a novel framework for floorplanning. Components such as cores, caches, crossbars, etc. are placed within containers. Associated with each container is a layout algorithm called a "layout manager" (LM). Containers are themselves components; therefore the model is inherently hierarchical. Finally, the model is implemented as a class library, allowing for extensibility of the model with additional layout algorithms of arbitrary generality or specificity. In particular, a traditional floorplanning algorithm

This is the authors' final manuscript. The authoritative version will appear in 2012 IEEE/IFIP 20th International Conference on Very Large Scale Integration (VLSI-SoC 2012).

can be included smoothly within the architecture, as can very specific knowledge-based LMs. To the best of our knowledge, no one has previously proposed this model of hierarchical containment with differing LMs for hardware floorplanning. Previously, in a Technical Report, we retrospectively showed the utility of ArchFP for CMP design investigations like those above [9]. Below we present a more recent example involving the study of Power Delivery Networks (PDNs).

#### II. BACKGROUND AND RELATED WORK

#### A. Floorplanning

Traditional floorplanners input a set of HW components with a given area and aspect-ratio (AR=width/height), and the wires between them, and search for a non-overlapping 2D layout that minimizes an objective function of total area and total wire length. Because finding optimal floorplans is NP-Hard, practical systems use various approximation techniques [10], [11]. For example, floorplans are often represented as a (binary) tree of rectangular areas, with HW components as the leaves, and larger composite rectangles further up the tree. The space of possible layouts is searched via simulated annealing which randomly perturbs the tree by a series of "moves" such as rearranging a node's children, moving children between nodes, laying a block on its side, etc. After each move, the objective function is re-evaluated to decide if the new floorplan is better or worse than the previous one. One advantage of this approach is that all components are handled in a simple and consistent fashion. But this approach can be slow, and produce in any given run, a floorplan of uncertain quality.

The use of a tree structure in these algorithms does not imply a hierarchical specification of the HW design, and the hierarchy has neither stability nor semantic meaning. In contrast, in the ArchFP model the hierarchical inclusion of components in containers is stable and has direct impact on the resultant design. More importantly, different layout algorithms can be, and typically are, applied at different levels in the hierarchy. Furthermore, the semantic relevance of component placement in LMs is increased when domain-specific LMs use specialized knowledge about how to lay out their children,

The objective functions of traditional floorplanners are too inflexible for CMP level design. Therefore, many CMP studies wrap an additional evaluation function around a traditional floorplanner. For example, Meyer et al. [12] studied system reliability and cost in application-specific SoCs using various NoC configurations. It was necessary to generate a very large number of floorplans to find the few that scored well on the desired metrics. Meyer found that over 90% of the runtime used in design space exploration was spent floorplanning.

We believe ArchFP has several advantages over traditional floorplanners for these CMP studies. First, the current set of LMs are quite general purpose and easy to use, and as they do not examine a large search space, have trivial runtimes. Second, as discussed below, the requirements on an LM are minimal, making the system very easy to extend. Together these two features allow ArchFP to support the direct creation of layouts that have desirable features for the CMP design space under investigation. The investigator can either build domain-specific knowledge into a novel LM, or combine existing LMs using their intuition about layouts likely to have needed attributes. This can result in faster and more focused investigations of large CMP design spaces. Finally, the ArchFP system design is flexible enough to allow a traditional floorplanner to be directly included as an LM, thereby ensuring floorplanning capability no worse than the current standard.

Floorplanners are often evaluated using benchmarks such as GSRC [13]. These metrics relate to post-RTL floorplanning, thus we have not tested ArchFP against such benchmarks.

#### B. GUI Design Toolkits

Modern GUI toolkits such as Java Swing [7] and Windows Presentation Foundation (WPF) [8] are 3rd generation systems built upon the lessons learned in previous systems. Java Swing and WPF both have the same architecture of components, containers, and LMs that we propose to use for floorplanning. Both toolkits contain many different LMs supporting horizontal, vertical, grid, box-and-spring and other placement algorithms. In both systems, the contract for an LM is defined as an interface. Therefore, anyone can create their own LMs either on top of the base set or completely independently and have them participate in the overall architecture in a consistent fashion. ArchFP is written in C++ and offers the same level of extensibility through the use of an abstract base-class with virtual methods for component addition, layout, generating output to a floorplan file, etc. In WPF, layout is done using the same flow of information between LMs as in ArchFP.

#### **III. ARCHFP IMPLEMENTATION**

A key goal of the floorplanner is to integrate with the growing suite of tools used for CMP design space investigation. The suite already contains tools such as gem5 [14], McPAT [2], HotSpot [15], and ParquetFP [16]. McPAT supplies the area information for components needed by ArchFP to produce floorplans. ArchFP can both produce and consume HotSpot floorplan files. This format is extremely simple; each element specifies it name, location, width and height.

Users can integrate with ArchFP in many ways. They can modify and recompile ArchFP code to create each floorplan. However, most CMP design studies investigate many closely related designs whose floorplans can be produced from one or two parameterized specifications. The user can write these once, then generate various floorplans by using command line arguments or a configuration file specific to their design parameters. For tighter integration, we expect many users will include ArchFP as a class library into their own tool for generating and/or evaluating CMP designs. We chose not to support a "declarative" language for ArchFP floorplan specifications because this approach cannot support the specification reuse described above, nor computation of component features such as area based on configuration parameters.

#### A. ArchFP Components

All ArchFP components are derived from a C++ base class, which is very similar to a component in one of the GUI

frameworks. While LMs act as containers, they are also components. This allows the arbitrary containment hierarchy to be consistent. Components contain the following information:

- *Component type*. The current general purpose LMs only use the type to provide names during output. Future LMs that are specific to various CMP structures will take the type of the components into account when doing layout.
- *MinAR* and *MaxAR*. The component's allowed AR range.
- *Location*. Components store their (x, y) location information relative to their container, not the overall floorplan.
- Area. Basic components have an intrinsic area. Containers discover their area as part of the layout process.
- *Width* and *Height*. The actual width and height are often not known until after layout because AR specifications flows recursively from top to bottom during layout.

#### B. ArchFP Layout Managers

All ArchFP LMs are derived from a C++ abstract container class, which contains its list of inferiors, and the attributes it inherits from the component class. However, the methods of the LMs are where the work of the system is performed. There are two abstract methods on the container base class that all LMs must implement; one to perform the layout, and one to output the layout in HotSpot floorplan format.

Information flow during layout is as follows.

- 1) The top level LM is given a target AR as a layout goal.
- 2) Recursively, each LM, starting at the top of the hierarchy, requests the area of its inferiors. As noted above, leaf components are given an area when they are created, halting the recursive descent. LMs then calculate their own area as the sum of their inferiors' areas.
- 3) Next, again starting from the top, each LM uses its target AR and the number and size of its inferiors, to call the layout method on those inferiors dictating their target AR. Thus, during this phase, components discover their width and height from the bottom up. Leaf components are also able to lay themselves out by checking their minimum and maximum AR and complying as closely as possible with the AR request from above.
- Next, the LM checks to see if the inferior's resultant width and height is as requested. If not, it is the LMs responsibility to adjust accordingly.
- 5) As each LM finishes the layout for a given inferior, it then sets that inferior's relative location.
- 6) Finally, the LM calculates its own width and height based on the actual location and size of its inferiors.

#### C. Initial set of LMs

As proven in many other contexts, a very powerful tool paradigm is to have a few simple primitives that operate well in conjunction with one another. Thus, we have started by implementing the following small collection of such LMs.

• *GridLM*. A GridLM contains a single inferior (of arbitrary nested complexity), and a repeat count. During layout, the GridLM will determine the best dimensions based on its requested AR. For example, if the grid layout

method is given a target AR of 2 (twice as wide as high), and the grid contains 8 elements, the GridLM will set its grid dimensions to 2 rows by 4 columns. This allows layout of the inferior with a target AR closest to a square.

- GeoLM. This is the most flexible LM. To reduce explicit levels of hierarchy in an ArchFP specification, inferiors of a GeoLM can have a repeat count. The GeoLM will then implicitly create a GridLM as an inferior to contain the repeating group. In addition, inferiors to the GeoLM are specified with a Geographic Location Hint (GLH) that indicates where the component is to be placed. Currently supported GLHs include Left, Right, Top, Bottom, Center, LeftRight, and TopBottom. During layout, the GeoLM takes each of its inferiors in order, and allocates all remaining space along the specified border(s) to it. LeftRight and TopBottom can only be applied to a repeating group of even size. The group is cut in two and each half put in the specified location. In addition, the LeftRight and TopBottom hints have a Mirroring option and a  $180^{\circ}$  rotation option.
- *FixedLM*. This powerful LM allows any pre-existing layout in a HotSpot floorplan file to be reused in any ArchFP floorplan. It supports optional Denard scaling, which is often accurate enough for early stage CMP prototyping. However, if more accurate scaling is needed, the ArchFP floorplan specification can be reused with rescaled areas from McPAT. This LM does no layout; its inferiors know their size, shape, and location.
- *BagLM*. A BagLM contains an arbitrary collection of components. It lays them out from largest to smallest placing each in turn along the shortest remaining side.

As shown below, these simple, intuitive, easy to use LMs can be used in combination to produce surprisingly complex floorplans. In addition, such a set of primitives acts as the foundation upon which to build more sophisticated LMs, especially ones specific to important CMP patterns such as NoC topologies, cache sharing models, SIMD layouts, etc. ArchFP acts as an organizing framework into which such LMs can be added and used in combination in a consistent fashion.

### IV. CASE STUDY

As power density rises with each new generation of chip technology, temperature and power delivery will soon become design constraints on further CMP scaling. Here we focus on reliability issues associated with electromigration (EM) in C4 pads and IR drop resulting in increased latencies. The study of these physical constraints requires an integrated tool suite. We use a combination of new and pre-existing tools; McPAT for power and area information, HotSpot to calculate local temperatures, a novel PDN model specifically designed for this experiment to calculate local IR drop and C4 pad currents [17], and ArchFP for layout information.

#### A. ArchFP Floorplan Specification

Using a 45nm Intel Penryn-like core as a baseline (described in IV-C), we create a series of scaled CMPs down to 16nm and

study the resulting power delivery noise and chip temperature. We model 2, 4, 8, and 16 core configurations. Figure 1 shows the C++ specification for the 8-core floorplan shown in Figure 2. To create a hierarchical description of a floorplan, we must first create the components near the leaves of the hierarchy so that we can later include them in the higher levels.

The specification for this floorplan works as follows.

- First, the layout of the Penryn-like core is loaded as a "macro" from a HotSpot floorplan file which was also created in ArchFP from 22nm scale *Architectural* level components. Due to the subcomponent diversity, this required a few dozen lines of code. In Figure 2 the core forms the unlabled complex areas near the top and bottom of the CMP (with names elided for clarity).
- 2) Second, the cluster of L2 cache and NoC is created.
- 3) Third, the Memory Controllers, core, and L2 clusters are each added in implicit grids of 8 each to the CMP level of the LM hierarchy. Note the use of the TopBottom GLH and mirroring of the core component that lead to the symmetric structure of the final floorplan.
- 4) Fourth, the layout is performed and the result is output.



Figure 1. ArchFP C++ code to build the layout shown in Figure 2. The arguments to the addCluster method are component, repeat count, area, AR limits, and a GLH. The arguments to addComponent are a predefined cluster (thereby adding hierarchy to the layout), a repeat count, and a GLH. The layout method takes a target AR, in this case 1 (square).

### B. ArchFP Floorplan Layout

ArchFP calculates the layout for this floorplan as follows.

- 1) First all components calculate their total area as described in section III-B.
- 2) The CMP GeoLM is the top component in the hierarchy, and we request it lay itself out as a square. It takes its children *in order of specification* and lays them out in the requested locations. It starts with the 8 MCs, and due to the TopBottom GLH divides them into two implicitly generated 4-element GridLMs. Each GridLM is allocated the entire width of the CMP square for layout at their respective top and bottom locations. Given the size and AR constraints, the GridLMs choose 1x4 grids, and determine the resultant width of each MC. The MC then lays itself out to fit its width.
- Next the CMP GeoLM lays out the core components. Again, two 4-element GridLMs are formed, and each is



Figure 2. CMP floorplan of 8 Penryn-like cores with L2 cache, NoCs and Memory Controllers added.

allocated the entire width of the CMP square as above, but their y-locations are now inboard of the MCs. Due to the TopBottomMirror GLH, the bottom GridLM is told to reflect its component through the x-axis. The fixed core component need not lay itself out because it uses the information read from the Penryn22.fp file.

- Next the CMP GeoLM lays out the L2 cluster. There is one 8-element GridLM created. Given the shape of the Center space to fill, it chooses a 2x4 grid configuration.
- 5) Finally, the L2 cluster lays itself out. As a GeoLM, it lays out its components in order. First the L2 is allocated the full height of the right side of its grid location to lay itself out, followed similarly by the NoC on the left.

To validate the quality of the ArchFP floorplans, we compared their half perimeter wire length (HPWL) and area to floorplans generated by ParquetFP [16]. Based on the designer's placement of components and using flexible AR constraints, the ArchFP floorplans for the 22nm Penryn-like core and 8-core CMP have the minimum possible areas, and (hand calculated) HPWL of 38mm and 108mm respectively. Based on traditional wire specifications, and depending on the relative weight given to area and HPWL in its objective function, ParquetFP produced floorplans with a reduction in HPWL of 15-33% for the core and 18-35% for the CMP, but with concomitant increases in area of 8-37% and 4-58% respectively. Also, the ParquetFP CMP floorplans do not have the intuitive regularity of component placement shown in the ArchFP floorplan. We conclude that the floorplans produced by ArchFP represent a good trade-off between HPWL and area, and act as a valid starting point for the PDN study.

## C. Scaling and Power Modeling

Our scaling and power assumptions are as follows.

1) Multicore scaling: We choose an Intel 45nm Penryn-like processor [18] as our baseline design. It has two 32-bit 4-way out-of-order cores running at 3.7GHz. Each core contains a 32KB L1 instruction cache and a 32KB L1 data cache. Unified L2 caches are private to each core and are each 3MB. For each technology node, we hold the processor architecture constant but assume that the number of cores (and therefore the number of private L2 caches) doubles. We use mesh-based networkon-chip (NoC) structure across all technology nodes.

Table I Area and power of multicore processors with Penryn-like cores

| Tech Node(nm)          | 45     | 32     | 22     | 16     |
|------------------------|--------|--------|--------|--------|
| # of Cores             | 2      | 4      | 8      | 16     |
| Area(mm <sup>2</sup> ) | 116.44 | 124.78 | 131.48 | 149.25 |
| Supply Voltage(V)      | 1.0    | 0.9    | 0.8    | 0.7    |
| Peak Total Power(W)    | 74.62  | 100.48 | 116.76 | 148.49 |

2) Power modeling: To get chip-wide power consumption data for all the technology nodes, we use McPAT [2]. Table I shows the area and peak power (including leakage power) results for our Penryn-like multicore designs in each technology.

McPAT calculates theoretical peak power by assuming maximum switching activity, corresponding to functional blocks being fully active every cycle. For most structures, such as L2 cache or NoC, this is neither achievable nor sustainable. To estimate the realistic worst-case power consumption for each system, we conducted performance simulations and activity factor analyses to extract an empirical reasonable worst-case switching activity. Based on these studies, we use 80% of McPATs theoretical peak power as our best estimate for practical peak power consumption. Previous work on single core stressmark generation [19] and NoC throughput [6] suggest realistic peak power to be 80-85% of theoretical peak power.

3) *CMP floorplanning:* As discussed above, our CMP floorplans were generated by ArchFP. Chips at different technology nodes share the same single core structure and the area of each functional block is calculated by McPAT. During the construction of all the CMP floorplans, we place MCs on the chip periphery and NoCs between cores as long, thin blocks.

4) Power delivery and temperature modeling: Taking our floorplans as input, we use HotSpot [15] to model chip temperature and a steady-state power delivery model described in [17] to model on-chip power delivery network. Table II lists the major physical parameters used in these two models.

 Table II

 PDN and cooling system parameters selected for scaling study

| Power Delivery Network                  |         |
|-----------------------------------------|---------|
| Top Layer Metal Pitch $(\mu m)$         | 30      |
| Top Layer Metal Width $(\mu m)$         | 6       |
| Top Layer Metal Thickness $(\mu m)$     | 5       |
| Top Layer Metal Resistivity $(\rho)$    | 1.68e-8 |
| C4 Pad Diameter $(\mu m)$               | 130     |
| C4 Pad Pitch ( $\mu m$ )                | 285     |
| C4 Pad Resistivity $(\rho)$             | 1.46e-7 |
| Package Resistance $(m\Omega)$          | 0.03    |
| Cooling System Thermal Resistance (K/W) |         |
| High end air cooling                    | 0.25    |
| Liquid cooling                          | 0.1     |



Figure 3. Max pad current and max on-chip IR drop at each technology node. The upper range of the right Y-axis is the threshold current value for EM (at  $100^{\circ}$ C). For IR drop, we do not set an explicit threshold value but a 3.8% IR drop could cause as high as 51% delay increase [22]. IR drop therefore poses a more significant risk to failure than EM.

For power delivery, we choose on-chip metal parameters to approximate an Intel 45nm metal stack [20]. C4 pad spacing was selected so that our pad density matches ITRS projections. Package resistance comes from [21]. For temperature, we simulated both air cooling and liquid cooling systems and their effective thermal resistance can be found in product datasheets.

# D. Results

1) Electromigration on C4 Pads: EM is one of the major failure mechanisms that deserve designers' attention. According to [23], aluminum and copper metal wires, commonly used for on-chip interconnections, can carry two orders of magnitude higher current density than solder joints. This suggests that C4 solder bumps are more vulnerable to EM. For this reason, we calculate the max current density on C4 pads, illustrated by the line in Figure 3. In order to determine the upper bound of the PDN capacity (or the lower bound of PDN noise), we assume that all pads are used for power or ground (and that each type is distributed uniformly). While this is an unrealistic assumption for a real system, it allows us to determine the best-case trend in PDN behavior. In the event that the PDN imposes constraints on the rest of the design under this best case, clearly any design under more realistic assumptions will be constrained by the PDN as well.

In [23], the author gives an EM threshold current density for SnPb solder. At 100°C, the maximum current density that a solder joint can carry without EM damage is  $8.5 \times 10^3 A/cm^2$ . Combined with our pad diameter assumption, we calculate the per pad current limit as 1.13A, shown in Figure 3 as the maximum value of the right Y-Axis. It is obvious that even though the maximum pad current increases as the technology scales, the absolute value is still far from the EM threshold. This could be an indication that under ITRS's projections for total pad count, there may be enough guard band for EM in C4 pads for at least the near future.

2) Steady-State IR drop: IR drop is an important PDN metric because it is directly related to silicon delay increase and frequency degradation. As technology scales, the impact of IR drop would increase due to higher currents. Similar to



Figure 4. A comparison between chip max temperature and worst IR drop across different technologies

the previous section, we dedicate all potential pad locations to power and ground pads and no pads to I/O signals. We then use the model to find the maximum on-chip IR drop ratio for each technology. This gives a lower bound on IR drop and the results are shown in Figure 3. The reported IR drop value combines both voltage drop from power plane and ground bounce on ground plane.

IR drop, unlike EM, does not directly result in immediate failure when a threshold current has been crossed, but results in performance degradation instead. Previous work [22] suggests that a 0.05V voltage drop at  $0.13\mu m$  with 1.35V power supply would cause a 15% average and up to 51% maximum delay increase. The bars in Figure 3 show that the IR drop increases as the power density increases with technology scaling, and that the IR-drop ratio value reaches above 4% at 16nm—resulting in non-trivial performance degradation. For a more realistic scenario where not all pads were dedicated to power and ground, the problem would be even worse.

3) Temperature vs. IR drop: Both IR drop and temperature are physical design constraints that closely relate to chip power density. A robust system should be designed with both factors in mind. Figure 4 combines chip max temperature with max IR drop. The temperature results are based on both an air cooling system and a liquid cooling system. Our results to date indicate similar trends for both power delivery and thermal limits. Starting from 16nm, air cooling will be insufficient for the modeled CMP. Even though switching to liquid cooling can keep chip temperature under the limit, power delivery will become a new bottleneck. The platform we built provides an infrastructure for future studies.

### V. CONCLUSION

To facilitate investigation of the very large design space for CMPs, tools that can be used to rapidly prototype pre-RTL CMP designs are becoming increasingly important. We present ArchFP, a floorplanning tool specifically designed to operate with *CMP* and *Architectural* components before the *Circuit* and *Technology* details have been determined. ArchFP retargets the flexible and extensible component/container/LM model of GUI design tools to floorplanning. To the best of our knowledge, this approach to floorplanning is completely novel. ArchFP is complementary to traditional floorplanners; it is superior for early prototyping, while traditional floorplanning techniques are necessary later in the design life-cycle. We previously showed that ArchFP could have been useful to several CMP design investigations. Here, we show the utility of ArchFP as part of an integrated tool suite in the study of power delivery and temperature constraints in likely CMP configurations of Penryn-like cores for four technology nodes. We conclude that IR drop may pose a more significant risk to PDNs than EM in C4 pads. In addition, starting with 16nm chips, power delivery is likely to become a more limiting factor than temperature in applications using liquid cooling.

#### ACKNOWLEDGMENT

This work was supported in part by the NSF under grants CRI-0551630 and MCDA-0903471 and by the SRC under task 1972, as well as grants from AMD and Intel. We also thank the anonymous reviewers for their helpful feedback.

#### REFERENCES

- [1] Y. Li et al., "CMP design space exploration subject to physical constraints," in *HPCA*, Feb. 2006, pp. 17–28.
- [2] S. Li et al., "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in *MICRO*, Dec. 2009, pp. 469–480.
- [3] K. Sankaranarayanan et al., "A case for thermal-aware floorplanning at the microarchitectural level," JILP, vol. 7, 2005.
- [4] K.Sankaranarayanan et al., "Architectural implications of spatial thermal filtering," *Integration, the VLSI Journal*, Online, Dec. 2011.
- [5] R. Kumar et al., "Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling," in *ISCA*, 2005, pp. 408– 419.
- [6] P. P. Pande et al., "Performance evaluation and design trade-offs for network-on-chip interconnect architectures," *IEEE Trans. Computers*, vol. 54, no. 8, pp. 1025–1040, Aug. 2005.
- [7] D. M. Geary, Graphic Java 2 : mastering the JFC, 3rd ed., ser. The Sun Microsystems Press Java series. Sun Microsystems, 1999, v. 2. Swing.
- [8] C. Anderson, Essential Windows Presentation Foundation, ser. Microsoft NET development series. Addison-Wesley, 2007.
- [9] G. Faust et al., "Rapid prototyping of CMP floorplans: A technical report," University of Virginia, Tech. Rep. CS-2012-02, March 2012.
- [10] S. H. Gerez, Algorithms for VLSI design automation. Wiley, 1999.
- [11] M. Sarrafzadeh et al., An Introduction to VLSI Physical Design. Mc-Graw Hill, 1996.
- [12] B. H. Meyer et al., "Cost-effective slack allocation for lifetime improvement in NoC-based MPSoCs," in DATE, March 2010, pp. 1596–1601.
- [13] "GSRC floorplan benchmark suite." [Online]. Available: http://www. cse.ucsc.edu/research/surf/GSRC/GSRCbench.html
- [14] N. Binkert et al., "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1–7, Aug 2011.
- [15] K. Skadron et al., "Temperature-aware microarchitecture," in ISCA, June 2003, pp. 2–13.
- [16] S. N. Adya et al., "Fixed-outline floorplanning: enabling hierarchical design," *IEEE Trans. VLSI*, vol. 11, no. 6, pp. 1120–1135, Dec. 2003.
- [17] R. Zhang et al., "Some limits of power delivery in the multicore era," in WEED, Jun 2012.
- [18] V. George et al., "Penryn: 45-nm next generation Intel Core 2 processor," in ASSCC, Nov 2007, pp. 14–17.
- [19] M. Joshi et al., "Automated microprocessor stressmark generation," in HPCA, 2008, pp. 229–239.
- [20] N. H. Weste et al., CMOS VLSI Design A Circuit and Systems Perspective, 4th ed. Addison-Wesley, 2011.
- [21] Intel Pentium 4 Processor in the 423 pin package / Intel 850 Chipset Platform. Intel, 2002.
- [22] M. Shao et al., "IR drop and ground bounce awareness timing model," in *ISVLSI*, May 2005, pp. 226–231.
- [23] Y. T. Yeh et al., "Threshold current density of electromigration in eutectic SnPb solder," *Applied Physics Letters*, vol. 86, no. 20, May 2005.