UVa CS Technical Reports

Some of the reports listed below are not available on-line. To obtain them, send mail to techrep-librarian@cs.virginia.edu.

[Computer Science Technical Reports] [Institute for Parallel Computation Technical Reports] [Research Memos]

Computer Science

Huang, Wei; Sankaranarayanan, Karthik; Ribando, Robert J.; Stan, Mircea R.; Skadron, Kevin. An Improved Block-Based Thermal Model in HotSpot 4.0 with Granularity Considerations, April 23, 2007.

Abstract: This technical report describes our most recent improvements to the HotSpot thermal model. First, in response to a previous paper in WDDD 2006 that cites some accuracy shortcomings, we improve treatment in HotSpot block model of high aspect-ratio blocks, high power densities, and convection boundary conditions. These improvements are included in the newest release of the HotSpot software, version 4.0. Second, to help guide users modeling efforts and choice of modeling resolution, we propose an analytical approach to find the relationship between the power modeling granularity and the associated peak temperature rise at that granularity, i.e. the relationship between the size and peak temperature of a uniform heat source. We theoretically confirm the existence of a strong spatial temperature low-pass filtering effect: a tiny heat source is much cooler than a large heat source with the same power density.

Walcott, Kristen R.; Humphreys, Greg; Gurumurthi, Sudhanva. Dynamic Prediction of Architectural Vulnerability from Microarchitectural State, November 17, 2006.

Abstract: Transient faults due to particle strikes are a key challenge in microprocessor design. Driven by exponentially increasing transis tor counts, per-chip faults are a growing burden. To protect against soft errors, redundancy techniques such as redundant multithrea ding (RMT) are often used. However, these techniques assume that the probability that a structural fault will result in a soft error (i.e., the Architectural Vulnerability Factor (AVF)) is 100 percent, unnecessarily draining processor resources. Due to the high co st of redundancy, there have been efforts to throttle RMT at runtime. To date, these methods have not incorporated an AVF model and therefore tend to be ad hoc. Unfortunately, computing the AVF of complex microprocessor structures (e.g., the ISQ) can be quite invo lved. To provide probabilistic guarantees about fault tolerance, we have created a rigorous characterization of AVF behavior that can be e asily implemented in hardware. We experimentally demonstrate AVF variability within and across the SPEC2000 benchmarks and identify strong correlations between structural AVF values and a small set of processor metrics. Using these simple indicators as predictors, we create a proof-of-concept RMT implementation that demonstrates that AVF prediction can be used to maintain a low fault tolerance level without significant performance impact.

Rajan, Hridesh; Sullivan, Kevin J.. On the Expressive Power of Classpects, July 18, 2005.

Abstract: The most prominent feature of AspectJ and derivative languages is the aspect. The aspect is a module-like construct that supports data abstraction but that is distinct from the class in several ways. First, aspects support new mechanisms: notably join points, pointcut descriptors and advice. At the same time, they lack some key features of classes, namely the ability to manage aspect instantiation using new. In earlier work, we showed that it was possible and valuable to simplify the design of such languages without loss of expressiveness by unifying aspects and classes in a new construct that we called the classpect. The contribution of this paper is the formal demonstration, using Felleisen's notion of macro-eliminable programming language extensions, that the classpect language model, realized in the Eos language and compiler, is strictly more expressive than AspectJ in the important case in which aspect-like modules must advise other aspect modules.

Jin, X.; French, J. C.; Michel, J.. Query Formulation for Information Retrieval by Intelligence Analysts, July 2005.

K. Dale, J. Sheaffer, D. Luebke, G. Humphreys, and K. Skadron.. Applications of Small Scale Reconfigurability to Graphics Processors, June 2005 (revised 8 July 2005).

Abstract: We explore the application of Small-Scale Reconfigurability (SSR) to graphics hardware. SSR is a relatively new architectural technique wherein functionality common to multiple subunits is reused rather than replicated, yielding high-performance reconfigurable hardware with reduced area requirements. We show that SSR can be used effectively in programmable graphics architectures to allow double-precision computation without affecting the performance of single-precision calculations and to increase fragment shader performance with a minimal impact on chip area.

Lu, Zhijian; Lach, John; Stan, Mircea; Skadron Kevin. Temperature-Aware Modeling and Banking of IC Lifetime Reliability, July 1, 2005.

Abstract: Most existing integrated circuit (IC) reliability models assume a uniform, typically worst-case, operating temperature, but temporal and spatial temperature variations affect expected device lifetime. As a result, design decisions and dynamic thermal management (DTM) techniques using worst-case models are pessimistic and result in excessive design margins and unnecessary runtime engagement of cooling mechanisms (and associated performance penalties). By leveraging a reliability model that accounts for temperature gradients (dramatically improving interconnect lifetime prediction accuracy) and modeling expected lifetime as a resource that is consumed over time at a temperature- and voltage-dependent rate, substantial design margin can be reclaimed and runtime penalties avoided while meeting expected lifetime requirements. In this paper, we evaluate the potential benefits and implementations of this technique by tracking the expected lifetime of a system under different workloads while accounting for the impact of dynamic voltage and temperature variations. Simulation results show that our dynamic reliability management (DRM) techniques provide a 40% performance penalty reduction over that incurred by pessimistic DTM in general-purpose computing and a 10% increase in quality of service (QoS) for servers, all while preserving the expected IC lifetime reliability.

Sankaranarayanan, Karthik; Velusamy, Sivakumar; Skadron, Kevin; Stan, Mircea. Microarchitectural Floorplanning for Thermal Management: A Technical Report, May 20, 2005.

Abstract: In current day microprocessors, exponentially increasing power densities, leakage, cooling costs,and reliability concerns have resulted in temperature becoming a first class design constraint like performance and power. Hence,virtually every high performance microprocessor uses a combination of an elaborate thermal package and some form of Dynamic Thermal Management (DTM) scheme that adaptively controls its temperature. While DTM schemes exploit the important variable of power density to control temperature,this paper attempts to show that there is a significant peak temperature reduction potential in managing lateral heat spreading through floorplanning. It argues that this potential warrants consideration of the temperature-performance trade-off early in the design stage at the microarchitectural level using floorplanning. As a demonstration,it uses previously proposed wire delay model and floorplanning algorithm based on simulated annealing to present a profile-driven, thermal-aware floorplanning scheme that significantly reduces peak processor temperature with minimal performance impact that is quite competitive with DTM.

Greenwell, William S.; Holloway, C. Michael; Knight, John C.. A Taxonomy of Fallacies in System Safety Arguments, December 10, 2004 (posted 19 July 2005).

Abstract: A system's safety argument is intended to show that the system is acceptably safe to operate in a given environment. If that argument is fallacious, the system may be prone to hazardous modes of operation that could contribute to accidents. We conducted a case study of three industrial safety cases to determine the frequency and nature of fallacious reasoning in system safety arguments. Our results suggest that the frequency of logical fallacies in these arguments is significant and that they follow common themes. To avoid these fallacies, developers must be aware of them when they create safety arguments, and regulators and investigators must know how to discover them when they review those arguments. We present a taxonomy of logical fallacies tailored to system safety cases to assist developers and regulators in these tasks and then demonstrate the taxonomy by applying it to the three safety cases from our case study.

Li, Y.; Skadron, K.; Brooks, D.; Hempstead, M.; Mauro, P.; Hu, Z.. Power and Thermal Effects of SRAM vs. Latch-Mux Design Styles and Clock Gating Choices, February 1, 2005.

Abstract: This paper studies the impact on energy efficiency and thermal behavior of design style and clock-gating style in queue and array structures. These structures are major sources of power dissipation, and both design styles and various clock gating schemes can be found in modern, high-performance processors. Although some work in the circuits domain has explored these issues from a power perspective, thermal treatments are less common, and we are not aware of any work in the architecture domain.

Velusamy, Siva; Huang, Wei; Lach, John; Skadron, Kevin.. Monitoring Temperature in FPGA based SoCs, December 20, 2004 (posted 4 March 2005).

Abstract: FPGA logic densities continue to increase at a tremendous rate. This has had the undesired consequence of increased power density, which manifests itself as higher on-die temperatures and local hotspots. Sophisticated packaging techniques have become essential to maintain the health of the chip. In addition to static techniques to reduce the temperature, dynamic thermal management techniques are essential. In this paper, we present the design of a system that monitors the temperatures at various locations on the FPGA. This system is composed of a controller interfacing to an array of temperature sensors that are implemented on the FPGA fabric. We discuss the issues involved in designing and interfacing to the sensors. We compare the sensor readings with values obtained from HotSpot, an architectural level thermal modeling tool. Finally we show how such a system can be used in two micro-architectural studies - determining the effect the splitting one high power density unit into components that can be spread over the chip, and analyzing the response of a system to thermal emergencies.

Tarjan, David; Skadron, Kevin. Merging Path and Gshare Indexing in Perceptron Branch Prediction, December 1, 2004.

Abstract: We introduce the hashed perceptron predictor, which merges the concepts behind the gshare, path-based and perceptron branch predictors. This predictor can achieve superior accuracy to a path-based and a global perceptron predictor, previously the most accurate dynamic branch predictors known in the literature. We also show how such a predictor can be ahead pipelined to yield one cycle effective latency. On 11 programs from the SPECint2000 set of benchmarks, the hashed perceptron predictor improves accuracy by up to 22 percent over a path-based perceptron and improves IPC by up to 6.5 percent.

Nguyen-Tuong, Anh ; Guarnieri, Salvatore ; Greene, Doug; Evans, David . Automatically Hardening Web Applications Using Precise Tainting, December 14, 2004.

Abstract: Most web applications contain security vulnerabilities. The simple and natural ways of creating a web application are prone to SQL injection attacks and cross-site scripting attacks (among other less common vulnerabilities). In response, many tools have been developed for detecting or mitigating common web application vulnerabilities. Existing techniques either require effort from the site developer or are prone to false positives. This paper presents a fully automated approach to securely hardening web applications. It is based on precisely tracking taintedness of data and checking specifically for dangerous content in only in parts of commands and output that came from untrustworthy sources. Unlike previous work in which everything that is derived from tainted input is tainted, our approach precisely tracks taintedness within data values. We describe our results and prototype implementation on the predominant LAMP (Linux, Apache, MySQL, PHP) platform.

Vicaire, Pascal A.; Stankovic, John A.. Elastic Localization, November 1, 2004.

Abstract: Numerous wireless sensor network algorithms assume that individual sensors possess location information. However, state of the art localization algorithms often achieve acceptable performance only under restrictive assumptions. For instance, some algorithms necessitate regular sensor deployment or centralized computations. Other algorithms require a high proportion of position aware nodes or the ability to accurately infer emission distance or emission direction of received radio signals. We propose the Elastic Localization Algorithm (ELA), a distributed, scalable, robust and efficient localization algorithm. ELA only presumes that a few percent of the sensors know their location and that an estimation of the maximum communication range is available. We provide extensive simulation data describing the precision, the convergence speed, and the communication load of ELA, using networks composed of thousands of sensors. In addition, we submit ELA to testing considering the influence of maximum range and beacon position misestimation, irregular radio patterns, asynchronous nodes, packet losses, particular topologies, and sensor mobility.

Horvath, Tibor; Abdelzaher, Tarek; Skadron, Kevin. Dynamic Voltage Scaling in Multi-tier Web Servers with End-to-end Delay Control, November 1, 2004.

Abstract: Energy and cooling costs of web server farms are among their main financial expenditures. This paper explores the benefits of dynamic voltage scaling (DVS) for power management in server farms. Unlike previous work, which addressed DVS on individual servers and on load-balanced server replicas, this paper addresses DVS in multi-stage service pipelines. Contemporary Web server installations typically adopt a three-tier architecture in which the first tier presents a Web interface, the second executes scripts that implement business logic, and the third serves database accesses. From a user's perspective, only the end-to-end response across the entire pipeline is relevant. This paper presents an algorithm for minimizing the total energy expenditure of the multi-stage pipeline subject to soft end-to-end response-time constraints. A distributed power management service is designed and evaluated on a real 3-tier server prototype for coordinating DVS settings in a way that minimizes global energy consumption while meeting end-to-end delay constraints. The service is shown to consume as much as 30 percent less energy compared to the default (Linux) energy saving policy.

Natrajan, Anand; Grimshaw, Andrew S.; Humphrey, Marty A.; Nguyen-Tuong, Anh. Dispelling Seven Myths about Grid Resource, August 30, 2004.

Abstract: Grid resource management is often viewed as scheduling for large, long-running, compute-intensive, parallel applications. This view is narrow, as we will argue in this article. Grids today encompass diverse resources including machines, users and applications. Moreover, grid users make different demands from different resources. Therefore, grid infrastructures must adopt a greater responsibility than before for managing resources. Grid resource management cannot mean just scheduling jobs on the fastest machines, but must also include scheduling special jobs on matched machines, preserving site autonomy, determining usage policies, respecting permissions for use and so on. In this article, we will present seven "myths" or common beliefs about grid resource management, and dispel each myth by presenting counter-examples or "observations". These observations have been culled from our experience as well as work done by several experts in major grid-related projects. In order to relate these observations to concrete implementation, we will also present grid resource management in Legion, a grid infrastructure project initiated at the University of Virginia.

Co, Michele; Skadron, Kevin. Evaluating Trace Cache Energy-Efficiency, October 10, 2004.

Abstract: Future fetch engines need to be energy-efficient. Therefore, a thorough evaluation and comparison of fetch engine design is necessary for futuristic processors. Our work compares the energy-efficiency of concurrent trace caches, sequential trace caches (STCs), block-based trace caches (BBTCs), and instruction caches (ICs). We compare: CTCs and STCs with path-based next trace predictor (NTP), ICs with branch predictor, and BBTCs with trace table. To separate out predictor organization and prediction effects we also evaluate ICs with NTP and BBTCs with NTP. In our experiments, we first evaluate the fetch engines with no area budget restrictions. Then, to consider higher clock rates we evaluate the fetch engines when restricting the area budget for each component. To consider future process technologies, we also evaluate the effect of increased leakage. We find that branch prediction (whether explicit or implicit) is a key component in the energy-efficiency of the fetch engine designs evaluated. Branch prediction effects are eliminated by artificially equalizing the effective branch prediction accuracy for the fetch engine designs and the results are evaluated. We find that access delay limits the theoretical performance of the fetch engines evaluated. We propose a novel ahead pipelined NTP that performs nearly as well as the single-cycle access NTP.

Rajan, Hridesh; Sullivan, Kevin. Generalizing AOP for Aspect-Oriented Testing, September 30, 2004 .

Abstract: In profiling program execution to assess the adequacy of a test set, a challenge is to select the code to be included in the coverage assessment. Current mechanisms for doing this are coarse-grained, assume traditional concepts of modularity, require tedious and error-prone manual selection, and leave the tester's intent implicit in the input provided to the testing tools. The aspect-oriented constructs of languages such as AspectJ promise to help ease and extend our ability to select the code to be included in a test adequacy criterion and to document the tester's intent within the source code itself. Our contribution is a language-centric approach to automated test adequacy analysis that we call concern coverage. We claim that our approach enables explicit, precise, abstract, and machine-readable representation of the tester's intent and that it can ease testing by eliminating the need for manual selection and explicit maintenance of test adequacy criteria.

Paul, Nathanael; Evans, David. .NET Security: Lessons Learned and Missed from Java, October 9, 2004.

Abstract: Many systems execute untrusted programs in virtual machines (VMs) to limit their access to system resources. Sun introduced the Java VM in 1995, primarily intended as a lightweight platform for execution of untrusted code inside web pages. More recently, Microsoft developed the .NET platform with similar goals. Both platforms share many design and implementation properties, but there are key differences between Java and .NET that have an impact on their security. This paper examines how .NET's design avoids vulnerabilities and limitations discovered in Java and discusses lessons learned (and missed) from Java's experience with security.

Tarjan, David; Skadron, Kevin. Revisiting the Perceptron Predictor Again, September 23, 2004.

Abstract: We introduce a new kind of branch predictor, the hashed perceptron predictor, which merges the concepts behind the gshare and perceptron branch predictors. This is done by fetching the perceptron weights using the exclusive-or of branch addresses and branch history. This predictor can achieve superior accuracy to a path-based and a global perceptron predictor, previously the most accurate fully dynamic branch predictors known in the literature, at the same storage budgets. Additionally, it reduces the number of adders by a factor of four compared to a path-based perceptron. We also show how such a predictor can be ahead pipelined to yield one cycle effective latency, making it the first standalone perceptron predictor. On the SPEC integer set of benchmarks, the hashed ahead-pipelined path-based perceptron predictor (hashed perceptron for short) improves accuracy by 20% over a path-based perceptron and improves IPC by 5.8%. We believe these improvements make the perceptron predictor a promising choice as a branch predictor for a future high-performance microprocessor.

Rajan, Hridesh; Sullivan, Kevin. Classpects: Unifying Aspect- and Object-Oriented Language Design, Sept 1, 2004.

Abstract: The contribution of this work is a compositional, modular building block for program design that unifies classes and aspects as they appear in today's most successful aspect-oriented languages. We call it the classpect. We make three basic claims for this idea: first, it can be realized cleanly in an industrial-strength language; second, it improves the conceptual integrity of the programming model; third, its compositionality significantly expands the useful program design space. To test these claims we designed and implemented a language supporting this model. Eos-U extends C#, has all of the aspect-oriented capabilities of AspectJ, but also unifies classes and aspects. We then exhibit a layered architectural style for modular design of integrated systems, and we show that, unlike earlier aspects, classpects cleanly support this design style.

Cai,Yuanfang; Sullivan, Kevin J.. Software Design Spaces: Logical Modeling and Formal Dependence Analysis, September 1, 2004 (Revised 24 September 2004).

Abstract: We lack a useful, formal theory of modularity in abstract software design. A missing key is a framework for the abstract representation of software design spaces that supports analysis of design decision coupling structures. We contribute such a framework. We represent design spaces as constraint networks and develop a concept of design decision coupling based on the minimal change sets of a variable. This work supports derivation, from logical models, of design structure matrices (DSM's), for which we have a promising but inadequate theory of modularity. We present complexity results and a brute force algorithm. To test for potential software engineering utility, we analyzed the design spaces of Parnas's 1972 information hiding paper, with positive results that were surprising in several ways.

John C. Giordano, Paul F. Reynolds Jr., David C. Brogan. Exploring the Constraints of Human Behavior Representation, March 12, 2004.

Abstract: Human behavior representation (HBR) is an elusive, yet critical goal for many in the simulation community. Requirement specifications related to HBR often exceed current capabilities. There exist a number of tools, techniques and frameworks to model and simulate HBR, but they are constrained and do not generalize well. Even with a vibrant research community, certain HBR characteristics remain beyond our grasp, unless some unforeseen disruptive technologies emerge. We survey the state of the practice for HBR, discuss ongoing research, and identify what appear to be insurmountable challenges. Along with exposing the essential characteristics of HBR and their current level of maturity, we propose a generational framework for considering HBR capabilities. While a number of HBR issues have been addressed in the literature, there is no published discussion explicitly detailing its constraints and limitations. (NOTE: This work will appear in the proceedings of the 2004 ACM/IEEE Winter Simulation Conference.)

Vieira, Sibelius Lellis; Liebeherr, Jorg. An Algorithmic Approach to Topological Design of Service Overlay Networks, May 2004.

Abstract: The Internet still lacks adequate support for QoS applications with real-time requirements. In great part, this is due to the fact that provisioning of end-to-end QoS to traffic that traverses multiple autonomous systems (ASs) requires a level of cooperation between ASs that is difficult to achieve in the current architecture. Recently, service overlay networks have been considered as an approach to QoS deployment that avoids these difficulties. In this study, we address the problem of the topological synthesis of a service overlay network, where endsystems and nodes of the overlay network (provider nodes) are connected through ISPs that supports bandwidth reservations. We express the topology design problem as an optimization problem. Even though the design problem is related to the (in general NP-hard) quadratic assignment problem, we are able to show that relatively simple heuristic algorithms can deliver results that are sometimes close to the optimal solution.

Huang, Chengdu; Abdelzaher, Tarek F.. Towards Content Distribution Networks with Latency Guarantees, April 15, 2004.

Abstract: This paper investigates the performance of a content distribution network designed to provide bounded content access latency. Content can be divided into multiple classes with different configurable per-class delay bounds. The network uses a simple distributed algorithm to dynamically select a subset of its proxy servers for different classes such that a global per-class delay bound is achieved on content access. The content distribution algorithm is implemented and tested on PlanetLab, a world-wide distributed Internet testbed. Evaluation results demonstrate that despite Internet delay variability, subsecond delay bounds (of 200-500ms) can be guaranteed with a very high probability at only a moderate content replication cost. The distribution algorithm achieves a 4 to 5 fold reduction in the number of response-time violations compared to prior content distribution approaches that attempt to minimize average latency. To the authors' knowledge, this paper presents the first wide-area performance evaluation of an algorithm designed to bound maximum content access latency, as opposed to optimizing an average performance metric.

Sankaranarayanan, Karthik; Skadron, Kevin. Profile-Based Adaptation for Cache Decay: A Technical Report, April 8, 2004.

Abstract: Cache decay is a set of leakage-reduction mechanisms that put cache lines that have not been accessed for a specific duration into a low-leakage standby mode. This duration is called the decay interval, and its optimal value varies across applications. This paper provides an extended discussion of the results previously presented in our journal paper. It describes an adaptation technique that analytically finds the optimal decay interval through profiling, and shows that the most important variables required for finding the optimal decay interval can be estimated with a reasonable degree of accuracy using profiling. This work explicitly trades off the leakage power saved in putting both the live and dead lines into standby mode, against its performance and energy costs. It achieves energy savings close to what can be obtained with an omniscient choice of per-benchmark optimal decay interval.

Huang, Wei; Stan, Mercia R.; Skadron, Kevin; Sankaranarayanan, Karthik; Ghosh, Shougata; Velusamy, Sivakumar. Compact Thermal Modeling for Temperature-Aware Design, April 2004.

Abstract: Thermal design in sub-100nm technologies has become one of the major challenges to the CAD community. Thermal effects on design aspects such as performance, power and reliability have to be thoroughly and seriously considered during the entire design flow in order to achieve faster design convergence and optimal design. This paper expands the discussion in our conference paper (DAC 2004). We first introduce the idea of temperature-aware design. We then propose a compact thermal model which can be integrated with modern CAD tools to achieve a temperature-aware design methodology. Finally, we use the compact thermal model in a case study of microprocessor design to show the importance of using temperature as a guideline for the design. The results from our thermal model show that a temperature-aware design approach can provide more accurate estimations, and therefore better decisions and faster design convergence.

Nguyen-Tuong, A.; Grimshaw, A. S.; Wasson, G.; Humphrey, M.; Knight, J. C.. Towards Dependable Grids, March 26, 2004.

Abstract: To date, grids (a form of distributed system) have been used to aggregate resources for performance-starved applications typically resulting from scientific enquiry. Grids should not just be facilitating advances in science and engineering; rather they should also be making an impact on our daily lives by enabling sophisticated applications such as new consumer services and support for homeland defense. For example, grids providing financial services should be seamlessly interconnected with grids that support telecommunications, and grids providing health care services should be seamlessly interconnected with both financial services and telecommunications. This is not possible today because the poor grid dependability — which is tolerated by scientific users — would be unacceptable in critical infrastructure applications. This project aims at correcting this problem by developing technology that will allow grids to be used to provide services upon which society can depend. Grids must be engineered both to achieve high dependability and to permit assurance that high dependability has been achieved.

Nagaraddi, Prashant; Yu, Z.; Stankovic, J.; He, Z.. VEST User's Manual, June 17, 2004.

Abstract: VEST (Virginia Embedded Systems Toolkit) provides an environment for constructing and analyzing component-based distributed real-time embedded systems. In this manual, we describe the process of installing and running VEST. It includes descriptions of all the features of the tool including the various analyses that can be performed on VEST designs. A comprehensive tutorial on learning the basics of using VEST is also provided.

Christin, Nicolas; Liebeherr, Jorg; Abdelzaher, Tarek F.. Enhancing Class-Based Service Architectures with Adaptive Rate Allocation and Dropping Mechanisms, March 25, 2004.

Abstract: Class-based service differentiation can be realized without resource reservation, admission control and traffic policing. However, the resulting service guarantees are only relative, in the sense that guarantees given to a flow class at any time are expressed with reference to the service given to other flow classes. While it is, in principle, not feasible to provision for absolute guarantees (i.e., to assure lower bounds on service metrics at all times) without admission control and/or traffic policing, we will show in this paper that such a service can be reasonably well emulated using adaptive rate allocation and dropping mechanisms at the link schedulers of routers. We name the resulting type of guarantees best-effort bounds. We propose mechanisms for link schedulers of routers that achieve these and other guarantees by adjusting the drop rates and the service rate allocations of traffic classes to current load conditions. The mechanisms are rooted in control theory and employ adaptive feedback loops. We demonstrate that these mechanisms can realize many recently proposed approaches to class-based service differentiation. The effectiveness of the proposed mechanisms are evaluated in simulation experiments as well as in measurement experiments of a kernel-level implementation on FreeBSD PC-routers with multiple 100Mbps Ethernet interfaces.

Lu, Zhijian; Huang, Wei; Ghosh, Shougata ; Lach, John; Stan, Mircea ; Skadron, Kevin. Analysis of Temporal and Spatial Temperature Gradients for IC Reliability, March 22, 2004 (withheld until March 4, 2005).

Abstract: One of the most common causes of IC failure is interconnect electromigration (EM), which exhibits a rate that is exponentially dependent on temperature. As a result, EM rate is one of the major determinants of the maximum tolerable operating temperature for an IC and of resulting cooling costs. Previous EM models have assumed a uniform, typically worst-case, temperature. This paper presents a model that accounts for temporal and spatial variations in temperature, and shows that accounting for these variations can dramatically improve interconnect lifetime prediction accuracy. We also show that the same modeling approach applies to temperature-related gate-oxide breakdown, another common cause of IC failure. We then propose that by modeling expected lifetime as a resource that is consumed over time at a rate dependent on temperature, substantial design margin can be reclaimed. For example, for a fixed target lifetime, intermittent higher operating temperatures and performance can be tolerated if compensated by lower temperatures at other times during the product's lifetime. This approach offers higher overall performance and/or lower cooling costs than a standard design methodology that uses a worst-case temperature criterion for reliability analysis. This report supersedes TR CS-2003-21 and TR CS-2004-07.

Huang, Wei; Lu, Zhijian; Ghosh, Shougata; Lach, John; Stan, Mircea ; Skadron, Kevin. The Importance of Temporal and Spatial Temperature Gradients in IC Reliability Analysis, January 6, 2004 (withheld until March 23, 2004).

Abstract: Existing IC reliability models assume a uniform, typically worst-case,operating temperature, but temporal and spatial temperature variations affect expected device lifetime. This paper presents a model that accounts for temperature gradients, dramatically improving interconnect and gate-oxide lifetime prediction accuracy. By modeling expected lifetime as a resource that is consumed over time at a temperature-dependent rate, substantial design margin can be reclaimed and/or less expensive cooling systems may be used.

Frost, Christopher; Peck, Michael; Evans, David. Pancakes, Puzzles, and Polynomials: Cracking the Cracker Barrel, March 2004.

Abstract: The Cracker Barrel peg game is a simple, one-player game commonly found on tables at pancake restaurants. In this paper, we consider the computational complexity of the Cracker Barrel problem. We show that a variation of a generalization of the problem is NP-complete.

Evans, David; Peck, Michael. Simulating Critical Software Engineering, February 2004.

Abstract: One goal of many introductory software engineering courses is to simulate realistic software engineering. Unfortunately, many of the practical constraints of typical courses are antithetical to that goal: instead of working in large teams on large projects, dealing with changing requirements and maintaining programs over many years, courses generally involve students working alone or in small teams with short projects than end the first time the program works correctly on some selected input. Of course, it is impossible (and undesirable) to carry out full industrial software development within the context of a typical university course. However, it is possible to simulate some aspects of safety critical software engineering in an introductory software engineering course. This paper describes an approach to teaching introductory software engineering that focuses on using lightweight analysis tools to aid in producing secure, robust and maintainable programs. We describe how assignments were designed that integrate analysis tools with typical software engineering material and reports on results from an experiment measuring students understanding of program invariants.

Lu, Zhijian; Stan, Mircea ; Lach, John; Skadron, Kevin. Interconnect Lifetime Prediction for Temperature-Aware Design, November 26, 2003.

Abstract: Thermal effects are becoming a limiting factor in high-performance circuit design due to the strong temperature-dependence of leakage power, circuit performance, IC package cost and reliability. Temperature-aware design tries to maximize performance under a given thermal envelope through various static and dynamic approaches. While existing interconnect reliability models assume a constant temperature, this paper presents a technique for probabilistically estimating interconnect lifetime for any time-varying temperature profile. With this formulation, interconnect lifetime can be modeled as a resource that is consumed over time, with the rate of consumption being a function of temperature. As a result, designers may be more aggressive in the temperature profiles that are allowed on a chip instead of using static worst-case assumptions. For example, performance (hence power and temperature) may be increased beyond what is allowed by worst-case restrictions for short periods as long as the increase is compensated for later by lower activity. With this model, temperature-aware designs will achieve higher overall performance while satisfying lifetime requirements.

Li, Chengzhi; Burchard, Almut; Liebeherr, Jorg. A Network Calculus with Effective Bandwidth, November 2003.

Abstract: This paper establishes a link between two principal tools for the analysis of network traffic, namely, effective bandwidth and network calculus. It is shown that a general formulation of effective bandwidth can be expressed within the framework of a probabilistic version of the network calculus, where both arrivals and service are specified in terms of probabilistic bounds. By formulating well-known effective bandwidth expressions in terms of probabilistic envelope functions, the developed network calculus can be applied to a wide range of traffic types, including traffic that has self-similar characteristics. As applications, probabilistic lower bounds are presented on the service given by three different scheduling algorithms: Static Priority (SP), Earliest Deadline First (EDF), and Generalized Processor Sharing (GPS). Numerical examples show the impact of the traffic models and the scheduling algorithm on the multiplexing gain in a network.

CS-2003-19 (Not available on-line)
Co, Michele; Skadron, K.. Evaluating the Energy Efficiency of Trace Caches, October 28, 2003.

Abstract: This paper presents the preliminary results of a sequential trace cache design space study as well as the results of a path-based next trace predictor design space study. Findings show that sequential trace caches are highly energy and power-efficient. Fetch engines which include a sequential trace cache provide higher performance for approximately equal area at a significant energy and power savings. We find that when examining performance and average fetch power, fetch engines with trace caches may not seem appealing, but when examining energy-delay and energy-delay-squared, the benefits of a trace cache become clear. Even if average fetch power is increased due to the increased fetch engine area, the energy-efficiency is still improves with a trace cache due to faster execution and more opportunities for clock gating, making the trace cache superior in terms of energy-delay and energy-delay-squared products. Our exploration of the Jacobson et al. path-based next trace predictor design space indicates that a hybrid next trace predictor consisting of a 1KB correlating table, 8KB secondary table, and 4-entry return history stack would be the best choice in terms of power, performance, and energy. [This report is temporarily unavilable. Please contact micheleco@cs.virginia.edu for further details.]

Hirst, Kevin R.; Haskins, John W. Jr.; Skadron, Kevin. dMT: Inexpensive Throughput Enhancement in Small-Scale Embedded Microprocessors with Differential Multithreading: Extended Results, Oct. 2003.

Abstract: This paper examines differential multithreading (dMT) as an attractive organization for increasing throughput in simple, small-scale, pipelined processors like those used in embedded environments. dMT copes with pipeline stalls due to hazards and data- and instruction-cache misses by using duplicated pipeline registers to run instructions from an alternate thread. Results show that dMT boosts throughput substantially and can in fact replace dynamic branch prediction or can be used to reduce the sizes of the instruction and data caches. This report expands upon [1] by presenting extended results for additional dMT configurations.

Li, Yingmin; Skadron, Kevin; Stan, Mircea, R.. New Findings on Using Queue Occupancy to Integrate Runtime Power-Saving Techniques Across the Pipeline, July 20, 2003.

Abstract: This paper provides new insights on how to integrate power-saving techniques by using queue occupancies to dynamically match the power-saving modes of various pipeline stages with the current instruction throughput. (This paper focuses on fetch, decode, integer execution, and data cache.) Architects have proposed many runtime power-saving techniques, most of which reduce power dissipation in a single microarchitectural unit. But very little work has been done to integrate these disparate techniques to ensure that they cooperate rather than interfering with each other. We use both queuing theory and experimental results to justify the use of superscalar decoupling queues to guide dynamic control of power settings. This permits integrated power control for multiple units across the pipeline, with minimal negative interaction, by matching the throughput of each stage and the application's current instruction-level parallelism. Our findings verify but also improve upon those in previous work by Semerraro et al. In particular, our approach is robust in jumping out of the bad power modes configuration incurring radical performance degradation, and our approach allows the fetch stage (a significant source of power dissipation) to realize power savings, something that prior integrated, queue-based techniques have not been able to accomplish.

Hill, Jonathan; Knight, John C.. Selective Notification: Combining Forms of Decoupled Addressing for Internet-Scale Command and Alert Dissemination, April 11, 2003.

Abstract: By using an information survivability control system, the survivability of critical networked information systems can be enhanced using a variety of fault-tolerance mechanisms. Essential to the effective implementation of such mechanisms is communication from the error detection component to the various application nodes in the network. In this paper, we introduce a technique called Selective Notification for the communication of commands and alerts in very large distributed systems. The technique combines intentional addressing, content addressing and sender qualification in a single decoupled event-delivery mechanism. We show that effective targeted command and alert dissemination is achievable, and that Selective Notification allows systems to apply a wide range of event connectivity policies. We present details of an implementation of Selective Notification and the results of performance assessment experiments. Based on our preliminary performance data, we conclude that Selective Notification can be used to support survivability architectures in Internet-sized systems.

Larochelle, David; Rosasco, Nicholas. Towards a Model of the Costs of Security, June 2003.

Abstract: We present a simple information security model to determine why, historically, the level of security has not increased despite numerous technical advances. In our model, the software design process involves trade-offs between security and functionality. Developers choose points in the design space corresponding to certain levels of security and functionality. If development resources, such as number of developers, time for completion, etc., are fixed, there is an implicit trade-off between security and functionality. We refer to the set of points that represent the maximum possible security given a certain level of functionality as the protection possibilities frontier (PPF). Technical advances push back the PPF expanding the set of accessible points in the design space potentially allowing both increased security and increased functionality. But historically this has not been sufficient to result in increased security. Instead almost all of the technical advancement is used to increase functionality. We examine how technical advances affect the marginal cost of security in terms of sacrificed functionality and classify technical advances into 3 categories: security neutral, security hostile, and security enhancing. In order for the level of security to increase, security enhancing technical advances must offset security hostile technical advances. We also observe that producing security enhancing technologies is surprisingly difficult. Even advances in information security technologies often result in the ability to take additional risks rather than increased security. Additionally we briefly examine user preferences, which to a large extant drive the actions of developers. We suggest that the lack of security cannot be explained purely by consumer apathy and that limited product availability and network externalities also contribute.

Greenwell, William S.; Knight, John C.. What Should Aviation Safety Incidents Teach Us?, March 20, 2003.

Abstract: Accidents and incidents involving safety-critical software systems often provide lessons to the systems users and designers, to industry, and to the software engineering community at large. Proper identification and documentation of these lessons is critical in order to prevent the recurrence of an untoward event. In this paper we examine two commercial aviation incidents involving failures of safety-critical software systems. Based on our analysis of the incidents and the official investigations that followed, we conclude that the aviation community is missing important lessons regarding safety-critical software systems, especially concerning the broad role these systems play in preserving the safety of commercial air travel. This is primarily because incidents involving such systems are not being investigated and documented with sufficient rigor to identify these lessons and disseminate them throughout the aviation community effectively.

Skadron, K.; Stan, M. R.; Huang, W.; Velusamy, S.; Sankaranarayanan, K.; Tarjan, D.. Temperature-Aware Microarchitecture: Extended Discussion and Results, April 2003.

Abstract: With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package's capacity is exceeded. Evaluating such techniques, however, requires a thermal model that is practical for architectural studies. This paper expands upon the discussion and results that were presented in our conference paper. It describes "HotSpot", an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package. Validation was performed using finite-element simulation. The paper also introduces several effective methods for dynamic thermal management (DTM): "temperature-tracking" frequency scaling, localized toggling, and migrating computation to spare hardware units. Modeling temperature at the microarchitecture level also shows that power metrics are poor predictors of temperature, that sensor imprecision has a substantial impact on the performance of DTM, and that the inclusion of lateral resistances for thermal diffusion is important for accuracy.

Zhang, Y.; Parikh, D.; Sankaranarayanan, K.; Skadron, K.; Stan, M. R.. HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects, March 2003.

Abstract: This report introduces HotLeakage, an architectural model for subthreshold and gate leakage that we have developed here at the University of Virginia. The most important features of HotLeakage are the explicit inclusion of temperature, voltage, gate leakage, and parameter variations, and the ability to recalculate leakage currents dynamically as temperature and voltage change due to operating conditions, DVS techniques, etc. HotLeakage provides default settings for 180nm through 70nm technologies for modeling cache and register files, and provides a simple interface for selecting alternate parameter values and for modeling alternative microarchitecture structures. It also provides models for several extant cache leakage control techniques, with an interface for adding further techniques. HotLeakage is currently a semi-independent module for use with SimpleScalar, but is sufficiently modular that it should be fairly easy to port to other simulators. Because sub-threshold leakage currents are exponentially dependent on temperature and voltage, because gate leakage is growing so rapidly, and because parameter variations can have a profound effect on simulation accuracy, we hope that HotLeakage will serve as a useful tool for microarchitects to more accurately evaluate issues related leakage power. HotLeakage is available for download at http://lava.cs.virginia.edu/HotLeakage.

Blum, Brian; He, Tian; Son, Sang; Stankovic, John. IGF: A State-Free Robust Communication Protocol for Wireless Sensor Networks, April 21, 2003.

Abstract: Wireless Sensor Networks (WSNs) are being designed to solve a gamut of interesting real-world problems. Limitations on available energy and bandwidth, message loss, high rates of node failure, and communication restrictions pose challenging requirements for these systems. Beyond these inherent limitations, both the possibility of node mobility and energy conserving sleep protocols that power down nodes introduce additional complexity to routing protocols that depend on up to date routing or neighborhood tables. Such state-based protocols suffer excessive delay or message loss, as system dynamics require expensive upkeep of these tables. Utilizing characteristics of high node density and location awareness, we introduce IGF, a location-aware routing protocol that is robust and works without knowledge of the existence of neighboring nodes (state-free). We compare our work against established routing protocols to demonstrate the efficacy of our solution when nodes are mobile or periodically sleep to conserve energy. We show that IGF far outperforms these protocols, in some cases delivering close to 100% of the packets transmitted while alternate solutions fail to even find a path between a source and destination. Specifically, we show that our protocol demonstrates a vast improvement over prior work using metrics of delivery ratio, control overhead, and end-to-end delay.

French, James C.; Watson, J. V. S.; Jin, X.; Martin, W. N.. Using Multiple Image Representations to Improve the Quality of Content-Based Image Retrieval, March 2003.

Abstract: Content-based image retrieval (CBIR) has been the object of considerable study since the early 90s. Much effort has gone into characterizing the "content" of an image for the purpose of subsequent retrieval. The present study seek to capitalize on this work and to extend it by employing content-analysis of multiple representations of an image which we term multiple viewpoints or channels. The idea is to place each image in multiple feature spaces and then effect retrieval by querying each of these spaces and merging the several responses. We show that a simple realization of this strategy can be used to boost the retrieval effectiveness of conventional CBIR.

He, Tian; Huang, Chengdu; Blum, Brian; Stankovic, John A.; Abdelzaher, Tarek. Range-Free Localization Schemes for Large Scale Sensor Networks, April 21, 2003.

Abstract: Wireless Sensor Networks have been proposed for a multitude of location-dependent applications. For such systems, the cost and limitations of the hardware on sensing nodes prevent the use of range-based localization schemes that depend on absolute point-to-point distance estimates. Because coarse accuracy is sufficient for most sensor network applications, solutions in range-free localization are being pursued as a cost-effective alternative to more expensive range-based approaches. In this paper, we present APIT, a novel localization algorithm that is range-free. We show that our APIT scheme performs best when an irregular radio pattern and random node placement are considered, and low communication overhead is desired. We compare our work via extensive simulation, with three state-of-the-art range-free localization schemes to identify the preferable system configurations of each. In addition, we study the effect of location error on routing and tracking performance. We show that routing performance and tracking accuracy are not significantly affected by localization error when the error is less than 0.4 times the communication radio radius.

Christin, N.; Liebeherr, J.. Marking Algorithms for Service Differentiation of TCP Traffic, February 21, 2003.

Abstract: Class-based service architectures for quality-of-service (QoS) differentiation typically provide loss, throughput, and delay differentiation. However, proposals for class-based QoS differentiation generally do not account for the needs of TCP traffic, which are characterized by a coupling of packet losses and achievable throughput.Ignoring this coupling may result in poor service differentiation at the microflow level. This paper shows how Explicit Congestion Notification (ECN) can be used to achieve service differentiation for TCP traffic classes at the microflow level. We present a traffic marking algorithm for routers, which, if used in conjunction with ECN, regulates thet ransmission rate of TCP sources in such a way that packet dropsd ue to buffer overflows are avoided. We demonstrate how the algorithm can be integrated in a service architecture with absolute and proportional QoS guarantees. Simulation results illustrate the effectiveness of the presented algorithms at avoiding packet losses and regulating traffic for meeting service guarantees, and provide a comparison with other algorithms proposed in the literature.

He, Tian ; Blum, Brian M. ; Stankovic, John A ; Abdelzaher, Tarek. AIDA: Application Independent Data Aggregation in Wireless Sensor Networks, Jan 12, 20003.

Abstract: Sensor networks, a novel paradigm in distributed wireless communication technology, have been proposed for use in various applications including military surveillance and environmental monitoring. These systems could deploy heterogeneous collections of sensors capable of observing and reporting on various dynamic properties of their surroundings in a time sensitive manner. Such systems suffer bandwidth, energy, and throughput constraints that limit the quantity of information transferred from end to end. These factors cou-pled with unpredictable traffic patterns and dynamic network topologies make the task of designing optimal protocols for such networks difficult. Mechanisms to perform data centric aggregation utilizing application specific knowledge provide a means to augmenting throughput, but have limitations due to their lack of adap-tation and reliance on application specific decisions. We therefore propose a novel aggregation scheme that adaptively performs application independent data aggregation in a time sensitive manner. Our work isolates aggregation decisions into a module that resides between the network and the data link layer and does not require any modifications to the currently existing MAC and network layer protocols. We take advantage of queuing delay and the broadcast nature of wireless communication to concatenate network units into an ag-gregate using a novel adaptive feedback scheme to schedule the delivery of this aggregate to the MAC layer for transmission. In our evaluation we show that end-to-end transmission delay is reduced by as much as 80% under heavy traffic loads. Additionally, we show as much as a 50% reduction in transmission energy con-sumption with the addition of only 2 bytes of header overhead per network unit. Theoretical analysis, simula-tion, and a test-bed implementation on Berkeley¡¯s MICA motes are provided to validate our claims.

Highley, Timothy; Reynolds, Paul. Marginal Cost-Benefit Analysis for Predictive File Prefetching, January, 2003.

Abstract: File prefetching can reduce file access latencies and improve overall performance. Prefetching can be especially important in on-line software on demand, where network latencies create unacceptable delays. Prefetching involves predicting future accesses and establishing when/whether to prefetch, based on future access predictions. Cost-benefit analysis (CBA) addresses when/whether to prefetch and it addresses the interaction between prefetching and caching. CBA weighs the expected benefits of file prefetching and the cost of expected buffer usage. We describe 1-Marginal CBA, an approach that employs probabilistic predictions, as opposed to deterministic hints. We present a probabilistically optimal, though intractable, algorithm, Opt, for a representative prediction model, and demonstrate that any other optimal algorithm under that model will also be intractable. We argue that in many circumstances 1-Marginal and Opt will make the same decisions. Finally, we present simulation results in which 1-Marginal reduced I/O time by an average of 19% and a maximum of 49% over other prefetching schemes in the literature, even when using the same predictve prediction model, and demonstrate that any other optimal algorithm under that model will also be intractable. We argue that in many circumstances 1-Marginal and Opt will make the same decisions. Finally, we present simulation results in which 1-Marginal reduced I/O time by an average of 19% and a maximum of 49% over other prefetching schemes in the literature, even when using the same predictor. Since the cost of 1-Marginal is comparable to that of other published algorithms the improvement is real.

Kang, K.D.; Son, S.H.; Stankovic, J.A.. Specifying and Managing Quality of Real-Time Data Services, September 18, 2002.

Abstract: The demand for real-time data services is increasing in many applications including e-commerce, agile manufacturing, and telecommunication network management. In these applications, it is desirable to execute transactions within their deadlines, i.e., before the real-world status changes, using fresh (temporally consistent) data. However, it is challenging to meet these fundamental requirements due to dynamic workloads and data access patterns in these applications and potential conflicts between transaction timeliness and data freshness requirements. In this paper, we define average/transient deadline miss ratio and new data freshness metrics to specify the required quality of real-time data services. We also present a novel QoS management architecture to support the required QoS even in the presence of unpredictable workloads and access patterns. To prevent overload and support the required QoS, the presented architecture applies feedback control, admission control, and flexible freshness management schemes. A simulation study shows that our QoS-aware approach can support the required QoS, whereas baseline approaches fail to support the required miss ratio and/or freshness. Further, our approach shows a comparable performance to the theoretical oracle that is privileged by a complete future knowledge of data accesses.

Liebeherr, Jorg; Patek, Stephen D.; Burchard, Almut. Statistical Per-Flow Service Bounds in a Network with Aggregate Provisioning, July 2002.

Abstract: Scalability concerns of QoS implementations have stipulated service architectures where QoS is not provisioned separately to each flow, but instead to aggregates of flows. This paper determines stochastic bounds for the service experienced by a single flow when resources are managed for aggregates of flows and when the scheduling algorithms used in the network are not known. Using a recently developed statistical network calculus, per-flow bounds can be calculated for backlog, delay, and the burstiness of output traffic.

Zhong, Weilin; Evans, David. When Ants Attack: Security Issues for Stigmergic Systems, April, 2002.

Abstract: Stigmergic systems solve global problems by using indirect communication mediated by an environment. Because they are localized and dynamic, stigmergic systems are self-organizing, robust and adaptive. These properties are useful for creating survivable systems, but stigmergic systems also raise new security concerns. Indirect communication makes systems more vulnerable in an open and hostile environment, and feedback mechanisms common to stigmergic algorithms can be exploited by attackers. In this paper we use AntNet, an adaptive routing algorithm inspired by biological ant foraging, to explore some of the security issues for stigmergic systems. We identify possible attacks and analyze their potency. We propose and evaluate mechanisms for defending against these attacks.

Larochelle, David; Scheidt, Karl; Sullivan, Kevin. Join Point Encapsulation, January 2003 (posted 16 June 2004).

Abstract: At the heart of aspect-oriented programming is the exposure of certain phenomena in the execution of one set of program elements to behavioral modifications specified by other elements. The phenomena are join points. The modifying elements are aspects. The problem that we address is that current aspect-oriented languages do not provide adequate means to control the exposure of join points for behavioral modification by aspects. Rather, these languages define certain classes of join points (e.g., method calls) and expose all instances thereof. In a nutshell, then, current languages lack join point encapsulation mechanisms. We map a solution space and describe one proof-of-concept design that we have implemented for AspectJ. A key feature of out design is that it is, itself, aspect-oriented: We use AspectJ's pointcut language to identify cross-curring sets of join points to be encapsulated. The potential benefits of such a modularity-supporting mechanism include improved ease of reasoning about program behavior and the ability to assure the absence of side-effects through enforced non-interference by aspect code, and improved ease of design evolution.

Hill, Jonathan C.; Knight, John C.; Crickenberger, Aaron M.; Honhard, Richard. Publish and Subscribe with Reply, October 23, 2002.

Abstract: Reply for Publish and Subscribe allows receivers of a publication to reply to the publisher. We demonstrate that Reply is a natural, efficient, and useful component of Publish/Subscribe. It is natural because it maintains the weakest possible coupling between senders and receivers. It is efficient because it stores computations discarded in publication forwarding, later applying them to channel replies. Most importantly, it is useful because it increases the domain of applications for which Publish/Subscribe is suited. This paper includes discussion of Reply’s utility, introduction of two algorithms with differing state storage and capabilities, their analysis for worst-case conditions, modeling of required resources, and presentation of a modular implementation of Reply for distributed Publish/Subscribe systems.

French, James C.; Martin, Worthy M.; Watson, James V. S.. A Qualitative Examination of Content-Based Image Retrieval Behavior using Systematically Modified Test Images, August, 2002.

Abstract: We describe the outcome of an effort to understand the behavior of content-based image retrieval (CBIR) technology by examining the behavior of a CBIR system in response to carefully constructed input query images. This work is preliminary and only considers a single CBIR system, SIMPLIcity. We chose this particular system for expediency. It is our intention to develop a methodology suitable for examining any CBIR system.

Kang, K. D.; Son, S. H.; Stankovic, J. A.. STAR: Secure Real-Time Transaction Processing with Timeliness Guarantees, August 19, 2002.

Abstract: Real-time databases are needed in security-critical applications, e.g., e-commerce, agile manufacturing, and military applications. In these applications, transactions and data items can be classified into several security levels according to their clearance and sensitivity levels. It is essential for real-time databases to prevent illegal direct/indirect transfer of sensitive data, e.g., secret trade, manufacturing, or operational data, between transactions belonging to different security levels. Further, transactions should be committed within their deadlines, i.e., before the market, manufacturing, or battle field status changes. In this paper, we present a novel real-time database architecture, in which illegal direct/indirect inter-level information flows are prevented while controlling the deadline miss ratio for admitted transactions to remain below a certain threshold. In our approach, mandatory access control mechanisms are applied for security purposes. QoS management, admission control, and feedback control schemes are applied to support certain guarantees on miss ratio against potential overload and data conflicts. A detailed simulation study shows that our approach can support the specified miss ratio preventing illegal information flows even in the presence of unpredictable workloads and varying degrees of data contention, whereas baseline approaches fail.

Barcella, M.; Huang, W.; Skadron, K.; Stan, M. R.. Architecture-Level Compact Thermal R-C Modeling, July 2002.

Abstract: Architectural thermal modeling of high performance VLSI systems is of special importance in order to achieve reliable and power-saving system designs. In this paper, we present a novel, computation-efficient methodology for architectural level compact thermal modeling. The RC thermal models derived from this methodology are based on the layout of micro-architectural level functional blocks such as caches, execution units, and register files, etc. We utilize the duality between thermal and electrical circuit to perform electro-thermal simulations of both steady state and transient thermal responses of different blocks that dissipate different amounts of power. Temperature profiles derived from this modeling technique are compared with those derived from computation-intensive FEM/FDM tools. The model is both extremely fast and accurate, less than 4% error. This modeling methodology can then be adopted for architectural level VLSI design tools to perform fast electro-thermal simulations.

Lu, Zhijian; Lach, John; Stan, Mircea R.; Skadron, Kevin. Alloyed Branch History: Combining Global and Local Branch History for Robust Performance, July 2002.

Abstract: This paper introduces "alloyed" prediction, a new two-level predictor organization that combines global and local history in the same structure, combining the advantages of two-level predictors and hybrid predictors. The alloyed organization is motivated by measurements showing that "wrong-history mispredictions" are even more important than conflict-induced mispredictions. Wrong-history mispredictions arise because current two-level, history-based predictors provide only global or only local history. The contribution of wrong-history to the overall misprediction rate is substantial because most programs have some branches that require global history and others that require local history. This paper explores several ways to implement alloyed prediction, including the previously proposed bi-mode organization. Simulations show that "mshare" is the best alloyed organization among those we examine, and that mshare gives reliably good prediction compared to bimodal ("two-bit"), two-level, and hybrid predictors. The robust performance of alloying across a range of predictor sizes stems from its ability to attack wrong-history mispredictions at even very small sizes without subdividing the branch predictor into smaller and less effective components.

Haskins, Jr., J. W.; Skadron, K.. Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation, July 12, 2002.

Abstract: This paper explores techniques for speeding up sampled microprocessor simulations by exploiting the observation that of the memory references that precede a sample, references that occur nearest to the sample are more likely to be germane during the sample itself. This means that accurately warming up simulated cache and branch predictor state only requires that a subset of the memory references and control-flow instructions immediately preceding a simulation sample need to be modeled. Our technique takes measurements of the memory reference reuse latencies (MRRLs) and uses these data to choose a point prior to each sample to engage cache hierarchy and branch predictor modeling. By starting cache and branch predictor modeling late in the pre-sample instruction stream, rather than modeling cache and branch predictor interactions for all pre-sample instructions we are able to save the time cost of modeling them. This savings reduces overall simulation running times by an average of 25%, while generating an average error in IPC of less than 0.7%.

Drewry, Darren T.; Gu, Lin; Hocking, A. Benjamin; Kang, Kyoung-Don; Schutt, Robert C.; Taylor, Christopher M.; Pfaltz, John L.. Current State of Data Mining, May 9, 2002.

Abstract: This report is a compendium of results uncovered in CS 851, spring semester 2002. Our goal is to provide direction to other graduate students who want to explore the fascinating field of "data mining." Be aware that this represents the topics and the papers that we found most interesting. We made no attempt to be either `fair' or to be `comprehensive' as in a Computing Reviews article. Instead we tried to pick just one or two references in each of our interest areas that were both relatively recent and, in our opinion, worth reading. Their bibliographies can direct a reader to other sources.

Kang, K. D.; Son, S. H.; Stankovic, J. A.; Abdelzaher, T. F.. FLUTE: A Flexible Real-Time Data Management Architecture for Performance Guarantees, June 14, 2002.

Abstract: Efficient real-time data management has become increasingly important as real-time applications become more sophisticated and data-intensive. In data-intensive real-time applications, e.g., online stock trading, agile manufacturing, sensor data fusion, and telecommunication network management, it is essential to execute transactions within their deadlines using fresh (temporally consistent) sensor data, which reflect the current real-world states. However, it is very challenging to meet this fundamental requirement due to potentially time-varying workloads and data access patterns in these applications. Also, user service requests and sensor updates can compete for system resources, thereby making it difficult to achieve guaranteed real-time database services in terms of both deadline miss ratio and data freshness. In this paper, we present a novel real-time data management architecture, which includes feedback control, admission control, and flexible update schemes, to enforce the deadline miss ratio of admitted user transactions to be below a certain threshold, e.g., 1%. At the same time, we maintain data freshness even in the presence of unpredictable workloads and access patterns. In a simulation study, we illustrate the applicability of our approach by showing that stringent performance specifications can be satisfied over a wide range of workloads and access patterns. We also show that our approach can significantly improve performa deadline miss ratio of admitted user transactions to be below a certain threshold, e.g., 1%. At the same time, we maintain data freshness even in the presence of unpredictable workloads and access patterns. In a simulation study, we illustrate the applicability of our approach by showing that stringent performance specifications can be satisfied over a wide range of workloads and access patterns. We also show that our approach can significantly improve performance compared to baselines.

Drewry, Darren T.; Reynolds, Paul F.; Emanuel, William R.. Optimization as a Tool for Consistency Maintenance in Multi-resolution Simulation, May 16, 2002.

Abstract: The need for new approaches to the consistent simulation of related phenomena at multiple levels of resolution is great. While many fields of application would benefit from a complete and approachable solution to this problem, such solutions have proven extremely difficult. We present a multi-resolution simulation methodology that uses numerical optimization as a tool for maintaining external consistency between models of the same phenomena operating at different levels of temporal and/or spatial resolution. Our approach follows from previous work in the disparate fields of inverse modeling and spacetime constraint-based animation. As a case study, our methodology is applied to two environmental models of forest canopy processes that make overlapping predictions under unique sets of operating assumptions, and which execute at different temporal resolutions. Experimental results are presented and future directions are addressed.

Highley, Timothy; Reynolds, Paul F.; Vellanki, Vivekanand. Absolute Cost-Benefit Analysis for Predictive File Prefetching, April 18, 2002.

Abstract: Cost-benefit analysis attempts to balance the benefits of file prefetching against the costs of giving up a buffer from the cache. This is a comprehensive approach to prefetching. It simultaneously answers the questions of what and when/whether to prefetch, and also addresses the interaction between prefetching and caching. Original research on cost-benefit analysis relied on hints about prefetching that were supplied by the program, though other research has considered the more general case of programs without hints. In this paper, we revisit previous research on cost-benefit analysis without hints, present new estimators for the costs and benefits of prefetching under a system model without hints, and generalize the system model to consider probability trees that may have the same block lie along multiple paths of the tree. We present a distinction between absolute and marginal cost-benefit analysis, two different approaches to file prefetching that previous researchers have used. The difference between the two is a subtlety that has not been previously noted or investigated. Our new estimators are based on absolute cost-benefit analysis, with modifications to address observed weaknesses in the absolute approach.

Lu, Chenyang; Blum, Brian M.; Abdelzaher, Tarek F.; Stankovic, John A.; He, Tian. RAP: A Real-Time Communication Architecture for Large-Scale Wireless Sensor Networks, March 25, 2002.

Abstract: Large-scale wireless sensor networks represent a new generation of real-time embedded systems with significantly different communication constraints from traditional networked systems. This paper presents RAP, a new real-time communication architecture for large-scale sensor networks. RAP provides convenient, high-level query and event services for distributed micro-sensing applications. Novel location-addressed communication models are supported by a scalable and light-weight network stack. We present and evaluate a new packet scheduling policy called velocity monotonic scheduling that inherently accounts for both time and distance constraints. We show that this policy is particularly suitable for communication scheduling in sensor networks in which a large number of wireless devices are seamlessly integrated into a physical space to perform real-time monitoring and control. Detailed simulations of representative sensor network environments demonstrate that RAP significantly reduces the end-to-end deadline miss ratio in the sensor network.

He, Tian; Stankovic, John A.; Lu, Chenyang; Abdelzaher, Tarek. SPEED:A Real-Time Protocol for Sensor Networks, March 1,2002.

Abstract: In this paper, we present a real-time communication protocol, called SPEED, for sensor networks. The protocol provides three types of real-time communication services, namely, real-time unicast, real-time area-multicast and real-time area-anycast. SPEED is specifically tailored to be a stateless, localized algorithm with minimal control overhead. End-to-end real-time communication guarantees are achieved using a novel combination of feedback control and non-deterministic QoS-aware geographic forwarding with a bounded hop count. SPEED is a highly efficient and scalable protocol for the sensor networks where node density is high while the resources of each node are scarce. Theoretical analysis and simulation experiments are provided to validate our claims.

Sang, S. H.; Kang, K. D.. QoS Management in Web-based Real-Time Data Services, April 4, 2002.

Abstract: The demand for real-time data services has been increasing recently. Many e-commerce applications and web-based information services are becoming very sophisticated in their data needs. They span the spectrum from low level status data, e.g., stock prices, to high level aggregated data, e.g., recommended selling/buying point. In these applications, it is desired to process user requests within their deadlines using fresh data, which reflect the current market status. Current web-based data services are poor at processing user requests in a timely manner using fresh data. To address this problem, we present a framework for guaranteed real-time data services in unpredictable environments. We also introduce a possible application of our approach in the distributed environment.

Hill, Jonathan C.. Site-Select Messaging for Distributed Systems, April 2, 2002.

Abstract: A technical description of the expressiveness and cost of Site-Select Messaging implemented on content-driven Publish/Subscribe.

French, J. C.; Martin, W. N.; Olsen, L. M.. Extending the Vocabulary Available for Cross-Disciplinary Searching of Earth Science Data, March, 2002.

Abstract: This paper discusses the development of a prototype search assistant designed to aid cross-disciplinary searching of Earth science data. The goal of the project is to provide aids to help searchers overcome the vocabulary problem: the vocabulary used by the searchers is not the same as that used by the indexers. The work was motivated by vocabulary issues existing in the EOS Data Gateway (EDG) at the time this work began. (The EDG is the interface for searching and ordering Earth science data products from NASA and affiliated centers.) We describe the status of the project and give examples of the techniques that we are using. We also discuss the way in which we are implementing a new search paradigm, multiple viewpoints, within our prototype.

Kang, K. D.; Son, Sang H.; Stankovic, John A.. Service Differentiation in Real-Time Main Memory Databases, January 29, 2002.

Abstract: The demand for real-time database services has been increasing recently. Examples include sensor data fusion, decision support applications, web information services, e-commerce, and data-intensive smart spaces. In these systems, it is essential to execute transactions in time using fresh (temporally consistent) data. Due to the high service demand and stringent timing/data temporal consistency constraints, real-time databases can be overloaded. As a result, users may suffer poor services. Many transaction deadlines can be missed or transactions may have to use stale data. To address these problems, we present a service differentiation architecture. Transactions are classified into several service classes based on their importance. Under overload, different degrees of deadline miss ratio guarantees are provided among the service classes according to their importance. A certain data freshness guarantee is also provided for the data accessed by timely transactions which finish within their deadlines. Feedback control is applied to support the miss ratio and freshness guarantees. In a simulation study, our service differentiation approach shows a significant performance improvement compared to the baseline approaches. The specified miss ratio and freshness are supported even in the presence of unpredictable workloads and data access patterns. Our approach also achieves a relatively low miss ratio for the less privileged service classes, thereby reducing potential starvation.

Haskins, Jr., John W.; Skadron, K.; KleinOsowski, AJ; Lilja, D. J.. Techniques for Accurate, Accelerated Processor Simulation: Analysis of Reduced Inputs and Sampling, January 22, 2002.

Abstract: Detailed execution-driven simulation is an important tool for computer architecture research. It is desirable to drive these simulations with standard benchmark programs that are commonly used to evaluate existing computer systems, such as the SPEC2000 suite. Unfortunately, simulating these benchmark programs to completion using full-detail, cycle-accurate simulation on the designated reference input sets results in intractably long simulation durations. This study evaluates and compares two techniques for combating long simulation times: reduced inputs and sampling. Our objective is to assess the ability of each to reduce simulation running times, while simultaneously minimizing the difference in the results generated by using these techniques relative to the results generated by simulating the benchmark programs to completion using the reference inputs. With the reduced input technique, new input sets are carefully generated by hand to produce run-time characteristics of the benchmark programs that are comparable to the overall characteristics produced when the programs are run with the standard inputs. Sampling, on the other hand, simulates only a small fraction of the program's overall execution in full, cycle-accurate detail using the reference inputs.

Scott, Kevin; Davidson, Jack. Software Security Using Software Dynamic Translation, November 2001.

Abstract: Software dynamic translation (SDT) is a technology that allows programs to be modified as they are running. Researchers have used SDT with good success to build a variety of useful software tools (e.g., binary translators, operating system simulators, low-overhead profilers, and dynamic optimizers). In this paper, we describe how SDT can be used to address the critical problem of providing software security. The paper shows how SDT can simply and effectively implement arbitrary user-specified software safety policies. Unlike static analysis techniques which typically process source code, SDT is applied to binary code. Consequently, SDT can handle untrusted binaries and unsecured libraries from any source. To demonstrate and validate that SDT provides additional security, we have implemented a software security API for Strata, our software dynamic translation infrastructure. The API, while simple, allows clients to implement powerful policies to prevent potential security violations. To illustrate the use of Strata and the security API, the paper provides implementations of several interesting and useful security policies.

Scott, Kevin; Davidson, Jack; Skadron, Kevin. Low-overhead Software Dynamic Translation, July 2001.

Abstract: Software dynamic translation (SDT) is a technology that allows programs to be modified as they are running. The overhead of monitoring and modifying a running program s instructions is often substantial in SDT. As a result SDT can be impractically slow, especially in SDT systems that do not or can not employ dynamic optimization to offset overhead. This is unfortunate since SDT has obvious advantages in modern computing environments and interesting applications of SDT con-tinue to emerge. In this paper we introduce two novel overhead reduction techniques that can improve SDT performance by a factor of three even when no dynamic optimization is performed. To demonstrate the effectiveness of our overhead reduction techniques, and to show the type of useful tasks to which low-overhead, non-optimizing SDT systems might be put, we implemented two dynamic safety checkers with SDT. These dynamic safety checkers perform useful tasks pre-venting buffer-overrun exploits and restricting system call usage in untrusted binaries. Further their performance is similar to, and in some cases better than, state-of-the-art tools that perform the same functions without SDT.

Scott, Kevin; Davidson, Jack. Strata: A Software Dynamic Translation Infrastructure, July 2001.

Abstract: Software dynamic translation is the alteration of a running program to achieve a specific objective. For example, a dynamic optimizer uses software dynamic translation to modify a running program with the objective of making the program run faster. In addition to its demonstrated utility in dynamic optimizers, software dynamic translation also shows promise for producing applications that are adaptable, secure, and robust. In this paper, we describe the design and implementation of an extensible, retargetable infrastructure to facilitate research in applications of software dynamic translation technology. The infrastructure, called Strata, provides the software dynamic translator implementor with a virtual machine model that can be extended to implement specific software dynamic translation functionality. To illustrate the use of Strata to build client applications, the paper describes the Strata implementation of a two dynamic safety checkers and a dynamic instruction scheduler.

Skadron, Kevin; Abdelzaher, Tarek; Stan, Mircea R.. Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management, Nov. 2001.

Abstract: This paper proposes the use of formal feedback control theory as a way to implement adaptive techniques in the processor architecture. Dynamic thermal management (DTM) is used as a test vehicle, and variations of a PID controller (Proportional-Integral-Differential) are developed and tested for adaptive control of fetch "toggling." To accurately test the DTM mechanism being proposed, this paper also develops a thermal model based on lumped thermal resistances and thermal capacitances. This model is computationally efficient and tracks temperature at the granularity of individual functional blocks within the processor. Because localized heating occurs much faster than chip-wide heating, some parts of the processor are more likely to be "hot spots" than others. Experiments using Wattch and the SPEC2000 benchmarks show that the thermal trigger threshold can be set within 0.2 degrees of the maximum temperature and yet never enter thermal emergency. This cuts the performance loss of DTM by 65% compared to the previously described fetch toggling technique that uses a response of fixed magnitude.

Liebeherr, J.; Nahas, M.; Si, W.. Application-Layer Multicasting with Delaunay Triangulation Overlays, December 5, 2001.

Abstract: Application-layer multicast supports group applications without the need for a network-layer multicast protocol. Here, applications arrange themselves in a logical overlay network and transfer data within the overlay. In this paper, we present an application-layer multicast solutionthat uses a Delaunay triangulation as an overlay network topology. An advantage of using a Delaunay triangulation is that it allows each application to locally derive next-hop routing information without requiring a routing protocol in the overlay. A disadvantage of using a Delaunay triangulation is that the mapping of the overlay to the network topology at the network and data link layer may be suboptimal. We present a protocol, called "Delaunay Triangulation (DT)" protocol, which constructs Delaunay triangulation overlay networks. We present measurement experiments of the DT protocol for overlay networks with up to 10,000 members, that are running on a local PC cluster with 100 Linux PCs. The results show that the protocol stabilizes quickly, e.g., an overlay network with 10,000 nodes can be built in just over 30 seconds. The traffic measurements indicate that the average overhead of a node is only a few kilobits per second if the overlay network is in a steady state. Results of throughput experiments of multicast transmissions (using TCP unicast connections between neighbors in the overlay network) show an achievable throughput of approximately 15 Mbps in an overlay with 100 nodes and 2 Mbps in an overlay with 1,000 nodes.

Scott, Kevin; Ramsey, Norman. When Do Match-compilation Heuristics Matter?, February, 2000.

Abstract: Modern, statically typed, functional languages define functions by pattern matching. Although pattern matching is defined in terms of sequential checking of a value against one pattern after another, real implementations translate patterns into automata that can test a value against many patterns at once. Decision trees are popular automata. The cost of using a decision tree is related to its size and shape. The only method guaranteed to produce decision trees of minimum cost requires exponential match-compilation time, so a number of heuristics have been proposed in the literature or used in actual compilers. This paper presents an experimental evaluation of such heuristics, using the Standard ML of New Jersey compiler. The principal finding is that for most benchmark programs, all heuristics produce trees with identical sizes. For a few programs, choosing one heuristic over another may change the size of a decision tree, but seldom by more than a few percent. There are, however, machine-generated programs for which the right or wrong heuristic can make enormous differences: factors of 2-20.

Scott, Kevin. On Proebsting's Law, March 1, 2001.

Abstract: In 1965 Gordon Moore observed that the capacity of semiconductor ICs doubled every 18 to 24 months. This trend, now known as Moore's Law, has held for over 25 years and is responsible for the exponential increase in microprocessor performance over this period. In 1998 Todd Proebsting made a similar-in-spirit, but altogether less optimistic observation about optimizing compilers. His observation, henceforth known as Proebsting's Law, is that improvements to compiler technology double the performance of typical programs every 18 years. Proebsting has suggested an experiment to evaluate the veracity of his observation. This paper presents the results of this experiment and some comments on what Proebsting's Law portends for compiler research.

Christin, Nicolas; Liebeherr, Jorg. The QoSbox: A PC-Router for Quantitative Service Differentiation in IP Networks, November 20, 2001.

Abstract: We describe the design and implementation in UNIX-based PCs of the QoSbox, a configurable IP router that provides per-hop service guarantees on loss, delays and throughput to classes of traffic. There is no restriction on the number of classes or the specific service guarantees each class obtains. The novel aspects of the QoSbox are that (1) the QoSbox does not rely on any external component (e.g., no traffic shaping and no admission control) to enforce the desired service guarantees, but instead, (2) dynamically adapts packet forwarding and dropping decisions as a function of the instantaneous traffic arrivals; also, (3) the QoSbox can enforce both absolute bounds and proportional service guarantees on queueing delays, loss rates, and throughput at the same time. We evaluate the QoSbox in a testbed of PC-routers over a FastEthernet network, and show that the QoSbox is a possible solution allowing for incremental deployment to the problem of providing service differentiation in a scalable manner.

Parikh, Dharmesh; Skadron, Kevin; Zhang, Yan; Barcella, Marco; Stan, Mircea R.. Power Issues Related To Branch Prediction, November 2, 2001.

Abstract: This paper explores the role of branch predictor organization in power/energy/performance tradeoffs for processor design. We find that as a general rule, to reduce overall energy consumption in the processor it is worthwhile to spend more power in the branch predictor if this results in more accurate predictions that improve running time. Two techniques, however, provide substantial reductions in power dissipation without harming accuracy. Banking reduces the portion of the branch predictor that is active at any one time. And a new on-chip structure, the prediction probe detector (PPD), can use pre-decode bits to entirely eliminate unnecessary predictor and BTB accesses. Despite the extra power that must be spent accessing the PPD, it reduces local predictor power and energy dissipation by about 45% and overall processor power and energy dissipation by 5--6%.

Hu, Zhigang; Juang, Philo; Skadron, Kevin; Clark, Douglas W.; Martonosi, Margaret. Applying Decay Strategies to Branch Predictors for Leakage Energy Savings, Oct. 2001.

Abstract: This paper shows that substantial reductions in leakage energy can be obtained by deactivating groups of branch-predictor entries if they lie idle for a sufficiently long time. Decay techniques, first introduced by Kaxiras et al. for caches, work by tracking accesses to cache lines and turning off power to those that lie idle for a sufficiently long period of time (the decay interval). Once deactivated, these lines essentially draw no leakage current. The key trick is in identifying opportunities where an item can be turned off without incurring significant performance or power cost. Branch predictors are, like caches, large array structures with significant leakage; as such, it is natural to consider applying decay techniques to them as well. Applying decay techniques to branch predictors is, however, not straightforward. The overhead for applying decay to individual counters in the predictor is prohibitive, so decay must be applied to groups of predictor entries. The most natural grouping is a row in the square data array used to implement the branch predictor, but then decay will only be successful if entire rows lie idle for sufficiently long periods of time. This paper shows that branch predictors do exhibit sufficient spatial and temporal locality to make decay effective for bimodal, gshare, and hybrid predictors, as well as the branch target buffer. In particular, decay is quite effective when applied intelligently to hybrid predictors, which use two predictors in parallel and are among the most accurate predictor organizations. Hybrid predictors are also especially amenable to decay, because inactive entries in one component can be left inactive if the other component is able to provide a prediction. Overall, this paper demonstrates that decay techniques apply more broadly than just to caches, but that careful policy and implementation make the difference between success and failure in building decay-based branch predictors.

Parikh, Stavan M.; Elder, Matthew C.. Air Traffic Management Overview, October 12, 2001.

Abstract: The Air Traffic Control (ATC) system manages air traffic from various sources throughout the United States National Airspace System (NAS). The mission of air traffic control is to promote the safe, orderly, and expeditious movement of aircraft. The Federal Aviation Administration (FAA) is responsible for providing the NAS infrastructure to enable air operations in the United States, 24 hours a day, 365 days a year. Air traffic control, the primary component of the NAS architecture, is a complex system comprising of many types of facilities, a plethora of hardware and software systems, and complicated interactions between humans and machines. There are two goals that must be met by the Air Traffic Control system: safety and efficiency. To ensure safety, minimum separation must be maintained between aircraft as they pass through the national airspace. Efficient air traffic control, however, requires that separation not be so great as to adversely affect the flow of traffic through the airspace. This document provides a summary of the air traffic management system together with pointers to other sources of more detailed information.

Kang, K.-D.; Son, S. H.; Stankovic, J. A.; Abdelzaher, T. F.. A QoS-Sensitive Approach for Timeliness and Freshness Guarantees in Real-Time Databases, October 9, 2001.

Abstract: The demand for real-time database services has been increasing recently. Examples include sensor data fusion, decision support, web information services, e-commerce, online trading, and data-intensive smart spaces. A real-time database, a core component of many information systems for real-time applications, can be a main service bottleneck. To address this problem, we present a real-time database QoS management scheme that provides guarantees on deadline miss ratio and data freshness, which are considered two fundamental performance metrics for real-time database services. Using our approach, admitted user transactions can be processed in a timely manner and data freshness can be guaranteed even in the presence of unpredictable workloads and data access patterns. A simulation study shows that our QoS-sensitive approach can achieve a significant performance improvement, in terms of deadline miss ratio and data freshness, compared to several baseline approaches. Furthermore, our approach shows a comparable performance to the theoretical oracle that is privileged by the complete future knowledge of data accesses.

Liebeherr, Jorg; Patek, Stephen; Burchard, Almut. A Calculus for End-to-end Statistical Service Guarantees, August 6, 2001 [Revised December 2003].

Abstract: The deterministic network calculus offers an elegant framework for determining delays and backlog in a network with deterministic service guarantees to individual traffic flows. This paper addresses the problem of extending the network calculus to a probabilistic framework with statistical service guarantees. Here, the key difficulty relates to expressing, in a statistical setting, an end-to-end (network) service curve as a concatenation of per-node service curves. The notion of an effective service curve is developed as a probabilistic bound on the service received by an individual flow. It is shown that per-node effective service curves can be concatenated to yield a network effective service curve.

Christin, Nicolas; Liebeherr, Jorg; Abdelzaher, Tarek F.. A Quantitative Assured Forwarding Service, August 8, 2001.

Abstract: The Assured Forwarding (AF) service of the IETF DiffServ architecture provides a qualitative service differentiation between classes of traffic, in the sense that a low-priority class experiences higher loss rates and higher delays than a high-priority class. However, the AF service does not quantify the difference in the service given to classes. In an effort to strengthen the service guarantees of the AF service, we propose a Quantitative Assured Forwarding service with absolute and proportional differentiation of loss, service rates, and packet delays. We present a feedback-based algorithm which enforces the desired class-level differentiation on a per-hop basis, without the need for admission control or signaling. Measurement results from a testbed of FreeBSD PC-routers on a 100 Mbps Ethernet network show the effectiveness of the proposed service, and indicate that our implementation is suitable for networks with high data rates.

Sankaranarayanan, Karthik; Skadron, Kevin. A Scheme for Selective Squash and Re-issue for Single-Sided Branch Hammocks, July 10, 2001.

Abstract: This abstract describes work to minimize re-execution of control independent instructions. This technique differs from prior work in its emphasis on compiler scheduling in order to minimize changes to the hardware of an out-of-order processor. Work so far has focused on single-sided branch hammocks.

Srinivasa, Rashmi; Phan, Tram; Mohanraj,Nisanti; Powell, Allison; French, Jim. Database Selection Using Document and Collection Term Frequencies, May 16, 2000.

Abstract: We examine the impact of two types of information - document frequency (df) and collection term frequency (ctf) - on the effectiveness of database selection. We introduce a family of database selection algorithms based on this information, and compare their effectiveness to two existing database selection approaches, CORI and gGlOSS. We demonstrate that a simple selection algorithm that uses only document frequency information is more effective than gGlOSS, and achieves effectiveness that is very close to that of CORI.

Sullivan, Kevin; Griswold, William G.; Cai, Yuanfang; Hallen, Ben. The Structure and Value of Modularity in Software Design, March 18, 2001.

Abstract: The concept of information hiding modularity is a cornerstone of modern software design thought, but its formulation remains casual and its emphasis on changeability is imperfectly related to the goal of creating value in a given context. We need better models of the structure and value of information hiding, for both their explanatory power and prescriptive utility. We evaluate the potential of a new theory-developed to account for the influence of modularity on the evolution of the computer industry-to inform software design. The theory uses design structure matrices to model designs and real options techniques to value them. To test the potential utility of the theory for software we represent a model software system in its terms-Parnas's KWIC-and evaluate the results. We contribute an extension to design structure matrices and show that the options results are consistent with Parnas's conclusions. Our results suggest that such a theory does have potential to help inform software design.

Srinivasa, Rashmi; Williams, Craig; Reynolds, Paul F.. Distributed Transaction Processing on an Ordering Network, February 28, 2001.

Abstract: The increasing demand for high throughputs in transaction processing systems leads to high degrees of transaction concurrency and hence high data contention. The conventional dynamic two-phase locking (2PL) concurrency control (CC) technique causes system thrashing at high data contention levels, restricting transaction throughput. Optimistic concurrency control (OCC) is an alternative strategy, but OCC techniques suffer from wasted resources caused by repeated transaction restarts. We propose a new technique, ORDER, that enlists the aid of the interconnection network in a distributed database system in order to coordinate transactions. The network in an ORDER system provides total ordering of messages at a low cost, enabling efficient CC. We compare the performance of dynamic 2PL and ORDER, using both an analytical model and a simulation. Unlike previously-proposed models for 2PL, our analytical model predicts performance accurately even under high data contention. We study the effects of various parameters on performance, and demonstrate that ORDER outperforms dynamic 2PL for a wide range of workloads.

Prey, Kevin; French, James C.; Powell, Allison L.; Viles, Charles L.. Inverse Document Frequency and Web Search Engines, February 5, 2001.

Abstract: Full text searching over a database of moderate size often uses the inverse document frequency, idf = log(N/df), as a component in term weighting functions used for document indexing and retrieval. However, in very large databases (e.g. internet search engines), there is the potential that the collection size (N) could dominate the idf value, decreasing the usefulness of idf as a term weighting component. In this short paper we examine the properties of idf in the context of internet search engines. The observed idf values may also shed light upon the indexed content of the WWW. For example, if the internet search engines we survey index random samples of the WWW, we would expect similar idf values for the same term across the different search engines.

O'Neil, Edward K.; French, James C.. A Description of the LAMB Web-Derived Language Model Builder, May 16, 2000.

Abstract: This paper describes the language modeling script constructed for the Spring 2000 Information Retrieval seminar. The script builds on Callan's work of automatically generating language models for text databases by creating language models for internet search engines. An overview of the work, principles and observed properties of language models, the language modeling script (LAMB) and its shortcomings are described.

Monroe, Gary A.; Mikesell, David R.; French, James C.. Determining Stopping Criteria in the Generation of Web-Derived Langua ge Models, May 10, 2000.

Abstract: In this work, we present a small-scale evaluation of two query-based sampling techniques for building language models, using a database comprised of world-wide web documents. We propose a metric by which it is possible to determine when to cease sampling a given web database, and we compare this new metric to other metrics that have been used in previous work to determine the fidelity of sampled language models.

Lu, Chenyang; Abdelzaher, Tarek F.; Stankovic, John A.; Son, Sang H.. A Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers, January 25, 2001.

Abstract: This paper presents the design and implementation of an adaptive architecture to provide relative, absolute and hybrid service delay guarantees for different service classes on web servers under HTTP 1.1. The first contribution of this paper is the architecture based on feedback control loops that enforce delay guarantees for classes via dynamic connection scheduling and process reallocation. The second contribution is our use of feedback control theory to design the feedback loop with proven performance guarantees. In contrast with ad hoc approaches that often rely on laborious tuning and design iterations, our control theory approach enables us to systematically design an adaptive web server with established analytical methods. The design methodology includes using system identification to establish dynamic models for a web server, and using the Root Locus method to design feedback controllers to satisfy performance specifications. The adaptive architecture has been implemented by modifying an Apache web server. Experimental results demonstrate that our adaptive server provides robust delay guarantees even when workload varies significantly. Properties of our adaptive web server also include guaranteed stability, and satisfactory efficiency and accuracy in achieving desired delay or delay differentiation.

Lu, Chenyang; Stankovic, John A.; Tao, Gang; Son, Sang H.. Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms, January 15, 2001.

Abstract: This paper presents a feedback control real-time scheduling (FCS) framework for adaptive real-time systems. An advantage of the FCS framework is its use of feedback control theory (rather than ad hoc solutions) as a scientific underpinning. We apply a control theory based design methodology to systematically design FCS algorithms to satisfy their transient and steady state performance specifications. In particular, we establish a dynamic model and performance analysis of several feedback control scheduling algorithms, which is a major challenge and key step for the control design of adaptive real-time systems. We also generalize a FCS architecture that allows plug-ins of different real-time scheduling policies and QoS optimization algorithms. Based on our model, we identify different types of real-time applications where each FCS algorithm can be applied. Performance evaluation results demonstrate that our analytically tuned FCS algorithms provide robust steady and transient state performance guarantees for periodic and aperiodic tasks even when the task execution time varied considerably from the estimation.

Patek, Stephen D.; Venkateswaran, Raja; Liebeherr, Jorg. Simple Alternate Routing for Differentiated Services Networks, September 29, 2000.

Abstract: Recent work on differentiated services in the Internet has defined new notions of Quality of Service (QoS) that apply to aggregates of traffic in networks with coarse spatial granularity. Most proposals for differentiated services involve traffic control algorithms for aggregate service levels, packet marking and policing, and preferential treatment of marked packets in the network core. The issue of routing for enhancing aggregate QoS has not received a lot of attention. This study investigates the potential benefit of using alternate routing strategies in support of differentiated services. We propose a traffic control scheme, called Simple Alternate Routing, wherein portions of unmarked packet flows can be assigned to alternate paths through a Service Provider Network (SPN) in response to congestion feedback information. The scheme is simple, requiring only minor changes to the SPN border routers so that alternately routed packets can be tunneled via conventional paths to an intermediate border node and then tunneled from there to the original egress border node. We present distributed algorithms for (1) discovering congestion within the SPN, and (2) allocating traffic to alternate paths that are uncongested. We have implemented the scheme in a packet-level simulation, and we have examined the transient response of the algorithm to perturbations in the nominal traffic levels experienced by the SPN. The experimental study of this paper provides some understanding of the scheme's ability to adapt in routing packets around congestion. Our results indicate that the alternate routing framework shows promise and warrants further consideration.

Liebeherr, Jorg; Christin, Nicolas. Buffer Management and Scheduling for Enhanced Differentiated Services, August 22, 2000.

Abstract: A novel framework, called JoBS (Joint Buffer Management and Scheduling), is presented for reasoning about relative and absolute per-class service differentiation in a packet network without information on traffic arrivals. JoBS has two unique capabilities: (1) JoBS makes scheduling and buffer management decisions in a single step, and (2) JoBS supports both relative and absolute QoS requirements of classes. JoBS is presented in terms of the solution to an optimization problem. Numerical simulation examples, including results for a heuristic approximation of JoBS, are presented to illustrate the effectiveness of the approach and to compare JoBS to existing methods for loss and delay differentiation.

Weikle, D. A. B.; Skadron, Kevin; McKee, Sally; Wulf, William A.. TSpec: A Notation for Describing Memory Reference Traces, August 14, 2000.

Abstract: Interpreting reference patterns in the output of a processor is complicated by the lack of a succinct notation for humans to use when communicating about them. Since an actual trace is simply an incredibly long list of numbers, it is difficult to see the underlying patterns inherent in it. The source code, while simpler to look at, does not include the effects of compiler optimizations such as loop unrolling, and so can be misleading as to the actual references and order seen by the memory system. To simplify communication of traces between researchers and to understand them more completely, we have developed a notation for representing them that is easy for humans to read, write, and analyze. We call this notation TSpec, for trace specification notation. It has been designed for use in cache design with four goals in mind. First, it is intended to assist in communication between people, especially with respect to understanding the patterns inherent in memory reference traces. Second, it is the object on which the cache filter model operates. Specifically, the trace and state of the cache are represented in TSpec, these are then the inputs for a function that models the cache, and the result of that function is a modified trace and state that are also represented in TSpec. Third, it supports the future creation of a machine readable version that could be used to generate traces to drive simulators, or for use in tools (such as translators from assembly language to TSpec). Finally, it can be used to represent different levels of abstraction in benchmark analysis.

Highley, Timothy; Reynolds, Paul; Williams, Craig. Testing the Functional Requirements of the Switch Interface Unit, August 4, 2000.

Abstract: Isotach networks provide special guarantees about message ordering and delivery times in a parallel or distributed computing environment. Although isotach networks can be implemented entirely in software running on commercial off-the-shelf hardware, an isotach network that uses custom hardware to maintain logical time may be particularly efficient. A hardware-based implementation of an isotach network has been designed and built using custom components, one of which is the switch interface unit (SIU). An SIU handles isotach functions for a host. These functions include the maintenance of logical time, the timestamping of outgoing isotach mesages, and the exchange of tokens with another custom hardware device,the token manager. We have designed and performed tests to analyze the switch interface units and increase confidence that the SIUs are functioning correctly. The analysis has indicated that the new units are working properly.

Abdelzaher, Tarek F.. A Schedulable Utilization Bound for Aperiodic Tasks, August 2, 2000.

Abstract: In this report, we derive a utilization bound on schedulability of apriodic tasks with arbitrary arrival times, execution times, and deadlines. To the author's knowledge, this is the first time a utilization bound is derived for the aperiodic task model. It allows constructing an O(1) admission test for aperiodic tasks. Earlier admission tests are at best O(n). We show that deadline-monotonic scheduling is the optimal fixed-priority scheduling policy for aperiodic tasks in the sense of maximizing the schedulable utilization bound. We prove that the optimal bound is 5/8. Our result is an extension of the well-known Liu and Layland's bound of ln 2 (derived for periodic tasks). The new bound is shown to be tight. We briefly generalize our results to tasks with multiple resource requirements and multiple processors. Dynamic priority scheduling (EDF) of aperiodic tasks is shown to have the same schedulability bound as for periodic tasks.

Our findings are especially useful for an emerging category of soft real-time applications, such as online trading and e-commerce, where task (request) arrival times are arbitrary, task service times are unknown, and service has to be performed within a given deadline. Our result provides theoretical grounds for guaranteeing deadlines of individual aperiodic requests by observing only the aggregate utilization conditions which simplifies achieving real-time assurances in such applications.

Williams, C.; Reynolds, P.F., Jr.; de Supinski, B.R.. Data coherence protocols: the home update protocol, March, 2000.

Abstract: We describe a class of directory coherence protocols called delta coherence protocols that use network guarantees to support a new and highly concurrent approach to maintain a consistent shared memory. Delta coherence protocols are more concurrent than other coherence protocols in that they allow processes to pipeline memory accesses without violating sequential consistency; support multiple concurrent readers and writers to the same cache block; and allow processes to access multiple shared variables atomically without invalidating copies held by other processes or otherwise obtaining exclusive access to the referenced variables. Delta protocols include both update and invalidate protocols. In this paper we describe the simplest, most basic delta protocol, an update protocol called the home update protocol. Delta protocols are based on isotach network guarantees. An isotach network maintains a logical time system that allows each process to predict and control the logical time at which its messages are received. We prove the home update protocol is correct using logical time to reason about the order in which requests are executed.

John A. Stankovic. VEST: A Toolset for Constructing and Analyzing Component Based Operating Systems For Embedded and Real-Time Systems, July 5, 2000.

Weikle, Dee A. B.; Skadron, Kevin; McKee, Sally A.; Wulf, Wm. A. Caches As Filters: A Unifying Model for Memory Hierarchy Analysis, June 1, 2000.

Abstract: This paper outlines the new caches-as-filters framework for the analysis of caching systems, describing the functional filter model in detail. This model is more general than those introduced previously, allowing designers and compiler writers to understand why a cache exhibits a particular behavior, and in some cases indicating what compiler or hardware techniques must be employed to improve a cache hierarchy's performance. Three components of the framework, the trace-specification notation, equivalence class concept, and new measures of cache performance, are described in previous publications. This paper extends the framework with a formal definition of the functional filter model and augments the trace-specification notation with additional constructs to describe conditionals and the effects of cache filtering. We then give detailed examples demonstrating the application of the model to a set of examples of a copy kernel, where each example represents a different equivalence class of traces.

Lack, Michael N.; Myers, Perry N. M.; Reynolds, Paul F.; Williams, Craig C.. The Isotach Messaging Layer: Ironman Design, May 9, 2000.

Abstract: In traditional parallel and/or distributed systems, synchronization among nodes requires relatively expensive mechanisms such as locks and barriers. An Isotach network inexpensively provides strong guarantees regarding message delivery order, thus allowing for sequential consistency and atomicity at a much lower cost.

The previous implementation of Isotach was built upon a third-party messaging layer, which resulted in unnecessary overhead and complications. This paper describes a complete design and implementation of a new messaging layer to support Isotach in a Linux/Myrinet network environment. The new system is fully integrated with previously designed Isotach custom hardware devices.

Philo Juang. Classification-Based Hybrid Branch Prediction (Thesis), March 24, 2000.

Abstract: One of the largest limiting factors in microprocessor instruction execution throughput is the existence of conditional branches in the instruction stream. Conditional branches disrupt the normal control flow of a program by introducing irregularities; instead of a natural, steady progression, conditional branches order the processor to "jump" to different places in the instruction stream. These irregularities force the processor to stall while the target to jump to is resolved.

The most common solution utilizes a branch predictor in conjunction with speculative execution to accelerate processing. First, the branch predictor attempts to forecast where the target of the branch is; it then immediately begins processing the instruction stream at that target. When the real target is finally resolved, if the processor was correct, then the effects of the disruption have been mitigated. Unfortunately, branch mis-predictions do occur, and the penalty for mis-predicting may be very high.

Current branch predictors invest quite a bit of logic in sophisticated, general purpose constructs designed to handle all branches in the instruction stream. While effective, a more efficient method of attack is to split the branches into distinct sets and direct them to a specialized branch predictor most suited to that particular type. This paper proposes to divide branches into sets based on complexity. Instead of one general purpose predictor, it is expected that this hybrid branch predictor will produce higher accuracy with the same amount of resources.

Furthermore, this thesis will show the impact of several classes of branches on the overall accuracy of the branch predictor. These branches fall into several categories: constant and almost constant, infrequent, repeating pattern, and iterative. These branches can be accurately predicted with a smaller branch predictor than typical general purpose predictors.

Each branch class will be removed from the instruction stream to examine its impact on overall accuracy. It is anticipated that removing one or more of these classes will improve overall accuracy. These types of branches will then be assigned a specialized predictor. The combination of multiple specialized predictors should contribute not only to improving the accuracy of their own branches but also in reducing the pollution of the "normal" branch predictor.

Wang, Chenxi; Hill, Jonathan; Knight, John; Davidson, Jack. Software Tamper Resistance: Obstructing Static Analysis of Programs, May 12, 2000.

Abstract: Reliable execution of software on untrustworthy platforms is a difficult problem. On the one hand, the underlying system services cannot be relied upon to provide execution assurance, while on the other hand, the effect of a tampered execution can be disastrous -- consider intrusion detection programs. What is needed, in this case, is tamper resistant software. Code obfuscation has been an area of development, in part, to enhance software tamper resistance. However, most obfuscation techniques are ad hoc, without the support of sound theoretical basis or provable results. In this paper, we address one aspect of software protection by obstructing static analysis of programs. Our techniques are based, fundamentally, on the difficulty of resolving aliases in programs. The presence of aliases has been proven to greatly restrict the precision of static data-flow analysis. Meanwhile, effective alias detection has been shown to be NP-Hard. While this represents a significant hurdle for code optimization, it provides a theoretical basis for structuring tamper-resistant programs -- systematic introduction of nontrivial aliases transforms programs to a form that yields data flow information very slowly and/or with little precision. Precise alias analysis relies on the collection of static control flow information. We further hinder the analysis by a systematic "break-down" of the program control-flow; transforming high level control transfers to indirect addressing through aliased pointers. By doing so, we have made the basic control-flow analysis into a general alias analysis problem, and the data-flow analysis and control-flow analysis are made co-dependent. We present a theoretical result which shows that a precise analysis of the transformed program, in the general case, is NP-hard and demonstrate the applicability of our techniques with empirical results.

Chenxi Wang; John Knight; Matt Elder. On Computer Viral Infection and the Effect of Immunization, November 12, 1999.

Abstract: Viruses remain a significant threat to modern computer systems. Despite the best efforts of those who develop anti-virus systems, new viruses and new types of virus that are not dealt with by existing protection schemes appear regularly. In addition, the rate at which a virus can spread has risen dramatically with the increase in connectivity. Defenses against infections by known viruses rely at present on immunization yet, for a variety of reasons, immunization is often only effective on a subset of the nodes in a network and many nodes remain unprotected. Little is known about either the way in which a viral infection proceeds in general or the way that immunization affects the infection process. In this paper we present the results of a simulation study of the way in which virus infections propagate through certain types of network and of the effect that partial immunization has on the infection. The Key result is that relatively low levels of immunization can slow an infection significantly.

Scott, Kevin; Skadron, Kevin. BLP: Applying ILP Techniques to Bytecode Execution, February 2000.

Abstract: The popularity of Java has resulted in a flurry of engineering and research activity to improve performance of Java Virtual Machine (JVM) implementations. This paper introduces the concept of bytecode-level parallelism (BLP)--data- and control- independent bytecodes that can be executed concurrently--as a vehicle for achieving substantial performance improvements in implementations of JVMs, and describes a JVM architecture--JVM-BLP--that uses threads to exploit BLP. Measurements for several large Java programs show levels of BLP can be as high as 14.564 independent instructions, with an average of 6.768.

Regehr, John; Stankovic, Jack; Humphrey, Marty. The Case for Hierarchical Schedulers with Performance Guarantees, March 14, 2000.

Abstract: Audio and video applications, process control, agile manufacturing and even defense systems are using commodity hardware and operating systems to run combinations of real-time and non-real-time tasks. We propose an architecture that will allow a general-purpose operating system to schedule conventional and real-time tasks with diverse requirements, to provide flexible load isolation between applications, users, and accounting domains, and to enforce high-level policies about the allocation of CPU time. This is accomplished by implementing a dynamic, hierarchical scheduling infrastructure. The infrastructure is integrated with a resource manager that provides a level of indirection between resource requests and the scheduling hierarchy. A scheduling infrastructure separates scheduler code from the rest of the operating system. To demonstrate the utility of our architecture, we describe its application to three existing real-time schedulers. For each of the three, we show added flexibility while retaining the original scheduling guarantees.

White, Brian S.; Skadron, Kevin. Path-Based Target Prediction for File System Prefetching, February 1, 2000.

Abstract: Prefetching is a well-known technique for mitigating the von Neumann bottleneck. In its most rudimentary form, prefetching simplifies to sequential lookahead. Unfortunately, large classes of applications exhibit file access patterns that are not amenable to sequential prefetching. More general purpose approaches often use models to develop an appropriate prefetching strategy. Such models tend to be large, thus preventing a kernel implementation which would lead to user transparency and more efficient execution. This work applies the target cache approach of path-based branch target prediction to file system prefetching to combat these deficiencies. The feasibility and worth of such a design are evaluated against a number of parallel applications popular in the scientific community.

Schulman, Eric; French, James C.; Powell, Allison L.. Evaluating Astronomical Institutional Productivity Using the Astrophysics Data System Database, January 4, 2000.

Abstract: We used the Astrophysics Data System (ADS) to measure the productivity of the 38 institutions studied by Abt during the period 1985 to 1994. The ADS database contains 84,822 astronomical papers published in Astronomy and Astrophysics, The Astronomical Journal, The Astrophysical Journal, Monthly Notices of the Royal Astronomical Society, and The Publications of the Astronomical Society of the Pacific during this period. For each of these papers we compared the affiliation of each author to a canonical list of ADS affiliations that we had created using automated and manual clustering strategies. We assumed--as did Abt--that each of the n authors on a paper should result in his or her institution getting credit for 1/n papers. We compared our results to those of Abt and determined that his results were not strongly affected by having neglected the European journals or by having used papers from only one six-month period in the decade 1985-1994.

Robert Boorstyn; Almut Burchard; Jorg Liebeherr; Chaiwat Oottamakorn. Statistical Multiplexing Gain of Link Scheduling Algorithms in QoS Networks, July 1999.

Abstract: A statistical network service which allows a certain fraction of traffic to not meet its QoS guarantees can extract additional capacity from a network by exploiting statistical properties of traffic. Here we consider a statistical service which assumes statistical independence of flows, but does not make any assumptions on the statistics of traffic sources, other than that they are regulated, e.g., by a leaky bucket. Under these conditions, we present functions, so-called local effective envelopes and global effective envelopes, which are, with high certainty, upper bounds of multiplexed traffic. We show that these envelopes can be used to obtain bounds on the amount of traffic on a link that can be provisioned with statistical QoS. A key advantage of our bounds is that they can be applied with a variety of scheduling algorithms. In fact, we show that one can reuse existing admission control functions that are available for scheduling algorithms with a deterministic service. We present numerical examples which compare the number of flows with statistical QoS guarantees that can be admitted with our empirical envelope approach to those achieved with existing methods.

Robert Boorstyn; Almut Burchard; Jorg Liebeherr; Chaiwat Oottamakorn. Statistical Multiplexing Gain of Link Scheduling Algorithms in QoS Networks (Short Version), July 1999.

Abstract: This report is an abbreviated version of CS-99-21. A statistical network service which allows a certain fraction of traffic to not meet its QoS guarantees can extract additional capacity from a network by exploiting statistical properties of traffic. Here we consider a statistical service which assumes statistical independence of flows, but does not make any assumptions on the statistics of traffic sources, other than that they are regulated, e.g., by a leaky bucket. Under these conditions, we present functions, so-called local effective envelopes and global effective envelopes, which are, with high certainty, upper bounds of multiplexed traffic. We show that these envelopes can be used to obtain bounds on the amount of traffic on a link that can be provisioned with statistical QoS. A key advantage of our bounds is that they can be applied with a variety of scheduling algorithms. In fact, we show that one can reuse existing admission control functions that are available for scheduling algorithms with a deterministic service. We present numerical examples which compare the number of flows with statistical QoS guarantees that can be admitted with our empirical envelope approach to those achieved with existing methods.

Highley, Timothy; Lack, Michael; Myers, Perry. Aspect Oriented Programming: A Critical Analysis of a New Programming Paradigm, May 5, 1999.

Abstract: Aspect-oriented programming is a paradigm which addresses crosscutting concerns in computer programs. A crosscutting concern is something that cannot be cleanly encapsulated in a generalized module. This paper serves several purposes. The first section will attempt to give some background for AOP and describe some of the motivation in more detail. It will also briefly describe the syntax and semantics of AspectJ, a Java based implementation of AOP. We will also present a design philosophy for using aspects on top of an object-oriented paradigm. The second section discusses some problems we have observed in the AspectJ implementation. The third section discusses alternative paradigms to which AOP might apply. While current research seems to favor OO languages as the basis for AOP, this does not necessarily need to be the case. Procedural languages will be considered to see if they also might benefit from AOP. An implementation for C will be briefly outlined here. Finally, we will give some concluding remarks about AOP and its usefulness.

Barker, A.L. ; Martin, W.N.. Population Diversity and Fitness Measures Based on Genomic Distances, August 24, 1999.

Abstract: We define a class of genetic algorithms where, at each time step, two parents are selected to produce a child which then replaces one member of the population at the next time step. We consider the finite-population case. We define a general crossover and mutation operation, as well as a genomic distance between individuals. We require a specific property to hold for such operations and distance functions, and present examples of crossover operations, mutation operations, and distance functions which meet the requirements. We then define the sum over all pairwise population distances as a measure of the diversity of a population and consider the time evolution of the expected diversity of a population. We show that under uniform, independent selection of parents and the individual to be replaced the expected diversity is strictly decreasing. For this case we calculate an explicit formula for the diversity at each time step, based only on the initial population diversity. We then consider and discuss the case where independent, fitness-based selection is used and show that the expected diversity is strictly decreasing whenever the same probability function is used to select both the parents and the individual to be replaced. We qualitatively discuss conditions where expected diversity will increase rather than decrease with fitness-based selection. Finally, we discuss fitness measures based on a distance function.

Szadja, Doug. Testing the Isotach Prototype Hardware Switch Interface Unit, June 4, 1999.

Abstract: Although Isotach systems can be implemented entirely in software using commercial off-the-shelf hardware, they achieve greatest efficiency with the help of special hardware components, one of which is the switch interface unit (SIU). Located in-link between the network interface boards and switches to which these boards are connected, the SIU is responsible for the maintenance of logical time, for timestamping outgoing isotach messages, and for exchanging tokens with another custom hardware device, the token manager (TM).

Testing the SIU requires the modification of the isotach prototype code base to create two new system versions, Version 1.75 which emulates the functionality of the SIU and is intended to run in systems in which other nodes are connected to hardware SIUs, and Version 2 which runs on hosts connected directly hardware SIUs. In addition to these prototypes, a suite of test programs has been designed to aid in lower level debugging of the device, and to verify that design requirements have been met.

This technical report describes the development of these prototype versions, the current state of the SIU debugging process, and the design of the test suite. In addition, detailed prototype code documentation and hints for programming the LANai (the Myrinet network interface processor) are included in the appendices.

Liebeherr, Jorg; Yilmaz, Erhan. Workconserving vs. Non-workconserving Packet Scheduling: An Issue Revisited, Feb 18,1999.

Abstract: Many packet schedulers for QoS networks are equipped with a rate-control mechanism. The function of a rate-control mechanism (also called rate controller) is to buffer packets from flows which exceed their negotiated traffic profile. It has been established that rate controllers lead to reduced buffer requirements at packet switches, and do not increase the worst-case delays in a deterministic service. On the other hand, rate controllers make a scheduler non-workconserving, and, thus, may yield higher average end-to-end delays. In this study, we show that by properly modifying a rate-controller, one can design a scheduler which balances buffer requirements against average delays. We present a scheduler, called Earliness-based Earliest Deadline First (EEDF), which achieves such a balancing using a tunable rate control mechanism. In simulation experiments, we compare EEDF with a rate-controlled EDF scheduler and a workconserving version of EDF.

French, James C.; Powell, Allison L.. Metrics for Evaluating Database Selection Techniques, June 18, 1999.

Abstract: The increasing availability of online databases and other information resources in digital libraries has created the need for efficient and effective algorithms for selecting databases to search. A number of techniques have been proposed for query routing or database selection. We have developed a methodology and metrics that can be used to directly compare competing techniques. They can also be used to isolate factors that influence the performance of these techniques so that we can better understand performance issues. In this paper we describe the methodology we have used to examine the performance of database selection algorithms such as gGlOSS and CORI. In addition we develop the theory behind a ``random'' database selection algorithm and show how it can be used to help analyze the behavior of realistic database selection algorithms.

Knight, John C.; Elder, Matthew C.; Du, Xing. Error Recovery in Critical Infrastructure Systems, June 4, 1999.

Abstract: Critical infrastructure applications provide services upon which society depends heavily; such applications require survivability in the face of faults that might cause a loss of service. These applications are themselves dependent on distributed information systems for all aspects of their operation and so survivability of the information systems is an important issue. Fault tolerance is a key mechanism by which survivability can be achieved in these information systems. Much of the literature on fault-tolerant distributed systems focuses on local error recovery by masking the effects of faults. We describe a direction for error recovery in the face of catastrophic faults, where the effects of the faults cannot be masked using available resources. The goal is to provide continued service that is either an alternate or degraded service by reconfiguring the system rather than masking faults. We outline the requirements for a reconfigurable system architecture and present an error recovery system that enables systematic structuring of error recovery specifications and implementations.

Sielken, Robert S.. Application Intrusion Detection, June 9, 1999.

Abstract: Intrusion detection has traditionally been performed at the operating system (OS) level by comparing expected and observed system resource usage. OS intrusion detection systems (OS IDS) can only detect intruders, internal or external, who perform specific system actions in a specific sequence or those intruders whose behavior pattern statistically varies from a norm. Internal intruders are said to comprise at least fifty percent of intruders [ODS99], but OS intrusion detection systems are frequently not sufficient to catch such intruders since they neither significantly deviate from expected behavior, nor perform the specific intrusive actions because they are already legitimate users of the system.

We hypothesize that application specific intrusion detection systems can use the semantics of the application to detect more subtle, stealth-like attacks such as those carried out by internal intruders who possess legitimate access to the system and its data and act within their bounds of normal behavior, but who are actually abusing the system. To test this hypothesis, we developed two extensive case studies to explore what opportunities exist for detecting intrusions at the application level, how effectively an application intrusion detection system (AppIDS) can detect the intrusions, and the possibility of cooperation between an AppIDS and an OS IDS to detect intrusions. From the case studies, we were able to discern some similarities and differences between the OS IDS and AppIDS. In particular, an AppIDS can observe the monitored system with a higher resolution of observable entities than an OS IDS allowing tighter thresholds to be set for the AppIDS' relations that differentiate normal and anomalous behavior thereby improving the overall effectiveness of the IDS.

We also investigated the possibility of cooperation between an OS IDS and an AppIDS. From this exploration, we developed a high-level bi-directional communication interface in which one IDS could request information from the other IDS, which could respond accordingly. Finally, we explored a possible structure of an AppIDS to determine which components were generic enough to use for multiple AppIDS. Along with these generic components, we also explored possible tools to assist in the creation of an AppIDS.

Humphrey, Marty; Knabe, Frederick; Ferrari, Adam; Grimshaw, Andrew. Accountability and Control of Process Creation in the Legion Metasystem, May 24, 1999.

Abstract: A metacomputing environment, or metasystem, is a collection of geographically separated resources (people, computers, devices, databases) connected by one or more high-speed networks. The distinguishing feature of a metasystem is middleware that facilitates viewing the collection of resources as a single virtual machine. The traditional requirements of security mechanisms and policies in a single physical host are exacerbated in a metasystem, as the physical resources of the metasystem exist in multiple administrative domains, each with different local security requirements. This paper illustrates how the Legion metasystem both accommodates and augments local security policies specifically with regard to process creation. For example, Legion configurations for local sites with different access control mechanisms such as standard UNIX mechanisms and Kerberos are compared. Through analysis of these configurations, the inherent security trade-offs in each design are derived. These results have practical importance to sites considering any future inclusion of local resources in a global virtual computer.

Bartholet, Robert G.. A Performance Study of Isotach Version 1.0, April 29, 1999.

Abstract: The Isotach version 1.0 prototype implements isotach logical time on a cluster of Intel-based personal computers linked with a Myrinet local area network. A performance study was conducted on this unoptimized version of the prototype to determine Isotach's performance characteristics, look for problems in the software and hardware design, and discover ways to optimize the system. Tests were conducted using custom developed programs and the Communications Analysis and Simulation Tool (CAST), a network benchmarking tool developed by the U.S. Navy and used at their request. In order to give us a baseline, we also performed tests on Fast Messages 1.1, a messaging layer known to provide good performance. CAST measurements were not as accurate as our custom developed tests due to software overhead in the tool. We explain the reasons for this overhead, and show that CAST accuracy is degraded on networks with sub-millisecond latencies. Our performance study measured throughput, latency, server performance under contention, effects of isochron size, and effects of token traffic on non-isotach traffic. The all- software Isotach version 1.0 prototype exceeded many of its performance goals. We expect performance to improve even more when custom-built hardware components replace software. Limitations in the design of the Isotach prototype were discovered, and recommendations were made to improve performance in future Isotach systems.

Grimshaw, A.; Ferrari, A.; Knabe, F.; Humphrey, M.. Legion: An Operating System for Wide-Area Computing, March 23, 1999.

Abstract: Applications enabled by the increasing availability of high-performance networks require the ability to share resources that are spread over complex, large-scale, heterogeneous, distributed environments spanning multiple administrative domains. We call this the wide-area computing problem. We argue that the right way to solve this problem is to build an operating system for the network that can abstract over a complex set of resources and provide high-level means for sharing and managing them. We describe the design of one such wide-area operating system: Legion. Through discussion of application examples, we demonstrate the attractive features of Legion approach to constructing a wide-area operating system using distributed object components.

Luebke, David. A Developer's Survey of Polygonal Simplification Algorithms, January 18, 1999.

Abstract: Polygonal simplification, a.k.a. level of detail, is an important tool for anyone doing interactive rendering, but how is a developer to choose among the dozens of published algorithms? This article surveys the field from a developer's point of view, attempting to identify the issues in picking an algorithm, relate the strengths and weaknesses of different approaches, and describe a number of published algorithms as examples.

Milner, Christopher W.. Pipeline Descriptions for Retargetable Compilers: A Decoupled Approach, June 1, 1998.

Abstract: A good optimizing compiler must have detailed information about the target processor's execution pipeline in order to generate and schedule code with high levels of instruction-level parallelism. Current state-of-the-art pipeline descriptions are tediously constructed on an instruction-by-instruction basis. These descriptions often fail to capture important instruction scheduling constraints so artificial resources are introduced to enforce these constraints. The result is a pipeline description that is difficult to maintain and reuse; retargeting the compiler means retargeting each instruction and rethinking the purpose of each artificial resource. To address the above problems, the proposed research will develop a new, powerful approach for describing modern instruction pipelines by separating the pipeline description from the instruction set description. The proposed approach uses a graphical description of the pipeline and an accompanying annotation language to describe the relevant behavior of the machine's execution pipeline. Using the descriptions of the pipeline and an existing description technique for instruction sets, it will be possible to generate instruction scheduler information automatically. Furthermore, this decoupling of the pipeline description from the instruction set description eases the burden of retargeting the compiler as new instruction set extensions and new pipeline implementations appear.

Pfaltz, John L.. Neighborhood Expansion Grammars, February 28, 1999.

Abstract: Phrase structure grammars, in which non-terminal symbols on the left side of a production can be rewritten by the string on the right side, together with their Chomsky hierarch classification, are familiar to computer scientists. But, these gramamars are most effective only to generate, and parse, strings. In this report, we introduce a new kind of grmmar in which the right side of the production is simply appended to the intermediate structure in such a way that the left side becomes its "neighborhood" in the new structure. This permits the grammatical definition of many different kinds of "n-dimensional" discrete structures. Several examples are given. Moreover, these grammars yield a formal theory grounded in antimatroid closure spaces. For example, we show that restricted neighborhood expansion grammars capture the essence of finite state and context free phrase structure grammars.

Pfaltz, John L.. Closure Spaces in the Plane, February 25, 1999.

Abstract: When closure operators are defined over figures in the plane, they are normally defined with respect to convex closure in the Euclidean plane. This report concentrates on discrete closure operators defined over the discrete, rectilinear plane. Basic to geometric convexity is the concept of a geodesic, or shortest path. Such geodesics can be regarded as the closure of two points. But, given the usual definition of geodesic in the discrete plane, they are not unique. And therefore convex closure is not uniquely generated. We offer a different definition of geodesic that is uniquely generated, although not symmetric. It results in a different geometry; for example, two unbounded lines may be parallel to a third, yet still themselves intersect. It is a discrete geometry that appears to be relevant to VLSI design.

French, James C.; Powell, Allison L.; Callan, Jamie. Effective and Efficient Automatic Database Selection, February 22, 1999.

Abstract: We examine a class of database selection algorithms that require only document frequency information. The CORI algorithm is an instance of this class of algorithms. In previous work, we showed that CORI is more effective than gGlOSS when evaluated against a relevance-based standard. In this paper, we introduce a family of other algorithms in this class and examine components of these algorithms and of the CORI algorithm to begin identifying the factors responsible for their performance. We establish that the class of algorithms studied here is more effective and efficient than gGlOSS and is applicable to a wider variety of operational environments. In particular, this methodology is completely decoupled from the database indexing technology so is as useful in heterogeneous environments as in homogeneous environments.

French, James C.; Powell, Allison L.; Callan, Jamie; Viles, Charles L.; Emmitt, Travis; Prey, Kevin J.. Comparing the Performance of Database Selection Algorithms, January 12, 1999.

Abstract: We compare the performance of two database selection algorithms reported in the literature. Their performance is compared using a common testbed designed specifically for database selection techniques. The testbed is a decomposition of the TREC/TIPSTER data into 236 subcollections. We present results of a recent investigation of the performance of the CORI algorithm and compare the performance with earlier work that examined the performance of gGlOSS. The databases from our testbed were ranked using both the gGlOSS and CORI techniques and compared to the RBR baseline, a baseline derived from TREC relevance judgements. We examined the degree to which CORI and gGlOSS approximate this baseline. Our results confirm our earlier observation that the gGlOSS Ideal(l) ranks do not estimate relevance-based ranks well. We also find that CORI is a uniformly better estimator of relevance-based ranks than gGlOSS for the test environment used in this study. Part of the advantage of the CORI algorithm can be explained by a strong correlation between gGlOSS and a size-based baseline (SBR). We also find that CORI produces consistently accurate rankings on testbeds ranging from 100--921 sites. However for a given level of recall, search effort appears to scale linearly with the number of databases.

Ramsey, Norman; Cifuentes, Cristina. A Transformational Approach to Binary Translation of Delayed Branches, February, 1999.

Abstract: The fundamental steps in binary translation are distinguishing code from data, mapping data from source to target, and translating instructions. Translating instructions presents few problems, except when the source instruction set has features not present in typical compiler intermediate codes. The most common such feature is the delayed branch.

Standard code-generation technology can handle delayed branches in the target language, but not in the source. Translating delayed branches can involve tricky case analyses to figure out what happens if there is a branch instruction in a delay slot. This paper presents a disciplined method for deriving such case analyses. The method identifies problematic cases, shows the translations for the non-problematic cases, and gives confidence that all cases are considered. The method also applies to other tools that analyze machine instructions.

We begin by writing a very simple interpreter for the source machine. It specifies, at the register-transfer level, how the source machine executes instructions, including delayed branches. We then transform the interpreter into an interpreter for a target machine without delayed branches. To maintain the semantics of the program being interpreted, we simultaneously transform the sequence of source-machine instructions into a sequence of target-machine instructions. The transformation of the instructions becomes our algorithm for binary translation.

We show the translation is correct by using a correspondence between source and target states, and showing if the source and target machines begin execution in corresponding states, they reach new corresponding states in a few instructions.

Srinivasa, Rashmi. Parallel Rule-Based Isotach Systems, February 5, 1999.

Abstract: The rule-based system is an important tool used to build expert systems. Much research has gone into trying to speed up rule-based systems, especially by using parallelism, but limited success has been observed so far. Most of the attempts have been in the direction of improving the Match-Recognize-Act (MRA) cycle [BROW85], which is the technique used in the sequential execution of rule-based systems. We present a parallel algorithm (ISORULE) for the execution of rule-based programs, which is based on isotach time [REYN89], and which eliminates the MRA algorithm. We compare ISORULE to a conventional parallel technique based on MRA (PARAMRA). We describe the design and implementation of ISORULE, and analyze the ways in which ISORULE exploits parallelism. We describe a static model to analyze ISORULE performance, and use it to demonstrate that ISORULE exploits most of the statically determinable parallelism in a rule set. We detail our simulation of ISORULE and PARAMRA, and report results obtained by running synthesized as well as real rule-based programs on it. Our experiments with synthetic rule sets reveal order of magnitude performance improvement of ISORULE over PARAMRA. Since our initial analysis suggests that the performance improvement increases with the amount of concurrency in the rule sets, multiple orders of magnitude in performance improvement seem likely with larger rule sets exhibiting a low degree of conflict. Further, we analyze the results obtained from our initial investigation of real rule sets, and explain the more limited performance improvement we observed with these sets.

Childers, Bruce R.; Davidson, Jack W.. Automatic Design of Custom Wide-Issue Counterflow Pipelines, January 21, 1999.

Abstract: Application-specific processor design is a promising approach for meeting the performance and cost goals of a system. Application-specific integrated processors (ASIP's) are especially promising for embedded systems (e.g., automobile control systems, avionics, cellular phones, etc.) where a small increase in performance and decrease in cost can have a large impact on a product's viability. Sutherland, Sproull, and Molnar have proposed a new pipeline organization called the Counterflow Pipeline (CFP) that may be appropriate for ASIP design. This paper extends the original CFP microarchitecture to a very long instruction word (VLIW) organization for the custom design of instruction-level parallel processors tailored to the requirements of computation-intensive inner loops. First, we describe our extensions to the CFP to support automatic customization of application-specific VLIW processors. Second, we present an automatic design system that customizes a wide-issue counterflow pipeline (WCFP) to the resource and data flow requirements of a software pipeline loop. Third, we show that custom asynchronous WCFP's achieve cycles per operation measurements that are competitive with custom VLIW organizations at a potentially low design complexity. Finally, the paper describes several enhancements that can be made to WCFP's to further improve performance of kernel loops.

Childers, Bruce R.; Davidson, Jack W.. Rapid Prototyping of Application-Specific Counterflow Pipelines, January 8, 1999.

Abstract: Application-specific processor (ASIP) design is a promising approach for meeting the performance and cost goals of an embedded system. We have developed a new microarchitecture for automatically constructing ASIP's. This new architecture, called a wide counterflow pipeline (WCFP), is based on the counterflow pipeline organization proposed by Sproull, Sutherland, and Molnar. Our ASIP synthesis technique uses software pipelining and design space exploration to generate a custom WCFP and instruction set for an embedded application. This type of architecture synthesis requires an infrastructure for rapidly prototyping ASIP's to evaluate design trade-offs. This paper presents the requirements and implementation of such an environment for automatic design of WCFP's. First, we describe a database for specifying design elements and architectural constraints. Second, we present an intermediate representation for WCFP synthesis and reconfigurable simulation. Finally, we describe a fast and reconfigurable simulation methodology for WCFP's.

Childers, Bruce R.; Davidson, Jack W.. Application-Specific Pipelines for Exploiting Instruction-Level Parallelism, May 1, 1998.

Abstract: Application-specific processor design is a promising approach for meeting the performance and cost goals of a system. Application-specific processors are especially promising for embedded systems (e.g., automobile control systems, avionics, cellular phones, etc.) where a small increase in performance and decrease in cost can have a large impact on a product's viability. Sutherland, Sproull, and Molnar have proposed a new pipeline organization called the Counterflow Pipeline (CFP). This paper shows that the CFP is an ideal architecture for fast, low-cost design of high-performance processors customized for computation-intensive embedded applications. First, we describe why CFP's are particularly well suited to realizing application-specific processors. Second, we describe how a CFP tailored to an application can be constructed automatically. Third, we present measurements that show CFP's elegantly and simply provide speculative execution, out-of-order execution, and register renaming that is matched to the application. These measurements show that CFP's speculative and out-of-order execution allow it to tolerate frequent control dependences and high-latency operation such as memory accesses. Finally, we show that asynchronous counterflow pipelines may achieve very high-performance by reducing the average execution latency of instructions over synchronous implementations. Application speedups of up to 7.8 are achieved using custom counterflow pipelines for several well-known kernel loops.

Ferrari, Adam; Knabe, Frederick; Humphrey, Marty; Chapin, Steve; Grimshaw, Andrew. A Flexible Security System for Metacomputing Environments, December 1, 1998.

Abstract: A metacomputing environment is a collection of geographically distributed resources (people, computers, devices, databases) connected by one or more high-speed networks and potentially spanning multiple administrative domains. Security is an essential part of metasystem design -- high-level resources and services defined by the metacomputer must be protected from one another and from possibly corrupted underlying resources, while those underlying resources must minimize their vulnerability to attacks from the metacomputer level. We present the Legion security architecture, a flexible, adaptable framework for solving the metacomputing security problem. We demonstrate that this framework is flexible enough to implement a wide range of security mechanisms and high-level policies.

Stankovic, John A.; Lu, Chenyang; Son, Sang H.. The Case for Feedback Control Real-Time Scheduling, November 27, 1998.

Abstract: Despite the significant body of results in real-time scheduling, many real world problems are not easily supported. While algorithms such as Earliest Deadline First, Rate Monotonic, and the Spring scheduling algorithm can support sophisticated task set characteristics (such as deadlines, precedence constraints, shared resources, jitter, etc.), they are all "open loop" scheduling algorithms. Open loop refers to the fact that once schedules are created they are not "adjusted" based on continuous feedback. While open-loop scheduling algorithms can perform well in static or dynamic systems in which the workloads can be accurately modeled, they can perform poorly in unpredictable dynamic systems, i.e., systems whose workloads cannot be accurately modeled. Unfortunately, many real-world complex systems are dynamic and unpredictable. In addition, most scheduling paradigms assume that timing requirements are known and fixed, while in real systems timing requirements are more flexible. For example, instead of a single fixed deadline for a task, a deadline range might be acceptable to the physical system. If this range was passed to the scheduling system, more robust scheduling could be possible. In this paper, we present a new scheduling paradigm, which we call feedback control real-time scheduling. Feedback control real-time scheduling will define error terms for schedules, monitor the error, and continuously adjust the schedules to maintain stable performance. Feedback control real-time scheduling will be able to deal with dynamic systems that are both resource insufficient and with unpredictable workload. It can also be integrated with flexible real-time requirements. In this paradigm, the scheduling algorithms regard the computer system as a control system with the scheduler as the controller, and integrate practical feedback control techniques into scheduling algorithms. This paper presents a practical feedback control real-time scheduling algorithm, FC-EDF, which is a starting point in the long-term endeavor of creating a theory and practice of feedback control scheduling. Research results will apply to many real world systems including real-time database applications, defense applications, agile manufacturing, and distributed multimedia.

Nguyen-Tuong, Anh; Chapin, Steve J.; Grimshaw, Andrew S.; Viles, Charlie. Using Reflection for Flexibility and Extensibility in a Metacomputing Environment, November 19, 1998.

Abstract: We present system developers with a reflective model, the Reflective Graph and Event model (RGE), for building metacomputing applications, incorporating our design goals of flexibility, extensibility, reusability, and composability. The model uses graphs and events to specify computations and enables first-class program graphs as event handlers. We demonstrate the RGE model in several areas of interest to metacomputing using Legion as our experimental testbed. We unify the concepts of exceptions and events; by making exceptions a special case of events. Furthermore, using RGE, we demonstrate how to build generic, composable and reusable components that can be shared across development environments such as MPI, PVM, NetSolve, C++, and Fortran.

Nahas, Michael D.; Wulf, William A.. Data Cache Performance When Vector-Like Accesses Bypass the Cache, July 5, 1997.

Abstract: A Stream Memory Controller, when added to a conventional memory hierarchy, routes vector-like accesses around the data cache. A memory system was simulated under these conditions and the data cache performance increased dramatically. The gain in performance was a result of the increased temporal locality of the access pattern. The access pattern also showed a decrease in spatial locality, making smaller cache lines nearly as effective as long ones.

Nguyen-Tuong, Anh; Grimshaw, Andrew S.. Using Reflection for Incorporating Fault-Tolerance Techniques into Distributed Applications, November 5, 1998.

Abstract: As part of the Legion metacomputing project, we have developed a reflective model, the Reflective Graph & Event (RGE) model, for incorporating functionality into applications. In this paper we apply the RGE model to the problem of making applications more robust to failure. RGE encourages system developers to express fault-tolerance algorithms in terms of transformations on the data structures that represent computations--messages and methods--hence enabling the construction of generic and reusable fault-tolerance components. We illustrate the expressive power of the RGE by encapsulating the following fault-tolerance techniques into RGE components: two-phase commit distributed checkpointing, passive replication, pessimistic method logging, and forward recovery.

Knight, John C.; Elder, Matthew C.; Chapin, A.C.; Combs, Brownell K.; Geist, Steven; McCulloch, Sean; Nakano, Luis G.; Sielken, Robert S.. Topics in Survivable Systems, August 14, 1998.

Abstract: In the Spring of 1998, a special topics course in Information Survivability was taught at the University of Virginia. The attendees were graduate students in computer science and the instructor for the course was John C. Knight. This report is a compendium of papers written by students who attended the course. The topics covered in this report are not comprehensive by any means; the survivability area is too broad for that. The papers do, however, span quite a wide technical range. The first paper, by Robert S. Sielken, is entitled "The Public Switched Telephone Network". This is a short summary of many aspects of the telephone system and provides some background information about how one of the critical infrastructure systems works. The second paper, by Sean McCulloch, is entitled "An Analysis of Non-Security Failures of the Electric, Phone, and Air Traffic Systems". This is an examination of some of the incidents that have occurred in these infrastructure application domains. The limitation to non-security incidents is important because it helps to point out the many sources of failure to which these systems are subject. The general concern of the community about security is significant, and four of the papers cover security-related topics. The first, by Matthew C. Elder, is entitled "Major Security Attacks on Critical Infrastructure Systems". This paper discusses recent security attacks on critical information systems, and presents extensive information about the severity of the security problem. The second security paper, by Brownell K. Combs, is entitled "Hacking Information Available on the Internet". This paper examines the material available to hackers on the Internet. The author sought out sources of information so as to determine what is available and how easy it is to obtain. The results are surprising. The third security paper, by Steven Geist, is entitled "Firewalls". This paper presents a summary report on firewalls, an important technology designed to help improve the security of existing systems. The fourth paper, by Luis Nakano, is entitled "State of the Art in Computer Virus Prevention". Viruses are growing in number at an extraordinary rate and many new techniques are being developed by their authors to combat existing detection systems. In this paper, the different types of virus are summarized and approaches to their elimination discussed. The final paper in this report is by A. C. Chapin and is entitled "Smart Cards: Security in the New Transaction Cards". The field of critical information systems is changing rapidly and this final paper is about an emerging technology that will have a large impact on all aspects of the field in the future. These papers have been compiled into this report to provide a ready source of information on a variety of topics for the interested reader. In each case, the papers include an extensive bibliography that can be used to obtain more details about the subject of the paper. For more information about the field of survivability, please visit the Web site of the survivability architectures project at the University of Virginia: http://www.cs.virginia.edu/~survive

Katramatos, Dimitrios; Chapin, Steve J.; Hillman, Patricia; Fisk, Lee Ann; van Dresser, David. Cross-Operating System Process Migration on a Massively Parallel Processor, August 17, 1998.

Abstract: As part of the Uniform Kernel project at Sandia National Labs, we developed process migration mechanisms for the Intel Paragon and Teraflop (ASCI Red) machines. The computing paradigm on these massively parallel processors divides the nodes into multiple logical partitions which run differing operating systems. This provides a unique environment for process migration, with homogeneous hardware and heterogeneous systems software. In this paper, we describe mechanisms that allow processes to (1) run on multiple operating systems, and (2) migrate between partitions on demand. The result is the first step towards a system which will allow partition boundaries to change dynamically, thus responding to changes in load mix and resource usage on the machine.

Stankovic, J.; Son, S.. Architecture and Object Model for Distributed Object-Oriented Real-Time Databases, August 19, 1998.

Abstract: The confluence of computers, communications, and databases is quickly creating a global virtual database where many applications require real-time access to both temporally accurate and multimedia data. This is particularly true in military and intelligence applications, but these required features are needed in many commercial applications as well. We are developing a distributed database, called BeeHive, which could offer features along different types of requirements: real-time, fault-tolerance, security, and quality-of service for audio and video. Support of these features and potential trade-offs between them could provide a significant improvement in performance and functionality over current distributed database and object management systems. In this paper, we present a high level design for BeeHive architecture and sketch the design of the BeeHive Object Model (BOM) which extends object-oriented data models by incorporating time and other features into objects.

Lindahl, Greg; Chapin, Steve J.; Beekwilder, Norman; Grimshaw, Andrew. Experiences with Legion on the Centurion Cluster, August 17, 1998.


The Legion Project spends most of its time building metacomputing environments and applications, but we are also actively building a clustered system, which we have named Centurion. Centurion is currently a cluster of 64 Alpha PCs, networked with low-latency gigabit networking hardware from Myricom, joined by a dozen x86-architecture PCs. We plan on doubling the size of the cluster in the near future. This shared-nothing, heterogeneous environment is an ideal fit for the Legion metacomputing system, which provides services such as naming, scheduling, heterogeneous communications, and parallel I/O. Centurion is roughly the size of "small" HPC systems in use today at supercomputing centers, giving us the opportunity to compare price and performance with traditional systems, and investigate the hardware and software challenges of building large clusters with commodity hardware.

This paper will cover our experiences with Centurion and Legion. We begin with an overview of the Legion metacomputing system. We will then discuss the costs and hidden costs of assembling a cluster. We then describe the performance of Centurion, using microbenchmarks of the communications system, standard parallel application benchmarks, and user applications. User applications running on the system come from a variety of scientific disciplines, and range from traditional MPI codes taken straight from supercomputers to more novel applications using "bag of tasks" and macro-dataflow formalisms. Within this range of applications, we will discuss our successes and failures with hiding network latency, load balancing, and most importantly, usability of the system by researchers without extensive training in parallel computing.

Son, Sang H.; Chaney, Craig; Thomlinson, Norris P.. Partial Security Policies to Support Timeliness in Secure Real-Time Databases, August 19, 1998.

Abstract: Conflicts in database systems with both real-time and security requirements can be unresolvable. We address this issue by allowing a database system to provide partial security in order to improve real-time performance when necessary. Systems that are partially secure allow potential security violations such as covert channel use at certain situations. We present the idea of requirement specification that enables the system designer to specify important properties of the database at an appropriate level. To help the designer, a tool can process the database specification to find unresolvable conflicts, and to allow the designer to specify the rules to follow during execution when those conflicts arise. We discuss several partial security policies and compare their performance in terms of timeliness and potential security violations.

Stankovic, John A.; Son, Sang H.; Nguyen, Chi D.. The Cogency Monitor: An External Interface Architecture for a Distributed Object-Oriented Real-Time Database System, August 19, 1998.

Abstract: We are developing a distributed database, called BeeHive, which could offer features along different types of requirements: real-time, fault-tolerance, security, and quality-of service for audio and video. Support of these features and potential trade-offs between them could provide a significant improvement in performance and functionality over current distributed database and object management systems. The BeeHive system however, must be able to interact with existing databases and import such data. In this paper, we present a high level design for this external interface and introduce the concept of a cogency monitor which acts as a gateway between BeeHive and the external world. The cogency monitor filters, shapes, and controls the flow of information to guarantee specified properties along the real-time, fault tolerance, quality of service, and security dimensions. We also describe three implementations of the cogency monitor and present some preliminary performance results that demonstrate the value of the cogency monitor.

Stankovic, John A.; Ramamritham, Krithi; Niehaus, Douglas; Humphrey, Marty; Wallace, Gary. The Spring System: Integrated Support for Complex Real-Time Systems, August 1, 1998.

Abstract: The Spring system is a highly integrated collection of software and hardware that synergistically operates to provide end-to-end support in building complex real-time applications. In this paper, we show how Spring's specification language, programming language, software generation system, and operating system kernel are applied to build a flexible manufacturing testbed. The same ingredients have also been used to realize a predictable version of a robot pick and place application used in industry. These applications are good examples of complex real-time systems that require flexibility. The goal of this paper is to demonstrate the integrated nature of the system and the benefits of integration; in particular, the use of reflective information and the value of function and time composition. The lessons learned from these applications and the project as a whole are also presented.

Chapin, Steve J.; Katramatos, Dimitrios; Karpovich, John; Grimshaw, Andrew S.. Resource Management in Legion, Feb 11, 1998.

Abstract: The recent development of gigabit networking technology, combined with the proliferation of low-cost, high-performance microprocessors, has given rise to metacomputing environments. These environments can combine many thousands of hosts, from hundreds of administrative domains, connected by transnational and world-wide networks. Managing the resources in such a system is a complex task, but is necessary to efficiently and economically execute user programs. In this paper, we describe the resource management portions of the Legion metacomputing system, including the basic model and its implementation. These mechanisms are flexible both in their support for system-level resource management but also in their adaptability for user-level scheduling policies. We show this by implementing a simple scheduling policy and demonstrating how it can be adapted to more complex algorithms.

Stankovic, John; Son, Sang H.; Hansson, Jorgen. Misconceptions about real-time databases, August 13, 1998.

Abstract: More and more databases are being used in situations where real-time constraints exist. A set of misconceptions have arisen regarding what a real-time database is and the appropriateness of using conventional databases in real-time applications. Nine misconceptions are identified and discussed. Various research challenges are enumerated and explained. In total, the paper articulates the special nature of real-time databases.

Pfaltz, John L.; Mikesell, David R.; Emael, William R.. Representing Metadata in the ADAMS Database System, July 25, 1998.

Abstract: A major concern of environmental scientists, and others with long term data requirements, has been the establishment of metadata standards so that data recorded today will be accessible 50 to 100 years hence. We contend that more important than the standards themselves will be the context in which they can be represented and can evolve as new requirements and technologies emerge. In this paper, we discuss an object oriented language which facilitates both the representation of metadata as well as its graceful evolution.

John L. Pfaltz; John E. Karro. Uniform Antimatroid Closure Spaces, August 9, 1998.

Abstract: Often the structure of discrete sets can be described in terms of a closure operator. When each closed set has a unique minimal generating set (as in convex geometries in which the extreme points of a convex set generate the closed set), we have an antimatroid closure space. In this paper, we show there exist antimatroid closure spaces of any size, of which convex geometries are only a sub-family, all of whose closed sets are generated by precisely the same number of points. We call them uniform closure spaces.

Peyton Jones, Simon L; Ramsey, Norman. Machine-independent support for garbage collection, debugging, exception handling, and concurrency (draft), August 8, 1998.

Abstract: For a compiler writer, generating good machine code for a variety of platforms is hard work. One might try to reuse a retargetable code generator from another compiler, but code generators are complex and difficult to use, and they limit one's choice of implementation language. One might try to use C as a portable assembly language, but C limits the compiler writer's flexibility and the performance of the resulting code. The wide use of C, despite these drawbacks, argues for a portable assembly language.

C-- is a new language designed expressly as a portable assembly language. C-- eliminates some of the performance problems associated with C, but in its originally-proposed form it does not provide adequate support for garbage collection, exception handling, and debugging. The problem is that neither the high-level compiler nor the C-- compiler has all of the information needed to support these run-time features. This paper proposes a three-part solution: new language constructs for C--, run-time support for C--, and restrictions on optimization of C-- programs.

The new C-- language constructs enable a high-level compiler to associate initialized data with spans of C-- source ranges and to specify ``alternate continuations'' for calls to procedures that might raise exceptions. The run-time support is an interface (specified in C) that the garbage collector, exception mechanism, and debugger can use to get access to both high-level and low-level information, provided that the C-- program is suspended at a safe point. High- and low-level information is coordinated by means of the C-- spans and a common numbering for variables. Finally, the C-- optimizer operates under the constraints that the debugger or garbage collector can change the values of local variables while execution is suspended, and that a procedure call with alternate continuations can return to more than one location.

This three-part solution also provides adequate support for concurrency, so the paper illustrates the problem and the proposed solution with examples from garbage collection, exception handling, debugging, and threads. The paper also includes a model of the dataflow behavior of C-- calls.

A number of open problems remain. The most serious have to do with apparent redundancies among spans and safe points, and with the interaction of debugging support with optimization.

This paper is very much work in progress. We are not yet satisfied with the solutions we've come up with. Perhaps you can help improve it.

Cohoon, James P.; Karro, John E.; Martin, W.N.; Niebel, William D.; Nagel, Klaus. Perturbation Method for Probabilistic Search for the Traveling Salesperson Problem, July 28, 1998.

Abstract: The Traveling Salesperson Problem (TSP), is an NP-complete combinatorial optimization problem of substantial importance in many scheduling applications. Here we show the viability of SPAN, a hybrid approach to solving the TSP that incorporates a perturbation method applied to a classic heuristic in the overall context of a probabilistic search control strategy. In particular, the heuristic for the TSP is based on the minimal spanning tree of the city locations, the perturbation method is a simple modification of the city locations, and the control strategy is a genetic algorithm (GA). The crucial concept here is that the perturbation of the problem (since the city locations specify the problem instance) allows variant solutions (to the perturbed problem) to be generated by the heuristic and applied to the original problem, thus providing the GA with capabilities for both exploration and exploitation in its search process. We demonstrate that SPAN outperforms, with regard to solution quality, one of the best GA systems reported in the literature.

Pfaltz, John L.; Karro, John E.. Transformations of Antimatroid Closure Spaces, July 23, 1998.

Abstract: Investigation of the transformations of vector spaces, whose most abstract formulations are called matroids, is basic in mathematics; but transformations of discrete spaces have received relatively little attention. This paper develops the concept of transformations of discrete spaces in the context of antimatroid closure spaces. The nature of these transformations are quite different from those encountered in linear algebra because the underlying spaces are strikingly different. The transformation properties of "closed", "continuous" and "order preserving" are defined and explored. The classic graph transformations, homomorphism and topological sort, are examined in the context of these properties. Then we define a deletion which we believe plays a central role in discrete transformations. Antimatroid closure spaces, when partially ordered, can be interpreted as lattices. We show that deletions induce lower semihomomorphisms between the corresponding lattices.

Grimshaw, Andrew S.; Lewis, Michael J.; Ferrari, Adam J.; Karpovich, John F.. Architectural Support for Extensibility and Autonomy in Wide-Area Distributed Object Systems, June 3, 1998.

Abstract: The Legion system defines a software architecture designed to support metacomputing, the use of large collections of heterogeneous computing resources distributed across local- and wide-area networks as a single, seamless virtual machine. Metasystems software must be extensible because no single system can meet all of the diverse, often conflicting, requirements of the entire present and future user community, nor can a system constructed today take best advantage of unanticipated future hardware advances. Metasystems software must also support complete site autonomy, as resource owners will not turn control of their resources (hosts, databases, devices, etc.) over to a dictatorial system. Legion is a metasystem designed to meet the challenges of managing and exploiting wide-area systems. The Legion virtual machine provides secure shared object and shared name spaces, application adjustable fault-tolerance, improved response time, and greater throughput. Legion tackles problems not solved by existing workstation-based parallel processing tools, such as fault-tolerance, wide-area parallel processing, interoperability, heterogeneity, security, efficient scheduling, and comprehensive resource management. This paper describes the Legion run-time architecture, focussing in particular on the critical issues of extensibility and site autonomy.

Powell, Allison L.; French, James C.. The Potential to Improve Retrieval Effectiveness with Multiple Viewpoints, May 14, 1998.

Abstract: We propose that providing multiple viewpoints of a document collection and allowing users to move among these viewpoints during a search or browse session will facilitate the location of useful documents. In this paper, we present the results of preliminary experiments that illustrate the potential benefits of this approach. These experiments utilize a full text view and a manually assigned topic descriptor view. We propose a measure for determining the amount of agreement among topic descriptors assigned to documents and show results for the TREC SJMN and ZIFF collections. For those same collections, we then show the potential retrieval improvement when the two viewpoints are utilized.

Ferrari, Adam J.; Grimshaw, Andrew S.. Basic Fortran Support in Legion, March 4, 1998.

Abstract: Fortran is the most widely used programming language for high-performance scientific computing applications, yet in the past the Legion system has not supported objects implemented in Fortran. This paper describes the design and interface of the Legion Basic Fortran Support (BFS) system. This system consists of compiler and runtime library that allow the description of Legion object interfaces in a Fortran-like Interface Description Language (IDL), and the implementation of Legion objects using Fortran. The system also supports remote method invocations on Legion objects through the use of pseudo-comments: Legion BFS directives embedded in normal Fortran comment lines. These method invocations are processed using a macro-dataflow model similar to that provided by the Mentat Programming Language, thus allowing both inter- and intra-method parallelism.

Weikle, Dee A. B.; McKee, Sally A.; Wulf, Wm. A.. A Stake In The Ground: A New Approach To Cache Analysis, December 6, 1997.

Abstract: As the processor-memory performance gap continues to grow, so does the need for effective tools and metrics to guide the design of efficient memory hierarchies to bridge that gap. Aggregate statistics of cache performance can be useful for comparison, but they give us little insight into how to improve the design of a particular component. We propose a different approach to cache analysis -- viewing caches as filters -- and present two new metrics for analyzing cache behavior: instantaneous hit rate and instantaneous locality. We demonstrate how these measures can give us insight into the reference pattern of an executing program, and show an application of these measures in analyzing the effectiveness of the second level cache of a particular memory hierarchy.

Beekwilder, Norm; Grimshaw, Andrew S.. Parallelization of an Axially Symmetric Steady Flow Program, May 29, 1998.

Abstract: Description of the modifications made to an axially symmetric steady flow program to convert it from a sequential code to a parallel code. Some of the difficulties encountered and performance results are discussed.

Ramsey, Norman; Davidson, Jack W.. Machine Descriptions to Build Tools for Embedded Systems, February 19, 1998 (minor revisions May 15, 1998).

Abstract: Because of poor tools, developing embedded systems can be unnecessarily hard. Machine descriptions based on register-transfer lists (RTLs) have proven useful in building retargetable compilers, but not in building other retargetable tools. Simulators, assemblers, linkers, debuggers, and profilers are built by hand if at all---previous machine descriptions have lacked the detail and precision needed to generate them. This paper presents detailed and precise machine-description techniques that are based on a new formalization of RTLs. Unlike previous notations, these RTLs have a detailed, unambiguous, and machine-independent semantics, which makes them ideal for supporting automatic generation of retargetable tools. The paper also gives examples of Lambda-RTL, a notation that makes it possible for human beings to read and write RTLs without becoming overwhelmed by machine-dependent detail.

Coppit, David W.; Sullivan, Kevin J.. Unwrapping, May 4, 1998.

Abstract: A key driver of software obsolescence is change in hardware and system software standards. In the area of software tools, there is now great pressure to host on Intel/Windows platforms tool functions that in the past were typically found in Unix or mainframe environments. In such cases, there can be value in reusing core parts of such legacy systems. Two standard methods for doing this are reengineering and wrapping. Reengineering, being unrestricted in the changes allowed, permits the removal of obsolete parts of a system but creates the risk that changes will break the complex and brittle reused code. Wrapping involves the reuse of existing code without change, but at the cost of including obsolete elements into the new environment. In this paper we propose unwrapping as a new synthesis of these two approaches. To unwrap is to remove unwanted design elements, but with a strong emphasis on leaving core code unchanged. We discuss our preliminary use of this approach to reuse core elements of two Unix-based reliability engineering tools in a modern tool based on package-oriented programming.

Childers, Bruce R.; Davidson, Jack W.. A Design Environment for Counterflow Pipeline Synthesis, March 1, 1998.

Abstract: The Counterflow Pipeline (CFP) organization may be a good target for synthesis of application-specific microprocessors for embedded systems because it has a regular and simple structure. This paper describes a design environment for tailoring CFP's to an embedded application to improve performance. Our system allows exploring the design space of all possible CFP's for a given application to understand the impact of different design decisions on performance. We have used the environment to derive heuristics that help to find the best CFP for an application. Preliminary results using our heuristics indicate that speedup for several small graphs range from 1.3 to 2.0 over a general-purpose CFP and that the heuristics find designs that are within 10% of optimal.

Wasson, Glenn S.; Natrajan, Anand; Gunderson, Jim P.; Ferrer, Gabriel J.; Martin, Worthy N.; Reynolds Jr., Paul F.. Consistency Maintenance in Autonomous Agent Representations, March 18, 1998.

Abstract: Multi-representation models are an attractive design solution for layered autonomous agent architectures in terms of man aging complexity of internal representations and separation of design concerns. However, the representations maintained by the multiple layers can become inconsistent with one another due to the nature of the layers' concerns and capabilities. Consistency maintenance in multi-representation models is an emerging concern in many domains, such as military simulations. We present an approach to consistency maintenance in multi-tier agent architectures. We draw on experience and techniques from the multi-representation modeling community. The benefit of this approach for autonomous agent designers is a conceptual framework through which to organize systems that must deal with temporal and mapping inconsistencies.

CS-92-03 (Not available on-line)
Knight, J.C.; Kienzle, D.M.. Software Safety: A Formal Approach, January 20, 1992.

CS-93-56 (Not available on-line)
Lavinus, J.W.; Cohoon, J.P.. A Faster Dynamic Programming Algorithm For Exact Rectilinear Steiner Trees, July 6, 1993.

CS-94-31 (Not available on-line)
Olson, T.; Brill, F.; Wasson, G.; Wong, J.. Software for Advanced Vision Systems, December 1994.

CS-94-32 (Not available on-line)
Kienzle, Darrell. Experience in Safety-Critical Software Specification, August 31, 1994.

CS-94-33 (Not available on-line)
Kienzle, Darrell. Information Flow Analysis: Techniques for Rigorously Deriving Software Safety Requirements, August 31, 1994.

CS-94-34 (Not available on-line)
Kienzle, Darrell; Knight, John. The Case for a Safety Specification, August 31, 1994.

CS-94-47 (Not available on-line)
Ganley, J.L.; Cohoon, J.P.. FPGA Layout by Congestion-Driven Simultaneous Placement and Routing, October 5, 1994.

CS-94-49 (Not available on-line)
Alexander, Michael; Robins, Gabriel. New Performance-Driven FPGA Routing Algorithms, October 13, 1994.

CS-95-01 (Not available on-line)
Ganley, J.L.; Cohoon, J.P.. Thumbnail Rectilinear Steiner Trees, January 6, 1995.

CS-95-05 (Not available on-line)
Oh, Y.. Scheduling Periodic Tasks in a Hard Real-Time Environment, January 27, 1995.

CS-95-13 (Not available on-line)
Ganley, J.L.; Cohoon, J.P.. Provably Good Moat Routing, March 3, 1995.

CS-95-15 (Not available on-line)
Alexander, M.; Cohoon, J.; Colflesh, J.; Karro, J.; Robins, G.. Three-Dimensional Field-Programmable Gate Arrays, March 4, 1995.

CS-95-21 (Not available on-line)
Robins, Gabe; Robinson, Brian. Pattern Minefield Detection From Inexact Data, April 17, 1995.

CS-95-24 (Not available on-line)
Wika, Kevin G.. Safety Kernel Enforcement of Software Safety Policies, May 1995.

CS-95-25 (Not available on-line)
Robins, Gabe. Surfing the Internet, May 1995.

CS-95-30 (Not available on-line)
Elder, Matthew. Specification of User Interface for Safety-Critical Systems, July 7, 1995.

CS-95-31 (Not available on-line)
Viles, C.L.; French, J.C.. On the Update of Term Weights in Dynamic Information Retrieval Systems, August 15, 1995.

CS-95-33 (Not available on-line)
McKee, Sally; Wulf, Wm. A.. A Memory Controller for Improved Performance of Streamed Computations on Symmetric Multiprocessors, March 1, 1995.

CS-95-36 (Not available on-line)
Ferrer, A.; Grimshaw, A.. Persistent Object State Management in Legion, August 1995.

CS-95-40 (Not available on-line)
Knight, J.C.; Cass, A.. Achieving Software Quality Through Reuse, September 1995.

CS-95-41 (Not available on-line)
Wika, K.G.; Wrege, S.. Exhausting Testing as a Verification Technique, September 1995.

CS-95-42 (Not available on-line)
Knight, J.C.; Nakano, L.G.; Sarkov, A.. Eliciting Background Information for Safety-Critical Software Specifications, September 1995.

CS-95-43 (Not available on-line)
Bailey, M.; Davidson, J.W.. Describing the Representation of Floating-Point Values, September 1995.

CS-95-47 (Not available on-line)
McKee, S.A.; Klenke, R.H.; Wulf, Wm. A.; Moyer, S.A.; Wright, K.L.; Oliver, C.W.; Aylor, J.H.. High-Speed Memory Access for Streamed Computations, November 1995.

CS-96-11 (Not available on-line)
Brill, Frank; Wasson, Glenn; Martin, Worthy. Sensing and Representing Dynamic Environments: Judicious Use of Representation Measurably Improves Performance, January 1996.

CS-96-18 (Not available on-line)
Sullivan, Kevin; Socha, John. Shaky Foundations? Using Formal Methods to Reason About Architectural Standards, November 1996.

CS-97-10 (Not available on-line)
Haddleton, Russell. Parallel Query Performance in ADAMS, May 2, 1997.

CS-97-14 (Not available on-line)
Sullivan, W.; Chalasani, P.; Jha, S.. Software Design Decisions As Real Options, May 1997.

CS-97-18 (Not available on-line)
Helvig, C.S.; Robins, G.; Zelikovsky, A.. An Approximation Scheme For The Group Steiner Tree Problem, July 1997.

CS-97-21 (Not available on-line)
Pfaltz, John. Object Equivalence, August 1997.

CS-97-30 (Not available on-line)
Kahng, A.; Robins, G. ; Singh, C. ; Wang, A. ; Zelikovsky, A. . Filling and Scotting: Analysis and Algorithms, December 14, 1997.

CS-95-48 (Not available on-line)
Wulf, Wm.; Weikle, Dee; Bingler, R.W.; McKee, Sally. A New Approach to Cache Analysis, November 1995.

CS-96-01 (Not available on-line)
Bailey, Mark W.; Davidson, Jack W.. Reusable Application-Dependent Machine Descriptions, February 2, 1996.

CS-96-17 (Not available on-line)
Bailey, Mark W.; Davidson, Jack W.. Construction of Systems Software Using Specifications of Procedure Calling Conventions, September 30, 1996.

CS-97-07 (Not available on-line)
Bateman, C. O.; Helvig, C. H.; Robins, Gabe; Zelikovsky, A.. Provably Good Routing Tree Construction with Multi-Port Terminals, April 9, 1997.

Hopenwasser, Lance. An Isotach Network Simulator, March 11, 1998.

Abstract: We have developed a simulator of Isotach systems which run on a Myrinet network. The simulator allows us to study the performance characteristics of the system and provides a means for experimenting with alternative implementations of Isotach networks. The rate at which tokens flow through the network determines the rate at which Isotach logical time progresses; therefore, token behavior is of keen interest. The simulator gathers performance data that is used to examine the behavior of tokens within the network under several different conditions. The simulator is also capable of obtaining similar data on barriers that make use of the Isotach network. These barriers are of interest, because they can be used to establish network-wide checkpoints in addition to other application-level uses. The simulator can provide data on characteristics of the network that most affect barrier completion time. Furthermore, the simulator can be used to compare the Isotach barrier algorithm to a simple centralized non-Isotach barrier algorithm. We present some of the issues that influenced the design of the simulator.

Viega, John; Tutt, Bill; Behrends, Reimer. Automated Delegation is a Viable Alternative to Multiple Inheritance in Class Based Languages, March 2, 1998.

Abstract: Multiple inheritance is still a controversial feature in traditional object-oriented languages, as evidenced by its omission from such notable languages as Modula-3, Objective C and Java . Nonetheless, users of such languages often complain about having to work around the absence of multiple inheritance. Automating delegation, in combination with a multiple subtyping mechanism, provides many of the same benefits as multiple inheritance, yet sidesteps most of the associated problems. This simple feature could satisfy both the designers and the users of class based object oriented languages. In this paper, we discuss why automated delegation is desirable. We also present Jamie, a freeware preprocessor-based extension to Java that offers such an alternative.

McKee, Sally A.; Weikle, D. A. B.; Wulf, Wm. A.. TSpec: A Specification of Memory Access Traces, August 15, 1997.

Abstract: This document describes Tspec, a "little language" for specifying memory traces, and Tint, an interpreter for Tspec. Tspec can be used to generate synthetic traces, or can be used to represent an actual trace in a human-readable, compressed format. We do not address here how to convert a trace into a Tspec specification.

Weikle, D.A.B.; Wulf, Wm. A.. TGen: Programmer's Reference Manual, August 15, 1997.

Abstract: The TGen (Trace Generation) Processor is a VLIW (Very Long Instruction Word) machine designed to generate general memory references to an arbitrary memory sys tem. The first version, called STAG (Smc TrAce Generation) in the Electrical Engineering Department, is being implemented with an SMC (Stream Memory Controller) and a path for simulating memory accesses to a generic hierarchical caching system.

Barker, Allen L.. Self-Organizing Switchbox Routing on an Adaptive Resistor Grid, Dec. 13, 1997.

Abstract: The switchbox routing problem arises in the fabrication of computer chips, and is representative of a general class of routing problems. We describe a self-organizing algorithm for solving certain instances of the switchbox routing problem. The method is based on path formation via a positive feedback process, with competitive interactions. We define such a process on a grid of adaptive, variable resistors and simulate its dynamics. The method is applied to several problem instances.

Childers, Bruce R.; Davidson, Jack W.. Automatic Counterflow Pipeline Synthesis, January 19, 1998.

Abstract: The Counterflow Pipeline (CFP) organization may be a good target for synthesis of application-specific microprocessors because it has a regular and simple structure. This paper describes early work using CFP's to improve overall application performance by tailoring a CFP to the kernel loop of an application. A CFP is customized for an application using the kernel loop's data dependency graph to determine processor functionality and interconnection. Our technique builds the design space for a given data dependency graph and explores the space to find the design with the best performance. Preliminary results indicate that speed-up for several small graphs range from 1.3 to 2.0 and that our design space traversal heuristics find designs that are within 10% of optimal.

Ramsey, Norman; Davidson, Jack W.. Specifying Instructions' Semantics Using CSDL (Preliminary Report), November 24, 1997.

Abstract: The Zephyr project is part of an effort to build a National Compiler Infrastructure, which will support research in compiling techniques and high-performance computing. Compilers work with source code, abstract syntax, intermediate forms, and machine instructions. By using high-level descriptions of the representations and semantics of these forms, we expect to be able to create compiler components that will be usable with different source languages, front ends, and target machines. To help deal with multiple machines, we are developing a family of Computer Systems Description Languages (CSDL) to describe properties that are relevant to the construction of compilers and other systems software. The languages describe properties of a machine's instructions or its mutable state, or both. Of particular interest is the description of the semantics of instructions, i.e., their effects on the state of the machine. This report describes our preliminary design of Lambda-RTL, a CSDL language for specifying instructions' semantics. We describe the effects of instructions using register transfer lists (RTLs). A register transfer list is a collection of assignments to locations, which represent registers, memory, and all other mutable state. We prescribe a form of RTLs that makes it explicit how to compute the values assigned and on what state the computation depends. The form also makes byte order explicit and provides for instructions whose effects may be undefined. Because our form of RTLs contains so much information, it is convenient for use by tools, but it would be tedious to write RTLs by hand. Lambda-RTL, which is based on the lambda-calculus and on register transfer lists, is a metalanguage designed to make it easier for people to write RTLs and to associate them with machine instructions. It enables us to omit substantial information from hand-written specifications; the Lambda-RTL translator infers the missing information and puts the resulting RTLs into canonical form. Lambda-RTL also provides a ``grouping'' construct designed to help specify large groups of similar instructions. We are still designing Lambda-RTL. This report presents a short overview of Lambda-RTL, followed by examples. The examples include definitions of basic operators, which we believe will be useful for describing a wide variety of machines, as well as excerpts from descriptions of the SPARC and Pentium. We have chosen excerpts that illustrate features which are characteristic of these particular machines; for example, we show a model of SPARC register windows, and we show how Lambda-RTL can help manage the complexity of the Pentium instruction set. Both the machine descriptions and Lambda-RTL itself are under development, so this report is a snapshot of a work in progress. We issue it now to solicit feedback both on our overall approach and on the details of Lambda-RTL. Please send feedback by electronic mail to zephyr-investigators@virginia.edu.

Ferrari, Adam J.. JPVM: Network Parallel Computing in Java, December 8, 1997.

Abstract: The JPVM library is a software system for explicit message-passing based distributed memory MIMD parallel programming in Java. The library supports an interface similar to the C and Fortran interface provided by the Parallel Virtual Machine (PVM) system, but with syntax and semantics modifications afforded by Java and better matched to Java programming styles. The similarity between JPVM and the widely used PVM system supports a quick learning curve for experienced PVM programmers, thus making the JPVM system an accessible, low-investment target for migrating parallel applications to the Java platform. At the same time, JPVM offers novel features not found in standard PVM such as thread safety, multiple communication end-points per task, and default-case direct message routing. JPVM is implemented entirely in Java, and is thus highly portable among platforms supporting some version of the Java Virtual Machine. This feature opens up the possibility of utilizing resources commonly excluded from network parallel computing systems such as Macintosh and Windows-NT based systems. Initial applications performance results achieved with a prototype JPVM system indicate that the Java-implemented approach can offer good performance at appropriately coarse granularities.

Knight, J.; Elder, M.; Flinn, J.; Marx, P.. Summaries of Three Critical Infrastructure Applications, November 14, 1997.

Abstract: Modern society has grown to depend heavily upon a set of infrastructure services for day-to-day operations to proceed normally and efficiently. Examples of these infrastructure services include power distribution, telecommunications, transportation, and financial systems. In the event of widespread failure in any of these systems, normal activities would be severely disrupted; therefore, these systems are referred to as critical infrastructure applications. This report presents the results of a study of three particular critical infrastructure applications: the banking and finance domain, the rail transport industry, and the air traffic control system. An overview of each application area is provided, including major processes and information concerning system architecture and implementation. In addition, a detailed bibliography is provided for each application domain.

Note: Updated version available at http://www.cs.virginia.edu/~jck/publications/domainAnalysis.pdf.

Nguyen-Tuong, Anh; Grimshaw, Andrew S.. Building Robust Distributed Applications With Reflective Transformations, November 14, 1997.

Abstract: Several projects are currently underway to build the nation's next generation computing infrastructure. These projects are sometimes called metasystems projects and seek to provide the illusion of a single, unified, virtual computing environment to end users. We expect metasystems to consist of millions of hosts and billions of objects, and on this scale, resource failures will be the norm and no longer the exception. One of the technical challenges that must be solved before such systems become usable is the widespread adoption of fault-tolerance techniques for botho consist of millions of hosts and billions of objects, and on this scale, resource failures will be the norm and no longer the exception. One of the technical challenges that must be solved before such systems become usable is the widespread adoption of fault-tolerance techniques for both system level services and user applications. As part of the Legion metasystem project, we have developed an architecture for incorporating fault-tolerance techniques into user applications. Our approach is based on reflective dynamic transformation techniques that manipulate the control and data flow characteristics of an application to achieve the desired fault-tolerance policies. In this paper, we present our Reflective Graph & Event model (RGE) computation model and illustrate sample reflective transformations for incorporating exception handling and replication techniques into applications.

Knight, John C.; Lubinsky, Raymond W.; McHugh, John; Sullivan, Kevin J.. Architectural Approaches to Information Survivability, September 2, 1997.

Abstract: Many large information systems have evolved to a point where the normal activities of society depend upon their continued operation. Significant concerns have been raised about the possible effects of failure in these systems. In this paper we discuss architectural approaches to improving the survivability of critical information systems and present a candidate architecture. The key features of the architecture are the use of a variety of shell structures (sometimes also known as wrappers) and the use of a network-wide approach to recovery and continued service. We discuss the design, implementation, and verification issues raised by the use of shells in complex distributed systems and introduce three types of shell: protection, enhancement, and correction. Combinations of these shells are used to ensure that the critical information system is protected against a wide variety of hazards ranging from software defects to malicious attacks. The implementation of shells is discussed and it is shown that the desirable characteristic of transparent implementation cannot generally be achieved, and that ensuring the correct operation of the shells is itself a significant issue. A demonstration system being developed for evaluation of the architectural concepts is presented.

DeJong, Colleen; Gibble, Matthew; Knight, John; Nakano, Luis. Formal Specifications: A Systematic Evaluation, May 1997.

Abstract: Industrial practitioners require constant improvements in the software development process and the quality of the resulting product in order to satisfactorily build larger and more complex systems. Academia praises formal specification techniques as a means to achieve these goals, yet formal specification has not been widely adopted by industry. The focus of this research is to study the disparity betweev industry and academia in their experience with formal specification methods. During the specification of a significant software system, a control system for a nuclear reactor, it became clear that the use of formal specification methods had potential benefits, but there were practical requirements thatt were not being met. Previous evaluations of formal specification failed to identify many of these flaws and a new comprehensive approach based on the requirements of the current software development process is needed. A comprehensive approach to evaluation was developed as part of this research. The evaluation method presented here does not examine theoretical qualities of language form and structure, rather it examines basic but vital practical issues involving the notation, tools, and methods for using them. There were two objectives of this research: - to identify these practical requirements and create a list of criteria for formal specification methods - to evaluate several formal specification methods based on these criteria. The criteria were systematically derived from current software development practice. This derivation links the criteria with specific activities in the software development process and supports their inclusion in the evaluation. Using this set of criteria, an evaluation of three formal specification methods, Z, PVS, and statecharts, was conducted by developing and examining specifications for a preliminary version of the reactor control system.

Gibble, Matthew; Knight, John C.; Nakano, Luis G.; DeJong, Colleen. Formal Verification: An Evaluation, May 1997.

Abstract: Despite extensive development over many years and significant demonstrated benefits, formal methods remain poorly accepted by industrial practitioners. Many reasons have been suggested for this situation such as a claim that they extend the development cycle, that they require difficult mathematics, that inadequate tools exist, and that they are incompatible with other software packages. There is little empirical evidence that any of these reasons is valid. The research presented here addresses the question of why formal methods are not used more widely. The approach used was to develop a formal specification for a safety-critical application using PVS and assess the results in a comprehensive evaluation framework. The results of the experiment suggests that there remain many impediments to the routine use of formal methods.

Sullivan, Kevin J.; Marchukov, Mark. Interface Negotiation and Efficient Reuse: A Relaxed Theory of the Component Object Model, May 9, 1997.

Abstract: Reconciling requirements for (1) the efficient integration of independently developed and evolving components and (2) the evolution of systems built from such components requires novel architectural styles, standards and idioms. Traditional object-oriented approaches have proven inadequate. Two important new mechanisms supporting integration and evolution are dynamic interface negotiation and aggregation, an approach to efficient composition. Both feature prominently in the Component Object Model (COM), a de facto standard providing the architectural foundation for many important systems. Because these are important mechanisms in general, and because they are central to COM in particular, it is essential that engineers be able to reason effectively about them. In earlier work (Sullivan et al. 1997), we showed that reasoning about them is hard and that formal mathematical theories of such mechanisms can provide a foundation for effective reasoning. In this paper we present a new theory of interface negotiation and aggregation in COM. Our new theory is based on a relaxed interpretation of the COM specification. Our earlier theory reflected an interpretation of the specification in which components had to be designed to follow COM-specified rules for interface negotiation and aggregation under any possible usage. Our new, strictly weaker theory requires only that actual system executions not manifest any violations of the rules. Architectural styles using mediators that we showed to be untenable under the earlier theory are tenable under this one provided the designers follow certain rules. We derive these necessary and sufficient conditions for legal use of interface negotiation in the presence of aggregation. Our results provide a basis for documenting what engineers must not do to use aggregation and interface negotiation properly.

Ramsey, Norman. Unparsing Expressions With Prefix and Postfix Operators, August 25, 1997.

Abstract: Automatic generation of computer programs from high-level specifications is a well-known technique, which has been used to create scanners, parsers, code generators, tree rewriters, packet filters, dialogue handlers, machine-code recognizers, and other tools. To make it easier to debug an application generator, and to help convince prospective users that the generated code can be trusted, the generated code should be as idiomatic and readable as possible. It is therefore desirable, and sometimes essential, that the generated code use infix, prefix, and postfix operators, and that they be presented without unnecessary parentheses. This paper presents a method of automatically parenthesizing expressions that use prefix, postfix, and implicit operators; the method is compatible with automatic prettyprinting. The paper also shows how to parse expressions that use such operators. The parsing algorithm can be used even if operator precedences are not known at compile time, which means that it can be used with an arbitrary number of user-defined precedences.

Natrajan, Anand. Authentication Based on Logical Time, May 10, 1996.

Abstract: Timestamp-based authentication protocols rely on real-time synchronisation between the principals involved. This synchronisation, which involves all concerned principals agreeing on the notion of their time, is often difficult to achieve, and hence nonce-based protocols were developed. However, the principals in a timestamp-based authentication protocol can be made to synchronise to logical time. Efficient logical time systems can guarantee that processors (or principals in this case) agree on a logical time. Using this property, new protocols can be developed for various communication paradigms. We show one such protocol for interactive communication between two parties using a trusted authentication server.

Natrajan, Anand; Powell, Allison L.; French, James C.. Using N-grams to Process Hindi Queries with Transliteration Variations, July 16, 1997.

Abstract: Retrieval systems based on N-grams have been used as alternatives to word-based systems. N-grams offer a language-independent technique that allows retrieval based on portions of words. A query that contains misspellings or differences in transliteration can defeat word-based systems. N-gram systems are more resistant to these problems. We present a retrieval system based on N-grams that uses a collection of Hindi songs. Within this retrieval system, we study the effect of varying N on retrievability. Additionally, we present an alternative spell-checking tool based on N- grams. We conclude with a discussion of the number of N-grams produced by different values of N for different languages and a discussion of the choice of N.

Golmie, Nada; Corner, Mark; Liebeherr, Jorg; Su, David. Improving the Effectiveness of ATM Traffic Control over Hybrid Fiber-Coax Networks, March 26, 1997.

Abstract: The IEEE 802.14 working group is currently standardizing a new media access control (MAC) protocol for the emerging Hybrid Fiber Coax (HFC) networks. Crucial for the success of 802.14 will be its ability to support higher layer traffic services, namely, ATM Constant Bit Rate (CBR), Variable Bit Rate (VBR) and Available Bit Rate (ABR) traffic classes. In this study, we investigate the interoperation of the MAC protocol, defined by 802.14, with ABR transmissions. An important finding of our study is that the bandwidth contention on the upstream channel in the HFC network may interfere with the feedback congestion control mechanisms of ABR traffic control. This interference can result in unfairness between ABR sources, and decreased utilization of the upstream HFC channel. As a solution to the problem we propose a scheme whereby the headend station of the HFC network returns congestion information contained in resource management (RM) cells to the ABR sources. The proposed mechanism can be incorporated into the ABR rate control scheme without modifying the current traffic management specifications. Numerous simulation scenarios are presented to illustrate our findings. Parts of the results have been presented to the IEEE 802.14 standardization committee.

Regehr, John. An Isotach Implementation for Myrinet, May 16, 1997.

Abstract: An isotach network provides strong guarantees about message delivery order. We show that an isotach network can be efficiently implemented entirely in software, on off-the-shelf hardware. Parts of this implementation could be performed much more efficiently in hardware - we are currently developing hardware components to do this. The all-software version then serves several purposes: to rapidly develop a working isotach system to use as a platform for development of higher level software, to find potential problems in the hardware design, and to pre-test software components of the system so that they won't have to be debugged at the same time as hardware.

Ramsey, Norman. Eliminating Spurious Error Messages Using Exceptions, Polymorphism, and Higher-Order Functions, April 3, 1997.

Abstract: Many language processors make assumptions after detecting an error. If the assumptions are invalid, processors may issue a cascade of error messages in which only the first represents a true error in the input; later messages are side effects of the original error. Eliminating such spurious error messages requires keeping track of values within the compiler that are not available because of a previously detected error. Examples include symbol-table entries, types, and intermediate code. This paper presents a discipline for tracking unavailable values and avoiding cascading error messages. The discipline itself is unsurprising, but it is both formalized and implemented in terms of a type constructor and combinators expressed in Standard ML, and the ML type rules enforce the discipline. The type constructor distinguishes intermediate results that are unavailable because of a previously detected error. The combinators transform ordinary functions, which assume all intermediate results are available, into functions that silently propagate the unavailability of intermediate results. ML's type system guides the application of the combinators; if the compiler writer does not account for a potentially unavailable value, the source code of the compiler does not type-check. The techniques presented exploit several features of Standard ML, including exceptions, higher-order functions, parametric polymorphism, and static type checking. Using these features enables the ML type system to ensure that the error-tracking discipline is applied consistently, relieving the programmer of that burden.

Stankovic, John A.; Son, Sang H.; Liebeherr, Jorg. BeeHive: Global Multimedia Database Support for Dependable, Real-Time Applications, April 17, 1997.

Abstract: The confluence of computers, communications and databases is quickly creating a global virtual database where many applications require real-time access to both temporally accurate and multimedia data. We are developing a global virtual database, called BeeHive, which is enterprise specific and offers features along real-time, fault tolerance, quality of service for audio and video, and security dimensions. Support of all these features and tradeoffs between them will provide significant improvement in performance and functionality over browsers, browsers connected to databases, and, in general, today's distributed databases. We present a high level design for BeeHive and various novel component technologies that are to be incorporated into BeeHive.

Ferrari, Adam J.; Chapin, Stephen J. ; Grimshaw, Andrew S.. Process Introspection: A Heterogeneous Checkpoint/Restart Mechanism Based on Automatic Code Modification, March 25, 1997.

Abstract: Process Introspection is a fundamentally new solution to the process checkpoint/restart problem suitable for use in high-performance heterogeneous distributed systems. A process checkpoint/restart mechanism for such an environment has the primary requirement that it must be platform-independent: process checkpoints produced on a computer system of one architecture or operating system platform must be restartable on a computer system of a different architecture or operating system platform. The central feature of the Process Introspection approach is automatic augmentation of program code to incorporate checkpoint and restart functionality. This program modification is performed at a platform-independent intermediate level of code representation, and preserves the original program semantics. This approach has attractive properties including portability, ease of use, customizability to application-specific requirements, and flexibility with respect to basic performance trade-offs. Our solution is novel in its true platform- and run-time system independence - no system support or non-portable code is required by our core mechanisms. Recent experimental results obtained using a prototype implementation of the Process Introspection system indicate the overheads introduced by the mechanisms are acceptable for computationally demanding applications.

Lucas, Matthew T. ; Wrege, Dallas E. ; Dempsey, Bert J.; Weaver, Alfred. Statistical Characterization of Wide-Area Self-Similar Network Traffic, October 9, 1996.

Abstract: Background traffic models are fundamental to packet-level network simulation since the background traffic impacts packet drop rates, queuing delays, end-to-end delay variation, and also determines available network bandwidth. In this paper, we present a statistical characterization of wide-area traffic based on a week-long trace of packets exchanged between a large campus network, a state-wide educational network, and a large Internet service provider. The results of this analysis can be used to provide a basis for modeling background load in simulations of wide-area packet-switched networks such as the Internet, contribute to understanding the fractal behavior of wide-area network utilization, and provide a benchmark to evaluate the accuracy of existing traffic models. The key findings of our study include the following: (1) both the aggregate and its component substreams exhibit significant long-range dependencies in agreement with other recent traffic studies, (2) the empirical probability distributions of packet arrivals are log-normally distributed, (3) packet sizes exhibit only short-term correlations, and (4) the packet size distribution and correlation structure are independent from both network utilization and time of day.

Stamoulis, A. ; Liebeher, J.. S2GPS: Slow-Start Generalized Processor Sharing, February 17, 1997.

Abstract: Packet scheduling methods that approximate Generalized Processor Sharing (GPS) are currently the focus of much research in the area of Quality-of-Service (QoS) networks. The ability of GPS schedulers to provide rate guarantees as well as delay guarantees meets the demand of many network applications. This paper addresses a shortcoming of GPS which can have significant impact on the service provided by GPS, however, which has been given little attention. Since, with GPS, the service rate received by a session is proportional to the number of backlogged sessions in the system, the service rate of a session may change abruptly if some other session becomes active. This may result in abrupt increases of delay of consecutive packets. In this paper, we propose a new scheduler, called {\it Slow-Start GPS} (S$^2$GPS), which alleviates the problem of abrupt decreases of service rates when new sessions start transmitting. S$^2$GPS is a modification of GPS where a session does not receive its guaranteed service rate immediately after it becomes active. Instead, the service rate of a session is gradually increased. We show that this prevents an abrupt delay increase of the other sessions. We derive delay bounds for sessions constrained by leaky buckets and we express quantitatively the advantages of the S$^2$GPS scheduling discipline.

French, James C.; Powell, Allison L.; Schulman, Eric. Automating the Construction of Authority Files in Digital Libraries: A Case Study, January 14, 1997.

Abstract: The issue of quality control has become increasingly important as more online databases are integrated into digital libraries. This can have a dramatic effect on the search effectiveness of an online system. Authority work, the need to discover and reconcile variant forms of strings in bibliographic entries, will become more difficult. Spelling variants, misspellings, translation and transliteration differences all increase the difficulty of retrieving information. This paper is a case study of our efforts to create an authority file for authors' institutional affiliations in the Astrophysics Data System. The techniques surveyed here for the detection and categorization of variant forms have broader applicability and may be used in authority work for other bibliographic fields.

French, James C.; Powell, Allison L.; Schulman, Eric. Applications of Approximate Word Matching in Information Retrieval, January 10, 1997.

Abstract: As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. The need to discover and reconcile variant forms of strings in bibliographic entries, i.e., authority work, will become more difficult. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. Approximate string matching has traditionally been used to help with this problem. In this paper we introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms.

French, James C.; Viles, Charles L.. Exploiting Coauthorship to Infer Topicality in a Digital Library of Computer Science Technical Reports, December, 1996.

Abstract: We propose a method of mapping the topical content of distributed digital libraries and demonstrate the technique using data from the Networked Computer Science Technical Report Library (NCSTRL) digital library project. This method seeks to exploit information derived from document coauthorship to produce improved automatic subject classifications of the documents. In a distributed digital library, these subject classifications are useful in characterizing both intra-site and inter-site content. They are also helpful in providing secondary retrieval services. We present the method and describe an experiment and results showing that improved clusterings can be achieved relative to traditional document clustering.

Lewis, Michael J.; Grimshaw, Andrew S.. Using Dynamic Configurability to Support Object-Orientation in Legion, December 4, 1996.

Abstract: Wide area distributed object systems will require mechanisms for creating, describing, and managing objects. The mechanisms must be scalable and must not mandate particular policies or algorithms because users will have different cost, security, performance, and functionality demands. Legion is a wide area distributed object system that supports this requirement using a first-class active class system. A Legion class is an active Legion object that manages its instances, provides their system-level support, and defines their behavior. To enable encapsulation and code reuse, and to simplify the requirements of many Legion programmers, Legion classes can be shared among Legion objects. To support instances that outlive the program that creates them, classes are persistent. To enable scalability, classes are distributed throughout the system. Shared, persistent, active, distributed (SPAD) class systems have many benefits for wide area distributed object systems such as Legion. But supporting traditional class roles, including instantiation and inheritance, requires new and different implementation solutions. This paper describes one way to support object-oriented languages and systems in the Legion SPAD class system. Our method is based on dynamic configurability of objects and classes. We describe dynamic configurability, motivate the need for new implementation solutions for a SPAD class system, describe the implementation of our solution, and illustrate its functionality with an example.

Ferrari, Adam J.; Lewis, Michael J.; Viles, Charles L.; Nguyen-Tuong, Anh; Grimshaw, Andrew. Implementation of the Legion Library, November 26, 1996.

Abstract: Legion is a multi-year effort to design and build scalable meta-computer software. Legion provides the software "glue" for seamless integration of distributed, heterogeneous hardware and software components. Much of this "glue" is provided in Legion's run-time library. In this document we provide a detailed description of the implementation of the Legion run-time library. The library is designed and implemented as a layered, configurable software protocol stack. This design, coupled with an event-based mechanism for inter-layer communication, enables considerable flexibility and extensibility. In this document, we first present the library interface and show how the library can be used "as is", without internal modification. We then show how to modify or add functionality to the library. Finally, we provide qualitative descriptions of how the library could be extended to encompass programming styles and methods besides the object-oriented, macro data-flow style that is Legion's initial target.

Ferrari, Adam. Process Introspection: A Checkpoint Mechanism for High Performance Heterogeneous Distributed Systems, October 10, 1996.

Abstract: The Process Introspection project is a design and implementation effort, the main goal of which is to construct a general purpose, flexible, efficient checkpoint/restart mechanism appropriate for use in high performance heterogeneous distributed systems. This checkpoint/restart mechanism has the primary constraint that it must be platform independent; that is, checkpoints produced on one architecture or operating system platform must be restartable on a different architecture or operating system platform. The Process Introspection mechanism is based on a design pattern for constructing interoperable checkpointable modules. Application of the design pattern is automated by two levels of software tools: a library of support routines that facilitate the use of the design pattern, and a source code translator that automatically applies the pattern to platform independent modules. A prototype implementation of library has been constructed and used to demonstrate that the design pattern can be applied effectively to construct platform independent checkpointable programs that operate efficiently.

Wrege, Dallas E. ; Liebeherr, Jorg. A Near-Optimal Packet Scheduler for QoS Networks, June 1996.

Abstract: A packet scheduler in a quality-of-service (QoS) network should be sophisticated enough to support stringent QoS constraints at high loads, but it must also have a simple implementation so that packets can be processed at the speed of the transmission link. The Earliest-Deadline-First (EDF) scheduler is the optimal scheduler for bounded-delay services in the sense that it provides the tightest delay guarantees of any scheduler, but an implementation of EDF requires the sorting of packets, a complex operation that is not practical for high-speed networks. In this study we present the design, implementation, and analysis of the novel Rotating-Priority-Queues+ (RPQ+) scheduler that is near-optimal in the sense that it can approximate EDF with arbitrary precision. The RPQ+ scheduler uses a set of prioritized FIFO queues whose priorities are rearranged (rotated) periodically to increase the priority of waiting packets. We show that RPQ+ has the following desirable properties: its implementation requires operations independent of the number of queued packets, it can provide worst-case delay guarantees, and it is always superior to a Static-Priority (SP) scheduler. For shared-memory architectures, we show that RPQ+ can be implemented with little computational overhead. We derive expressions for the worst-case delays in an RPQ+ scheduler and demonstrate that the achievable network utilization increases with the frequency of queue rotations, approaching that of EDF in the limit. We use numerical examples, including examples based on MPEG video, to show that in realistic scenarios RPQ+ can closely approximate EDF even for infrequent queue rotations.

Liebeherr, Jorg ; Wrege, Dallas E.. An Efficient Solution to Traffic Characterization of VBR Video in Quality-of-Service Networks, May 1996.

Abstract: A network that offers deterministic, i.e., worst-case, quality-of-service (QoS) guarantees to variable-bit-rate (VBR) video must provide a resource reservation mechanism that allocates bandwidth, buffer space, and other resources for each video connection. Such a resource reservation scheme must be carefully designed, otherwise network resources are wasted. A key issue for the design of a resource reservation scheme is the selection of a traffic characterization that specifies the traffic arrivals on a video connection. The traffic characterization should accurately describe the actual arrivals so that a large number of connections can be supported, but it must also map directly into efficient traffic policing mechanisms that monitor arrivals on each connection. In this study, we present a fast and accurate traffic characterization method for stored VBR video in networks with a deterministic service. Our characterization approach approximates the so-called empirical envelope of a video sequence. We use this approximation to obtain a traffic characterization that can be efficiently policed by a small number of leaky buckets. We present a case study where we apply our characterization method to networks that employ a dynamic resource reservation scheme with renegotiation. We use traces from a set of 25-30 minute MPEG sequences to evaluate our method against other characterization schemes from the literature.

Sullivan, Kevin J.. Software Design: The Options Approach, June 17, 1996.

Abstract: Many software engineering principles and concepts that are critical to reasoning about problems in software design (of which software architecture is an important special case) remain ad hoc, idiosyncratic and poorly integrated. I argue that this is due to our lack of a clean theory about how to make software design decisions. In this paper I propose that we should view software design as a process of deciding how to make irreversible capital investment in software assets of uncertain value, and that financial options theory provides a firm, unifying, simplifying and well developed basis for such decision-making. To support this view, I interpret software architecture and other related concepts in options theoretic terms.

Wang, Chenxi; Wulf, William A.. A Distributed Key Generation Technique, March 29, 1996.

Abstract: In a public key cryptographic system, the uniqueness and authenticity of the keys are essential to the success of the system. Traditionally, a single, centralized key distribution/certification server has been used to generate and distribute keys. This approach requires a distinguished trusted entity which could potentially become a single point of failure or penetration in a distributed environment. We present in this paper a new, simple way to handle distributed key generation We assign a unique range of m-bit numbers to each key generator in the system. As a result, the lower-order m bits of the keys generated is a unique number in the assigned range. our scheme not only provides a way to generate globally unique keys in an independent, distributed fashion, it also enhances the security of public-key cryptosystems by eliminating the mapping between keys and entity names.

Zelikovsky, Alexander. Improved Approximation of Maximun Planar Subgraph, May 9, 1996.

Abstract: The maximum planar subgraph problem (MPSP) asks for a planar subgraph of a given graph with the maximum total cost. We suggest a new approximation algorithm for the weighted MPSP. We show that it has performance ratio of 5/12 in the case of graphs where the maxumum cost of an edge is at most twice the minimum cost of an edge. For the variant of MPSP that asks for an outerplanar graph, the algorithm suggested is the first with the higher performance ratio (7/12 instead of 1/2).

French, James C.; Bull, Glen L.; Tarrant, Martha R.; Powell, Allison L.. Library Access, Search and Retrieval (LASR) Pilot (Final Report), March 27, 1996.

Abstract: The Global Change Data and Information System (GCDIS) is a cooperative effort among eight United States government agencies and other organizations to provide public Internet access to their global change resources. The Library Access, Search and Retrieval (LASR) pilot project focused on in-library use of these resources, testing the GCDIS with the public from September, 1994 - December, 1995 in order to solicit information about the system's usefulness in its current state and suggestions which might be employed in the continued evolution and revision of it. This document is the final report of the LASR project.

Zelikovsky, Alexander. Better approximation bounds for the network and Euclidean Steiner tree problems, March 20, 1996.

Abstract: The network and Euclidean Steiner tree problems require a shortest tree spanning a given vertex subset within a network G=(V,E,d) and Euclidean plane, respectively. For these problems, we present a series of heuristics finding approximate Steiner trees with performance guarantees coming arbitrary close to 1+ln 2= 1.693... and 1+ln(2/sqrt3) = 1.1438..., respectively. The best previously known corresponding values are close to 1.746 and 1.1546.

Lucas, Matthew T.; Dempsey, Bert J.; Weaver, Alfred C.. Distributed Error Recovery for Continuous Media Data in Wide-Area Multicast, July 18, 1995.

Abstract: This paper proposes new mechanisms to improve the quality of wide-area disseminations of time-sensitive streams, such as packet voice and video, from a single source to multiple receivers. Most current packet-switched networks provide no end-to-end quality-of-service guarantees for packet delay or loss, which can reduce the playback quality at receivers. Buffering at the multicast receivers can be used to minimize the playback degradation due to network delay variations. Addressing packet losses is more problematic. Solutions are constrained by the long latencies that may exist between the source and some of its receivers, the danger of simultaneous messages from a large number of receivers overwhelming network and endsystem resources (multicast implosion), and the limited bandwidth available on most wide-area links. Adapting the transmission rate of the multicast source to the available network bandwidth prevents excessive packet loss and protects the network from severe congestion due to any single source. However, rate control does not recover packet losses. To recover packets dropped by the network, the transmission of redundant information in the data stream (forward error correction) has been shown effective, but redundancy may consume significant network bandwidth and is inefficient when receivers have heterogeneous error rates. In this paper we develop and evaluate a novel retransmission-based error recovery strategy. Unlike previous reliable multicast protocols, our protocol relies on retransmissions from one receiver to another, thus avoiding the limitations of long latencies and multicast implosion inherent to source-driven retransmissions. Local retransmissions enables low latency recovery, uses wide-area bandwidth sparingly, and isolates the impact of losses to one area of the network. Simulation results presented in this paper demonstrate that this retransmission-based scheme can achieve distributed and effective recovery of packet losses for large receiver sets. The results show further that network resources are efficiently utilized and compare favorably with utilization under forward error correction.

de Supinski, Bronis R.; Williams, Craig; Reynolds Jr., Paul F.. Performance Evaluation of the Late Delta Cache Coherence Protocol, March 13, 1996.

Abstract: This paper presents the results of a simulation study designed to evaluate the performance of the late delta cache coherence protocol and a conventional directory based invalidation protocol. Delta cache protocols are a highly concurrent directory based family of coherence protocols which exploit an isotach logical time system to provide support for sequential consistency and atomicity. The late delta protocol is an update based member of this family of protocols. Our results demonstrate the efficacy of the late delta protocol across a wide range of workloads. We show that this protocol provides comparable performance to the conventional invalidation protocol under workloads with no atomicity requirements and little contention, but outperforms the conventional invalidation protocol as atomicity requirements and contention increase.

Nguyen-Tuong, A.; Grimshaw, A. S.. Exploiting Sequential Debuggers in a Parallel Environment: An Introduction to the Mentat Assistant Debugger, May 1, 1995.

Abstract: The Mentat Assistant Debugger (MAD) is a set of tools that enables programmers to debug their Mentat applications using the debugger of their choice. MAD is easy to use and requires no modifications to existing Mentat code. MAD leverages off the existing base of sequential debuggers available in both the public domain and commercial sector and can easily accommodate future improvements in debugging technology.

Karpovich, John F.. Support for Object Placement in Wide-Area Heterogeneous Distributed Systems, January 16, 1996.

Abstract: One of the open challenges in distributed computing systems is determining how to place tasks onto processors when they are needed (in the Legion project being developed at UVA the basic computational units are modelled as objects, so the problem is one of object placement). The placement decision is crucial because it determines the run-time behavior of an object, including performance, cost and whether it can run at all. Many approaches have been developed to address this problem in a distributed system environment, but it is our claim that these efforts do not take the proper approach for supporting the needs of the large wide area heterogeneous virtual computer systems we envision will exist in the future. In particular, the systems developed to date are inadequate because they 1) focus on solutions for a narrow set of application types, environments, or user objectives, and 2) often inadequately support the full complexity and features of large distributed systems. We propose to better support the placement process in distributed systems by employing a new approach. Our approach is different from previous ones in that we propose to design a framework for supporting a wide range of different placement problems, user objectives and placement algorithms, rather than building a system that supports a single placement technique. The goal of the framework is to provide programmers with the basic mechanisms to support each facet of the placement process which will enable them to implement the placement policies and techniques that meet their needs. On the other hand, individuals will be competing for limited resources owned by different people or organizations. Therefore, the framework must also contain mechanisms to enforce the policies of resource owners and to resolve conflicts between users. The research effort proposed here will focus on developing the mechanisms needed to support flexible distributed object placement. To identify the main components of the placement process and the key issues that must be resolved, we will first develop a general model of the placement process. Using this model, we will next develop and implement a framework to support object placement within the Legion system. Developing such a framework will lend insight into our placement model and will also provide a proof of concept for our approach. Finally, we will demonstrate the usefulness of our framework approach by mapping a range of placement algorithms to the Legion framework and evaluating the performance of several algorithms versus that provided by the system today.

French, James C.; Knight, John C.; Powell, Allison L.. Hypertext Structures and Software Documentation, February 6, 1996.

Abstract: Software documentation represents a critical resource to the successful functioning of many enterprises. However, because it is static, documentation often fails to meet the needs of the many diverse users who are required to consult it on a regular basis in the course of their daily work. Software documentation is a rich resource that has not been fully exploited. Treatment of software documentation presents a number of interesting problems that require a blend of information retrieval and hypertext techniques for their successful solution. The evolving nature of a software project and the diverse demands on its documentation present an especially challenging environment. This is made even more challenging by the variety of information resources, ranging from formal specification languages to source code, that must be integrated into a coherent whole for the purpose of querying. In this paper we discuss the issues involved with automating the management of software documentation to better increase its utility. We describe the mechanics of a prototype system, SLEUTH, currently under investigation at the University of Virginia as a vehicle for software documentation management. The prototype maintains software documentation as a hypertext with typed links for the purpose of browsing by users with varying needs. These links are synthesized by the system and kept accurate under update. Appropriate authoring tools provide the system with the information it needs for this maintenance function. Ad hoc querying is provided over the documentation and hypertext documents are synthesized in response to these queries.

Reilly, S. D.; Pfaltz, J. L.; French, J. C.. Increasing the Computational Potential of the World Wide Web, February 9, 1996.

Abstract: For many World Wide Web applications there is a need to provide session semantics so that users have the impression of a continuous interaction. This is true, for example, when one searches a database interactively. Because WWW servers are stateless some extra mechanism is necessary to give the impression of session semantics. This report discusses a strategy for implementing session semantics over a WWW application. Apart from the need to maintain state during interactive sessions, there is also the need to control the application. Under any circumstances this is a tedious activity. This report also discusses a mechanism for modeling a WWW application as a finite state automaton and describes a tool, the Stateful Server Application Tool (SSAT), built to assist in the development of these applications.

Haddleton, Russell F.. An Implementation of a Parallel Object Oriented Database System, December 20, 1995.

Abstract: In the following document we describe the current implementation of ADAMS, a parallel object oriented database system developed at the University of Virginia. The parallel data structures employed by ADAMS are discussed, as is the client/server architecture. We list a number of sources of parallel speed-up found in typical ADAMS programs, and explain how these opportunities are exploited. Several potential future research projects related to this work are given.

Wulf, Wm. A.; McKee, Sally A.. Hitting the Memory Wall: Implications of the Obvious, November 1, 1994.

McKee, Sally A.. Dynamic Access Ordering: Bounds on Memory Bandwidth, November 1, 1994.

Abstract: Memory bandwidth is becoming the limiting performance factor for many applications, particu- larly scientific computations. Access ordering is one technique that can help bridge the processor- memory performance gap. We are part of a team developing a combined hardware/software scheme for implementing access ordering dynamically at run-time. The hardware part of this solution is the Stream Memory Controller, or SMC. In order to validate the SMC concept, we have conducted numerous simulation experiments, the results of which are presented elsewhere. We have developed analytical models to bound asymptotic uniprocessor SMC performance, and have demonstrated that the simulation behavior of our dynamic access-ordering heuristics approaches those bounds. Here we introduce a model of of SMC startup costs, and we extend the uniprocessor SMC models to describe performance for modest-sized symmetric multiprocessor (SMP) SMC systems.

McKee, Sally A.. Dynamic Access Ordering for Symmetric Multiprocessors, May 1, 1994.

Abstract: Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vector-like algorithms, including the "Grand Challenge" scien- tific problems. Caching is not the sole solution for these applications due to the poor temporal and spatial locality of their data accesses. Moreover, the nature of memories themselves has changed. Achieving greater bandwidth requires exploiting the characteristics of memory components "on the other side of the cache" - they should not be treated as uniform access-time RAM. This paper describes the use of hardware-assisted access ordering in symmetric multiprocessor (SMP) systems. Our technique combines compile-time detection of memory access patterns with a memory subsystem (called a Stream Memory Controller, or SMC) that decouples the order of requests generated by the computational elements from that issued to the memory system. This decoupling permits the requests to be issued in an order that optimizes use of the memory system. Our simulations indicate that SMP SMC systems can consistently deliver nearly the full system bandwidth.

McKee, S. A.; Oliver, C. W.; Wulf, Wm. A.; Wright, K. L.; Aylor, J. H.. Evaluation of Dynamic Access Ordering Hardware, October 30, 1995.

Abstract: Memory bandwidth is rapidly becoming the limiting performance factor for many applica- tions, particularly for streaming computations - such as scientific vector processing or mul- timedia (de)compression - that lack the locality of reference that makes caching effective. We describe and evaluate a system that addresses the memory bandwidth problem for this class of computations by dynamically reordering stream accesses to exploit memory system architecture and device features. The technique is practical to implement, using existing compiler technology and requiring only a modest amount of special-purpose hardware. With our prototype system, we have observed performance improvements by over 200% over normal caching.

McKee, S. A.; Weikle, D. A. B. ; Wright, K. L.; Oliver, C. W.; Landon, T. C.; Voss, A. P.; Wulf, Wm. A.; Aylor, J. H.. Avoiding Irreproducible Results: Modeling the Stream Memory Controller, October 30, 1995.

Abstract: Recent discussions have acknowledged the danger of the Computer Architecture community becoming a "society for irreproducible results". Unfortunately, there is little incentive within our community to verify others' results. We have a tendency to experiment with a single instance of an artifact, but then to generalize about the properties of all related artifacts. A third consideration is that there is an inherent artificiality to our work that makes it even more difficult not to build our own assumptions into the structure of our experiments. We create the artifacts being studied in addition to the instruments and experiments used to study them. Finally, reproducing results usu- ally requires reproducing the experimental environment as well, which is often difficult. We describe the research strategy we used to avoid these pitfalls when trying to demonstrate the effectiveness of a particular concept: dynamic access ordering. We present this case study to demonstrate that our results are reproducible several different ways, and to convey our experi- ences in hopes that a similar approach might prove useful to others.

Landon, T. C.; Salinas, M. H.; Klenke, R. H.; Aylor, J. H.; McKee, S. A.; Wright, K. L.. A Systematic Approach to Optimizing and Verifying Synthesized High-Speed ASICs, December 11, 1995.

Abstract: This paper describes the design process used in developing a Stream Memory Controller (SMC)*. The SMC can reorder processor-memory accesses dynamically to increase the effective memory bandwidth for vector operations. A 132-pin ASIC was implemented in static CMOS using a 0.75mm process and has been tested at 36MHz.

McKee, Sally A.; Wulf, Wm. A.; Landon, Trevor C.. Bounds on Memory Bandwidth in Streamed Computations, March 1, 1995.

Abstract: The growing disparity between processor and memory speeds has caused memory bandwidth to become the performance bottleneck for many applications. In particular, this performance gap severely impacts stream-orientated computations such as (de)compression, encryption, text searching, and scientific (vector) processing. This paper looks at streaming computations and derives analytic upper bounds on the bandwidth attainable from a class of access reordering schemes. We compare these bounds to the simulated performance of a particular dynamic access ordering scheme, the Stream Memory Controller (SMC). We are building the SMC, and where possible we relate our analytic bounds and simulation data to the simulation performance of the hardware. The results suggest that the SMC can deliver nearly the full attainable bandwidth with relatively modest hardware costs.

McKee, S. A.; Klenke, R. H.; Schwab, A. J. ; Wulf, Wm. A,; Moyer, S. A.; Hitchcock, C.; Aylor, J. H.. Experimental Implementation of Dynamic Access Ordering, August 1, 1993.

Abstract: As microprocessor speeds increase, memory bandwidth is rapidly becoming the performance bottleneck in the execution of vector- like algorithms. Although caching provides adequate performance for many problems, caching alone is an insufficient solution for vector applications with poor temporal and spatial locality. More- over, the nature of memories themselves has changed. Current DRAM components should not be treated as uniform access-time RAM: achieving greater bandwidth requires exploiting the charactistics of components at every level of the memory hierarchy. This paper describes hardware-assisted access ordering and our hardware development effort to build a Stream Memory Controller (SMC) that implements the technique for a commercially available high-performance microprocessor, the Intel i860. Our strategy augments caching by combining compile-time detection of memory access patterns with a memory subsystem that decouples the order of requests generated by the processor from that issued to the memory system. This decoupling permits requests to be issued in an order that optimizes use of the memory system.

McKee, Sally A.. Access Ordering and Memory-Conscious Cache Utilization, March 1, 1994.

Abstract: As processor speeds increase relative to memory speeds, memory bandwidth is rapidly becoming the limiting performance factor for many applications. Several approaches to bridging this performance gap have been suggested. This paper examines one approach, access ordering, and pushes its limits to determine bounds on memory performance. We present several access-ordering schemes, and compare their performance, developing analytic models and partially validating these with benchmark timings on the Intel i860XR.

McKee, Sally A.; Moyer, Steven A.; Wulf, Wm. A.; Hitchcock, Charles Y.. Increasing Memory Bandwidth for Vector Computations, August 01, 1993.

Abstract: Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance micro- processors to vector-like algorithms, including the "Grand Challenge" scientific problems. Caching is not the sole solution for these applications due to the poor temporal and spatial locality of their data accesses. Moreover, the nature of memories themselves has changed. Achieving greater bandwidth requires exploiting the characteristics of memory components "on the other side of the cache" - they should not be treated as uniform access-time RAM. This paper describes the use of hardware-assisted access ordering, a technique that combines compile-time detection of memory access patterns with a memory subsystem that decouples the order of requests generated by the processor from that issued to the memory system. This decoupling permits the requests to be issued in an order that optimizes use of the memory system. Our simulations show significant speedup on important scientific ker- nels.

Bailey, Mark W.; Davidson, Jack W.. Target-Sensitive Construction of Diagnostic Programs for Procedure Calling Sequence Generators, October 20, 1995.

Abstract: Building compilers that generate correct code is difficult. In this paper we present a compiler testing technique that closes the gap between actual compiler implementations and correct compilers. Using formal specifications of procedure calling conventions, we have built a target-sensitive test suite generator that builds test cases for a specific aspect of compiler code generators: the procedure calling sequence generator. By exercising compilers with these target-specific test suites, our automated testing tool has been able to expose and isolate 8 bugs in heavily used production-quality compilers. These bugs cause more than 700 test cases to fail.

Davidson, Jack W.; Jinturkar, Sanjay. An Aggressive Approach to Loop Unrolling, June 1995 (Revised October 1995).

Abstract: A well-known code transformation for improving the execution performance of a program is loop unrolling. The unrolled loop usually requires fewer instruction executions than the original loop. The reduction in instruction executions comes from two sources: the number of branch instructions executed is reduced, and the index variable is modified fewer times. In addition, for architectures with features designed to exploit instruction-level parallelism, loop unrolling can expose greater levels of instruction-level parallelism. Loop unrolling is an effective code transformation often improving the execution performance of programs that spend much of their execution time in loops by 10 to 30 percent. Unfortunately, it has not been studied as extensively as other code improvements such as register allocation or common subexpression elimination. The result is that many compilers employ simplistic loop unrolling algorithms that miss many opportunities for improving run-time performance. This paper describes how aggressive loop unrolling is done in a retargetable optimizing compiler. Using a set of 32 benchmark programs, the effectiveness of this more aggressive approach to loop unrolling is evaluated. Interactions between loop unrolling and other common optimizations are also discussed.

Davidson, Jack W.; Jinturkar, Sanjay. Improving Instruction-level Parallelism by Loop Unrolling and Dynamic Memory Disambiguation , February 1995 (Revised October 1995).

Abstract: Exploitation of instruction-level parallelism is an effective mechanism for improving the performance of modern super-scalar/VLIW processors. Various software techniques can be applied to increase instruction-level parallelism. This paper describes and evaluates a software technique, dynamic memory disambiguation, that permits loops containing loads and stores to be scheduled more aggressively, thereby exposing more instruction-level parallelism. The results of our evaluation show that when dynamic memory disambiguation is applied in conjunction with loop unrolling, register renaming, and static memory disambiguation, the ILP of memory-intensive benchmarks can be increased by as much as 300 percent over loops where dynamic memory disambiguation is not performed. Our measurements also indicate that for the programs that benefit the most from these optimizations, the register usage does not exceed the number of registers on most high-performance processors.

Bailey, Mark W.; Davidson, Jack W.. Computing System Descriptions for Systems Software, February 24, 1995 (revised September 19, 1995).

Abstract: The proliferation of high-performance microprocessors in recent years has made the development of systems software, such as compilers, assemblers, linkers, debuggers, simulators, and other related tools, more challenging than ever. When a new processor is introduced, each of these applications must be rewritten or retargeted to the new machine. This paper describes a description system, called CSDL, that permits the specification--in a concise, easily understood notation--of all aspects of a computing system that must be known in order to automate the construction of high-quality systems software. Unlike past machine description languages, and as the term computing system indicates, this new description system spans the boundary between hardware and software. CSDL descriptions are modular and extensible, providing a flexible system for specifying computing system information that can be shared among many different applications

Ferrari, Adam; Filipi-Martin, Adrian; Viswanathan, Soumya. The NAS Parallel Benchmark Kernels in MPL, September 12, 1995.

Abstract: The Numerical Aerodynamic Simulation (NAS) Parallel Benchmarks are a set of algorithmically specified benchmarks indicative of the computation and communication needs of typical large-scale aerodynamics problems. Although a great deal of work has been done with respect to implementing the NAS Parallel Benchmark suite on high-end vector supercomputers, multiprocessors, and multicomputers, only recently has the possibility of running such demanding applications on workstation clusters begun to be explored. We implemented a subset of the NAS benchmarks using the Mentat Programming Language, and ran performance tests on a cluster of workstations using the Mentat system. We compared our performance results to previous NAS benchmark tests using the Parallel Virtual Machine (PVM) system in a similar hardware environment. We found that due to algorithmic improvements, efficient communications provided by the Mentat system, and low introduced overheads even at the higher level of programming abstraction provided by Mentat, we observed significantly improved performance in a number of cases.

Nguyen-Tuong, Anh; Grimshaw, Andrew S.; Karpovich, John F.. Fault-Tolerance in Coarse Grain Data Flow, Aug. 28 1995.

Abstract: Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. It is certain that host failures and other faults will be an every day occurrence in these systems. Unfortunately contemporary parallel processing systems were not designed with fault-tolerance as a design objective. The data-flow model, long a mainstay of parallel processing, offers hope. The model's functional nature, which makes it so amenable to parallel processing, also facilitates straight- forward fault-tolerant implementations. It is the combination of ease of parallelization and fault- tolerance that we feel will increase the importance of the model in the future, and lead to the widespread use of functional components. Using Mentat, an object-oriented, data-flow based, parallel processing system, we demonstrate two orthogonal methods of providing application fault-tolerance. The first method provides transparent replication of actors and requires modification to the existing Mentat run-time system. Providing direct support for replicating actors enables the programmer to easily build fault-tolerant applications regardless of the complexity of their data-flow graph representation. The second method - the checkboard method - is applicable to applications that contain independent and restartable computations such as "bag of tasks", Monte Carlo's, and pipelines, and involves some simple restructuring of code. While these methods are presented separately, they could in fact be combined. For both methods, we present experimental data to illustrate the trade-offs between fault-tolerance, performance and resource consumption.

Lewis, Michael J.; Grimshaw, Andrew. The Core Legion Object Model, August 1995.

Abstract: This document describes the core Legion object model. The model specifies the composition and functionality of Legion's core objects - those objects that cooperate to create, locate, manage, and remove objects from the Legion system. The model reflects the underlying philosophy and objectives of the Legion project. In particular, the object model facilitates a flexible extensible implementation, provides a single persistent name space, grants site autonomy to participating organizations, and scales to millions of sites and trillions of objects. Further, it offers a framework that is well suited to providing mechanisms for high performance, security, fault tolerance, and commerce.

Sullivan, Kevin J.; Knight, John C.. Assessment of an Architectural Approach to Large-Scale Systematic Reuse, August 17, 1995.

Abstract: Large-scale systematic reuse promises rapid development of significant systems through straightforward composition of large-scale existing assets. The realization of this promise would provide major benefits in many areas. For example, sophisticated software-engineering tools could be developed rapidly and inexpensively to deliver promising software engineering research results into practice. To date the promise of large-scale reuse remains largely unrealized. Although some successes have been achieved, barriers remain in a variety of areas: technical, managerial, cultural, and legal. In this paper we address an important technical barrier: architectural mismatch. Architectural mismatch has been identified as an important barrier to large-scale reuse. Recently architectural frameworks that purport to enable large-scale reuse have been developed. Among them is Microsoft's OLE technology, comprising both an architectural framework and a suite of reusable component applications. The manufacturer presents this technology as a toolkit for the rapid development of applications from components. In reviewing these component applications, we observed that they offer rich features applicable to a number of domains, and not merely management information systems or data processing. The components are both designed for reuse and also seem to span the spectrum of capabilities required to build a wide variety of applications. The capabilities include relational database management, graphical user interface construction, compound document design and storage, constraint-based structured interactive graphics, and diverse computational models, including spreadsheets and general-purpose imperative programming in languages such as C++. In this paper we evaluate the approach to large-scale systematic reuse represented by OLE. We conclude that, although difficulties remain, such an approach is practical now in many domains, that it substantially overcomes the architectural impediments that have hindered some previous attempts at large-scale reuse, and that it represents significant progress towards realizing the promise of rapid development of sophisticated systems. We report on our prototyping approach to the evaluation of this technology. Our evaluation focused on the ability of the technology to support the development of software engineering tools. We define our evaluation framework, describe our experience developing a specific tool, and present conclusions of our evaluation.

Wulf, William A.; Wang, Chenxi; Kienzle, Darrell. A New Model of Security for Distributed Systems , August 10, 1995 .

Abstract: With the rapid growth of the information age, electronic activities of many kinds are becoming more common. The need for protection and security in this environment has never been greater. The conventional approach to security has been to enforce a system-wide policy, but this approach will not work for large distributed systems where entirely new security issues and concerns are emerging.We argue that a new model is needed that shifts the emphasis from "system as enforcer" to user-definable policies in which the cost scales with the degree of security required. Users ought to be able to select the level of security they need and pay only the overhead necessary to achieve it. Moreover, ultimately, they must be responsible for their own security. This research is being carried out in the context of the Legion project. We start by describing the objectives and philosophy of the overall project and then present our conceptual model and tentative implementation plan. A set of technical challenges is also addressed.

Christie, Robert. Design and Analysis of a Multimedia Network Architecture, 12 May 1995.

Abstract: A paradigm shift is underway in how computer networks are used. The new breed of applications that incorporate multimedia add a new dimension of complexity to network management because their service requirements are quite different from those of older applications. The problem lies in the service discipline of the network, where routers generally use a single queue which operates in a first-come-first-served manner. As traffic through the router increases, packet delays become longer to the point where the router's internal queue overflows and packets are dropped. This project addresses the design of a packet-switched network that can support soft real-time applications such as multimedia. We focus on the communications protocol, the router design, and the resource reservation and admission control policies that will allow the overall system to operate. We provide a survey of related work on design issues concerning network service paradigms, traffic policing mechanisms, resource administration mechanisms, and resource reservation protocols. We then discuss the design and implementation of our router, end-systems, and admission control policies. We show that with the proper choice of protocols, traffic policing, and resource reservation within the router, a packet-switched network can support guaranteed quality-of-service for multimedia communications.

Sullivan, Kevin J.; Kalet, Ira J.; Notkin, David. Mediators in a Radiation Treatment Planning Environment, June 25, 1995.

Abstract: We describe the architecture of Prism, an integrated system for planning radiation treatments for cancer patients. This architecture is based on the mediator design approach. The problem addressed by this approach is that common methods of designing integrated systems lead to undue module coupling, significantly complicating software development and evolution. The mediator approach was devised to ameliorate the conflict between integration and ease of software development and evolution. It does this by enabling the composition of visible and independently defined components into behaviorally integrated systems. We present Prism as evidence for two claims: first, the mediator method can overcome the problem of coupling, easing the design and evolution of real integrated systems; second, the method profitably can be learned and used by practicing software engineers. Prism continues to evolve in clinical use at University of Washington and Emory University Cancer Centers, and in trials at other institutions, including the University of Miami. It is an attractive subject for continuing research on integration, software structure, and the evolution of mediator-based architectures.

Karpovich, John F.; Grimshaw, Andrew S.; French, James C.. Breaking the I/O Bottleneck at the National Radio Astronomy Observatory (NRAO), September 1994.

Karpovich, John F.; Grimshaw, Andrew S.; French, James C.. Extensible File Systems (ELFS): An Object-Oriented Approach to High Performance File I/O, July 22, 1994.

Abstract: As CPU performance has rapidly improved, increased pressure has been placed on the performance of accessing external data in order to keep up with demand. Increasingly often the I/O subsystem and related software is unable to meet this demand and valuable CPU resources are left underutilized while users are forced to wait longer than necessary for results. This problem is especially acute for many scientific applications which use large data sets and require high performance. This paper presents our experiences with working to alleviate an I/O bottleneck in radio astronomy applications at the National Radio Astronomy Observatory (NRAO). We first present our model, the ExtensibLe File Systems model (ELFS), for improving both the performance of data access and the usability of access software. We then present the current situation at NRAO, our solution, performance results and our plans for future work.

Karpovich, John F.; French, James C.; Grimshaw, Andrew S.. High Performance Access to Radio Astronomy Data: A Case Study, July 11, 1994.

Abstract: Scientific applications often manipulate very large sets of persistent data. Over the past decade, advances in disk storage device performance have consistently been outpaced by advances in the performance of the rest of the computer system. As a result, many scientific applications have become I/O-bound, i.e. their run-times are dominated by the time spent performing I/O operations. Consequently, the performance of I/O operations has become critical for high performance in these applications. The ELFS approach is designed to address the issue of high performance I/O by treating files as typed objects. Typed file objects can exploit knowledge about the file structure and type of data. Typed file objects can selectively apply techniques such as prefetching, parallel asynchronous file access, and caching to improve performance. Also, by typing objects, the interface to the user can be improved in two ways. First, the interface can be made easier to use by presenting file operations in a more natural manner to the user. Second, the interface can allow the user to provide an "oracle" about access patterns, that can allow the file object to improve performance. By combining these concepts with the object-oriented paradigm, the goal of the ELFS methodology is to create flexible, extensible file classes that are easy to use while achieving high performance. In this paper we present the ELFS approach and our experiences with the design and implementation of two file classes: a two dimensional dense matrix file class and a multidimensional range searching file class.

Karpovich, John F.; Judd, Matthew; Strayer, W. Timothy; Grimshaw, Andrew S.. A Parallel Object-Oriented Framework for Stencil Algorithms, January 27, 1993.

Abstract: We present an object-oriented framework for constructing parallel implementations of stencil algorithms. This framework simplifies the development process by encapsulating the common aspects of stencil algorithms in a base stencil class so that application-specific derived classes can be easily defined via inheritance and overloading. In addition, the stencil base class contains mechanisms for parallel execution. The result is a high-performance, parallel, application-specific stencil class. We present the design rationale for the base class and illustrate the derivation process by defining two sub-classes, an image convolution class and a PDE solver. The classes have been implemented in Mentat, an object-oriented parallel programming system that is available on a variety of platforms. Performance results are given for a network of Sun SPARCstation IPCs.

Natrajan, Anand; Reynolds Jr., Paul F.; Srinivasan, Sudhir. Consistency Maintenance using UNIFY, June 16, 1995.

Abstract: Distributed simulations comprised of aggregated entities (AEs) and disaggregated entities (DEs) pose critical consistency issues when AEs and DEs interact. Usually, meaningful interaction cannot take place without one of the two representing itself at a level of resolution compatible with the level of the other. Any approach that employs dynamic transitions between aggregated and disaggregated resolution levels suffers from not only potential consistency problems, but also chain disaggregation, network flooding, transition latency, and mapping problems between levels. Solutions meant to solve some or all of these problems leave one critical issue unresolved: proper and efficient maintenance of consistency among the levels of resolution for the same set of objects. An alternative approach, where AE and DE levels of resolution are maintained concurrently for the same objects, encounters the same consistency issues: anything that happens to a DE must be reflected accurately in the AE, and vice versa. The fundamental problem is this: consistency cannot be guaranteed without a unified, coherent approach to correct, efficient consistency maintenance among levels of resolution for a set of simulated objects. In our approach, UNIFY, we contend that multiple levels of resolution should be addressed inside individual simulations, in order to ensure an efficient, coherent, verifiable solution. We propose the concept of MREs (Multiple Resolution Entities) in lieu of AEs and DEs. MREs are capable of representing simulated objects at specified levels of resolution in a consistent manner. A MRE should be able to provide, when requested, bindings for attributes at any desired level in a timely manner. A number of key issues must be addressed in order to provide this capability. The most critical issues are: identification of core data, temporal consistency, and mapping consistency.

Natrajan, Anand; Nguyen-Tuong, Anh. To disaggregate or not to disaggregate, that is not the question, March 23, 1995.

Abstract: The dichotomy between aggregated and disaggregated states is a false one. It is possible for an aggregated entity to be at many levels of aggregation by storing the relevant data of all levels. In this paper, we propose a scheme, UNIFY, wherein each unit either maintains state information at all allowed levels of aggregation or furnishes it on demand. We present problems with traditional approaches towards aggregation such as temporal inconsistency, chain disaggregation and network flooding. We also deal with issues that we envisage will beset the simulation world such as aggregation of dissimilar entities, dynamic aggregation and the perceiver problem. We describe a framework with which these problems could be solved or tackled better. We study the benefits and disadvantages of UNIFY and propose new directions for research. Finally, we analyze the demands made by our scheme on network, memory and CPU resources.

Harper, Roger R.. Interoperability of Parallel Systems: Running PVM Applications in the Legion Environment, May 3, 1995.

Abstract: Parallel Programming systems based on loosely coupled networks of computers are becoming more and more popular. One of the more popular systems is Parallel Virtual Machine (PVM) which allows a user to think of a set of computers as a single parallel machine. The Legion project at the University of Virginia is another attempt to provide users with the illusion that a collection of computers is a single virtual machine. This report describes a software library that allows applications written for the PVM system to run in the Legion Environment. The report describes the design and development of the library, and compares the performance of the library, and PVM, on established benchmark applications.

Oh, Yingfeng. Scheduling Periodic Tasks In a Hard Real-Time Environment, January 27, 1995.

Abstract: Consider the traditional problem of scheduling a set of periodic tasks on a multiprocessor system, where task deadlines must be guaranteed. We first derive a general schedulability condition for Rate-Monotonic, which reduces the uniprocessor schedulability condition obtained by Liu and Layland and by Serlin, and the multiprocessor schedulability condition recently derived by Burchard, Liebeherr, Oh, and Son to its two specific cases. Then a tight upper bound is obtained for the number of processors required in an optimal schedule for any given set of tasks with a fixed number of tasks and a fixed utilization. Finally, similar conditions are derived for the Earliest Deadline First scheduling. These conditions shed new light on the periodic task scheduling problem.

Srinivasan, Sudhir; Reynolds Jr., Paul F.. Adaptive algorithms vs. Time Warp: An analytical comparison, April 6, 1995.

Abstract: Adaptive synchronization algorithms have been proposed to improve upon purely conservative and purely optimisitic algorithms. Experimental studies have have indeed provided encouraging results. In the spirit of previous analyses, we present the first known analytical comparison of adaptively optimisitic algorithms with the Time Warp protocol. We define a class of adaptive protocols, the asynchronous adaptive waiting protocols (AAWP's) and identify several practical protocols that belong to this class. We show that Time Warp can outperform an AAWP arbitrarily. We describe NPSI adaptive protocols, a sub-class of AAWP's, and specify a member of this sub-class, the Elastic Time Algorithm. We show that this algorithm can outperform Time Warp arbitrarily.

Srinivasan, Sudhir; Supinski, Bronis de. Multicasting in DIS: A unified solution, March 21, 1995.

Abstract: Multicasting, as an alternative to broadcasting, has great potential to improve DIS scalability by reducing the demands on network bandwidth and computational resources. We take an eclectic approach to incorporating multicasting into DIS, blending new ideas based on insights discussed here with the best schemes proposed previously. To our knowledge, no previous work has provided such a unification. The simplicity and completeness of our approach make it more promising than previous ones. In general (and in DIS in particular), the multicast problem should be considered as consisting of two parts: i) definition and use of multicast groups, and ii) efficient implementation of these groups. We describe a method for defining and using multicast groups, present its implementation requirements and show these to be realizable. Further, we provide insights into the configuration of DIS networks in order to obtain the most benefit from multicasting. Finally, we note that management of static objects is inherently related to any multicast solution. The integration of static object management with our multicast solution is made possible by its static nature and use of a regular grid.

Grimshaw, Andrew S.; Nguyen-Tuong, Anh; Wulf, William A.. Campus-Wide Computing: Early Results Using Legion at The University of Virginia, March 27, 1995.

Abstract: The Legion project at the University of Virginia is an attempt to provide system services that provide the illusion of a single virtual machine to users, a virtual machine that provides both improved response time via parallel execution and greater throughput. Legion is targeted towards both workstation clusters and towards larger, wide-area, assemblies of workstations, supercomputers, and parallel supercomputers. Rather than construct Legion from scratch we are extending an existing object-oriented parallel processing system by aggressively incorporating lessons learned over twenty years by the heterogeneous distributed systems community. The campus-wide virtual computer is an early Legion prototype. In this paper we present challenges that had to be overcome to realize a working CWVC, as well as performance on a production biochemistry application.

Oh, Yingfeng; Son, Sang H.. Fixed-Priority Scheduling of Periodic Tasks on Multiprocessor Systems, March 15, 1995.

Abstract: Consider the problem of periodic task scheduling, in which we seek to minimize the total number of processors required to execute a set of tasks such that task deadlines are guaranteed by the Rate-Monotonic (or RM) algorithm on each processor. This problem was first investigated by Dhall and Liu, and the previous lowest bound for the problem was 2.0. In this paper, an improved solution is given by designing a new algorithm for it. The algorithm, called RM-First-Fit-Decreasing-Utilization (or RM-FFDU), is shown to have a worst-case tight bound of 5/3 = 1.66..., the lowest upper bound ever derived for the scheduling problem. Simulation studies show that on the average, the new algorithm performs consistently better than those in the literature.

Oh, Yingfeng; Son, Sang H.. On a Constrained Bin-packing Problem, March 3, 1995.

Abstract: We study a bin-packing problem which is one-dimensional and is constrained in the manner items are placed into bins. The problem is motivated by a practical real-time scheduling problem, where redundant periodic tasks need to be assigned to a multiprocessor system. The problem is stated in the traditional light: to use as few bins as possible to pack a given list of items, and it is a generalization of the classical bin-packing problem. We first propose a heuristic algorithm called First-Fit-K to solve the bin-packing problem, and then prove that First-Fit-K has an asymptotical worst-case tight bound of 1.7. We also study the average-case performance of the algorithm. The simulation results show that First-Fit-K performs within 10% of the optimal solution.

Srinivasan, Sudhir; Reynolds Jr., Paul F.. NPSI adaptive synchronization algorithms for PDES, Nov. 8, 1994 (revised April 6, 1995).

Abstract: Research in parallel discrete event simulation indicates that neither purely conservative nor purely optimistic synchronization algorithms will perform well consistently. We survey several new approaches that attempt to improve performance by limiting optimistic execution. In most of these, the criterion for limiting optimism is static or based on local information, which conflicts with the dynamic nature of discrete event simulations. We contend that an adaptive approach based on low cost near-perfect system state information is the most likely to yield a consistently efficient synchronization algorithm. We suggest a framework by which NPSI (near-perfect state information) adaptive protocols could be designed and describe the first such protocol - Elastic Time Algorithm. We present performance results from an implementation of this algorithm which show that adaptive protocols based on the use of NPSI are promising. In particular, we show that NPSI adaptive protocols have the capacity to be more efficient than Time Warp in both time and space. We identify major issues in the design and usability of NPSI protocols and discuss ongoing research.

Oh, Yingfeng; Son, Sang H.. A Processor-Efficient Scheme for Supporting Fault-Tolerance in Rate-Monotonic Scheduling, February 28, 1995.

Abstract: We address the issue of supporting fault-tolerance in a real-time system, where the deadlines of periodic tasks are guaranteed by the Rate-Monotonic algorithm. The problem is stated as one to minimize the total number of processors required to execute a set of periodic tasks, each of which, for fault-tolerance purposes, has multiple versions. A simple but effective algorithm is proposed to solve the task allocation problem. The algorithm is shown to have superior worst-case and average-case performance.

Reynolds Jr., Paul F.; Williams, Craig; Wagner Jr., Raymond R.. Isotach Networks, February 1995.

Abstract: We introduce a class of networks called isotach networks designed to reduce the cost of concurrency control in asynchronous computations. Isotach networks support several properties important to the correct execution of parallel and distributed computations: atomicity, causal message delivery, sequential consistency, and memory coherence in systems in which shared data can replicate and migrate. They allow processes to execute atomic actions without locks and to pipeline memory accesses without sacrificing sequential consistency. Isotach networks can be implemented in a wide variety of configurations, including NUMA (non-uniform memory access) multiprocessors and distributed as well as parallel systems. Networks that implement isotach time systems are characterized not by their topology, but by the guarantees they make about the relative order in which messages appear to be delivered. These guarantees are expressed in logical time, not physical time. Physical time guarantees would be prohibitively expensive, whereas logical time guarantees can be enforced cheaply, using purely local knowledge, and yet are powerful enough to support efficient techniques for coordinating asynchronously executing processes. Empirical and analytic studies of isotach systems show that they outperform conventional systems under realistic workloads, in some cases by an order of magnitude or more.

Varanelli, J.M.; Cohoon, J.P.. A Fast Method for Generalized Starting Temperature Determination in Monotonically Cooling Two-Stage Simulated Annealing Systems (supercedes CS-93-52)., September 1993; Revised February 1995..

Sarkar, Ambar; Waxman, Ronald; Cohoon, James P.. Specification-Modeling Methodologies for Reactive-System Design, January 1995.

Abstract: The goal of this paper is to investigate the state-of-the-art in specification-modeling methodologies applicable to reactive-system design. By combining the specification requirements of a reactive system and the desirable characteristics of a specification-modeling methodology, we develop a unified framework for evaluating any specification-modeling methodology applicable to reactive-system design. A unified framework allows the designer to look at the spectrum of choices available and quickly comprehend the suitability of a methodology to the specific application. Using the unified framework, we study a number of representative methodologies, identifying their respective strengths and weaknesses when evaluated for the desired characteristics. The differences and relationships between the various methodologies is high lighted. We find our framework to be quite useful in evaluating each methodology. A summary of our observations is presented, together with recommendations for areas needing further research in specification modeling for reactive systems. Two such areas are improving model continuity and providing better complexity control, especially across different abstraction levels and modeling domains. We also present a description of each methodology studied in the unified framework.

Barros, Julio; French, James; Martin, Worthy; Kelly, Patrick. System for indexing multi-spectral satellite images for efficient content-based retrieval, January 1995.

Abstract: Current feature-based image databases can typically perform efficient and effective searches on scalar feature information. However, many important features, such as graphs, histograms, and probability density functions, have more complex structure. Mechanisms to manipulate complex feature data are not currently well understood and must be further developed. The work we discuss in this paper explores techniques for the exploitation of spectral distribution information in a feature-based image database. A six band image was segmented into regions and spectral information for each region was maintained. A similarity measure for the spectral information is proposed and experiments are conducted to test its effectiveness. The objective of our current work is to determine if these techniques are effective and efficient at managing this type of image feature data.

Oh, Yingfeng; Son, Sang H.. Scheduling Real-Time Tasks for Dependability, January 17, 1995.

Abstract: Real-time systems are increasingly used in applications whose failure may result in large economic and human costs. Since many of the systems operate in environments that are non-deterministic, and even hazardous, it is extremely important that the systems must be dependable, i.e., the deadlines of tasks must be met even in the presence of certain failures. In order to enhance the dependability of a real-time system, we study the problem of scheduling a set of real-time tasks to meet their deadlines even in the presence of processor failures. We first prove that the problem of scheduling a set of non-preemptive tasks on more than two processors to tolerate one arbitrary processor failure is NP-complete even when the tasks share a common deadline. A heuristic algorithm is then proposed to solve the problem. The schedule generated by the heuristic algorithm can tolerate one arbitrary processor failure in the worst case. The analysis and experimental data show that the performance of the heuristic algorithm is near-optimal.

Viles, Charles L.; French, James C.. Dissemination of Collection Wide Information in a Distributed Information Retrieval System , January 6, 1995.

Abstract: We find that dissemination of collection wide information (CWI) in a distributed collection of documents is needed to achieve retrieval effectiveness comparable to a centralized collection. Complete dissemination is unnecessary. The required dissemination level depends upon how documents are allocated among sites. Low dissemination is needed for random document allocation, but higher levels are needed when documents are allocated based on content. We define parameters to control dissemination and document allocation and present results from four test collections. We define the notion of iso-knowledge lines with respect to the number of sites and level of dissemination in the distributed archive, and show empirically that iso-knowledge lines are also iso-effectiveness lines when documents are randomly allocated.

Jones, John D.; French, James C.. An Archive Service with Persistent Naming for Objects, August 1994.

Abstract: Wide-area systems for information storage and retrieval are rapidly gaining in popularity. Exam- ples include FTP (File Transfer Protocol), Gopher, and World Wide Web (WWW) archives of many types of information. Each system provides a means of naming a file or data object so that others may retrieve the information. Unfortunately, this name depends on the network address of the server and on the filename within that machine. Some flexibility is gained by using aliases for these names, but problems persist. Additionally, the use of aliases does not handle the replication of files at several sites, or the movement of a file from one site to another. The result is that these names frequently become invalid. A name that is good today may not work in a week or a month. If a name for a useful collection is passed on to others, it may not work by the time they try to use it. For these reasons, a better approach to naming is needed. In this paper we present a prototype distributed service for archiving files. Each file is given a permanent name, which can be used to retrieve the file from any site of the archive. If the con- tacted site does not have the data, it returns the current location, which is accessed automatically by the client. This allows files to be moved and replicated within the archive without invalidating any of the names. In a hypermedia environment such as the WWW, this means that the links will remain valid, avoiding the need to change each document that references this data. We have developed this system to demonstrate that this approach is feasible. We describe both a standalone service that is accessed via a command-line client, and a gateway to the service from the World Wide Web. The gateway allows the archive to be accessed from NCSA Mosaic, or any client that can use the Hypertext Transfer Protocol (HTTP). The service is assumed to be global in scope, so issues of scalability are critical. This influenced our design, and led us to experiment with caching of names, and automatic replication of fre- quently accessed files. We have, however, omitted the implementation of a distributed name resolver, using a centralized one instead. A distributed implementation would be required for an actual service, and we describe how this might be built. The task of locating an archive site is also discussed.

Varanelli, James M.; Cohoon, James P.. A Two-Stage Simulated Annealing Methodology, November, 1994.

West, Emily A.; Grimshaw, Andrew S.. Braid: Integrating Task and Data Parallelism, November 11, 1994.

Abstract: Archetype data parallel or task parallel applications are well served by contemporary languages. However, for applications containing a balance of task and data parallelism the choice of language is less clear. While there are languages that enable both forms of parallelism, e.g., one can write data parallel programs using a task parallel language, there are few languages which support both. We present a set of data parallel extensions to the Mentat Programming Language (MPL) which allow us to integrate task parallelism, data parallelism, and nested task and data parallelism within a single language on top of a single run-time system. The result is an object-oriented language, Braid, that supports both task and data parallelism on MIMD machines. In addition, the data parallel extensions define a language in and of itself which makes a number of contributions to the data parallel programming style. These include subset-level operations (a more general notion of element-level operations), compiler provided iteration within a data parallel data set, and the ability to define complex data parallel operations.

DeLong, M. A.; Ortega, J. M.. SOR as a Preconditioner, November 4, 1994.

Abstract: We show by experimental results on some convection-diffusion type equations that the SOR iteration may be a promising preconditioner in conjunction with the GMRES method. Our results indicate that it is critical to take several Gauss-Seidel or SOR iterations, rather than just one, and that at least a factor of two improvement over Gauss-Seidel can be expected with a reasonable approximation to the optimal omega. This approximation must be on the low side of the optimum, however, as an omega only slightly too large can lead to no convergence. We also show that the red/black ordering leads to no degradation and is usually slightly beneficial. Thus, we expect good parallel results although the current experiments are only on a serial machine. Limited experiments with BiCGSTAB do not show similar improvements for Gauss-Seidel preconditioning although a suitable omega can give a factor of two speedup.

Lehr, Matthew R.; Son, Sang H.. StarBase Research Summary, Autumn 1994, October 31, 1994.

Abstract: This report describes the research activities pertaining to the StarBase project at the University of Virginia's Department of Computer Science. StarBase, a firm real-time database management system (RT-DBMS), is remarkable in that it is constructed on top of an actual real-time microkernel. StarBase seeks to provide a general RT-DBMS environment which can handle a variable transac tion load, rather than statically scheduling a known set of transactions. This report presents an overview of the StarBase RT-DBMS as it operates today and describes its development environment. The accomplishments of the last two years of research are reviewed and the critical features needed to support such a RT-DBMS are examined.

Benitez, Manuel E.; Davidson, Jack W.. Target-specific Global Code Improvement: Principles and Applications, November 4, 1994.

Abstract: Current and future high-performance systems require language processors that can generate code that fully exploits the power of the underlying architecture. A key and necessary component of such language processors is a global code improver. This article describes the key principles behind the design and implementation of a global code improver that has been use to construct several high-quality compilers and other program transformation and analysis tools. The code improver, called vpo, employs a paradigm of compilation that has proven to be flexible and adaptableall code improving transformations are performed on a target-specific representation of the program. The aggressive use of this paradigm yields a code improver with several valuable properties. Four properties stand out. First, vpo is language and compiler independent. That is, it has been used to implement compilers for several different computer languages. For the C programming language, it has been used with several front ends each of which generates a different intermediate language. Second, because all code improvements are applied to a single low-level intermediate representation, phase ordering programs are minimized. Third, vpo is easily retargeted and handles a wide variety of architectures. In particular, vpo's structure allows new architectures and new implementations of existing architectures to be accommodated quickly and easily. Fourth and finally, because of its flexible structure, vpo has several other interesting uses in addition to its primary use in an optimizing compiler. This article describes the principles that have driven the design of vpo and the implications of these principles on vpo's implementation. The article concludes with a brief description of vpo's use as a back end with front ends for several different languages, and its use as a key component for the realization of several other applications.

Barros, Julio; French, James; Martin, Worthy; Kelly, Patrick; White, James M.. Indexing Multispectral Images for Content-Based Retrieval, 26 October 1994.

Abstract: This paper discusses our view of image databases, content-based retrieval, and our experiences with an experimental system. We present a methodology in which efficient representation and indexing are the basis for retrieval of images by content as well as associated external information. In the example system images are indexed and accessed based on properties of the individual regions in the image. Regions in each image are indexed by their spectral characteristics, as well as by their shape descriptors and position information. The goal of the system is to reduce the number of images that need to be inspected by a user by quickly excluding substantial parts of the database. The system avoids exhaustive searching through the image database when a query is submitted.

Bailey, Mark W.; Davidson, Jack W.. A Formal Model and Specification Language for Procedure Calling Conventions, Oct. 21, 1994.

Abstract: Procedure calling conventions are used to provide uniform procedure-call interfaces. Applications, such as compilers and debuggers, which generate, or process procedures at the machine-language abstraction level require knowledge of the calling convention. In this paper, we develop a formal model for procedure calling conventions called P-FSA's. Using this model, we are able to ensure a number of completeness and consistency properties of calling conventions. Currently, applications that manipulate procedures implement conventions in an ad-hoc manner. The resulting code is complicated with details, difficult to maintain, and often riddled with errors. To alleviate the situation, we introduce a calling convention specification language, called CCL. The combination of CCL and P-FSA's facilitates the accurate specification of conventions that can be shown to be both consistent and complete.

Ganley, J. L.; Cohoon, J. P.. Optimal Rectilinear Steiner Tree Routing in the Presence of Obstacles (supercedes CS-92-39, CS-93-15, and CS-93-19), January 1994.

Abstract: This paper presents a new model for VLSI routing in the presence of obstacles, that transforms any routing instance from a geometric problem into a graph problem. It is the first model that allows computation of optimal obstacle-avoiding rectilinear Steiner trees in time corresponding to the instance size (the number of terminals and obstacle border segments) rather than the size of the routing area. For the most common multi-terminal critical nets---those with three or four terminals---we observe that optimal trees can be computed as efficiently as good heuristic trees, and present algorithms that do so. For nets with five or more terminals, we present algorithms that heuristically compute obstacle-avoiding Steiner trees. Analysis and experimental results demonstrate that the model and algorithms work well in both theory and practice. Also presented are several theoretical results: a derivation of the Steiner ratio for obstacle-avoiding rectilinear Steiner trees, and complexity results for two special cases of the problem.

CS-94-18 (Not available on-line)
Dunn, M. F.; Knight, J. C.. The Role of Domain Analysis in Quality Assurance, June 1994.

Viles, Charles L.; French, James C.. Availability and Latency of World Wide Web Information Servers, September 1994 (revised October 31, 1994).

Abstract: During a 90 day period in 1994, we measured the availability and connection latency of HTTP (hypertext transfer protocol) information servers. These measurements were made from a site in the Eastern United States. The list of servers included 189 servers from Europe and 324 servers from North America. Our measurements indicate that on average, 5.0% of North American servers and 5.4% of European servers were unavailable from the measurement site on any given day. As seen from the measurement site, the day-to-day variation in availability was much greater for the European servers than for the North American servers. The measurements also show a wide variation in availability for individual information servers. For example, more than 80% of all North American servers were available at least 95% of the time, but 5% of the servers were available less than 80% of the time. The pattern of unavailability suggests a strong correlation between unavailability and geographic location. Median connection latency from the measurement site was in the 0.2 - 0.5s range to other North American sites and the 0.4 - 2.5s to European sites, depending upon the day of the week. Latencies were much more variable to Europe than to North America. The magnitude of the latencies suggest the addition of an MGET method to HTTP to help alleviate large TCP set-up times associated with the retrieval of web pages with embedded images. The data show that 97% and 99% of all successful connections from the measurement site to Europe and North America respectively were made within the first 10 s. This suggests the establishment of client-side time-out intervals much shorter than those used for normal TCP connection establishment.

CS-94-13 (Not available on-line)
Benitez, M. E.. Register Allocation and Phase Interactions in Retargetable Optimizing, April 18, 1994.

CS-93-70 (Not available on-line)
Dickens, P. M.. Analysis of the Aggressive Global Windowing Algorithm, January 1993.

CS-93-50 (Not available on-line)
Wulf, W. A.; Prey, J. C.. Laboratory Experience for an Introductory Computer Science Course, September 14, 1993.

CS-93-49 (Not available on-line)
Dunn, M. F.. Cellular Tools Extended Domain Analysis, August 31, 1993.

CS-93-47 (Not available on-line)
Weaver, A. C.. Teleradiology, Teleconferencing and Teleconsultation, June 1, 1993.

CS-93-44 (Not available on-line)
Street, F.; Weaver, A. C.. A Video Mail Distribution System for Networked Personal Computers, June 1, 1993.

Dunn, M. F.; Knight, J. C.. Certification of Reusable Software Parts, August 31, 1993.

CS-93-38 (Not available on-line)
Weissman, J. B.; Grimshaw, A. S.. Multigranular Scheduling of Data Parallel Programs, July 8, 1993.

CS-93-35 (Not available on-line)
Wulf, W. A.; Moyer, S. A.. National Science Foundation Workshop on High Performance Memory, June 18, 1993.

CS-93-21 (Not available on-line)
Street, F.; Weaver, A.. A Video Mail Distribution System for Networked Personal Computers, April 16, 1994.

CS-93-20 (Not available on-line)
Michel, J.; Waterman, A.; Weaver, A.. Performance Evaluation of an Off-Host Communications Architecture, April 16, 1993.

CS-93-19 (Not available on-line)
Lavinus, J. W.; Cohoon, J. P.. Routing a Multi-Terminal Critical Net: Steiner Tree Construction in the Presence, April 12, 1993.

CS-93-15 (Not available on-line)
Lavinus, J. W.; Cohoon, J. P.. Complexity Results for Special Cases of Obstacle - Avoiding Rectilinear, March 23, 1993.

CS-93-12 (Not available on-line)
Koeritz, C. A.; Knight, J. C.. Specifying Program Semantics Precisely and Hierarchically: Consequences, January 26, 1993.

CS-93-11 (Not available on-line)
Koeritz, C. A.; Knight, J. C.. Anomalies Encountered in Ada Exception Handling, January 26, 1993.

CS-93-10 (Not available on-line)
Weaver, A.. High Speed Communication for Distributed Applications, January 18, 1993.

CS-93-09 (Not available on-line)
Weaver, A.. XTP: A New Communications Protocol for Factory Automation, January 18, 1993.

CS-93-06 (Not available on-line)
Dickens, P.; Reynolds, P.. Analysis of the Aggressive Global Windowing Algorithm, January 8, 1993.

CS-93-04 (Not available on-line)
Strayer, T.; Weaver, A.. Xpress Transfer Protocol Tutorial, January 7, 1993.

CS-92-39 (Not available on-line)
Lavinus, J. W.; Cohoon, J. P.. An Analogue of Hunan's Theorem in the Presence of Obstacles, November 3, 1992.

CS-92-38 (Not available on-line)
Lavinus, J. W.; Cohoon, J. P.. Optimal Routing of Critical But Typical Nets, October 20, 1992.

CS-92-34 (Not available on-line)
Dunn, M. F.; Knight, J. C.. Automating The Detection of Reusable Parts in Existing Software, September 17, 1992.

CS-92-31 (Not available on-line)
Himmick, S.. A Semantic Base for Verifying the Security of a Computer System, October 5, 1992.

CS-92-29 (Not available on-line)
Bapat, S.; Cohoon, J. P.. A Parallel VLSI Circuit Layout Methodology, September 15, 1992.

CS-92-28 (Not available on-line)
Widener, P.; Briercheck, G. S.; Himmick, S.; McKee, S. A.; Peri, R. V.; Wulf, W. A.. The WM Protection Mechanism, August 1992.

CS-92-25 (Not available on-line)
Pausch, R.; Crea, T.. A Literature Survey for Virtual Environments: Military Flight, August 19, 1992.

CS-92-23 (Not available on-line)
Knight, J. C.; Kienzle, D.. Safety-Critical Computer Applications: The Role of Software Engineering, July 3, 1992.

CS-92-19 (Not available on-line)
Reynolds Jr., P. F.; Williams, C. C.; Wagner, R. R.. Empirical Analysis of Isotach Networks, June 1992.

CS-92-18 (Not available on-line)
Dickens, P. M.; Reynolds Jr., P. F.; Duva, J. M.. State Saving and Rollback Costs for an Aggressive Global Windowing, July 1, 1992.

CS-92-17 (Not available on-line)
Kienzle, D. M.. A Tourist's Guide to Formal Methods, May 29, 1992.

CS-92-16 (Not available on-line)
Ammann, P. E.; Brilliant, S. S.; Knight, J. C.. The Effect of Imperfect Error Detection on Reliability Assessment, May 29, 1992.

CS-92-15 (Not available on-line)
Myers, E. A.; Knight, J. C.. An Improved Software Inspection Technique and An Empirical Evaluation, May 28, 1992.

CS-92-14 (Not available on-line)
Knight, J. C.. Issues in the Certification of Reusable Parts, May 21, 1992.

CS-92-13 (Not available on-line)
Weaver, A. C.; McNabb, J. F.. Digitized Voice Distribution Using XTP and FDDI, May 15, 1992.

CS-92-10 (Not available on-line)
Dickens, P. M.; Reynolds Jr., P. F.; Duva, J. M.. Performance and Scalability Results for an Aggressive Global, April 22, 1992.

CS-92-09 (Not available on-line)
Oh, Yingfeng; Son, S. H.. Fault-Tolerant Real Time Multiprocessor Scheduling, April 9, 1992.

CS-92-07 (Not available on-line)
Son, S. H.; Lee, Juhnyoung; Lin, Yi. Hybrid Protocols Using Dynamic Adjustment of Serialization Order, March 12, 1992.

CS-92-06 (Not available on-line)
Yang, Yiechan; Hsu, Lifeng; Son, S. H.. Distribute Algorithms for Efficient Deadlock Detection and Resolution, February 19, 1992.

CS-92-05 (Not available on-line)
Weaver, A. C.. Briefing Slides for a Feasibility Study of Digitized Voice Distribution, February 11, 1992.

CS-92-04 (Not available on-line)
Weaver, A. C.. A Feasibility Study of Digitized Voice Distribution via the XPress, January 20, 1992.

CS-92-01 (Not available on-line)
Myers, E. A.; Knight, J. C.. Phased Inspections and Their Implementation, January 7, 1992.

CS-91-38 (Not available on-line)
Reynolds Jr., P. R.; Wayhner, R. R.. A Local Synchrony Implementation: Banyan Networks, December 20, 1991.

CS-91-37 (Not available on-line)
Bapat, S.; Cohoon, J. P.. Parfect: A Hierarchical VLSI Layout Methodology Based on Geometric Partitioning and Steiner Routing, December 19, 1991.

CS-91-36 (Not available on-line)
Bapat, S.; Cohoon, J. P.. Ruff: The Parfect Tool for Course and Gobal Routing, December 19, 1991.

CS-91-33 (Not available on-line)
Williams, Craig; Reynolds Jr., Paul F.. Combining Atomic Actions in a Recombining Network, November 1991.

CS-91-30 (Not available on-line)
Davidson, J. W.; Whalley, D. B.. An Environment for Integrating Architectural Design with Compiler Development, October 1991.

CS-91-27 (Not available on-line)
Pausch, Randy. A White Paper on Virtual Reality: Fall 1991, October 1991.

CS-91-26 (Not available on-line)
Conway, Matt; Pausch, Randy. An Introduction to External Control, October, 1991.

CS-91-25 (Not available on-line)
Conway, Matt; Pausch, Randy; DeLine, Robert; Shackelford, Anne. A Reference Manual for SUIT, the Simple User Interface Toolkit, Version 2.0, October, 1991.

CS-91-24 (Not available on-line)
Conway, Matt; Pausch, Randy; Passarella, Kimberly. A Tutorial for SUIT, the Simple User Interface Toolkit, Version 2.0, October, 1991.

CS-91-18 (Not available on-line)
Bapat, S.; Cohoon, J. P.; Heck, P. L.; Ju, A.; Randall, L. J.. Examining Routing Solutions, October 1991.

CS-91-17 (Not available on-line)
Wulf, Wm. A.. Development of a Set of 'Closed Laboratories' for an Undergraduate Computer Science Curriculum, September 1991.

CS-91-16 (Not available on-line)
Wulf, Wm. A.; Wad, Rohit. WM FIFOs: Size Analysis, July 1991.

CS-91-15 (Not available on-line)
Richards, Dana; Liestman, Arthur L.. Degree-Constrained Pyramid Spanners, June 1991.

CS-91-13 (Not available on-line)
Son, Sang H.; Beckinger, Robert. MRDB: A Multi-user Real-Time Database Manager for Time-Critical Applications, May 1991.

CS-91-12 (Not available on-line)
Son, Sang H.; Kouloumbis, Spiros. A Real-Time Synchronization Scheme for Replicated Data in Distributed Database Systems, May 1991.

CS-91-11 (Not available on-line)
Son, Sang H.; Kouloumbis, Spiros. Performance Evaluation of Replication Control Algorithms for Distributed Database Systems, May 1991.

CS-91-10 (Not available on-line)
Knight, John C.; Myers, Ethella Ann. Phased Inspections and Their Implementation, May 1991.

CS-91-09 (Not available on-line)
Knight, John C.. The Certification of Reusable Components, April 1991.

CS-91-08 (Not available on-line)
Knight, John C.; Ammann, Paul E.; Santos, Steven J.. The Effectiveness of Data Diversity As An Error Detection, April 1991.

CS-91-06 (Not available on-line)
Holler, Anne M.. A Study of the Effects of Subprogram Inlining, May 1991.

CS-91-05 (Not available on-line)
Benitez, M. E.; Chan, Paul; Davidson, Jack; Holler, Anne; Meloy, Sue; Santhanam, Vatsa. ANDF: Finally an UNCOL After 30 Years, March 1991.

CS-91-04 (Not available on-line)
Wulf, Wm. A.. A Proposal for WM Interprocess Communication, March 1991.

CS-91-03 (Not available on-line)
Wulf, Wm. A.; Jones, Anita K.. WM Protection: The Base Mechanism, March 1991.

CS-91-02 (Not available on-line)
Dempsey, Bert J.; Weaver, Alfred C.. A Reliable Multicast Using the Xpress Transfer Protocol, January 1991.

CS-91-01 (Not available on-line)
Salowe, J. S.. Shallow Interdistance Selection and Interdistance Enumeration, January 1991.

CS-90-31 (Not available on-line)
Davis, Timothy A.; Strayer, W. Timothy; Weaver, Alfred C.. Transmitting Graphics Images Over a Local Area Network, November 1990.

CS-90-30 (Not available on-line)
Simoncic, Robert; Weaver, Alfred C.; Colvin, Alex M.. Experience with the Xpress Transfer Protocol, October 1990.

CS-90-29 (Not available on-line)
Pausch, Randy. A Tutorial for SUIT: The Simple User Interface Toolkit, September 1990.

CS-90-28 (Not available on-line)
Salowe, Jeffrey. Construction of Multidimensional Spanner Graphs, with Applications to Minimum Spanning Trees , September 1990.

CS-90-27 (Not available on-line)
Aylor, J. H.; Cohoon, J. P.; Feldhousen, E. L.; Johnson, B. W.. Gate - A Genetic Algorithm for Compacting Randomly Generated Test Sets, September 1990.

CS-90-26 (Not available on-line)
Bapat, S.; Cohoon, J. P.. Sharp-Looking Geometric Partitioning, September 1990. .

CS-90-25 (Not available on-line)
Deshpande, A. S.; Richards, D. S.. A Platform for Biological Sequence Comparison on Parallel Computers , September 1990.

CS-90-24 (Not available on-line)
Ortega, James M.. Ordering for Conjugate Gradient Preconditionings, August 1990.

CS-90-20 (Not available on-line)
Davidson, Jack W.; Whalley, David. Fast Function Calls and Returns, August 1990.

CS-90-19 (Not available on-line)
Jones, Anita K.; Wad, Rohit. The WM Computer Architectures: Military Standard Manual, July 1990.

CS-90-18 (Not available on-line)
Dempsey, Bert J.; Weaver, Alfred C.. Issues in Designing Transport Layer Multicast Facilities, July 1990.

CS-90-17 (Not available on-line)
Pausch, Randy; Leatherby, James H.. A Study Comparing Mouse-Only Input for a Graphical Editor, July 1990.

CS-90-16 (Not available on-line)
Richards, Dana; Liestman, Arthur. Degree-Constrained Pyramid Spanners, July 1990.

CS-90-15 (Not available on-line)
Jones, Anita K.; Aylor, James H.. End-to-End Experimental Systems Research, July 1990.

CS-90-14 (Not available on-line)
Son, Sang H.; Wagle, Prasad. Using Dynamic Priority in Real-Time Database Systems, July 1990 .

CS-90-13 (Not available on-line)
Son, Sang H.; Lin, Yi. Concurrency Control Using Priority-Based Locking, June 1990 .

CS-90-12 (Not available on-line)
Grimshaw, Andrew S.; Chisholm , D; Segal, B.; Benitez, M.; Salinas, M.; Kester, P.; McCalla, S.. Implementation Independent Architectural Comparison, June 1990 .

CS-90-11 (Not available on-line)
Richards, D. S.; Salowe, J. S.. A Linear-Time Algorithm to Construct a Rectilinear Steiner Minimal Tree for k-External Point Sets , May 1990 .

CS-90-10 (Not available on-line)
Davidson, J. W.; Whalley, D. B.. Methods for Saving and Restoring Register Values Across Function, May 8, 1990.

CS-90-09 (Not available on-line)
Grimshaw, A. S.. The Mentat Run-Time System , April 1990.

CS-90-08 (Not available on-line)
Grimshaw, A. S.; Loyot, E.. The Mentat Programming Language Users Manual and Tutorial , April 1990.

CS-90-07 (Not available on-line)
Pausch, Randy. Transactional User Interfaces, March 1990.

CS-90-06 (Not available on-line)
Pausch, Randy; Williams, Ronald D.. Tailor: Creating Custom User Interfaces Based on Gesture, March 1990.

CS-90-05 (Not available on-line)
Wulf, W. A.; Hitchcock, Charles. The WM Family of computer Architectures , March 1990.

CS-90-04 (Not available on-line)
Dempsey, Bert J.; Weaver, Alfred C.. Requirements for Multicast in a Technical Workstation Environment, March 1990.

CS-90-03 (Not available on-line)
Wagle, Prasad; Son, Sang. Scheduling Using Dynamic Priority in Real-Time Database Systems, March 1990.

CS-90-02 (Not available on-line)
Wulf, Wm. A.. The WM Computer Architecture, January 1990.

CS-90-01 (Not available on-line)
Son, Sang H.; Chiang, Shi-Chin. Experimental Evaluation of a Concurrent Checkpointing Algorithm, January 1990.

CS-89-19 (Not available on-line)
Peden, Jeffery H.; Weaver, Alfred C.. The Utilization of Priorities on Token Ring Networks, August 1989.

CS-89-17 (Not available on-line)
Williams, Craig; Reynolds Jr., Paul F.. On Variables as Access Sequences in Parallel Asynchronous Computations, December 1989.

CS-89-16 (Not available on-line)
Reynolds Jr., Paul F.; Williams, Craig; Wagner Jr., Raymond R.. Parallel Operations , December 1989.

CS-89-15 (Not available on-line)
Dempsey, Bert J.; Strayer, W. Timothy; Weaver, Alfred. Issues in Providing a Reliable Multicast Facility, November 1989.

CS-89-14 (Not available on-line)
Davidson, Jack; Whalley, David. Reducing the Cost of Branches by Using Registers, November 1989.

CS-89-13 (Not available on-line)
Son, Sang H.; Chang, Chun-Hyon. Performance Evaluation of Real-Time Locking Protocols Using a , November 1989.

CS-89-12 (Not available on-line)
Liestman, Arthur; Richards, Dana. Edge-Labelled Gossip in Rands, July 1989.

CS-89-10 (Not available on-line)
Sanders, Robert M.. The Xpress Transfer Protocol (XTP) - A Tutorial, October 1989 .

CS-89-09 (Not available on-line)
Wulf, Wm. A.. The WM Computer Architectures Principles of Operation, October 1989.

CS-89-08 (Not available on-line)
Davidson, Jack W.; Whalley, David B.. Ease: An Environment for Architecture Study and Experimentation , September 1989.

CS-89-05 (Not available on-line)
Son, Sang H.; Ratner, Jeremiah M.. Starlite: An Environment for Distributed Database Prototyping, August 1989.

CS-89-01 (Not available on-line)
Haghighi, Navid; Son, Sang H.. Multiple Date Versions in Database Systems , June 1989.

Liebeherr, Jorg; Wrege, Dallas E.. Design and Analysis of a High-Performance Packet Multiplexer for Multiservice Networks with Delay Guarantees, July 1994.

Liebeherr, Jorg; Wrege, Dallas E.; Ferrari, Domenico. Exact Admission Control for Networks with Bounded Delay Services, July 1994.

Bailey, Mark W.; Davidson, Jack W.. A Formal Model for Procedure Calling Conventions, July 22, 1994.

Abstract: Procedure calling conventions are used to provide uniform procedure-call interfaces. Applications, such as compilers and debuggers, which generate, or process procedures at the machine-language abstraction level require knowledge of the calling convention. In this paper, we develop a formal model for procedure calling conventions called P-FSA's. Using this model, we are able to ensure a number of completeness and consistency properties of calling conventions. Currently, applications that manipulate procedures implement conventions in an ad-hoc manner. The resulting code is complicated with details, difficult to maintain, and often riddled with errors. To alleviate the situation, we introduce a calling convention specification language, called CCL. The combination of CCL and P-FSA's facilitates the accurate specification of conventions that can be shown to be both consistent and complete.

Pancerella, Carmen. Reduction Operations in Parallel Discrete Event Simulation (PhD Dissertation), May 19, 1994.

Abstract: Building on Reynolds's hardware/software framework for parallel discrete event simulation (PDES), we establish a number of novel and best known results based on the use of reduction-based computing to support PDES. We demonstrate the utility of reduction-based computing to a spectrum of well-known PDES synchronization protocols, such as conservative techniques and Time Warp. We enhance the hardware portion of Reynolds's framework at three levels: 1) we define a virtual computation model, 2) we develop a functional design, and 3) we present a detailed implementation of this design. Each of the preceding steps is based on correctness criteria we establish here. We develop novel algorithms for performing reduction-based message acknowledgments. We prove the correctness of one of them, a single phase acknowledgment algorithm that takes advantage of the existence of global virtual time. Finally, we introduce target- specific reductions, a very promising strategy for disseminating near-perfect state information in PDES's. A target-specific reduction is one where each logical process receives synchronization information (reduced values) only from those logical processes on which it is logically dependent. We demonstrate that the computation of target-specific values can have a sub-quadratic sequential time complexity. Supporting empirical results clearly demonstrate that target-specific reductions will provide significant time and space savings in PDES's.

Liebeherr, Jorg; Akyildiz, Ian F.; Sarkar, Debapriya. Bandwidth Regulation of Real-Time Traffic Classes in Internetworks, June 20, 1994.

Dempsey, Bert J.. Retransmission-Based Error Control for Continuous Media Traffic in Packet-Switched Networks (PhD dissertation), June 19, 1994.

Pfaltz, John L.. Partitions of 2^n, June 16, 1994.

Grimshaw, Andrew S.; Wulf, William A.; French, James C.; Weaver, Alfred C.; Reynolds Jr., Paul F.. Legion: The Next Logical Step Toward a Nationwide Virtual Computer, June 08, 1994.

Grimshaw, Andrew S.; Wulf, William A.; French, James C.; Weaver, Alfred C.; Reynolds Jr., Paul F.. A Synopsis of the Legion Project, June 08, 1994.

Lehr, Matthew R.; Son, Sang H.. Managing Contention and Timing Constraints in a Real-Time Database System, June 07, 1994.

Abstract: This technical report discusses how current real-time technology has been applied to a database management system to support firm real-time transactions. The report reviews priority-based CPU- and resource scheduling concepts and shows how they are used to avoid the problem of priority inversion in transaction service order, transaction progress, and memory allocation. Next, the appropriateness of optimistic concurrency control to real-time data management is examined, and the implementation of previously proposed methods WAIT-X(S) and Precise Serialization is detailed. Finally, the enforcement of firm deadlines using asynchronous aborts is discussed.

West, Emily A.. Combining Control and Data Parallelism: Data Parallel Extensions to the Mentat Programming Language, May 18, 1994.

Weissman, Jon B.; Grimshaw, Andrew S.. Network Partitioning of Data Parallel Computations, May 25, 1994.

Son, Sang H.; Agarwal, Nipun. A Model for Specification and Communication of Data for Distributed Multimedia Systems, May 13, 1994.

Pfaltz, John L.. Closure Lattices, May 1, 1994 (Revised).

Abstract: Atomic, upper semi-modular lattices, or geometric lattices, have been well studied because of their connections with matroids, projective geometries, subspaces of vector spaces, and other mathematical areas of interest. Closure lattices, although also very regular, are neither atomic nor semi-modular and so have been largely ignored. Yet, they arise naturally in the study of partial order relations, or acyclic directed graphs.

Alexander, Michael J.; Robins, Gabriel. New Graph Arborescence and Steiner Constructions for High-Performance FPGA Routing, April 11, 1994.

Abstract: The flexibility and reusability of field-programmable gate arrays (FPGAs) enable significant speed and cost savings in the VLSI design/validation/simulation cycle. However, this is achieved at a substantial performance penalty due to signal delays through the programmable interconnect. This motivates a critical-net routing objective which seeks to minimize source-sink signal propagation delays; we formulate this objective as a graph Steiner arborescence (i.e., shortest-path tree with minimum wirelength) problem and propose an effective heuristic which produces routing trees with optimal source-sink pathlengths, and having wirelength on par with the best existing graph Steiner tree heuristics. Our second contribution is a new class of greedy Steiner tree constructions in weighted graphs, based on an iterated application of an arbitrary given graph Steiner heuristic; this construction significantly outperforms the best known graph Steiner tree heuristics of Kou, Markowsky and Berman, and of Zelikovsky. We incorporated our algorithms into an actual router, enabling the complete routing of several industrial designs while requiring a reduced maximum channel width. All of our methods are directly applicable to other graph-based routing regimes, such as building-block design, routing in the presence of obstacles, etc., as well as in non-CAD areas such as multicasting in communication networks.

Davidson, Jack W.; Jinturkar, Sanjay. Memory Access Coalescing: A Technique for Eliminating Redundant Memory Accesses, November 17, 1993.

Bailey, Mark W.; O'Bagy, Janalee. FLECS: A Tool for Rapid Prototyping of Mechanisms in Success/Failure Based Languages, July 1990.

Abstract: Our goal is to provide a prototyping tool that facilitates the addition of new language mechanisms in success/failure based languages. By this, we mean languages that have success and failure of expressions, generators, and backtracking. We have designed an interpreter, called FLECS, for a subset of Icon, a language which exhibits the above features. FLECS is written in Scheme using a technique called continuation-passing-style (CPS). This approach allows us to extend, modify, and experiment with new language mechanisms which range from datatyping issues to general control structures. In this report, we provide an overview of the base semantics that FLECS implements, fol lowed by a brief description of the CPS implementation technique. FLECS's capability is then demonstrated by augmenting the base language with a general control abstraction that has been shown to be powerful in the context of traditional procedural languages. In conclusion, the seman tics of this control abstraction in the presence of success and failure is discussed.

Liebeherr, Jorg; Akyildiz, Ian F.. Deadlock Properties of Queueing Networks with Finite Capacities and Multiple Routing Chains, March 21, 1994.

Abstract: Blocking in queueing network models with finite capacities can lead to deadlock situations. In this paper, deadlock properties are investigated in queueing networks with multi- ple routing chains. The necessary and sufficient conditions for deadlock-free queueing networks with blocking are pro- vided. An optimization algorithm is presented for finding deadlock-free capacity assignments with the least total capacity. The optimization algorithm maps the queueing net- work into a directed graph and obtains the deadlock freedom conditions from a specified subset of cycles in the directed graph. In certain network topologies, the number of deadlock freedom conditions can be large, thus, making any optimiza- tion computationally expensive. For a special class of topo- logies, so-called {\m tandem networks}, it is shown that a minimal capacity assignment can be directly obtained without running an optimization algorithm. Here, the solution to the minimal capacity assignment takes advantage of the regular topology of tandem networks. March 21, 1994

Boese, Kenneth D.; Kahng, Andrew B.; McCoy, Bernard A.; Robins, Gabriel. Rectilinear Steiner Trees with Minimum Elmore Delay, March 19, 1994.

Grimshaw, A.S.; Loyot Jr., E.C.; Smoot, S.; Weissman, J.B.. *** Please refer to CS-94-06 for current release *** Mentat User's Manual, November 3, 1991.

Grimshaw, A.S.; Loyot, E.C.; Weissman, J.B.. ***Please refer to CS-94-05 for the current release*** Mentat Programming Language (MPL) Reference Manual, November 3, 1991.

Knight, John C.; Cass, Aaron G.; Fernandez, Antonio M.; Wika, Kevin G.. Testing a Safety-Critical Application, February 18, 1994.

Wika, Kevin G.; Knight, John C.. A Safety Kernel Architecture, February 18, 1994.

Dempsey, Bert J.; Liebeherr, Jorg; Weaver, Alfred C.. On Retransmission-Based Error Control for Continuous Media Traffic in Packet-Switching Networks, February 18, 1994.

Group, The Mentat Research. Mentat 2.5 Programming Language Reference Manual, February 17, 1994.

Group, The Mentat Research. Mentat 2.6 Release Notes, February 17, 1994.

Group, The Mentat Research. Mentat 2.5 User's Manual, February 17, 1994.

McKee, Sally A.. Uniprocessor SMC Performance on Vectors with Non-Unit Strides, January 31, 1994.

McKee, Sally A.. Hardware Support for Dynamic Access Ordering: Performance of Some Design Options, August 09, 1993.

Srinivasan, Sudhir; Reynolds Jr., Paul F.. Super-criticality revisited., May 28, 1993.

Abstract: Critical path analysis has been suggested as a technique for establishing a lower bound on the completion times of all parallel discrete event simulations (PDES). A protocol is super-critical if there is at least one simulation that can complete in less than the critical path time using that protocol. Previous studies have shown that several practical protocols are super-critical while others are not. We present a sufficient condition to demonstrate that a protocol is super-critical. Also, we show that a condition used in a previous study is not sufficient but necessary for super-criticality. It has been claimed that super-criticality requires independence of one or more messages (or states) on events in the logical past of those messages (states). We present an example which contradicts this claim and examine the implications of this contradiction on lower bounds.

Burchard, Almut; Liebeherr, Jorg; Oh, Yingfeng; Son, Sang H.. Assigning Real-Time Tasks to Homogeneous Multiprocessor Systems, January 06, 1994.

Abstract: Optimal scheduling of real-time tasks on multiprocessor systems is known to be computationally intractable for large task sets. Any practical scheduling algorithm for assigning real-time tasks to a multiprocessor system presents a trade-off between its computational complexity and its performance. The performance of a scheduling algorithm is measured in terms of the additional number of processors required to arrive at a schedule without deadline violations as compared to an optimal algorithm. In this study, new schedulability conditions are presented for homogeneous multiprocessor systems where individual processors execute the rate-monotonic scheduling algorithm. The conditions are used to develop new strategies for assigning real-time tasks to processors. The performance of the new strategies is shown to be significantly better than suggested by the existing literature. Under the (realistic) assumption that the load of each real-time task is small compared to the processing speed of each processor, it is shown that all processors can be almost fully utilized. Task assignment strategies are proposed for scenarios where the task set is known a priori (off-line schemes), and where the real-time task set can change dynamically (on-line schemes).

Hodes, Todd D.; McCoy, Bernard A.; Robins, Gabriel. Dynamically-Wiresized Elmore-Based Routing Constructions, December 15, 1993.

Robins, Gabriel; Wrege, Dallas E.; Zhang, Tongtong; Pearson, William R.. On the Primer Selection Problem in Polymerase Chain Reaction Experiments, November 15, 1993.

Yasinsac, Alec; Wulf, William A.. Evaluating Cryptographic Protocols, December 22, 1993.

Agarwal, Nipun; Son, Sang H. Experiences with the development of a multimedia information system for medical applications, July 12, 1993 (Revised Dec 15, 1993).

McKee, Sally A.. An Analytic Model of SMC Performance, November 15, 1993.

CS-93-52 (Not available on-line)
Varanelli, James M.; Cohoon, James P.. A Fast Method for Generalized Starting Temperature Determination in Two-Stage Simulated Annealing Systems, September 23, 1993.

Davidson, Jack W.; Whalley, David B.. A Design Environment for Addressing Architecutre and Compiler Interactions, November 24, 1993.

Davidson, Jack W.; Holler, Anne M.. Subprogram Inlining: A Study of its Effects on Program Execution Time, November 17, 1989.

Abstract: Equations representing the execution time performance on noninlined and inlined versions of a program have been developed. The accuracy of the equations' description of inlined program execution time behavior was demonstrated on four computer systems. Using the equations, understanding of how certain factors influence the speed of inlined code was gained. Contrary to a number of published reports in the literature, the increased size of inlined code was not found to affect its execution time performance on demand-paged virtual memory machines. On such systems, neither the use of an inlining algorithm that includes program size constraints nor the substitution of interprocedural data flow analysis for inlining is warranted. A modest improvement in the caching and paging behavior of test programs' inlined versions was also observed. Situations in which inlining was not beneficial, which were related to its interactions with register allocation, were characterized. Furthermore, investigation with the equations provided insights that can be of value to machine and compiler designers.

Benitez, Manuel E.; Davidson, Jack W.. Code Generation for Streaming: an Access/Execute Mechanism, December 7, 1990.

Davidson, Jack W.; Rabung, John R.; Whalley, David B.. Relating Static and Dynamic Machine Code Measurements, July 13, 1989.

Benitez, Manuel E.; Davidson, Jack W.. The Advantages of Machine-Dependent Global Optimization, November 17, 1993.

Benitez, Manuel E.; Davidson, Jack W.. Register Deprivation Measurements, November 15, 1993.

Abstract: The development of register deprivation measurements was motivated by the desire to study the effect that register demand has on code improvement and register allocation strategies. In addition to the obvious application of testing the spill mechanism used by the compiler's register allocator, the register deprivation strategy can be used to determine the relationship between the number of allocable registers and the effectiveness of a code improver as a whole or its optimization phases individually, enhance the coverage of validation suites, evaluate the register demands of benchmark suites and help machine designers determine the optimal number of registers needed to support existing compiler technologies. This paper contains a description of register deprivation techniques, presents some of their most useful applications, and discusses issues that must be addressed in order to incorporate this technique into a compiler. Also included are register deprivation measurement results obtained using a modest set of benchmarks that provide interesting and somewhat unexpected insights pertaining to optimizations, benchmark programs and architectures.

Benitez, Manuel E.; Davidson, Jack W.. A Retargetable Integrated Code Improver, November 15, 1993.

Abstract: We present a retargetable, machine-level framework that tightly integrates the three primary functions performed by an optimizer: code generation, register allocation and code improvements. This framework is highly retargetable, able to fully exploit unique architectural features and, unlike most integrated frameworks, easily extended to include a comprehensive set of local and global optimizations. A unique approach to performing transformations improves register allocation, reduces interactions between the code improvement phases and enhances retargetability by eliminating the need to determine an effective phase ordering for each target architecture.

Bailey, Mark W.; Davidson, Jack W.. A Formal Specification for Procedure Calling Conventions, November 5, 1993.

Abstract: Procedure calling conventions are used to provide uniform procedure-call interfaces. Applications, such as compilers and debuggers, which generate, or process procedures at the machine-language level require an accurate description of the calling convention in use. Until now, the portion of such an application that concerns itself with the procedure call interface was implemented in an ad-hoc manner. The resulting code is complicated with details, difficult to maintain, and often incorrect. In this paper, we present a model and language, called CCL, for specifying procedure calling conventions. The language can be used to automatically generate the calling-convention specific portions of applications. It also serves as accurate documentation for a given calling convention. CCL provides a concise and natural method of specifying calling conventions that are easy to read and modify. By using a convention specification to automatically generate code, one can enhance the retargetablility of applications that make use of procedure calling conventions. Additionally, different calling conventions can easily be implemented, thereby simplifying experimentation with different call conventions to increase the efficiency of the procedure call.

Michel, Jeffrey R.. The Design and Evaluation of an Off-Host Communications Protocol Architecture, August 1, 1993.

Wulf, William A.; Yasinsac, Alec; Oliver, Katie S.; Peri, Ramesh. A Technique for Remote Authentication, July 01, 1993.

Son, Sang H.; Agarwal, Nipun. Synchronization of Temporal Constructs in Distributed Multimedia Systems with Controlled Accuracy, October 28, 1993.

Abstract: With the inception of technology in communication networks such as ATM it will be pos- sible to run multimedia applications on future integrated networks. Synchronization of the related media data is one of the key characteristics of a multimedia system. In this paper we present a scheme for synchronization of multimedia data across a network where the accuracy of detecting asynchronization and predicting the future asynchrony is variable and can be tailored to the intended application. The protocol has been designed keeping in mind characteristics of ATM networks such as the absence of global synchronized clocks and utilizing features as the QOS promised by them. The multimedia data when sent across the network may also be stored at an intermediate node and later retrieved for dis- play. We extend the scheme and present a mechanism wherein synchronization of all the possible temporal constructs is supported and not restricted to the "in-parallel" construct which is only one of the thirteen possible temporal relations.

Alexander, Michael J.; Robins, Gabriel. An Architecture-Independent Unified Approach to FPGA Routing, October 8, 1993.

Abstract: Field-programmable gate arrays (FPGAs) are an inexpensive and flexible low risk design alternative to custom integrated circuits. While FPGA partitioning and technology mapping have been well-studied, FPGA routing has received much less attention. In this paper we propose a unified general framework for FPGA routing, which allows simultaneous optimization of multiple competing objectives under a smooth designer-controlled tradeoff. Our approach is based on a new and general multi-weighted graph formulation, enabling a good theoretical performance characterization, as well as an effective, practical implementation. Our router is architecture-independent, computationally efficient, and performs very well on industrial benchmarks. Finally, our multi-weighted graph formulation is quite general and is directly applicable to many other areas of combinatorial optimization.

Dempsey, Bert; Liebeherr, Jorg; Weaver, Alfred. A Delay Sensitive Error Control Scheme for Continuous Media Communications, October 05, 1993.

Yasinsac, Alec; Wulf, William A.. A Formal Semantics for Evaluating Crytographic Protocols, September 29, 1993.

Lehr, Matthew. StarBase v2.2 Implementation Details, September 03, 1993.

Boese, K. D.; Kahng, A. B.; McCoy, B. A.; Robins, G.. Near-Optimal Critical Sink Routing Tree Constructions, August 22, 1993.

Prey, Jane C.; Cohoon, James P.; Fife, Greg. Software Engineering Beginning In The First Computer Science Course, June, 1993.

MacCallum, Laurie; Grimshaw, Andrew S.. Linear Algebra Library for Mentat Applications, August 06, 1993.

Srinivasan, Sudhir; Reynolds Jr., Paul F.. Non-interfering GVT Computation via Asynchronous Global Reductions, April 01, 1993.

Grimshaw, Andrew S.; Weissman, Jon B.; Strayer, W. Timothy. Portable Run-Time Support for Dynamic Object-Oriented Parallel Processing, July 14, 1993.

Grimshaw, Andrew S.. The Mentat Computation Model Data-Driven Support for Object-Oriented Parallel Processing, May 28, 1993.

Wagner Jr., Raymond R.. On the Implementation of Local Synchrony, June 01, 1993.

Abstract: Local synchrony is a distributed approach to providing logically synchronous capabilities in an asynchronous system. Local synchrony provides a logical timeline to memory access, and thus a total ordering to the steps of a distributed computation, without many of the inherent consequences of global synchronization. Local synchrony offers a framework for the implementation of various concurrency control mechanisms, including combining and isochrons (parallel operations). We show the implementability of local synchrony, and demonstrate its compatibility with existing protocols for fault-tolerance in general purpose multiprocessors. The results of our research are threefold. We show first that practical implementation of local synchrony is feasible in general, asynchronous, MIMD architectures. We show that such implementations are compatible with existing hardware fault tolerance systems and present a correct rollback algorithm for our general implementation. Finally, we present both analytic and empirical studies which illustrate the performance characteristics of such implementations. Our analytic modeling technique extends in novel ways probabilistic modelling methods often employed for interconnec- tion networks. Our work establishes the practicality and feasibility of local synchrony as a means of providing support for concurrency control.

Michel, Jeffrey R.; Waterman, Alexander S.; Weaver, Alfred C.. Performance Evaluation of an Off-Host Communications Architecture, June 01, 1993.

Barrera, Tim; Griffith, Jeff; Robins, Gabriel; Zhang, Tongtong. Closing the Gap: Near-Optimal Steiner Trees in Polynomial Time, June 1, 1993.

Dempsey, Bert J.; Liebeherr, Jorg; Weaver, Alfred C.. A New Error Control Scheme for Packetized Voice over High-Speed Local Area Networks, May 24, 1993.

Oh, Yingfeng; Son, Sang H.. Scheduling Hard Real-Time Tasks with Tolerance of multiple processor failures, May 24, 1993.

Oh, Yingfeng; Son, Sang H.. Scheduling Hard Real-Time Tasks with 1-Processor-Fault-Tolerance, May 24, 1993.

Oh, Yingfeng; Son, Sang H.. Preemptive Scheduling of Periodic Tasks on Multiprocessor: Dynamic Algorithms and Their Performance, May 24, 1993.

Oh, Yingfeng; Son, Sang H.. Preemptive Scheduling of Tasks with Reliability Requirements in Distributed Hard Real-Time Systems, May 24, 1993.

Oh, Yingfeng; Son, Sang H.. Tight Performance Bounds of Heuristics for a Real-Time Scheduling Problem, May 24, 1993.

Robins, Gabriel; Salowe, Jeffrey S.. On the Maximum Degree of Minimum Spanning Trees, May 12, 1993.

Williams, Craig; Reynolds Jr., Paul F.. Highly Concurrent Cache Coherence Protocols, December 1990.

Dempsey, Bert J.; Fenton, John C.; Michel, Jeffrey R.; Waterman, Alexander S.; Weaver, Alfred. Ada Binding Reference Manual - SAFENET Lightweight Application Services, January 7, 1993.

Dempsey, Bert; Fenton, John; Michel, Jeff; Waterman, Alex; Weaver, Alfred. SAFENET Internals, January 7, 1993.

Dempsey, Bert J.; Fenton, John C.; Michel, Jeffrey R.; Waterman, Alexander S.; Weaver, Alfred C.. A Performance Evaluation of the SAFENET Lightweight System, February 9, 1993.

Dempsey, Bert J.; Michel, Jeffrey R.; Weaver, Alfred. Tutorial on UVA SAFENET Lightweight Communications Architecture, January 7, 1993.

Boese, K. D.; Kahng, A. B.; McCoy, B. A.; Robins, G.. Fidelity and Near-Optimality of Elmore-Based Routing Constructions, April 06, 1993.

McCoy, Bernard A.; Robins, Gabriel. Non-Tree Routing, April 12, 1993.

Moyer, Steven A.. Access Ordering and Effective Memory Bandwidth, April 05, 1993.

French, James C.. Heuristic Clustering of Signature Files: A New Approach (Preliminary Results), September 29, 1992.

French, James C.; Viles, Charles L.. A Software Toolkit for Prototyping Distributed Applications (Preliminary Report), September 29, 1992.

Srinivasan, Sudhir; Reynolds Jr., Paul F.. Hardware Support for Aggressive Parallel Discrete Event Simulation, January 11, 1993.

Grimshaw, Andrew S.; Strayer, W. Timothy; Narayan, Padmini. The Good News About Dynamic Object-Oriented Parallel Processing, December 17, 1992.

Grimshaw, Andrew S.; Weissman, Jon B.; West, Emily A.; Loyot Jr., Ed. MetaSystems: An Approach Combining Parallel Processing and Heterogeneous Distributed Computing Systems, December 29, 1992.

Abstract: Large metasystems comprised of a variety of interconnected high-performance architectures are becoming available to researchers. To fully exploit these new systems, software must be provided that is easy to use, supports large degrees of parallelism in applications code, and manages the complexity of the underlying physical architecture for the user. This paper describes our approach to constructing and exploiting metasystems. Our approach inherits features of earlier work on parallel processing systems and heterogeneous distributed computing systems. In particular, we build on Mentat, an object-oriented parallel processing system developed at the University of Virginia that provides large amounts of easy-to-use parallelism for MIMD architectures.

Barrera, Tim; Griffith, Jeff; McKee, Sally A.; Robins, Gabriel; Zhang, Tongtong. Toward a Steiner Engine: Enhanced Serial and Parallel Implementations of the Iterated 1-Steiner MRST Heuristic, December 01, 1992.

Boese, Kenneth D.; Kahng, Andrew B.; Robins, Gabriel. High-Performance Routing Trees With Identified Critical Sinks, November 1, 1992.

Alpert, C. J.; Cong, J.; Kahng, A. B.; Robins, G.; Sarrafzadeh, M.. Minimum Density Interconnection Trees, October 1, 1992.

Grimshaw, Andrew S.. Easy-to-use Object-Oriented Parallel Processing with Mentat, October 12, 1992.

Loyot Jr., Edmond C.. VMPP: A Virtual Machine for Parallel Processing, September 29, 1992.

French, James C.. Electronic Distribution of Technical Reports and Working Papers: A Simple Cooperative Approach, September 29, 1992.

Narayan, Padmini; Smoot, Sherry; Sarkar, Ambar; West, Emily; Grimshaw, Andrew; Strayer, Timothy. Portability and Performance: Mentat Applications on Diverse Architectures, July 22, 1992.

Alexander, Michael J.; Bailey, Mark W.; Childers, Bruce R.; Davidson, Jack W.; Jinturkar, Sanjay. Memory Bandwidth Optimizations for Wide-Bus Machines, August 4, 1992.

French, J.C.; Jones, A.K.; Pfaltz, J.L.. Scientific Database Management (Final Report), August 1990.

Reynolds Jr., P.F.; Pancerella, Carmen M.; Srinivasan, Sudhir. Making Parallel Simulations Go Fast, June 17, 1992.

Reynolds Jr., Paul F.; Pancerella, Carmen M.; Srinivasan, Sudhir. Design and Performance Analysis of Hardware Support for Parallel Simulations, June 10, 1992.

Pausch, Randy; Young II, Nathaniel R.; DeLine, Robert. The Pascal of User Interface Toolkits, October 7, 1991.

Reynolds Jr., Paul F.; Pancerella, Carmen M.. Hardware Support for Parallel Discrete Event Simulations, April 20, 1992.

Weaver, Alfred C.; Strayer, W. Timothy; Dempsey, Bert J.. Specification for a Real-Time Transport Protocol, January 10, 1989.

Strayer, W. Timothy; Weaver, Alfred C.. Evaluation of Transport Protocols for Real-Time Communication, June 7, 1988.

Strayer, W. Timothy; Dempsey, Bert J.; Weaver, Alfred C.. Making XTP Responsive to Real-Time Needs, December 1, 1989.

Strayer, W. Timothy. A Study of Preemptable vs. Nonpreemptable Token Reservation Access Protocols, December 5, 1990.

Strayer, W. Timothy. Communication Services for Real-Time Systems: An Examination of ARTS over XTP, December 6, 1991.

Strayer, W. Timothy. Function-Based Scheduling, December 6, 1991.

Strayer, W. Timothy; Weaver, Alfred C.. Is XTP Suitable for Distributed Real-Time Systems?, January 17, 1992.

Strayer, W. Timothy. Analysis of Deadline-Driven Scheduling Algorithms Using Function-Driven Techniques, May 4, 1992.

Grimshaw, Andrew S.; Loyot Jr., Edmond C.. ELFS: Object-Oriented Extensible File Systems, July 8, 1991.

Grimshaw, Andrew S.. An Introduction to Parallel Object-Oriented Programming with Mentat, April 4, 1991.

Pausch, Randy; Williams, R. D.. Giving CANDY to Children: User-Tailored Gesture Input Driving An Articulator Based Speech Synthesizer, October 7, 1991.

Pausch, Randy; Vogtle, L.; Conway, M.. One Dimensional Motion Tailoring for the disabled: A User Study, October 7, 1991.

Pausch, Randy; Leatherby, J. H.. Voice Input vs. Keyboard Accelerators: A User Study, October 7, 1991.

Pausch, Randy. Virtual Reality on Five Dollars a Day, October 7, 1991.

Strayer, W. Timothy. Function-Driven Scheduling: A General Framework for Expression and Analysis of Scheduling, May 14, 1992.

French, J.C.; Jones, A.K.; Pfaltz, J.L.. Scientific Database Management (Panel Reports and Supporting Material), August 1990.

Institute for Parallel Computation

Barker, Allen L.; Brown, Donald E.; Martin, Worthy N.. Static Data Association with a Terrain-Based Prior Density, August 05, 1994.
We consider the problem of estimating the states of a static set of targets given a collection of densities, each representing the state of a single target. We assume there is no a-priori knowledge of which of the given densities represent common targets, but that a prior density for the target locations is available. For a two-dimensional location estimation problem we construct a prior density model based on known features of the terrain. For a simple Gaussian association-estimation algorithm using a prior density we consider when the prior is most effective in data association, or correlation, and when it is most effective in state estimation. We present some simulation results and discuss some issues involved in measuring algorithm performance and in the algorithm implementation. We briefly discuss extensions to higher dimensional state spaces and non-static models.

IPC-93-01 (Not available on-line)
Rovnyak, Steven M.; Kretsinger, Stein E.; Thorp, James S.; Brown, Donald E.. Decision Trees for Real-Time Transient Stability Prediction, January 14, 1993.

IPC-92-11 (Not available on-line)
Brown, Donald E.; Huntley, Christopher L.. A Clustering Approach to Service District Selection, November 30, 1992.

IPC-92-10 (Not available on-line)
Elder IV, John F.. Perceptrons, Regression, and Global Network Optimization, September 4, 1992.

IPC-92-09 (Not available on-line)
Elder IV, John F.; Brown, Donald E.. Induction and Polynomial Networks, August 20, 1992.

IPC-92-07 (Not available on-line)
Freitag, Lori A.; Ortega, James M.. The RSCG Algorithm on Distributed Memory Architectures, August 10, 1992.

IPC-92-06 (Not available on-line)
Nayar, Narinder; Ortega, James M.. Computation of Selected Eigenvalues of Generalized Eigenvalue Problems, August 10, 1992.

IPC-92-03 (Not available on-line)
Huntley, Christopher; Brown, Donald E.; Markowicz, Bernard P.; Sappington, David E.. Freight Routing and Scheduling at CSX Transportation, June 2, 1992.

IPC-92-01 (Not available on-line)
Brown, Donald E.; Corruble, Vincent; Pittard, Clarence Louis. A Comparison of Decision Tree Classifiers with Backpropagation Neural Networks for Multi-Modal Classification Problems, March 13, 1992.

IPC-91-11 (Not available on-line)
McElrath, Rodney. A Look at Two Persistant Storage Models, December 10, 1991.

IPC-91-08 (Not available on-line)
Fulcher, George E.; Brown, Donald E.. A Polynomial Network for Predicting Temperature Distributions, June 26, 1991.

IPC-91-05 (Not available on-line)
Barker, Allen; Brown, Donald; Martin, Worthy. A Framework for the Compact Representation of Data Used by Data Fusion Algorithms, May, 1991.

IPC-91-04 (Not available on-line)
Brown, Donald E.; Huntley, Christopher L.; Garvey, Paul J.. Clustering of Homogeneous Subsets, February 25, 1991.

IPC-91-01 (Not available on-line)
Brown, Donald E.; Pomykalski, James J.. Reliability Estimation During Prototyping of Knowledge-Based Systems, January 11, 1991.

IPC-90-04 (Not available on-line)
Brill, F. Z.; Brown, D. E.; Martin, W. N.. Genetic Algorithms for Feature Selection for Counterpropagation Networks, April 9, 1990.

IPC-90-02 (Not available on-line)
Brown, D. E.; Pittard, C. L.; Spillane, A. R.. ASSET: A Simulation Test Bed for Evaluating Data Association Algorithms, January 22, 1990.

IPC-90-01 (Not available on-line)
Brown, D. E.; Markert, W. J.. Uncertainty Management with Imprecise Knowledge with Application to Design, January 4, 1990.

IPC-89-13 (Not available on-line)
French, James C.; Pratt, Terrence W.. Performance Measurement of Two Parallel File Systems, November, 1989.

IPC-89-07 (Not available on-line)
Son, S. H.; Haghighi, N.. Performance Evaluation of Multiversion Database Systems, July 25, 1989.

IPC-89-06 (Not available on-line)
Orlandic, R.. Design, Analysis and Applications of Compact 0-Complete Trees, May 22, 1989.

IPC-89-05 (Not available on-line)
Brown, D. E.. A Justification for Applying the Principle of Minimum Relative Entropy to Information Integration Problems, April 18, 1989.

IPC-89-04< (Not available on-line)
Spillane, A. R.; Pittard Jr., C. L.; Brown, D. E.. A Method for the Evaluation of Correlation Algorithms, April 1, 1989.

IPC-89-03 (Not available on-line)
Huntley, C. L.; Brown, D. E.. A Parallel Heuristic for Quadratic Assignment Problems, March 16, 1989.

IPC-89-01 (Not available on-line)
Stewart, B.S.. A Bibliography of Heuristic Search, January 30, 1989.

Barker, Allen L.; Brown, Donald E.; Martin, Worthy N.. Bayesian Estimation and the Kalman Filter, August 5, 1994 (revised Sept 19, 1994).
In this tutorial article we give a Bayesian derivation of a basic state estimation result for discrete-time Markov process models with independent process and measurement noise and measurements not affecting the state. We then list some properties of Gaussian random vectors and show how the Kalman filtering algorithm follows from the general state estimation result and a linear-Gaussian model definition. We give some illustrative examples including a probabilistic Turing machine, dynamic classification, and tracking a moving object.

Barros, J. E.; French, J. C.. Scalable Database Support for Correlation and Fusion Algorithms, October, 1993.
The process of extracting useful information from noisy sensor data is an important part of applications such as environmental analysis and military surveillance. Difficulties arise with analyzing the accuracy of the reports and in processing the total number of reports. Automated methods capable of handling noisy information from multiple sensors must be employed to assist the human analyst. This report investigates scalable database support for such systems. Region estimation and range retrieval techniques are employed to dramatically reduce retrieval and overall execution times.

Pfaltz, John L.. The ADAMS Language: A Tutorial and Reference Manual, April 1993.
This report describes the ADAMS language as it would be used by an applications programmer. It does not describe how ADAMS is implemented. The first three sections assume no knowledge of ADAMS whatever, and are quite tutorial in nature. Only basic, introductory concepts are covered. The remaining sections, although still tutorial, presume some familiarity, e.g. having coded simple programs of the same complexity as those in Section 3. The treatment in these sections is definitive, so that the report can be also used as a reference manual.

Moyer, Steven A.. Access Ordering Algorithms for a Multicopy Memory, December 18, 1992.

Moyer, Steven A.. Access Ordering Algorithms for an Interleaved Memory, December 18, 1992.

Moyer, Steven A.. Access Ordering Algorithms for a Single Module Memory, December 18, 1992 (Revised).

Orlandic, Ratko; Pfaltz, John L.. Q0-trees: A Dynamic Structure for Accessing Spatial Objects with Arbitrary Shapes, December 6, 1991.

Pfaltz, John L.; French, James C.; Grimshaw, Andrew. An Introduction to the ADAMS Interface Language: Part I, April 17, 1991.

Pfaltz, John L.. Programming over a Persistent Data Space, August 19, 1992.

Haddleton, Russell F.. Storage of Numeric Data in ADAMS, August 07, 1992.

Pfaltz, John L.; French, James C.. Multiple Inheritance and the Closure of Set Operators in Class Hierarchies, June 25, 1992.

Pfaltz, J.L.; French, J.C.; Whitlatch, J.L.. Scoping Persistent Name Spaces in ADAMS, June 28, 1988.

Reynolds Jr., P.F.; Weight, C.; Fidler II, J.R.. Comparative Analyses of Parallel Simulation Protocols, December 6, 1989.

Reynolds Jr., P.F.. A Spectrum of Options for Parallel Simulation, September 9, 1988.

French, J.C.; Pratt, T.W.; Das, M.. Performance Measurement of a Parallel Input/Output System for the Intel iPSC/2 Hypercube, January 21, 1991.

Brown, D.E.; Huntley, C.L.. A Practical Application of Simulated Annealing to Clustering, February 1990.

Pfaltz, J.L.; Son, S.H.; French, J.C.; Baron, P.K.; Kirks, D.J.; Orlandic, R.. Basic Data Concepts in ADAMS, November 30, 1987.

Orlandic, Ratko; Pfaltz, John L.. Compact O-Complete Trees: A New Method for Searching Large Files, January 26, 1988.

Cleary, Thomas P.. A Relational Interface to an Object Based System or Translating SQL to ADAMS, August 8, 1991.

Moyer, Steven A.. Performance of the IPSC/860 Node Architecture, May 1991.

Charlesworth, Arthur. A Characterization of Associativity, November 1990.

Huntley, C.L.; Brown, D.E.. Parallel Genetic Algorithms with Local Search, revised July 11, 1991.

Charlesworth, Arthur. The Nondeterministic Divide, November 1990.

Dickens, P.M.; Reynolds Jr., P.F.. SRADS with Local Rollback, January 22, 1990.

Reynolds Jr., P.F.; Dickens, P.M.. Spectrum: A Parallel Simulation Testbed, March 12, 1989.

Pfaltz, J.L.; French, J.C.; Grimshaw, A.; Son, S.H.; Baron, P.; Janet, S.; Lin, Y.; Lloyd, L.; McElrath, R.. Implementation of the ADAMS Database System, December 11, 1989.

Baron, Paul K.. The ADAMS Preprocessor, December 4, 1989.

Janet Jr., Stanley A.. The ADAMS Storage Management System, August 10, 1989.

Pfaltz, J.L.; French, J.C.; Grimshaw, A.; Son, S.H.; Baron, P.; Janet, S.; Kim, A.; Klumpp, C.; Lin, Y.; Lloyd, L.. The ADAMS Database Language, February 28, 1989.

French, James C.. A Global Time Reference for Hypercube Multicomputers, October 10, 1988.

Pfaltz, John L.. Implementing Set Operators over Class Hierarchies, August 5, 1988.

Son, Sang H.; Pfaltz, John L.. Reliability Mechanisms for ADAMS, March 20, 1988.

Research Memos

Coleman, Clark L.; Davidson, Jack W.. Experiments in Data Placement, June 30, 1998.
Compilers currently place data objects at memory addresses that are determined by the order of the declarations of the data objects. Experiments confirm that cache misses can be reduced by rearranging the order of data objects, reducing conflict misses, and by aligning array objects on cache line boundaries, reducing compulsory misses slightly.

Coleman, Clark L.; Davidson, Jack W.. Experiments in Data Colocation, June 30, 1998.
If two data objects have non-overlapping lifetimes during all possible executions of a program, they could be assigned to the same addresses in memory. Experiments confirm that this can reduce certain pressures throughout a memory hierarchy.

Benitez, Manuel E.. Register Transfer Standard, March 19, 1991.