Annotation-Assisted Lightweight Static Checking

Position Paper for The First International Workshop on Automated Program Analysis, Testing and Verification

David Evans
evans@virginia.edu
University of Virginia, Department of Computer Science

Also available as Postcript.

Abstract

While heavyweight formal methods have shown much promise in academia and remarkable success in industrial hardware projects, they are rarely used in industrial software projects. There are many reasons for this [DillRusby96, Hall96, HollowayButler96], of which one of the most important is the lack of a realistic adoption path between current development techniques and more formal approaches. Our research seeks to provide a few of the steps along that path. Our experience so far with LCLint [EGHT94, Evans96] indicates that lightweight static checking tools may provide an effective way to introduce formal methods into industrial environments.

Background

There is a huge gap between the amount of effort and expertise required to use traditional development tools (such as compilers, integrated development environments, and test scripts) and formal techniques such as Z specifications, model checking, and program verification. On the other hand, a large class of common programming errors can be detected using simpler techniques. Our research explores what can be done with minimal programmer effort, without requiring substantial changes to traditional development processes, and with a tool that requires no user interaction and runs about as fast as a typical compiler. To do this, we have developed LCLint, a tool for statically checking C programs. Our approach is general enough to use with other source languages (in fact, we have built a prototype for a similar tool for Java), but we focus primarily on C for pragmatic reasons: it is a small and simple language, and still widely used in industry in even security-critical applications.

LCLint provides a first step towards the adoption of formal techniques and mechanical analysis. If minimal effort is invested adding annotations to programs, LCLint can perform stronger checks than can be done by any compiler or standard lint. Adding these annotations is the first step on the path to using formal analysis techniques. LCLint checking is designed to provide a clear and cost-effective payoff for any effort spent adding annotations. Some of the problems that can be detected by LCLint include: violations of information hiding; inconsistent modifications of caller-visible state or uses of global variables; memory management errors including uses of dead storage and memory leaks; and undefined program behavior. LCLint checking is done using simple dataflow analyses. This means the checking is as fast as a compiler, and LCLint can easily be introduced into standard development cycles.

As one would expect, LCLint's performance and usability goals require certain compromises to be made regarding the checking. In particular, we believe that it is reasonable to sacrifice soundness and completeness towards these goals (see [Evans96] for a more complete justification). While this would not be acceptable in many environments, it is a desirable tradeoff in a typical industrial development environment where cost-effective detection of program bugs is the overriding goal. LCLint has been in active use for more than five years, and has been used by thousands of programmers in both industry and academia.

Current Directions

Our current work focuses on extending this approach in two directions: enhancing the functionality of LCLint without relaxing the usability and efficiency requirements by adding support for user-defined annotations, and providing the next step toward heavyweight formal methods by introducing more expressive annotations and automated run-time checking.

User-defined Annotations. Currently, LCLint users are limited to a pre-defined set of annotations and associated checking. This works well as long as their programming style is consistent with the methodology supported by LCLint (e.g., abstract data types implemented by separate modules, pointers used in a limited systematic way), but is problematic if they need to check a program that does not adhere to this methodology. For example, LCLint provides annotations for checking storage that is managed using reference counting. An annotation is used to denote an integer field of a structure as the reference count, and LCLint will report inconsistencies if new pointers to the structure are created without increasing the reference count, or if the storage associated with the referenced object is not deallocated when the reference count reaches zero. If a program implements reference counting in some other way (for example, by keeping the reference counts in a separate lookup table), however, LCLint provides no relevant annotations or checking. More generally, applications have application-specific constraints that should be checkable statically. These could include value consistency requirements, event ordering requirements, and global constraints.

We are investigating extensions to LCLint that address this problem by supporting user-defined annotations. Programmers will be able to invent new annotations, express syntactic constraints on their usage, and define checking associated with the user-defined annotation in different contexts. Our approach is to devise annotations for use in detecting buffer overflow errors, and then to generalize the checking semantics associated with those annotations and the annotations already provided by LCLint in such a way that allows all of the annotations to be expressed using a general meta-annotation language.

Towards Heavyweight Formal Techniques. The performance and usability requirements of LCLint inherently limit the kinds of checking that can be done as well as the claims that can be made about a checked program. We can consider overcoming these limitations by relaxing these requirements. One approach would be to require more complete specifications and use theorem-proving technology to perform more complex checking. This is similar to what has been done by the Extended Static Checking (ESC) project at Compaq SRC [DLNS98]. ESC uses a theorem proving technology, which enables it to detect a larger class of errors than can be done by the simple dataflow analyses done by LCLint. This, however, increases the costs of the analysis considerably, and makes it difficult to use ESC on large programs.

Another approach would be to use a combination of static and run-time checking. If LCLint is not able to prove statically that a specified property holds, it could insert run-time checks to ensure the property holds at run-time. This has the considerable disadvantage that an error may still occur at run-time, but it would allow more complex properties to be guaranteed without requiring the user expertise and effort typically required of a program verification system. Our experience using Naccio to transform programs to enforce a safety policy described using a general, high-level language [EvansTwyman99, Evans99] offers a possible approach to automatically inserting run-time checking in programs. A related project is the Assertion Definition Language (ADL) created by Sun Microsystems, X/Open and the Information-technology Promotion Agency (an agency of Japan's MITI) [Sankar93, Obayashi98]. The ADLT tool generates run- time assertions and test cases from ADL specifications and test data annotations that provide descriptions of test sets.

Static checking has numerous well-understood advantages over the combination of run-time checking and automated testing, but cannot enforce many useful properties that can be easily enforced using run-time checking. We believe that a well-integrated combination of run-time and static checking is likely to be successful in introducing heavyweight formal methods into industrial environments. Our current efforts are directed towards integrating the lightweight static checking done by LCLint with run-time checking.

Availability

More information on LCLint and source code and binary releases is available at http://lclint.cs.virginia.edu.

Acknowledgements

LCLint grew out of the Larch Project at the MIT Lab for Computer Science and DEC Systems Research Center the led by John Guttag and Jim Horning. The work described in this paper is being done by David Evans, John Knight and David Larochelle at the University of Virginia.

References

[DLNS98] David L. Detlefs, K. Rustan M. Leino, Greg Nelson, James B. Saxe. Extended Static Checking. Compaq SRC Research Report #159, December, 1998.

[DillRushby96] D. Dill and J. Rushby. Acceptance of Formal Methods: Lessons from Hardware Design. IEEE Computer, April 1996.

[EGHT94] David Evans, John Guttag, Jim Horning and Yang Meng Tan. LCLint: A Tool for Using Specifications to Check Code. In Proceedings of the SIGSOFT Symposium on the Foundations of Software Engineering, December 1994.

[Evans96] David Evans. Static Detection of Dynamic Memory Errors. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (PLDI '96), Philadelphia, PA, May 1996.

[EvansTwyman99] David Evans and Andrew Twyman. Policy-Directed Code Safety. In Proceedings of the 1999 IEEE Symposium on Security and Privacy, Oakland, California, May, 1999.

[Evans99] David Evans. Policy-Directed Code Safety. MIT PhD Thesis, October 1999.

[Hall96] Anthony Hall. What is the Formal Methods Debate About? IEEE Computer, 29(4):22-23, April 1996.

[HollowayButler96] C. Michael Holloway and Ricky W. Butler. Impediments to Industrial Use of Formal Methods. IEEE Computer, 29(4):25-26, April 1996.

[Obayashi98] Masaharu Obayashi, Hiroshi Kubota, Shane P. McCarron, Lionel Mallet. The Assertion Based Testing Tool for OOP: ADL2. International Conference on Software Engineering, Kyoto, April 1998.

[Sankar93] Sriram Sankar and Roger Hayes. Specifying and Testing Software Components using ADL. Sun Microsystems, 1993.

David Evans
Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904

evans@virginia.edu
Other Addresses