Overview of Build Systems

Build systems are software tools designed to automate the process of program compilation. Build systems come in various forms, and are used for a variety of software build tasks. While their primary goal is to efficiently create executables, they often try to meet a variety of secondary goals, which at times might be at odds with each other.

Contents

Build System Philosophy

At their core, build systems are a functional based languages mapping a set of source resources (in most cases, files) to a target (executable). The primary assumption of the build system is that each of the build actions are idempotent. That is, each invocation of a build command with the same input and options will create the same output. This assumption allows the build system to memoize the actions that is has already performed, and only perform build actions on resources that have changed.

In order to correctly manage this memoization, the build system must know exactly which resources are being used for any given command. This process can become difficult because many modern build tools include functionality to "include" secondary files while building the primary file. In general this problem can be represented by a dependency DAG (directed acyclic graph).

hello_digraph.dot.png

In this simple example there are three source files, hello.c, hello_io.c, and hello.h. The .h file is included by the two .c files. The compiler takes the .c as input (and also reads in hello.h as part of the preprocessing) and produces the .o files as output. The .o files are then linked together to create the executable. The DAG informs the system what has to be rebuilt when a change occurs. If hello.c is altered everything reachable from that node in the DAG must be rebuilt.

Implicit vs. Explicit Dependencies

As described previously, for a build system to correctly do partial rebuilds of only the components that have changed, it must fully understand the dependencies of the build. Generally, there are two methods to specify the dependencies: implicitly or explicitly. Explicit dependencies are the most straight forward to implement, the relationship is explicitly encoded in the build script. Most build systems have the ability to explicitly specify a dependency, even if they try to find the majority of dependencies implicitly.

Build systems find implicit dependencies a number of ways. First is by file association. Most build tools use common file extensions to identify file types. So, given the file hello.c many build systems will infer a build dependency from hello.c to hello.o. Further, additional dependencies can be inferred from contents of the file. The build system can scan the .c file for #include directives, and then add a new dependency from associated header file to the source file. This technique does not always work, however, because include directives can be guarded by the preprocessor. In general it is undecidable to discover whether a source file will actually include a given header.

Parallelization and Articulation Points

As the discipline of software engineering has progressed, modular programs split into multiple files are standard for large-scale applications. These modules lend themselves to independent compilation, which can be done in parallel. This parallelism can significantly reduce build times, and therefore many build systems provide functionality to issue independent compilation steps in parallel.

However, the ability to build a system in parallel is limited by articulation points in the denpendecy graph. Traditionally, the articulation point for building is the final step, where all the object files are linked into an executable. Some systems, such as Java, defer the link step until runtime, but that only delays the problem. Further, metadata associated with the program must also be correlated with the metadata from all the object files.

Popular Build Systems

For most modern build systems, practicality beats out most other concerns. This section describes some of the popular build systems in use today, along with the code for building the simple example described above. Note, the examples below don't necessarily have the exact same functionality, especially with regard to dependency tracking. Rather, they are the simples versions of the given tool that successfully compiled the hello world program.

Make

Make is one of the original build systems, coming out of Bell Labs. Currently the most popular incarnation of Make is GNU Make. Make allows for explicit mapping between source and target (foo.o: foo.c), as well as general mappings (%.o: %.c). Make also allows for "phony" target, which where the command executed does not actually create the target. This allows Makefiles to perform additional functionality. Further, the GNU extensions to make allow it to be highly customizable, making it able to perform much of the deployment configuration usually reserved to configure. Here is the simple hello example in GNU Make:

OBJS=hello.o hello_io.o
CC=gcc

hello: $(OBJS)
    $(CC) -o $@ $(OBJS)

%.o: %.c
    $(CC) -c $<

Ant

Ant is an xml-based tool, focused on building Java tools. Ant leverages XML to specify its source-to-target mapping. It organizes projects into targets that are created by a series of tasks. Ant focuses on Java building, with premade tools for invoking javac, java, and jar tools. According Ant's homepage, the guiding design philosophy behind Ant was to address multiplatform problems with previous shell-based build systems, ala Make.

Here is the simple hello example in ant:

<project name="Hello">
    <taskdef resource="cpptasks.tasks" />
    <target name="hello">
        <cc outtype="executable" subsystem="console"
            outfile="hello">
            <fileset dir="." includes="*.c" />
        </cc>
    </target>
</project>

The XML makes the ant build file slightly more verbose than other build systems. Because ant was not originally designed for C programs, this example uses the cpptasks extension.

Scons

Scons is a python-based build tool, focusing on flexibility and scripting. Scons SConstruct files are interpreted as python scripts. Here is the hello build in Scons:

env = Environment()
env.Program(target="hello", source=["hello.c", "hello_io.c"])

Scons is focused on flexibility and platform independence, and the cost of performance. Because scons is based on a general-purpose language it offers a wide range of extensible features, including custom builders and file scanning.

CMake

CMake is "cross platform make." It is designed with portability in mind. CMake specifics dependencies in it's own configuration file, which is then used to create "native" makefile in the target environment. Here is the CMakeLists.txt for the example:

add_executable (hello hello.c hello_io.c)

Jam

Jam is "Just Another Make," released by perforce. Jam contains many of the features of make, but is designed to be fast and concise, and solving a number of the traditional problems of Make. The following is the build file in Jam:

Main hello : hello.c hello_io.c ;

Jam is strictly whitespace delimited, without the space between hello and :, Jam would interpret it as a single lexeme hello:.

Other comparisons

This article is by no means comprehensive. Here are a few other build system comparisons available.

Author:Dan Williams (dan_williams@cs.virginia.edu)
Homepage:http://www.cs.virginia.edu/~dww4s
Credits:Html by docutils. Syntax highlighting by pygments. CSS from rst2a.
Last update:05-01-09