The bits and pieces of a modern computer program.

My goal in this series of posts is to explore the challenges inherent in re-using solutions in programming. To do that, I need to describe how we get from source code to a running application in modern programming environments. That’s probably something that’s useful in its own right and and a post I should have written in my Unlocking Programming series.

From Code to Executable

When I write a program I actually write text called source code. I save that text to a file Almost universally an ASCII-encoded file with no header, though that is starting to change a little bit. . I then run a program called a compiler that takes that file as input. The compiler usually pulls in several other source code files and translates the combination into a file in another programming language. Most compilers go through several intermediate languages and sometimes utilize several other programs along the way, but eventually I end up with machine code (along with some metadata) in a file called an object file.

Machine code is suitable input for a computer chip, as it is in the format that tells the chip what to do with data. Object files, however, are not ready to be given to my chip. First I need to give them to a linker which uses the metadata to pull together several object files and related files called static libraries and creates an executable file. Executable files, like object files, contain both machine code and metadata.

Executables are the files we think of as “‍programs.‍” For the most part, we can treat them like finished products ready to be shared with whoever wants them.

From Executable to Running Program

When I go to run an executable file there are still several steps that need to be handled. The exact details differ by operating system, but the basic outline is the same.

First, the operating system uses some of the metadata in the file to set up what’s called virtual memory. Virtual memory is an abstraction that allows a program to pretend like it has full access to all of your computer’s memory and completely ignore the fact it is actually sharing memory with every other program you are running. Part of the executable metadata includes instructions on where within the virtual memory to put the machine code, where to place initial data, etc.

Next, the operating system pulls into the virtual memory machine code contained in some dynamic libraries. Typically this is done by playing with the virtual memory structure so that all of the programs that access the same dynamic libraries can have their virtual memory map to the same single copy in the physical memory of your computer. This is one of the places you can have problems running a program: if the user doesn’t have the same dynamic libraries as the programmer, this step will fail.

Finally, the operating system tells the chip to switch to the program’s virtual memory and start running the program.

While Running

Programs never run in a vacuum. Every 10–100 milliseconds the operating system will suspend a running program, do some internal bookkeeping, and switch the processor to another program. In addition, the program can make system calls, switching back to the operating system to ask it to do something for it.

Programs can also change themselves as they run. The most common way this is done is with a system call that asks the operating system to load more dynamic libraries into the program’s virtual memory. Less commonly, programs can change their own machine code as they run.

Bytecode and Interpreters

Some programs (increasingly many, it seems) are not, in fact, programs at all. Java and C#, for example, both compile only half-way down, stopping at an intermediate step called bytecode. This byte code is then used to control the execution of a program called a bytecode interpreter (the JRE for Java, .NET for C#) that simulates much of the traditional tasks of both chip and operating system. Others, like Lua and Matlab, skip compilation altogether: the source code itself is used as an input to an interpreter program.

At some point the distinction between programs and data becomes fuzzy. When I make something bold in a word processor I’m adding a piece to the file I’ll save that instructs the word processor to change the typeface weight. That kind of instruction differs only in complexity from the instructions in source code, yet it is almost universally seen as data, not programming.