On safe testing of student submissions

Computer security is a big topic full of complicated problems, and one on which I am far from an expert. However, as is usually the case with big, complicated topics, restricting the problem space can both make the underlying issues easier to see and viable solutions easier to design.

Consider the following common scenario:

Students are asked to write programs that do X and upload them to a server
Teacher runs student-submitted programs, compares their behavior to X, and writes results to file A
Repeat with tasks Y, Z, etc.

Now, an unscrupulous student who cares about grades and not learning could attempt to rig the system by writing a program that doesn’t do X: instead it writes fake results directly to file A. Or, if they can’t find file A maybe it emails the student a copy of all of the submissions of other students so they can copy other students submissions and resubmit. How can a teacher prevent these behaviors? The following are some of the techniques I have used as a teacher.

Hide and Seek

The first technique is to put files in locations where I do not expect the student to be able to find them. This is harder than just naming my directories nonsense like “‍KE42wDGpEeSTBote8WlazzlY8‍” because it is relatively easy for a program to discover the name of the directory in which it is located. The usual technique I’ve used is something like

copy one student’s submission to a temporary directory
run that student’s code
compare results to expectations and record grades
erase the contents of the temporary directory
repeat with next student’s submission

I have yet to have a student break out of this system, but that does not mean it is actually secure. A determined student could probably make find a way to list the entire directory structure, helping them seek out what they want.

One principle of hide-and-seek that applies also to other security situations: if you let the attacker try multiple times they have a significant advantage because they can use early tries to gain information and delay attempting to use that information until later.

Drop privileges

The second technique is to disable various abilities prior to running the program. A simple example of this is to physically disconnect from the Internet: that way if the code tries to send information to the student there is no network connection over which to send it. Other examples include creating a user account that is not permitted by the operating system to access most directories and running the code as that unprivileged user. These and other privilege dropping practices can cause various categories of undesired behaviors to fail when attempted. Often that failure can also generate error messages which can be used to penalize the code that attempted the banned behavior.

There are two main problems with dropping privileges. The first is that it is hard to drop enough of them to ban everything that you don’t want to have happen. The second is that the granularity of control is sometimes to coarse to both allow the code to do what it is supposed to do and also prevent it from doing similar but undesirable things. Turning off the Internet is not an option when the student code is supposed to access the Internet; finding a way to turn off “‍the bad parts‍” is an active area of security research.

An additional practical issue facing some teachers is that they are often running on computers maintained by someone else and their accounts often do not have the privilege of dropping privileges. The reasons why a system administrator might want to limit who has privilege-dropping privileges are not germane to this post; the fact that they do is enough to add a challenge for some teachers that is independent of technical ability.

Black-list actions

If you know that student code that does X has the potential to bad things, you can either screen for or block X directly. You can do this with a full-text search of the submitted code for dangerous phrases like import urllib or by providing dummy implementations of prohibited tools such as replacing the usual file opening routines with ones that do nothing or create an error message.

Black-listing has roughly the compliment of privilege-dropping’s problems. It is fine-grained, letting you block very specific actions, but being fine-grained it is hard to cover all important cases. It can also be tricky to implement, but tricky because of the technical nuances of implementation rather than because of external permissions. That said, I have frequently found it to be an effective way to block undesirable behavior.

Incidentally, I usually find that if I can blacklist I can also stub: replacing some actions not with error-producing alternatives but instead with faked-but-plausible reactions. Having “‍list directory‍” return a plausible directory structure that does not actually exist can be useful both to test corner-case behavior and to keep would-be attackers busy with bait instead of meat.

White-list actions

One of my go-to approaches when I need to run low-level code is a white-list: rather than picking a few actions student code can’t do I pick a few it can do and block everything else. This makes it easy to reason about security: if the permitted actions can’t do it, it can’t be done. It is also surprisingly easy to implement in most languages; it generally suffices to do something like replace all identifiers in their code with wrapper identifiers (e.g., replace foo with sandboxed_foo) and then selectively pick a few identifiers I want to bring back with things like typedef sandboxed_int int. Functions can also be handled; for example, I can make sandboxed_fopen a new function that checks the file name being opened first and only calls the real fopen if the file name is one I’m ok with them opening.

White listing is powerful and can be made very secure, but it also requires you to know in advance the full set of things a student is permitted to do. For many small assignments, this is fairly straightforward; but for some larger assignments it is not as simple. Additionally, there is non-trivial up-front costs to creating a whitelisting wrapper for each language.

Inspect, not run

Another technique I use in some cases is to not run code at all, but rather to write programs that inspect the code and reason about its behavior. Reasoning about code behavior is not always possible A direct result of Rice’s Theorem is that for any given reasoning system there exists either broken code the system will identify as working or working code it will identifiy as broken or code it will fail to understand at all. but for many student assignments the scope of solutions that will be created are simple enough that effective analysis is, in fact, possible. I have used automated code inspection for assignments given to hundreds of students without even one of them finding solutions the analyzer could not handle.

The seminal challenge with code inspection is that it is intellectually challenging to design and implement and effective code analysis engine. Almost every kind of “‍this is hard‍” proof I’ve seen in computing applies to some aspect of this problem. Additionally, there are comparatively few toolsets available to help and many of those that do exist are poorly documented or difficult to manage. That said, there is not more certain way to prevent untrusted code from doing things you don’t want it to do than to never run the code at all.