How to write better multiple-choice questions.

Good at taking Bad tests

At some point many years before I was writing my own quizzes and exams I was told that each multiple-choice question should have four answers: one that is right, one that is plausible but wrong, one that is pretty far-fetched, and one that is ludicrous. I have now forgotten the source of this suggestion and may well be recalling it incorrectly, but it does seem to describe many of the exams I had in college.

Looking at that structure, it becomes evident that someone with very little knowledge will still show a baseline performance of at about 50%. In the traditional American grading scale, 50% is an F, but as it was explained to me that was supposed to mean that students needed to know at least half of the material to pass a course. With the four-question model, that percentage is closer to 10.

All of this is made worse by the mindset many question authors seem to have when creating their answers. I learned early in school that I could often deduce the correct answer by asking myself this meta-question: “‍If the correct answer was X, would my teacher have worded the question this way?‍” Very often, the plausible wrong answer would have suggested a different wording of the question. For example

Q: What was Columus trying to do when he discovered the Americas?

Discover corn and tomatoes

Escape from religious persecution

Find a new trade route to India

See what was out there

For most test writers, we can figure out the answer by the cognitive alignment between the question and each answer.

“‍Discover corn and tomatoes‍” is so unrelated to the rest of the questions it is almost certainly the joke answer.
“‍Escape from religious persecution‍” has a focus that does not match the question. If the answer was “‍escape‍” then the question writer would have been thinking about “‍why did he leave Europe‍” not “‍what was he trying to do‍”: the wording of the question is too destination-centric for this to be the answer.
“‍Find a new trade route to India‍” lines up nicely with the question: it is an objective, and one that the Americas would interrupt.
“‍See what was out there‍” is also objective-centric, but it isn’t fleshed out very well. If this was the answer then the instructor wouldn’t have had to be vague; we would have seen something like “‍settle a dispute about the size of the Atlantic‍” or the like, something that suggested the author knew some details of the answer.

But even more telling, if this was the answer then the question would never even has been asked. I don’t ask “‍what was Edison trying to do when he invented the light-bulb‍” because the answer is “‍invent a light-bulb‍”; similarly I wouldn’t ask “‍what was Columbus trying to do while exploring‍” if the answer was “‍explore‍”.

In other words, a meta-level thinker with a reasonable command of language can get a very good score on tests written in this style without any understanding of the tested topics at all.

Better tests

If your goal in writing a multiple-choice question is to determine if a student understands the material, there are at least two approaches that can help meet this end.

Randomness

Consider the “‍none of the above‍” answer. If you are going to include these, the surest technique I know of that includes them without creating none-of-the-above patterns is to write every question without them and then go back over the questions and for each one (a) add a “‍none of the above‍” answer and (b) roll a die to randomly pick one other answer to remove. If you happen to randomly remove the correct answer, then none of the above becomes the correct answer.

Randomness, in created by some source external to the question author’s head, can remove the meta-thinking ability from exams. You can randomly pick an answer and re-write the question the way you would have if that answer were the answer (e.g. “‍why did Columbus leave Europe?‍”) without changing the answers. When there is a range of answers (e.g. 5, 6, 7, 8) you can randomly pick where in the range given the correct answer will be, adding enough before and after it to make that random choice happen. And so on.

Randomness is a relatively universal way of inhibiting meta-testing, but it can also lead to student frustration and a sense of “‍trick question‍”. It postulate that if you gave one randomization-faired exam to a class that had had other traditional unfair exams, you’d see some of the top-performing students drop to the bottom because a useful skill they had learned over the years suddenly started leading them astray.

Likely Mistake Analysis

In software engineering we talk about how to write tests for software. There are many guidelines used to guide test creation; one of them is “‍consider common mistakes programmers make and write tests that would notice each one.‍” This likely mistake analysis is a powerful tool in writing exam questions.

Consider an arithmetic exam question “‍2 + 3 × 4 − 1‍”. What kinds of mistakes might a student be expected to make?

Make a mistake on precedence, such as
- (2 + 3) × 4 − 1 = 19
- 2 + 3 × (4 − 1) = 11
- (2 + 3) × (4 − 1) = 15
Make a mistake on operations, such as
- 2 × 3 × 4 − 1 = 23
- 2 + 3 + 4 − 1 = 8
- 2 + 3 × 4 ÷ 1 = 14
Guess based on patterns in the options. We now have {8, 11, 13, 14, 15, 19, 23}; the middle value and the only value with both of its neighbors in the set is 14, and that isn’t the answer so we are probably OK.

This kind of analysis is useful in other questions as well, but it is more work than coming up with random alternatives. What misconceptions might a student have about the purpose of Columbus’s sailing trip? Perhaps the student doesn’t know where he was hoping to reach, so we could have a question like “‍Where did Columbus expect to end up when he went sailing?‍” with answers like {“‍India‍”, “‍China‍”, “‍Japan‍”, “‍an island‍”, “‍the edge of the world‍”, “‍a previously undiscovered land‍”, “‍America‍”}. Identifying the misconception(s) before writing the question can greatly reduce the chance that the wording of the question itself gives away the answer.

Fair vs. Easy

A fair test is one where good performance on the test is correlated only with good understanding of course material. Common multiple-choice tests are not fair, because “‍test-taking skills‍” (meaning the ability to analyze the question author’s intentions) are also correlated with success.

An easy test is one where the level of understanding needed to perform well is significantly lower than the level of understanding that the average student has attained. This can be because the test is so poorly written that most students have the ability to meta-test through it, or it can be because the targeted misunderstandings are not ones the students have.

In my observation, many instructors believe that they are giving harder exams than they are in fact giving because they are giving unfair exams. If you take the concepts they thought they were testing and re-write the questions to fairly test those concepts with no ancillary answer-leakage then student performance drops significantly.

High-level Cognition

Often multiple-choice exams stay fairly low in the cognition spectrum. They test memory, association, recall, or fairly simple skill. But there is no need for them to do so.

One technique for building a higher-level-thinking question starts by identifying the cognitive task you want to test. Suppose I want to know if my students understand the idea that arithmetic is made up of numbers and operators, and that each operator needs a precedence and a definition. I could ask them to produce that text, but I could also see how well they can reason by putting it in a hypothetical. For example, I might ask the following:

We learned that × happens before + and that operations happen left-to-right; thus 2 + 3 × 4 × 5 = 2 + 12 × 5 = 2 + 60 = 62. If we had said that + happens first and that operations happen right-to-left the same expression would have given a different answer: 2 + 3 × 4 × 5 = 5 × 4 × 5 = 5 × 20 = 100. If I had failed to give rules, the expression would be ambiguous, meaning we wouldn’t know whether it was 62 or 100.

Suppose I said that × and ÷ happen first, then + and −, but did not give a left-right ordering. Which of the following expressions would be ambiguous?

2 + 3 × 4 − 5

3 + 4 − 5

2 ÷ 3 + 4 − 5

3 − 4 − 5

Or I might ask

Suppose we introduced a new operator Ⓜ that does the maximum of its operands: thus 3 Ⓜ 4 = 4 Ⓜ 2 = 4. Assume that Ⓜ has left-to-right precedence and occurs after × but before +.

What is the value of 1 + 2 Ⓜ 5 ÷ (3 Ⓜ 2) − 1 Ⓜ 3?

−1

−1

3

0

3

Hypothetical are powerful tools for testing ideas independent of memorization because students have not seen them before; thus they have to generalize previously-learned principled and apply them in a new setting in order to answer the question provided.