Why program failure messages are so reliably bad.
The program passed every test I threw at it; it probably has some corner cases I didn’t handle properly but it seems pretty complete.
That is to say, it is pretty complete when I give it good input. But if I give it something incorrect, like “2 + × 3”, it simply fails. It doesn’t crash or explode; it just gives back nothing at all. There’s no kind of error message or diagnostic. Helpful error messages, it turns out, are much more difficult.
The reason error messages are hard is fairly simple. There is one obvious unambiguous thing to do with a good piece of input, but give a parser (or any program) bad input and the “right” thing to do is difficult to define. Humans want error messages to tell them “how” they went wrong. The simple answer (“your input didn’t make sense”) doesn’t do that. For something like a parser we could pretty easily say “unexpected symbol in position 3: ‘×’” and maybe even add in “expected a number instead” for more clarity, but that doesn’t really convey the scope of the problem. Maybe I forgot a number between the “+” and the “×”, or maybe one of those two was an accident, or maybe I wanted “2+” to mean “two or more”, or…. The options are endless.
For parsers the problem of getting a sensible error message isn’t that bad. For most other programs it is a lot worse. Error messages are notoriously cryptic: most error messages I see amount to “error some cryptic number has occurred while trying to do the thing I was trying to do.”
Many error messages come from a conversation like this:
“What if x is negative here?”
“It shouldn’t be.”
“Yeah, but what if it is?”
“I don’t know. Something went wrong, I guess. Put in an error message.”
Many more come from not having this conversation at all. If at some point in the program I mistakenly assume x is positive but it might not be, who knows how much later it will be before the things I computed based on that assumption mess something up.
In addition to the difficulty of discovering every possible error and the difficulty of describing what went wrong in an understandable fashion, there is a strong economic reason good error messages are rare. Why spend time (and thus money) working on making things fail nicely when you could spend that time working on making things work nicely? One might as well run a deli that specializes in flavorless sandwiches, that never drip or spill stuff on your lap. Until failure becomes so pervasive that it is the common case there’s little incentive to fix it.
This is one reason I spend an hour writing a parser instead of fifteen minutes downloading one someone else wrote. I know how mine works. I designed it with the thing I plan to parse in mind. If it fails, I know where to look for the failure. When I get a cryptic “error message 1138” it isn’t cryptic to me: I wrote that error message, I know how to interpret it. Just this morning I had such an error: my scripture-reformatter deleted some spaces between short words when I asked it to give me a clean copy of the book of Moses to annotate in preparation for institute class this evening. Because I wrote the reformatter, I knew there could only be three places that this error could happen; five minutes later I had it fixed. Yet another reason I am glad I know how to program.
I expect good error messages will remain scarce for the foreseeable future. Because I know how hard they are too create, I also take particular pleasure when I see a good one. It is nice to know that some programmers still take thought in building solid foundations.