Monday, October 11, 2010

Deutsch's criteria for fixing bugs

Peter Deutsch did a lot of things, such as coming up with a JIT Smalltalk VM. There are other smaller things, like the 8 fallacies of distributed computing, or his criteria to evaluate bug fixes. I couldn't find a reference to the bug criteria, though. Thus, in short, Deutsch states you fix a bug when:

  • you can completely explain how the bug occurs, and
  • you can prove the change you make addresses the verifiable cause of the bug.
Unfortunately, it is common to hear claims of having fixed a bug just because "I made a change and the bug went away, therefore I addressed the source of the bug". In other words, an instance of the post hoc ergo propter hoc logical fallacy.

This type of fallacy comes up quite often. For instance, when working with C, you can make a C pointer aliasing problem evident by changing code that is far away from where the issue manifests. I found a case of this phenomenon not long ago. The inclusion of code that would be optimized away to nothing in one function affected register allocation in some other function that had nothing to do with the first one. In the second function, you would see a bug.

Should you claim that changing the first function fixes the bug in the second one? Or that the change in the first function somehow controls a compiler bug? Or would it be better to determine the source of the bug, in which case you can be sure that the change you make actually addresses the problem?

Alas, since sometimes these investigations take a long time, you often see things like "well I changed the compiler flags and that made the bug go away, therefore it's a compiler bug". Maybe, but you have to prove it, not merely state it. In the above case, the real issue was the source code was in violation of the C99 spec, which defines what the C language is to begin with. Clearly, as far as the compiler is concerned, the "bug" was a case of garbage in => garbage out. But, of course, as soon as you fix the source code so it does not rely on pointer aliasing, the compiler magically produces the intended code regardless of the optimization level. Sigh...

6 comments:

Anonymous said...

Glad to see this post. As someone who dabbled with implementing a Smalltalk VM in distant past, I was very much aware of Peter's contributions to Lisp and Smalltalk. Regretfully, I could never get hold of his paper on JIT.

I didn't know about his bug fix criteria. One may not believe this but, I have witnessed several bug fixes where exceptions thrown were "fixed" by either catching and ignoring them silently or commenting out code that threw them. And this wasn't in any no name company either - rather an industry leader where I worked in the past.

Unfortunately, programming is being practiced increasingly as a fine way to earn one's living by people who have little to no understanding of what it is about. Far too many people are just skilled in the use of IDEs and design patterns and think they have mastered the craft of programmer.

Andrés said...

> Unfortunately, programming is being practiced increasingly as a fine way to earn one's living by people who have little to no understanding of what it is about. Far too many people are just skilled in the use of IDEs and design patterns and think they have mastered the craft of programmer.

I agree... "oh, the browser has more refactorings in its context menu, consequently this is a superior tool and I write better code with it". :)... yeah right.

Andrés said...

Hey, what VM did you work with?

Eliot Miranda said...

Hi Andrés,

I think you've stated it slightly incorrectly. The criterion is that one hasn't fixed the bug until you've explained what the bug is and how the proposed change addresses it. Here's an extract from a message from Steve Dahl formulating the principle:

"Remember that Peter never expressed it (in my hearing) as a formal principle, but he just kept asking the engineer what bug caused the symptom until they went back and explained it to his satisfaction. Or ran off in a snit.

A symptom is not a bug. Fixing a symptom doesn't mean you've fixed the bug. Until you can explain what the bug is, and demonstrate that the hypothesized bug really is occurring, and is causing the observed symptom, you haven't fixed the problem."

You can shorten this to until you've truly explained the bug you can't fix it, you can only obscure its existence.

Andrés said...

Thanks Eliot, I tried grepping around for the comment but I couldn't find it and so I had to rely on the oral tradition to figure it out. But yes, that is the intent I remember. It's totally not ok to just change stuff until the program "appears to work"...

Andrés said...

Hmmm... as an after thought, we really need to write these kinds of things down. This oral tradition business sometimes makes me think we ended up losing a whole generation of Smalltalkers in part because they were not there when we were talking, or we didn't say the right thing when they were nearby...

So, what else is there? What do you think we should write?