Best Practices vs Witch Hunts - IT Hare on Soft.ware

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

The road to Witch Hunt is Paved with Good Intentions

In any field, software development included, there are lots of well- and less-known best practices. As a rule of thumb, best practices are very useful and in general should be followed. Unfortunately, way too often, developers are starting to take best practices as gospel, and/or become obsessed with them. This turns these best practices into outright witch hunts. In this article, we’ll discuss what the mechanism behind witch hunts is, and why they’re Really Bad Things.

Life cycle: from best practice to witch hunt

Let’s see how best practice usually evolves (YMMV, but usually the picture is rather similar):

A few teams start (usually independently) using a certain practice within their projects.
The practice becomes more widespread.
Somebody publishes the practice as a ‘rule of thumb’, usually with rationale behind, and with certain cases of applicability exceptions.
The practice is officially named as ‘best practice’ by some authoritative source. The rationale is usually still present, but some of the applicability exceptions are often lost due to the lack of space.
The practice becomes pretty much universal. At the same time, both the rationale and the applicability exceptions are gradually forgotten.
“This triggers the natural feeling that the ‘zealot’ is right in his fight for the best practice ideals. Then some people (‘zealots’) emerge, who think that ~~the Earth will stop turning~~ project is hopelessly deficient if there is even one single case of the best practice being violated. Neither the rationale nor the applicability exceptions are taken into account.
When such a ‘zealot’ lays his hands on a real-world project, he starts to enforce the practice mercilessly.
At first, the ‘zealot’ normally starts with eliminating the least controversial of the best practice violations. Among other things, this means that at this stage he usually eliminates violations which are in line with the rationale and in conditions where the applicability exceptions do not apply. As a result, his efforts at this stage tend to benefit the project in general, which is usually more or less obvious to the whole team. This triggers the natural feeling that the ‘zealot’ is right in his fight for the best practice ideals.
As a natural consequence, only a few people dare to challenge the ‘zealot’ with his further fight for the best practice. Also, a substantial chunk of the team starts to believe that this specific best practice is the Absolute Good Thing. And this is the point where the best practice effectively becomes a witch hunt.
Applying this specific best practice is never questioned, and both rationale and applicability are completely ignored. In the name of this best practice, pretty much anything can be done within this team. In particular, it includes violating any other best practices (especially those lacking their own ‘zealots’ in the team). All kinds of weird things can and will happen in such teams (for practical examples, see below). As an additional benefit, if there are two best practices with ‘zealots’ behind them in the same team, any kind of conflict between these best practices can result in a ‘holy war’.

Wait, but it is still a best practice, isn’t it?

The best practice is useful only if all considerations (including both rationale and applicability) are carefully considered. Without taking these considerations into account, applying the best practice often leads to conflicts with other best practices, which in turn can easily lead to grave consequences.

What can possibly go wrong when applying a best practice?

“Naming something best practice doesn’t really change its impact on the projects Naming something best practice doesn’t really change its impact on the projects. As with pretty much anything out there and despite the name, best practices are not absolute virtues. Rather, they are ‘rules of thumb’, with all kinds of considerations which need to be taken into account. The very name ‘best practice’ is misleading – a more appropriate name would be ‘usually a best practice’. Below are a few examples of ‘best practices’ which in application went wrong at some point.

Example 1. Mild example – magic numbers

Use of ‘magic numbers’ (using unnamed numerical constants) is generally frowned upon in the programming community. An associated best practice is to replace them with named constants. This best practice exists for two good reasons.

Rationale:

Better readability
Better maintainability in case the constant changes

So far, so good, but recently I’ve run into discussion where the author has asked if in the code

kbytes = bytes / 1024;

1024 is a ‘magic number’ and should be replaced with the named BYTES_PER_KBYTE. Moreover, there were quite a few people saying that 1024 is indeed a ‘magic number’ which should be replaced. However, I contend (and most of the practical programmers I’ve spoken about it agree) that this is wrong, and that 1024 should be left as is (unless an exception applies as described below).

To realize the reason why this best practice is wrong in this specific case, one should take a look at the very reasons behind it. Note that in our case, both reasons behind having this best practice do not apply. First, bytes / 1024 is more readable than bytes / BYTES_PER_KBYTE, and second, the chances of BYTES_PER_KBYTE ever changing are infinitesimally small (ok, unless you’re into nitpicking between kilobytes and kibibytes, and are planning to change the semantic meaning of the term KBYTE in the context of your project, which would be a Very Bad Thing per se).

This means that BYTES_PER_KBYTE is not a good thing to have, and the plain 1024 is generally preferable. However, as we are speaking about applicability, this observation is also not an absolute, and there are possible exceptions. One such exception arises if you need a consistent way of displaying your bytes to the end-user. In this case, a constant such as BYTES_TO_DISPLAY_KBYTES might make sense (and it may change later, which justifies its existence), but the generic BYTES_PER_KBYTE is still pretty much useless.
Ok, this was a very mild example of a witch hunt, with very mild negative implications.

Example 2. Stronger example – memory leaks

Those dealing with C/C++ are familiar with memory leaks ¹. Common best practice is to avoid them at all costs.

Rationale:

To avoid using unnecessary memory which can exhaust computer/process resources

Applicability Exception:

All memory allocated via malloc() will be freed on program exit anyway, so calling free() right before program exit is not necessary.

“However, should we always aim to eliminate all of them?One of the common methods used for detecting memory leaks is using some kind of tool (such as Valgrind for Linux and the built-in debugger for Windows) which runs on program exit and lists all the blocks which were allocated via malloc() but were never deallocated via free(). It is indeed a very useful tool, and it is very good best practice to run this tool and eliminate as many of these memory leaks as possible. However, should we always aim to eliminate all of them?

More than once in my programming career, I have needed to allocate a once-per-program buffer (for example, for logging purposes) which was used by all the program threads. When I faced this task for the very first time 20 or so years ago, I tried to deallocate this buffer properly on program exit. However, in some weird scenarios² deallocating it caused some race conditions, which once-in-100-or-so exits have caused a program crash (thread writing to an already-deallocated buffer) for no real reason.

That time, I spent almost two weeks trying to fix this elusive problem (which kept reappearing after each fix in a new form). It continued as a kind of weird ding-dong battle until I asked myself: what would change if I don’t deallocate this buffer at all? The whole rationale of fighting memory leaks doesn’t really apply when we’ve already decided to exit the program, as in a few microseconds it will all be freed anyway (and in a much more efficient manner BTW). As soon as I realized this, the problem went away for good, and I was able to throw away a few dozen of rather weird synchronization lines of code which I wrote when trying to fix the problem by other means.

In this example, I myself was guilty of witch hunting, though I’m humbly asking the sentencing judge to consider my inexperience at that point as a mitigating circumstance.

Our next example is more severe, and would involve breaking the code as a result of a witch hunt.

Example 3. Breaking code while fighting -Wall

This one I’ve seen myself more than once. In some project, there is a perfectly valid C code, which involves some implicit casts (like unsigned-to-signed of the same size, or well-defined integer truncations/promotions). All these casts are perfectly well-defined in C, and lead to perfectly well-defined results both in theory and practice. The code is reasonably readable too. In short, there are no (zero, nada) problems with the code whatsoever.

“Then, our ‘zealot’ starts to get rid of these overzealous warnings ... by merely inserting explicit casts as he sees fit.Then, a ‘zealot’ comes in and adds –Wall to the compiler command line. Right away, the compiler starts to complain about these casts (why the compiler does it is beyond me, but this question is not in the scope now). These warnings are overzealous, as the code is well-defined and standard-compliant.

Then, our ‘zealot’ starts to get rid of these overzealous warnings – not by disabling them, and not even by thoughtfully adjusting types so there are less conflicts, but by merely inserting explicit casts as he sees fit. He needs to insert dozens of such casts, and he doesn’t pay much attention to what the-code-he-changes really does (he’s on the job of eliminating warnings, and he has much more to do, anything else is merely an obstacle on his way to achieving the Greater Good of –Wall without warnings).

As he makes a mistake when inserting casts, the previously perfectly valid code becomes broken. Even worse, the code might be broken in fringe cases which may go unnoticed for a while. Not to mention that code with tons of explicit casts becomes much less readable. I rest my case and ask the jury to sentence our ‘zealot’ to a life of abstinence from programming.

Now to our next, and final, example of witch hunting.

Example 4. The ultimate example – Debian RNG disaster

Once upon a time, there was a library named OpenSSL. As a part of it, they used a random number generator, which in turn, had two lines of code which were using uninitialized memory. It wasn’t a real problem, as the whole idea was about gathering as much entropy as they could get their hands on, and uninitialized memory couldn’t possibly hurt them.

“Nothing changed, except that all the supposedly private keys generated by all the Debian-based distributions, became easily guessable Then, an overzealous Debian supporter, in a holy fight to eliminate all Valgrind warnings, commented out these two lines of code [Mason08] [Schneier08] [Kroll13]. Nothing changed, except that all the supposedly private (and secret) keys generated by all the Debian-based distributions (yes, these do include the all-popular Ubuntu), became easily guessable until this bug was fixed (which took over two years). Worse than that, when the problem was discovered, it meant that all the SSL exchanges between such Debian-based distributions, suddenly became retrospectively vulnerable (i.e. if somebody recorded them, he’d be able to decrypt them now based on the nature of this vulnerability).

The whole incident was at the time the worst security incident in the entire open source community, and it resulted precisely from an overzealous developer trusting tools and making changes to eliminate warnings without understanding the context or the potential implications.³

It is interesting to note that when analyzing the disaster, all kinds of explanations (read: excuses) were given for it happening, including the outright ridiculous, “The OpenSSL code was too clever by half” [Cox08]. Very few sources (if any) have mentioned the real culprit: an obsession with the enforcement of best practices, which unfortunately too easily becomes a witch hunt. BTW, I’m not trying to condemn the person who made the change – but rather the whole culture where such witch hunts (without taking into consideration related circumstances) are considered the Right Thing To Do.

I’m scared, what should I do?

There are at least two lessons to be learned from this article. The first one is:

Whenever you’re in doubt about applying a best practice – think about the rationale behind it. If the rationale doesn’t apply in your case – forget about the best practice, it doesn’t apply here.

The second lesson applies to much more dangerous witch hunts within existing code:

Whenever there is the slightest risk that by enforcing a best practice on existing code, you’ll break it – think more than twice before going ahead with your change. And no, the change being just commenting two lines and having no apparent downsides, is not guaranteed to be safe.

Or, re-stating the same thing in a shorter manner:

Merciless refactoring is good, merciless refactoring having no clue about what you’re doing, is not.

¹ While those writing in garbage-collected languages may not be familiar with memory leaks and even think that there can be no memory leaks in JVM, it is wrong at least for so-called semantic memory leaks [NoBugs12]

² AFAIR, most of the trouble was about threads being not terminated for various reasons, and it does happen when dealing with network stuff and you don't want your user to wait forever for no apparent reason.

³ The argument that he tried to ask OpenSSL team and asked in the wrong place, and who’s responsible for it happening (which has resulted in lots of fingerpointing between Debian and OpenSSL at the time), is beyond the scope of this article.

[+]References

[+]Disclaimer

Acknowledgements

This article has been originally published in Overload Journal #125 in February 2015 and is also available separately on ACCU web site. Re-posted here with a kind permission of Overload. The article has been re-formatted to fit your screen.

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Comments

The_Assimilator says

March 16, 2016 at 6:51 am

“The argument that he tried to ask OpenSSL team and asked in the wrong place, and who’s responsible for it happening (which has resulted in lots of fingerpointing between Debian and OpenSSL at the time), is beyond the scope of this article.”

When you’re using the OpenSSL debacle as “The ultimate example” of “witch hunting”, the full backstory is certainly in scope. The truth is simple: the developer who originally wrote the randomising code in OpenSSL is responsible for it being broken because s/he didn’t do the minimum of due diligence by *adding a comment to explain why the code was written like that*.

If that had been done, it’s almost certain the code wouldn’t have been changed, and OpenSSL wouldn’t have been broken for two whole years. (The fact that it was broken for two years, without anyone noticing, is an entirely different kettle of fish and a damning indictment of the OpenSSL development team and their reviewing and testing processes, or rather lack thereof.)

I’ve been coding for a decade and I can tell you that if I had come across that same piece of uncommented code, asked the devs about it and got no reply, I would also have deleted it. I think most developers who are conscientious and refactor as they work would do the same. Does that make us all witch-hunters?

I think not, which makes using OpenSSL as an example of witch-hunting invalid, which weakens your argument substantially.

- "No Bugs" Hare says
  
  March 16, 2016 at 3:56 pm
  
  > The truth is simple:
  
  This line is already a very religious one. Whoever says that “there is only one single truth” – is very likely guilty of religious sentiments about the subject. And while religion doesn’t necessarily imply witch hunts – it does much more often than it doesn’t.
  
  > Does that make us all witch-hunters?
  
  If you’re changing the code without understanding what it is doing – yes, it certainly does.
  
leammas says

October 4, 2016 at 9:48 am

I’ve suspected C++ related developers to be the most clever ones and now I see it’s true. The article is real fresh air among dozens of “look ma, another best practice” out there. Thanks!

- "No Bugs" Hare says
  
  October 4, 2016 at 3:29 pm
  
  We’re not the most clever ones – it is just these days, with most of COBOL and plain-C guys retired, we tend to be the most experienced ones 😉 .