If they wrote it in a book, it MUST BE GOOD CODE! Or How many mistakes can fit into 100 lines of book tutorial code? Part 2

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

So, there is no Wyoming. And remember: if they said it on television, it MUST BE TRUE!

— Garfield the Cat, 'It MUST BE TRUE!' episode —

There is a common (mis)perception out there that whatever is written in a book, is to be trusted implicitly. As we’ve seen in the previous part of this article, it is not always the case. In particular, we were able to identify 8 different mistakes, one style issue and one “almost-mistake”, all within 42 lines of code from the book, and presented as tutorial (!).

Today we continue with the further 2 pages (containing 58 lines of code) from the same book. This will bring the total number of lines to 100, and we’ll be able to answer the question from the title of these 2 articles – “how many mistakes can fit into 100 lines of book tutorial code”.

So, let’s take a look at the next 2 pages out of the very same book [ProgrammingMultiPlayerGames]:

NB:To see the image more clearly, please click on it
NB2: No books have been harmed in the production of these scans (the effect of the pages being torn out is photoshopped)

Mistake #9: Missing const qualifier

void dreamMessage::WriteString(char *s) //NOBUGS: missing 'const' in declaration
{
  if(!s)
  {
    return; //NOBUGS: see Mistake #10
  }
  else
    Write(s, strlen(s) + 1);
}

In the declaration of the dreamMessage:writeString() above there is one thing missing: it is const qualifier (the function should have been declared as WriteString(const char *s)).

In general, const qualifiers should be used whenever semantically applicable (both in C and in C++), and WriteString() represents a very obvious case for it: parameter of WriteString() is not expected to be changed (neither it is changed), and marking its parameter as being const leads to the code being more clear both to compiler and to developer (and also prevents accidental changes which violate the const contract). Reasoning behind using const is widely available, and we won’t discuss it in detail here; those interested in the reasoning behind, can refer to [SutterAlexandrescu].

Actually, this is not the first place where this mistake (missing const qualifier) has occurred within the code: it has already been observed in dreamMessage::Write(void *d, int length).

Mistake #10: NULL parameter value causes Different Message Format

“While the detection of pointer parameter being NULL is almost universally a good idea, having it silently ignored might lead to difficult-to-find bugs.While the detection of pointer parameter being NULL is almost universally a good idea, having it silently ignored might lead to difficult-to-find bugs. This alone would qualify this code as an almost-mistake, but in the WriteString() function above, the situation is significantly worse.

If NULL value is passed to WriteString(), then a message is generated (with no indication of any error whatsoever), but the format of this message-when-WriteString()-parameter-is-NULL will be diffferent than the format-of-the-message-when-WriteString()-parameter is not NULL. More specifically, when parameter is NULL, there will be nothing written to the message, not even a ‘\0’ character, so when the parser on the receiving side calls ReadString() to parse the output of our WriteString() call, it will read the data beyond what-was-written-by-this-WriteString() call, with generally unpredictable results. In general, such a practice is extremely error-prone and will lead to very-difficult-to-find bugs.

Instead of silently returning here, good practice would be to throw an exception, or at least (in the extreme case, bordering with a mistake) to have an assert()¹ or (once again bordering with a mistake) – to write a single ‘\0’ char to the message, to keep the message format correct regardless of the function input values.

¹ Note that C assert() is not a good thing to use for conditions which can realistically happen in production, so exceptions are clearly preferred where possible. Stay tuned for a separate post on subtle relations between the two and suggested practices

Mistake #11: Implicit Prohibitions and Implicit Modes

void dreamMessage::BeginReading(int s)
{
  size = s;
  readCount = 0;
}

“It means that if I'm intermixing calls to Write*() and BeginReading() functions - which is intuitively ok from my perspective as a developer-using-dreamMessage - my code is likely to fail in an unexpected-to-me way.Within the dreamMessage::BeginReading(), there is a very subtle mistake, which originates from a mix-up between a message and its parsing (see Mistake #1 in the first part of the article). The problem here is that the data member size is re-used for both reading and writing the message; most of the time it has the semantics of “size of the message”; however, for the BeginReading(int) function this semantics becomes broken, and the message size begins to be controlled externally (and arbitrarily). It means that if I’m intermixing calls to Write*() and BeginReading() functions – which is intuitively ok from my perspective as a developer-using-dreamMessage² – my code is likely to fail in an unexpected-to-me way. The failure (from my intuitive perspective) will be resulting from an unexpected behaviour that after the call to BeginReading(int), not only position-until-which-Read-will-parse-the-message will change, but also a point where new data will be appended to the message, which is a counterintuitive side-effect of BeginReading(int) call.

Turning on my mind-reading machine, I can guess that the author of class dreamMessage didn’t think that such a usage scenario is allowed, i.e. his assumption was that the the caller may not intermingle Writes and Reads. Such a restriction would be fine as such, but then it needs to be enforced, so that the developer who is doing something wrong with dreamMessage, gets notified about The Horrible Mistake in His Ways as soon as possible. With this restriction in mind, we can say that our message can be in one of two states – “being read” or “being written”, with certain operations prohibited in each of these states. As I’ve said above, the problem here is not about having this restriction, but about having it implicit and non-enforced. To make the dreamMessage solid in this regard, these requirements need to be made explicit, by having a separate data member state, and by asserting (or by throwing an exception) whenever a function-inappropriate-for-this-state is called.

In practice, to avoid this problem in the case of dreamMessage, a much better approach would be to address Mistake #1, and to separate the parser from the message (adding a parameter parse_size to the parser to provide functionality of BeginReading(int)). However, in some other cases, making a previously-implicit state explicit (and asserting when the contract between the class user and the class itself is violated) can be a Really Good Idea (which will save a lot of trouble for developers-using-your class down the road).

² A LOT of developers will think “why shouldn’t I be able to parse my own message in the middle of it being composed?” at least until explicitly told otherwise

Mistake #12: Returning the Pointer to Static

char *dreamMessage::Read(int s)
{
  static char c[2048];
  if(readCount+s > size )
    return NULL;
  else
    memcpy(&c, &data[readCount], s);
      //NOBUGS: potential memory corruption (see Mistake #13)

  readCount += s;

  return c;//NOBUGS: returning pointer-to-static
}

Unlike the previous mistake, there is nothing subtle about this one: it is very obvious. As you can see, this function returns a pointer to the static array c. And returning a pointer to a static is a really bad coding practice which should be avoided (in most of the code, including network-related one).

One big reason to avoid returning a pointer to static, is that such code will break really badly as soon as you try to use it in a multi-threaded environment (and as the book is about multi-player(!) games, multi-threaded stuff might be needed, at least in the network part of the code). More generally, returning pointer-to-static means that using dreamMessage::Read() is not reentrant.

If the previous reason is not sufficient, another reason to avoid such practice is that it makes the code counter-intuitive and error-prone (for example, a second call to Read() will change the data-which-has-been-read by a previous call to Read()).

K&R C In 1978, Brian Kernighan and Dennis Ritchie published the first edition of The C Programming Language. This book, known to C programmers as 'K&R', served for many years as an informal specification of the language— Wikipedia —Bottom line: avoid returning pointers to the static data at all cost; while this practice has been used in times of K&R and since that point has found its place in the C standard library, it has been discouraged for at least the last 15 years or so (with reentrant versions of such functions, such as asctime_r() and strtok_r(), specified in more recent versions of POSIX).

In this particular case, the caller could have provided a buffer (and buffer size(!)) where the result should be placed, or a pointer to some kind of allocated buffer might be returned, or even could have used something like std::vector<uint8_t>.

This mistake will be repeated twice in the 100 lines of code under analysis.

Mistake #13: Potential Memory Corruption

In the dreamMessage:Read() above, if the parameter s is > 2048, a buffer overwrite (likely causing a memory corruption) will occur. To make things worse, this “s MUST be <= 2048” requirement is based on an arbitrary size of the buffer buried deep within function implementation.

If you really need to have a fixed-size buffer, have the decency to throw an exception or at least (bordering with a mistake) to assert()³ when its size is exceeded. Relying on the caller always being sane is a really bad idea (and makes life of the caller – when debugging her code – Much Much Worse than it should be).

³ see footnote about assert() above

Mistake #14: A Little Marvel: Abusing Return Values

char dreamMessage::ReadByte(void)
{
  char c;
  if(readCount+1 > size)
    c = -1;//NOBUGS: '-1' is a legal value for a char
  else
    memcpy(&c, &data[readCount], 1);

  readCount++;
  return c;
}

There is an error-handling code in this function, but there is a subtle but really bad problem with it. When the caller is trying to read beyond the size, the function detects it. So far, so good. But what does the function do then?

It returns -1, and the return type is char, which means that if there was a byte -1 written to the message by the sender, the return value of this function becomes indistinguishable from the error return value.

These return codes are normally used (elsewhere in the book) in a manner such as:

//from ReadString() function:
char c = ReadByte();
if(c == -1 || c ==0)//NOBUGS: See Also Mistake #14a
  break;

It means that if the string contains a perfectly valid character with ‘-1’ value, the parser will interpret it as an end of the string. Moreover, it will leave the parser in an inconsistent state, causing further parsing to read invalid data.

In the C standard libraries, there is a practice of returning ‘-1’ on error, and it is fine – as long as ‘-1’ is not a valid return value. For example, getc() function does return ‘-1’ (more precisely – EOF), but the return type of getc() is int (not char), and all valid chars are returned as positive values, which makes the function sane (unlike dreamMessage::ReadByte() above).

While expanding the return type of ReadByte() would fix the bug, in C++ it’s better to use exceptions and avoid thinking about this kind of stuff (while simplifying error handling for the caller too).

This mistake will be repeated 4 times on our 100 lines of code torn out of the book.

Mistake #14a: Honourable Mention: Relying on Char being Signed

Actually, the code right above (with a comment “from ReadString() function”), has one more bug (thanks to Tobi for mentioning signed/unsigned char issue, which has lead to discovering this mistake). The problem with this code is that comparison if(c==-1) will work properly only if char c is a signed char. Otherwise (if char is unsigned), it will be cast to int, and after this cast it will be equal to 255, so that comparison with -1 will never work.

As C standard doesn’t specify whether char should be signed or unsigned (and that even worse, there is a compile-time switch in most compilers to affect char being signed or unsigned with default behaviour differing from one compiler to another one⁴), such code is incorrect. While this bug is not, strictly speaking, in our 100 lines of code under review, it still clearly deserves a honourable mention in our bug parade; on the other hand, it can be seen as a code issue even while staying strictly within ReadByte() function.

Fortunately, avoiding such bugs is easy – by using int8_t (or uint8_t if you prefer your type being unsigned) instead of char. In practice, I usually prefer to keep my own typedef byte8 as uint8_t.

⁴ According to [StackOverflow], by default char in gcc is signed, unless it is Android NDK, where char is unsigned, and by default in MSVC char is signed; for all these compilers, behaviour can be changed via compile-time switch

Mistake #15: Life after Death

core dump In computing, a core dump (in Unix parlance), memory dump, or system dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has terminated abnormally (crashed).— Wikipedia —In addition to the abuse-of-return-values described above, the very same dreamMessage::ReadByte() function has yet another potentially-coredumping mistake (though chances of this core dump happening are not too high in practice). More specifically, whenever the size is exceeded, ‘-1’ is returned. However, as the function doesn’t simply return ‘-1’, but also increments readCount. Then, if the caller repeats these calls to ReadByte() long enough, it will eventually cause readCount to wrap around. And after readCount wraps around, it will go into the negative area, so the error detection condition won’t fire, and the code will fall through to the line #7, likely causing a read from inaccessible memory (which in turn usually leads to core dump).

The best would be to throw a C++ exception in case of buffer overread.

This mistake will be repeated 4 times on our 4 book pages; note that the simple ‘return -1’ instead of ‘c = -1’ is not exactly correct for functions such as ReadShort() etc., as it might lead to reading the discarded data; for functions such as ReadShort(), the correct non-exception version would be something along the lines of { readCount = size; return -1; } – though I strongly recommend using exceptions instead.

Mistake #16: Magic Number

Magic Number The use of unnamed magic numbers in code obscures the developers' intent in choosing that number, increases opportunities for subtle errors, and makes it more difficult for the program to be adapted and extended in the future.— Wikipedia —In the dreamMessage:ReadByte() above, ‘-1’ is a typical “magic number”, and magic numbers are to be avoided. The reasons are numerous (see, for example, [Martin] and [Stroustrup]), and while I am myself particularly guilty of using magic numbers in my code, I admit that it is indeed a mistake (well, at least when the magic numbers go beyond the implementation details of one single class).

This mistake is repeated four times within 100 lines of code under review.

Conclusion

We’ve managed to find 16 different mistakes (with 39 instances of these mistakes), one style-issue and one almost-mistake (with 7 instances in total), all this within 100 lines of supposedly tutorial code. I think it perfectly illustrates my point that not everything from the books (or even my blog posts for that matter) should be taken as an example of “how to write software”.⁵

⁵ If you see a mistake in my code on this blog – please let me know, I’ll fix it. I am really interested in making this blog as mistake-free as possible (even more interested than in further stroking my own ego 😉 ). With the book, especially a printed one, your options are usually more limited 🙁 .

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Comments

Tobi says

September 2, 2015 at 5:40 am

#14b: char is not guaranteed to be signed leading to compiler warnings/errors

- "No Bugs" Hare says
  
  September 8, 2015 at 9:20 am
  
  Thanks a lot for mentioning signed/unsigned char issue! Actually, in the code above it is even worse than compiler warning – it will lead to incorrect behaviour (see description of Mistake #14a, which I’ve added).
  
ben says

September 4, 2015 at 3:57 pm

Nice post, and contains some good wisdom.

Of course “mistakes” can be subjective, but I’d argue that asserts shouldn’t be used for checking assumptions which are outside the control of the module they’re in (assuming that we’re talking about debug-only asserts which are compiled-out in release).

There were a couple of places where you suggested using asserts to protect against conditions which could realistically happen in production.

For example, in Mistake #13, you suggested using an assert to check that a fixed-size buffer hasn’t been exceeded. This is asking for trouble, because it will only get checked if the person using your code happens to have written a test case for this scenario, or tries it on their development machine one day.

The chances are they didn’t write this test, and using an assert makes their life Much Much Worse than it could be, because they’ll have a production memory corruption to diagnose.

Assumptions which are outside the control of the module they’re relied on should be checked in release as well as debug builds (unless there’s a very strong performance consideration). Asserts should be used for checking assumptions within the module boundary.

What do you think?

- "No Bugs" Hare says
  
  September 8, 2015 at 5:47 am
  
  While you’re right, and one shouldn’t use C-style assert() to detect conditions which could realistically happen in production – the Big Question is: “how to define ‘realistically’?” – and it is all about probabilities of such a call happening in a specific project (which in turn depend on the coding style). The reason why I’m not all for writing exceptions by hand – is because doing so is bulky and poorly readable (the code becomes too cluttered with error handling). What I generally prefer – is to have PROJECTASSERT() macro with syntax which is identical to C assert(), but which throws an exception instead of aborting – this way the code becomes very readable and self-documented on the one hand, and doesn’t carry assert( )/NDEBUG caveats. Probably I should write a separate post on asserts and exceptions 🙂 .
  
  As for this post – I will add a footnote to this effect, though I hope you agree that having assert() is still much better than original code without any checking 🙂 .
  
Dennis Brown says

September 11, 2015 at 12:07 am

WriteFloat and ReadFloat assume sizeof(float) == 4. This might be a usual case for modern processors, but there is nothing in the C standard that says a float is 4 bytes long. If sizeof(float) happens to be 8 bytes, then you have saved half a float, which is probably a wrong answer when you take it out of the buffer, but if the extra 4 bytes happen to be zeros, and are the least significant bytes, it might be only a slightly wrong answer, which is probably worse than an obviously wrong answer.

Likewise, ReadShort and ReadLong assume a short is 2 bytes and a long is 4 bytes. Not correct for 64 bit processors.

- "No Bugs" Hare says
  
  September 11, 2015 at 4:22 am
  
  You’re perfectly right, and this issue has been discussed in some detail in Part I of the article (Mistake #7). I’ve avoided mentioning mistakes every time they appear in the code, as it would make the whole thing much longer than necessary 🙁 .