If they wrote it in a book, it MUST BE GOOD CODE! Or How many mistakes can fit into 100 lines of book tutorial code. Part 1

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

So, there is no Wyoming. And remember: if they said it on television, it MUST BE TRUE!

— Garfield the Cat, 'It MUST BE TRUE!' episode —

There is a common (mis)perception out there that whatever is written in a book, is to be trusted implicitly. While there are books and authors to be trusted, generalizing it to any book to contain only good code, is about as prudent as relying on “If they said it on TV, it MUST be true” statement¹

Unfortunately, the code in some of the books shouldn’t be taken as an example of “how to write code”. Quite recently, I ran into a book with so many different and subtle mistakes within a few pages, that I decided to write about them – both as an illustration of the idea “take everything with a pinch of salt, books included” and as an illustration of subtle mistakes to be avoided.

¹ For those having any doubt about this statement, see the incredible episode “It Must Be True” by Garfield the Cat, with a piece of it available on YouTube: [Garfield]).

I will try to avoid nitpicking too much (not that I need it for the code which we will be discussing), with the main criteria to classify something as a mistake, being “if you see something like this in your code, you should fix it”. It doesn’t include purely stylistic disagreements ², and other situations where the same result can be achieved in several different ways, with no clear advantage with any one of them. However, if it is something which may hurt you in the long run – I will treat it as a mistake, with a few issues earning special recognition qualifying as “style issues” or “almost-mistakes”. The balance between the two is not obvious, and you might disagree about the severity of some of these – but in general these are things to be avoided.

It should be noted that the mistakes in this book are subtle enough to get past testing, and if you’re lucky enough (and your code is not used enough), any of them can hide without manifesting itself, for a while.

As with any mistakes which are able to hide from testing, it is even more important to spot and remove them ASAP, before they bring your system down at the very worst moment

And a necessary disclaimer: this article doesn’t imply that my own code is free from this kind of stuff (except for the most egregious things); nevertheless, these are things to be avoided, and when there are too many of them per square inch, the book turns from “how to write code” into “how NOT to write code”, which in turn may make book’s value to the reader negative 🙁 .

² such as “what is the right way to put square brackets” or “whether to use f(void) or f() for function declarations”

Let the (un)fun Begin!

There are three hundred and sixty-four days when you might get un-birthday presents,
and only one for birthday presents, you know.

— Lewis Carroll —

Due to the number of mistakes involved, this article has been split into two parts, one part of the article per each of two book pages. Together, we will consider four pages (two sheets) out of the book; these pages contain exactly 100 lines of code (not counting empty lines, comment-only lines, and lines which consist of a lone curly bracket). Here are the scans of the first two pages of interest from the book [ProgrammingMultiPlayerGames] (these two pages containing 42 lines of code):

NB:To see the image more clearly, please click on it
NB2: The book is about games, and (at least when it comes to socket stuff) covers both Windows and Unix, so reader can expect that cross-platform issues are taken into account
NB3: No books have been harmed in the production of these scans (the effect of the pages being torn out is photoshopped)

So, let’s take a look at the class dreamMessage³

³ IMHO given the number of mistakes, the body of class dreamMessage would be more suitable for class nightmareMessage : private dreamMessage

Mistake #1: Violating the Single Responsibility Principle

private:
  bool overFlow; //NOBUGS: see Mistake #4
  int maxSize;
  int size;
  int readCount; //NOBUGS: see Mistake #1

public:
  char *data;
  char outgoingData[1400]; //NOBUGS: see Mistake #2

Single Responsibility Principle states that every class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility— Wikipedia —The code fragment above lists data members of the class dreamMessage. When I see this code, I immediately feel that it contains way too many things to be within a single class. More detailed analysis reveals that indeed, there are two separate concepts which are improperly intermingled here. The first one is the message itself, and the second one is a process of parsing of this message.

Parsing the message is a separate thing from the message itself (to illustrate this fact, it is sufficient to note that nothing – except for the code above – prevents us from having two or more parsers for the same message, running simultaneously). And if not for parsing included into the class dreamMessage, we wouldn’t need that readCount variable.

Therefore, according to a Single Responsibility Principle, it would be a significantly better design decision to separate class dreamMessage and class dreamMessageParser, moving readCount to the dreamMessageParser object. It would make the code significantly more readable, using the class much more straightforward, etc. etc. Moreover, we’ll see an example why it was a Bad Idea to violate this principle in this very specific case, in the second part of the article in Mistake #11.

Mistake #2: 1400-byte Optional Field as a Struct Member? Gimme a break!

IMNSHO Initialism of In my not-so-humble opinion— Wiktionary —A second thing which makes me uneasy about the dreamMessage member data, is 1400-byte outgoingData buffer as a struct member. Moreover, it is accompanied with a data pointer, which may or may not point to outgoingData, and if data doesn’t point to outgoingData, we don’t use these 1400 bytes at all.

“Throwing away 1400 bytes for no reason for every single network message is an outrageous wasteWhile someone may argue that this in the days of multi-gigabyte RAM 1400 bytes is a minor thing, I have a very different point of view. IMNSHO, the waste of memory on the scale of kilobyte-per-object is one big reason why we still have swapping on our multi-gigabyte-RAM boxes (despite the fact that these boxes are not doing 1000x more than multi-megabyte-RAM boxes were 15 years ago). While I agree that going after every single byte is not necessary for modern PCs, but throwing away 1400 bytes for no reason for every single network message (and in an average game you can have a lot of these messages flying around simultaneously), is an outrageous waste, and should be fixed.

Mistake #3: Lack of Encapsulation

From a very different perspective, this arrangement of data pointing out to outgoingData or elsewhere (with both of them being public!), is very unclear, difficult to read, and error-prone. It lacks proper encapsulation both formally and substantively. Proper encapsulation is violated formally, because size of data is maxSize, with data being public, and maxSize being private. It is also violated substantively, because keeping track of “where does data point to for this instance of the dreamMessage” is a Really Bad Idea.

To fix mistakes #2 and #3 in one shot, I’d suggest to have a perfectly encapsulated “owning” pointer to a dynamically allocated data (how to implement it – is more or less a matter of taste, what is important is that the ~dreamMessage() destructor deletes any memory allocated for this purpose).

Mistake #4: using a Data Member to indicate Error Condition

The overFlow data member is used for the purposes of marking the dreamMessage as “having had an overflow”, making it essentially invalid. This is quite poor design practice, especially for C++ (and class dreamMessage is undeniably C++). In C++, instead of this technique, exceptions should be used for exactly this purpose. The only excuse not to use exceptions for such an obvious error condition is when your compiler doesn’t support exceptions at all, but this was a very rare thing even back in 2004, when the book was published.

Style-Issue: Declarations Separate from Initialization

char *dreamMessage::GetNewPoint(int length)
{
  char *tempData;//NOBUGS: Declaration without initialization

  //Check for overflow
  if(size + length > maxSize)
  {
    Clear();
    overFlow = true;
      //NOBUGS: Falling down in case of overFlow is DANGEROUS! see Mistake #5
  }

  tempData = data + size;
  size += length;
  return tempData;
}

“In C++, combining declarations with initializations is long considered to be a Good Thing.There is no reason whatsoever to have the declaration of tempData to be separate from its initialization in line #13 (ok, except if you need to inflate the number of lines in your code, or the number of pages in your book). While this qualifies as a style issue, in general separating declaration and initialization is less readable and is therefore undesirable. In C++, combining declarations with initializations is long considered to be a Good Thing. To make it entirely clear – I am not arguing for initializing tempData in line #3; I am arguing for moving tempData declaration down to line #13, where it is first used.

This style issue will be repeated at least 5 times in 100 lines of code under consideration.

Mistake #5: Potential Memory Corruption

The erroneous design decision to use overFlow to indicate the error condition (see Mistake #4 above), has led to an outright bug in the dreamMessage::GetNewPoint() function. In particular, if the length parameter in a GetNewPoint(length) call is larger than dreamMessage::maxSize, then GetNewPoint() will return a pointer to an insufficient buffer. As typical calling pattern for GetNewPoint() in the rest of the book (and also what is to be expected by developer-using-class-dreamMessage) is along the lines of

memcpy(GetNewPoint(length),data,length)

, it means that in this case we have outright memory corruption.

Having this kind of stuff in your code is like having a time bomb: it will explode, and for any sizeable project if will usually happen sooner rather than later (unless, of course, “later” is the worst possible time for the explosion to happen)

Now to the question “how should it have been done?”. The answer is simple: without the overFlow data member, and simply throwing a C++ exception in case of insufficient buffer.

Mistake #6: Unnecessary Dependency leading to Tight Coupling

void dreamMessage::AddSequences(dreamClient *client)
{
  WriteShort(client->GetOutgoingSequence());
  WriteShort(client->GetIncomingSequence());
}

The AddSequences() function introduces a mysterious class dreamClient to the scope of class dreamMessage, with some of its details (such as the prototype of GetOutgoingSequence()) being exposed to dreamMessage. This creates a dependency of class dreamMessage on class dreamClient, which in turn (given that class dreamClient also has a dependency on class dreamMessage) leads to rather tight coupling between these two classes for no reason.

The AddSequences() function doesn’t need access to the dreamMessage internals, and should be moved to class dreamClient. As class dreamClient needs to know about class dreamMessage anyway, moving this function to class dreamClient doesn’t create any new dependencies, while removing it from class dreamMessage does remove an unnecessary dependency (and reduces unnecessary coupling).

Almost-Mistake: memcpy() of a single byte?

void dreamMessage::WriteByte(char c)
{
  char* buf;
  buf = GetNewPoint(1);
  memcpy(buf, &c, 1);
}

While some might say that memcpy(buf, &c,1) is perfectly legal code, and it indeed is, I still insist that it is substantially less readable, and – depending on the compiler – might be less efficient than the simple and straightforward

*buf = c;

so I still suggest to replace memcpy() with it.

However, this kind of usage is not of the nature of “you should fix it regardless of your feelings about it”, and there MIGHT be some reasoning behind using it, so I’ve classified it as an “Almost-Mistake”; in any case, we’ll have much more important things to worry about down the road. Also note that replacement *buf = c; is valid only for a single byte (and for shorts, longs etc. it would cause potential alignment issues, see, for example, [GameDev]); this is one more reason to treat the original memcpy() as an (arguable) matter of choice.

This almost-mistake will be repeated twice in our 100 lines of code.

Mistake #7: There is No Guarantee that sizeof(short) is Always 2

We’ve come to a 3-line (essentially 2-line, if we don’t count the unnecessary separate buf declaration) function, which has more than one serious mistake. Let’s look at these 2 lines in detail.

void dreamMessage::writeShort(short c)
{
  char *buf;
  buf = GetNewPoint(2);
  memcpy(buf,&c,2);//NOBUGS: Assuming that sizeof(short) == 2
                   //NOBUGS: Also assuming that endianness of data on the wire, and
                   //        that of the current platform is the same
}

The first big problem with this function is that it assumes that sizeof(short) is always 2 (the same thing happens with assuming that sizeof(long) is always 4, sizeof(float) is always 4, etc.). This is not guaranteed at all, and might change depending on the platform and compiler.

“On 64-bit Unixes, the so-called LP64 model is very common, with sizeof(long) being 8Having this mistake means that if this code is to be ported somewhere else (even for a Windows-only program such ports may include x64/another compiler/XBox) – you may well be out of luck (and proving that all the sizes are always exactly the same for all the compilers for all the platforms is much more difficult than following the simple advice below). It becomes even worse as the book specifically discusses Unix machines, where compilers and data sizes vary greatly (as one example, on 64-bit Unixes, the so-called LP64 model is very common, with sizeof(long) being 8).

Avoiding this mistake is simple – use int16_t instead of short (and appropriate types in other such places, including using uint8_t instead of char in WriteByte()).

This mistake will be repeated 8 times in our 100 lines of code (including 4 times in dreamMessage::Read*() functions)

Mistake #8: Disregarding Endianness

Endianness Endianness is the ordering... of bytes of a word of digital data in computer memory storage or during transmission.— Wikipedia —While using memcpy() within WriteShort() does avoid issues with alignment (as noted above, having *buf=c instead of memcpy() except for WriteByte() would be Yet Another Mistake due to alignment), it doesn’t avoid problems related to endianness. Even programs intended purely for Win32 may run on the platforms with a different endianness (in particular, both Xbox and Windows RT have been reported to be big-endian, and Who Knows what will happen in the future). If this happens, if one system communicating is little-endian, and another one is big-endian, the code above will return wrong values. To avoid issues with endianness, a simple approach would be to write endianness-agnostic code such as:

*buf++ = (uint8_t)c;
*buf = (uint8_t)(c>>8);

For further discussion on endianness (and on more efficient implementations for specific platforms), see [GameDev].

This mistake will be repeated 6 times within our 100 lines of code (including 3 times in dreamMessage::Read*() functions)

To be Continued…

We’ve found quite a few mistakes (to put it mildly) in first 42 lines of tutorial(!) code taken out of the book; we’ll continue with analysis of the rest of our 100 lines of code in the next post. Stay tuned!

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Comments

Nicholas "LB" Braden says

August 27, 2015 at 4:48 am

Nicely written, I await the later parts! Also, thank you for suggesting use of int16_t and related types – I wish this was a more well-known thing. I almost never use raw primitive types, instead using size_t and other related typedefs to ensure I always use the correct type for the situation.