Avoiding ugly afterthoughts. Part b. Coding for Security, Coding for i18n, Testing as a Part of Development - Page 3 of 3

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

Testing as a Part of Development Process

On unit-testing and TDD

Well, now we’re coming to a really contentious part of our discussion about coding – namely to the role of testing as a part of development. In this area, currently there are two Big Camps: let’s name them “Old School Camp” and “TDD Camp” (a.k.a. “Test-Driven Development Camp”).

“Within an “Old School Camp”, you don’t care about testing at all – that is, until your QA files a bug against you. Within “TDD Camp” (at least with true blue TDD folks), the whole development is all about testingWithin an “Old School Camp”, you don’t care about testing at all – that is, until your QA files a bug against you. Within “TDD Camp” (at least with true blue TDD folks), the whole development is all about testing; in fact, with a true blue TDD you don’t write a real code until you write a test for it first (known as “test-first” paradigm). It is widely argued in literature that TDD projects tend to be developed with less bugs and faster (opposed to be intuitively slower).

I won’t go into a detailed discussion on pros and cons of TDD here, but will note that on TDD, I’m very much with [Hansson]; in short – he loves testing but doesn’t like TDD (when taken literally as a gospel). For argumentation in this regard – you can refer to him, I will just tell a short real-world story.

Once upon a time, there was a company which core business was very much about running an online system (i.e. running inherently distributed software). And it so happened that the software was MUCH more reliable than the rest of the industry; 1 hour of unplanned downtime per year under release-every-4-weeks regime was that good for the industry in question, that technical auditors questioned the number until full logs were provided.

And then a new developer came to the company, and was obsessed with TDD ideals to the point that they became a religion. And he wrote some new code, faithfully following all the TDD teachings (adding quite a few units tests in the process). And after he wrote the code, he has read a long lecture about the advantages of TDD and test-first approach in particular. His speech was finished with something along the lines of “Now you can see how TDD helps to deploy programs guaranteed to be bug-free”.

And then his code was deployed, and caused one of those once-in-a-year downtimes :-((.

The reason for the failure was a very typical one for a distributed system, and hopelessly out of scope for unit testing. It was

an unexpected sequence of otherwise valid inputs¹

Moral of this story is certainly not that the guy was stupid (he wasn’t), and not even that TDD is useless. Moral of the story is all about

unit tests being utterly insufficient for debugging of a distributed system

(and yes, it makes unit-test-driven development pretty much useless for distributed systems – at least those with more than two parties involved).

“These unexpected-sequence bugs become the main and the most annoying source of bugs in a pretty much any distributed system with more than two actors.After you iron out all your obvious bugs (using unit testing, whether it is TDD or QA or whatever-else) – these unexpected-sequence bugs become the main and the most annoying source of bugs in a pretty much any distributed system with more than two actors. And it applies to multiplayer games in spades.

To deal with these unexpected-sequence bugs, I know two options. The first one is to think about this potential problem (and with a bit of experience it allows to save lots of time debugging and refactoring). In this respect, TDD (when taken literally) “leads to an overly complex web of intermediary objects and indirection in order to avoid doing anything that’s “slow”… It’s given birth to some truly horrendous monstrosities of architecture. A dense jungle of service objects, command patterns, and worse.”[Hansson] In turn, this artificial complexity leads to the code being less readable, which actually inhibits thinking process (the one necessary to predict an unexpected-sequence bug in advance).

The second option to figure out an unexpected-sequence bug is indeed testing. However, unit testing won’t help here, not at all. To detect these bugs, we need to test for a thing which-you-were-not-able-to-think-about (!).

However crazy it sounds, it is possible to (try to) test for these things; two techniques which come to mind in this regard, are “replay-based regression testing” (replaying sequences from a real-world app), and “simulation testing” (we’ll discuss both in more detail below).

TL;DR about TDD and unit testing in general:

“Unit testing is by far not sufficient to test a distributed system with more-than-two actors. This applies to multiplayer games in spades.Unit testing is by far not sufficient to test a distributed system with more-than-two actors. This applies to multiplayer games in spades.
More or less the same goes for user-experience-driven “acceptance testing” (at least if it’s run only once or twice). With distributed system as a whole, behavior is not deterministic, so the test which runs ok 99 times, can fail on the 100^th run 🙁
Two things which help in this regard, are replay-based regression testing, and simulation testing.
If understanding “TDD” wider than “unit-test-driven development” (and calling for replay-based testing and simulation testing as a prerequisite for writing code) – it MAY work, though I strongly insist on preventing tests from affecting software design decisions (see above why; IMNSHO, readability trumps pretty much everything else, and ability to make mockups is light years down the list from readability).

In other words – while TDD as such is not a Bad Thing per se, it often results in (a) obsession with unit tests, and (b) reduced code readability in the name of mock-ups etc. Both these things are detrimental, at least for distributed systems (multiplayer games included).

Regression Testing and Continuous Integration

“you DO need automated regression testing, plain and simple. And it SHOULD consist of ALL of the following:Whatever your multiplayer game is, you DO need automated regression testing, plain and simple. And it SHOULD consist of ALL of the following:

Unit tests (though do NOT overplay them; in particular, changing design just to accommodate mock-ups IMNSHO qualifies as a Really Bad Idea™)
Replay-based tests
Simulation tests

Closely related to Regression Testing is Continuous Integration (CI). As it was noted in Chapter IX, CI requires quite a bit of automated testing; arguably the most important type of testing for CI is regression testing.

Replay-based Regression Testing

As we’ve discussed in Chapter V, deterministic event-driven programs can be recorded and then replayed. From testing perspective, it gives us a mechanism to write a sequence of events (from a testing environment, or even from a production one) – and to run it over new code to see whether the code still performs in exactly the same manner.

“The word exactly in the phrase above is both a blessing and a curse.The word exactly in the phrase above is both a blessing and a curse. When your replay testing succeeds – you know for sure that everything is fine, but if it fails – well, you don’t know pretty much anything (in particular, if the new code was introduced into the same event-driven object affecting some different aspect of the same object). On the other hand, with an advent of github with mostly-independent changes, each of the smaller changes MIGHT happen to be testable using replay-based regression against previously-recorded event sequence.

In any case, replay-based regression testing won’t work if your new code changes behavior of the app (more precisely – of the event-driven object you’re testing); however, it does work if the new code only adds a new feature. This means that using replay-based testing as a part of fully automated testing is not usually feasible; however, semi-automated use (based on observations such as “we know that for this module behaviour SHOULD NOT changed since last build, though some new functionality has been added”) is perfectly possible.

The Big Advantage of this kind of testing over usual unit-tests is that you can be sure that during recording phase, your players did all those unusual-event-sequences-you-were-not-able-to-think-about.

Simulation Testing

Another thing which works extremely well for testing of distributed systems, is simulation testing. Usually it comes in one of two flavors: simulating players, and simulating network problems.

Simulating Players

“Running a thousand of players over 100 instances of your game world will tell you MUCH more about those elusive bugs than any kind of hand-written tests.Simulating players is one thing you SHOULD do from the very beginning of your game development (that is, unless there exists a very strong prejudice against it²).

The idea here is that you’re creating a headless client, which does nothing but simulates the-very-dumbest-player (but the one who can still do something meaningful). The idea of this testing is NOT to test UI, nor to test how the players will abuse rules of your game; the idea is to look for all those unexpected-sequence bugs you might still have. Running a thousand of players over 100 instances of your game world will tell you MUCH more about those elusive bugs than any kind of hand-written tests.

¹ it is an open question whether such things qualify as “races”, so for the time being I’ll leave it as an “unexpected sequence”

² one example of such games-with-a-strong-prejudice-against-simulators is poker

Simulating Network Problems

Another kind of simulation testing is related to simulating network problems. As one example – you can setup a box with Linux and netem for this purpose (there are other options out there too). Alternatively (though only if you’re using UDP) – you can build delays and packet losses/reorderings right into your own UDP library.

“The key here is that you SHOULD test your game under close-to-real-world network conditions, and your usual office LAN is certainly “too good to be true” to represent any kind of Internet connectionThe key here is that you SHOULD test your game under close-to-real-world network conditions, and your usual office LAN is certainly “too good to be true” to represent any kind of Internet connection (heck, over the LAN you often have even one-two-one correspondence between TCP recv() and send() calls – the thing which falls apart within two seconds on any real-world Internet connection).

Note that such network latency simulators SHOULD be used to accompany quite a bit of your other other testing (as in “run your simulated players over simulated network delay and see how it goes”). You DO want to be sure that your tests run not only over LAN, but in presence of real-world network issues too.

Wireshark

And while we’re at the issue of network testing – make sure that while your run those network latency simulators, you take a close look at your traffic with a packet sniffer such as Wireshark. In most cases, you will learn a LOT of interesting things about your traffic; even when you think that you know for sure how it SHOULD behave (and even when it seems to work); when dealing with the network stuff, there are lots of strange corner cases which can (and often SHOULD) be optimized.

Bottom line on Testing

A short summary of my personal recommendations for distributed system testing (games included):

DO have automated tests
- …including unit tests (do NOT overuse them though)
- …including player simulation tests
“DO run these automated tests as a part of your Continuous Integration processDO run these automated tests as a part of your Continuous Integration process
DO have semi-automated tests
- …including Replay-Based Tests (ideally – replaying production recordings)
  - if Replay-Based Test fails on the whole new revision – try to run it on separate merges of separate feature branches into your develop branch
- DO test new functionality manually – AND re-run applicable tests when functionality changes. This process MAY be delegated to QA
  - DO record events for this testing, creating a new Replay-Based test case.

Phew. While developing your game, you’ll probably have a dozen of other practices, but I’d say that the above is the very bare minimum you DO need to have.

[[To Be Continued…

This concludes beta Chapter 12( from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 13 on Network Programming.]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Pages: 1 2 3

Comments

Craig J. Bass says

April 5, 2016 at 8:05 pm

I must admit I very much disagree with the views on TDD here, but I certainly agree that readability comes first. I don’t think you have to sacrifice readability for tests. I find readability comes with tests.

https://blog.8thlight.com/uncle-bob/2014/05/19/First.html

http://blog.cleancoder.com/uncle-bob/2015/11/18/TheProgrammersOath.html
I think point 3. is key to me.

- "No Bugs" Hare says
  
  April 6, 2016 at 11:49 am
  
  > I think point 3. is key to me.
  
  I don’t have any disagreements with point 3. However, unit tests, at the very least when applied to distributed systems, are extremely far from producing any kind of proof (and are much more like “false sense of security”); see above about unexpected-sequences stuff, which is dominating non-trivial bug space for any distributed system I know.
  
  And then, given very limited help from unit testing, I refuse to change code just to enable some of the weirder unit tests. If design is bad – it should be changed regardless of tests, that’s it, but if it is tests which dictate design – there is something very wrong in the picture.
  
  > I don’t think you have to sacrifice readability for tests.
  
  As long as you don’t sacrifice readability – I have no problems with tests whatsoever (and encourage them too :-)). The Big Fat Problem with TDD is that way too many people have started to treat it as a kind of religion (see my earlier “Best Practices vs Witch Hunts” article here: http://ithare.com/best-practices-vs-witch-hunts/ )