Determinism: Requirements vs Features

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[[About Chapter XVIII from Vol.2 of the upcoming “Development and Deployment of MMOG” book. There is no need to worry, Chapter XVIII(b) will follow next week. Stay tuned!]]

Schroedinger's Cat vs Production Post-Mortem, Low-Latency Fault Tolerance, and Replay-Based Regression Testing

As was discussed in a blog post [NoBugs16] a few months ago, determinism can have quite a few important practical uses, ranging from replay-based regression testing, to low-latency determinism-based fault tolerance, with production post-mortem in between.

In the very same post (as well as in Overload [NoBugs15a]) requirements to achieve determinism were discussed; however, one point was left out of the deliberations, and this is the question of ‘what exactly is the definition of determinism our system needs to comply with, to achieve the deterministic goodies mentioned above’. This article aims to provide some analysis in this regard.

First of all, let’s mention that, in practice, at least three different types of somewhat deterministic behavior can be observed; the differences between them are related to changes which can break deterministic behavior.

Types of determinism

Cross-platform determinism – an extremely difficult one

“The most obvious form of determinism (and usually the one which comes to mind when speaking about determinism without specifying further details), is what I call cross-platform determinism.The most obvious form of determinism (and usually the one which comes to mind when speaking about determinism without specifying further details), is what I call cross-platform determinism. A program which is cross-platform deterministic has the following properties.

Definition 1. A program in source code form is considered to be cross-platform deterministic if and only if:

When the source code of the program is compiled by several different compilers across several different platforms, the resulting executable produces exactly the same results given exactly the same inputs.

For those platforms where it cannot produce exactly the same results, ideally such a program shouldn’t compile at all (or at least should fail immediately after being started).

Notes:

This should stand for all acceptable inputs
- Ideally, non-acceptable inputs should be filtered out by the program (for example, asserted or ignored).

If the program is interactive (i.e. it interacts with the world outside itself), all the interactions with the outside world need to be considered as program inputs.
- This also applies to non-deterministic system calls such as ‘current time’; see the discussion on ways to implement this in ‘Deterministic Components for Distributed Systems’[NoBugs15a].

Factors breaking cross-platform determinism

“Cross-platform determinism is the strictest definition of determinism I know; not surprisingly, there are quite a few factors which can break itCross-platform determinism is the strictest definition of determinism I know; not surprisingly, there are quite a few factors which can break it:

CPU compatibility issues. Just as one example – if the CPU has non-IEEE-compliant floating-point arithmetic, it can easily break cross-platform determinism. The same goes for CPUs with bugs (such as an infamous Pentium FDIV bug). NB: even IEEE-compliant floating point per se doesn’t guarantee determinism; see ‘Compiler compatibility issues’.
Compiler compatibility issues. It just so happens that compilers can generate code which produces subtly different results depending on the platform. In particular, some compilers are known to rearrange floating-point calculations – which is not exactly correct (as floating-point addition is non-associative due to non-linear rounding); another example of problems relate to ‘what does the compiler use for intermediaries’ [RandomASCII]. These issues are also known to depend heavily on compiler settings 🙁 .
Runtime library compatibility issues. Even standard libraries leave quite a bit of leeway to implementers (at least in C/C++). Just as one example – if we have an unordered collection (such as unordered_map<>¹), then iteration over this collection doesn’t specify a ‘correct’ order for the items; as a result, two perfectly compliant implementations can produce rather different results, breaking cross-platform determinism as specified above. Floating-point libraries are known to introduce quite a bit of not-exactly-matching behavior too.
C/C++: Reading dirty RAM, and other ‘Undefined Behavior’ stuff.
C/C++: Using pointers for anything except for dereferencing. Especially dreadful in the presence of ASLR (Address Space Layout Randomization), but has been seen to cause severe problems in other cases too.
Multithreaded stuff. As a rule of thumb, multithreaded programs as such are not deterministic. They can be made deterministic by restricting the multithreaded model to certain limited patterns of inter-thread interactions.
1. My (by far) favorite example of a deterministic multithreaded program is having Shared-Nothing Reactors as described in [NoBugs15a] / [NoBugs16], with all the inputs of each Reactor separately considered as program inputs. This way, we make each individual Shared-Nothing Reactor deterministic, effectively removing multithreading from scope.
2. Shared-Nothing Reactor is not the only possible way to ensure determinism. Strictly speaking, even mutex-based inter-thread synchronization can be made deterministic; however, to do it, we’ll need to consider the whole state of the object protected by mutex to be program input at this point, which will reduce the practical uses of this approach to a pretty much empty set.

With such a long list of potential troubles, it is no wonder that achieving cross-platform determinism is extremely difficult (at least for C/C++). In practice, it has been observed that it is items #2 (compiler compatibility) and #3 (runtime library compatibility) which tend to cause the most problems. Item #1 is usually not that bad (though YMMV), and items #4–6 are in our hands, so we can avoid them.

Which leads us to the following observation (which is well-known in gamedev circles):

Achieving cross-platform determinism for a sizeable program ranges from ‘extremely difficult’ to ‘next to impossible’ 🙁

However, taking a look at the list above (and our notes about things which tend to cause the most trouble), we can try to limit our deterministic appetites to the very same platform – and even to the very same executable.

¹ partially ordered containers, such as multimap<>, are reportedly deterministic since C++11

Same-executable determinism – the easiest one

Let’s change our Definition 1 to the following

Definition 2. A program in source code form is considered to be same-executable deterministic if and only if:

When the source code of the program is compiled on a single compiler for a single platform, using the same libraries, the resulting executable produces exactly the same results given exactly the same inputs.

Note: the same notes as for Definition 1 still apply.

“As follows from Definition 2, the same-executable deterministic program is much more realistic to be implemented in practice (and yes, it has been done more than once too).As follows from Definition 2, the same-executable deterministic program no longer suffers from breaking-determinism factors #1 (well, save for an occasional FDIV bug), #2, and #3. This makes it much more realistic for being implemented in practice (and yes, it has been done more than once too).

Same-platform determinism against minor changes – in-between one

To implement some features (mostly this applies to Regression Replay Testing), a same-executable determinism is not sufficient; what we need is something along the lines of the following Definition 3:

Definition 3. A program in source code form is considered to be same-platform deterministic against minor changes if and only if:

It is same-executable deterministic, and

When relatively small changes to the source code are made (creating ‘new’ source from the ‘old’ one), and these changes break determinism in an unmodified piece of code, the number of changes to the source code which are necessary to restore determinism (so that the ‘new’ executable produced with the same platform + compiler + libraries but produced out from the ‘new’ code, behaves exactly as the ‘old’ one with regards to unmodified portions of the code), is relatively small too.

Note: same notes as for Definition 1 still apply.

The second condition in Definition 3 is necessary to deal with scenarios when minor changes to the code break determinism (for example, it may happen because of the compiler using a different reordering of floating-point operations for different executables); however, such occurrences of non-determinism should be identifiable and locally fixable.

Of course, any definition which says something is minor is inherently vague, and yet in practice I’ve seen these kind of things working reasonably well. Usually, it goes along the following lines:

the code is maintained as almost cross-platform deterministic. More specifically, it is written with the intent to be 100% cross-platform deterministic – and as soon as any non-determinism is spotted, it is fixed. This is not that difficult; the real difficulty lies in getting from almost cross-platform determinism to real cross-platform determinism (and the main obstacle to this approach is that spotting rarely occurring non-determinism is difficult, especially when it comes to floating-point stuff – because it doesn’t manifest itself often).

when we have a need to exploit this type of determinism, we’re always working with ‘old’ source code and ‘new’ source code. And if non-determinism is spotted in ‘new’ source – it can (and should) be fixed, just as any with other kind of non-determinism. More on this in the ‘Replay-based regression testing’ section below.

One really simple example to illustrate this might go as follows. In our ‘old’ source code, we have something like

double f(float a, float b, float c) {
  //do something
  return a + b + c; //(1)
}

Usually, the formula is much more complicated than that, but this one will do for our purposes. In fact, the line is highly likely to be non-deterministic but we didn’t spot it (or didn’t care at that point). And let’s assume (just for the sake of defining things more precisely) that the compiler interpreted it as

double f(float a, float b, float c) {
  //do something
  double tmp = (double)b + (double)c; //(2)
  return (double)a + tmp; //(2)
}

Note that while this is a perfectly valid interpretation of our first sample, it is not the only valid one. For example, a compiler might add b and c as floats, and only then convert it to a double, or it might use a different order of additions. Any such variation would produce almost the same –but not identical – results.

As a result, when we change some code near line (1) – for example, the ‘do something’ part, a compiler can rearrange things to use a different kinds of intermediaries (because it has different registers available), or a different order of floating-point additions (just because it felt that it would allow for better use of a pipeline for this specific target CPU). As a result, our new code can start to behave differently from the old one. As the difference is about extreme corner cases, it may or may not pop up during our testing. However, from the point of view of our Definition 3 (and in particular, from the point of view of replay-based regression testing as discussed below) we’re fine in both cases:

if the difference didn’t manifest itself during testing, then for the purposes of these specific tests, our code is still perfectly deterministic (!). In other words, as long as we cannot observe that the program is non-deterministic, in the context of specific input vectors we don’t care about it.

“if the difference did manifest itself during the testing, it can be identified, and the line (1) can be rewritten into two lines (2), making the ‘new’ code deterministic (and consistent with the ‘old’ code too).if the difference did manifest itself during the testing, it can be identified, and the line (1) can be rewritten into two lines (2), making the ‘new’ code deterministic (and consistent with the ‘old’ code too). Strictly speaking, this second property (consistency with the old code) is not guaranteed; however, most of the time finding a deterministic version of the new code which is equivalent to the old one is perfectly feasible.

Deterministic goodies

Now, let’s list those goodies which we can get out of determinism – and see which type of determinism is required for each one.

Deterministic lockstep etc.

Description. One common example of a reason to use determinism (in particular, in games) is to produce exactly the same results across different computers. In this case, it would be possible just to send the same inputs across the network to all the computers (and for games, the inputs are usually very small) and to get all of the computers to run exactly in sync. One notable example of such a protocols is deterministic lockstep [GafferOnGames].

Required Determinism. To make deterministic lockstep (and other similar protocols) work across clients running on different platforms, we need cross-platform determinism as defined in Definition 1 🙁 . Unfortunately, it is rarely possible (and to the best of my knowledge, most such attempts have failed 🙁 ).

Client-side replay

Description. Another common example of determinism-based features (also coming from the gamedev world) is client-side replay. In such cases, we record only the inputs of the game, and then replay it by simply feeding the same inputs to the client.

Required Determinism. To make client-side replay work across clients running on different platforms, we also need cross-platform determinism as defined in Definition 1 🙁 .

Production post-mortem

Description. As described in [NoBugs15a], if we have deterministic Reactor, then we can write a log of all the events for that Reactor. Then, if something bad happens (like a crash or an assert failure), we have not only the current state, but the whole history of the events which led to the crash. We can replay this history in the comfort of a developer’s machine to reproduce the bug 100% of the time because of the behavior being deterministic (and a reproducible bug is pretty much a dead bug).

“In practice, when saving the whole history is not practical (and it usually isn’t ;-)), we can still have a circular buffer storing the last N seconds of the program before the crash.In practice, when saving the whole history is not practical (and it usually isn’t ;-)), we can still have a circular buffer storing the last N seconds of the program before the crash. While this doesn’t allow identification of all the bugs out there (because the bug condition could have occurred before those N seconds), for quite a few systems it still allows identification of 80–90% of them.
Required Determinism. To make production post-mortem work, only same-executable determinism (as defined in Definition 2) is necessary (well, usually it is not a problem to store all the released executables).

Low-latency fault tolerance

Description. As described in [NoBugs15b], deterministic Reactors (with circular logging) can be used to achieve low-latency fault tolerance (in a sense, it is ideologically similar to the now-discontinued ‘Virtual Lockstep’ technique which was used by VMWare). Such determinism-based implementation of fault tolerance allows latencies which are inherently better than those of ‘Fast Checkpoints’.

Required Determinism. For determinism-based fault tolerance to work, we only need same-executable determinism (as defined in Definition 2). That’s because after the catastrophic server failure, we’ll use exactly the same executable to achieve exactly the same results.

Replay-based regression testing

“the same Reactors with input logging can allow the use of real-world inputs to test that certain changes didn’t really change the behavior of the systemDescription. As it was described in [NoBugs16], the same Reactors with input logging can allow the use of real-world inputs to test that certain changes didn’t really change the behavior of the system. While such testing is inherently limited to the testing of (a) refactoring and (b) new features (and is not applicable to the testing of changes) – it can still facilitate testing quite a few things in an extremely reliable manner (and it is especially important as most of development is about new features).

The idea for such testing goes along the following lines:

record all the program inputs while the old code runs in production (usually this is done on per-Reactor basis)
make changes, producing new code (and a new executable)
run a replay of the recorded inputs against the new executable, and compare the results with those of the old code. Any changes indicate that 100% regression is not achieved.

Required Determinism. To get the benefits from replay-based regression testing, we need to have same-platform determinism against minor changes as defined in Definition 3.

In practice, this is often possible. While small changes can cause different behavior (in particular, with floating-point order and intermediaries) – it is usually not that difficult to fix them (in the case of floating-point issues due to compiler optimizations, by removing ambiguities and enforcing the behavior which was used by the old code, see example above). As soon as the regression test passes, this floating-point disambiguation can be rolled back if desirable; this can be done as a separate stage, and although it will be breaking strict regression testing, with the change being trivial, it can be reviewed for near-equivalence very easily.

Features-vs-determinism-type matrix

Now, we’re in position to summarize our findings in the following table:

	Same-Executable Determinism (Definition 2) – the simplest	Same-Platform Determinism against Minor Changes (Definition 3)	Cross-Platform Determinism (Definition 1) – most complicated
Deterministic Lockstep	–	–	Yes
Client-side replay	–	–	Yes
Replay-based regression testing	–	Yes	Yes
Production post-mortem	Yes	Yes	Yes
Low-latency fault tolerance	Yes	Yes	Yes

Conclusions

We’ve analysed different types of determinism (as encountered in the real world), and figured out which of these types of determinism are required to obtain different benefits.

From a practical point of view, this means that while deterministic lockstep and client-side replay are not usually feasible if multiple platforms are involved, goodies such as replay-based regression testing, production post-mortem, and low-latency fault tolerance are usually well within reach.

[+]References

[+]Disclaimer

Acknowledgements

This article has been originally published in Overload Journal #135 in October 2016 and is also available separately on ACCU web site. Re-posted here with a kind permission of Overload. The article has been re-formatted to fit your screen.

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Comments

Paul says

December 27, 2016 at 6:37 pm

Regarding std::multimap determinism. Since C++11 it is deterministic by the standard. Quote from http://en.cppreference.com/w/cpp/container/multimap/insert:

“If the container has elements with equivalent key, inserts at the upper bound of that range.”

Regarding other containers such as std::unordered_map. Order of iteration is never defined, since ordering elements in a deterministic fashion is not a feature of this container. One, relying on the specific order of elements in such container is the one building his house on a sand.

- "No Bugs" Hare says
  
  December 28, 2016 at 9:25 am
  
  Thanks for noting determinism of partially ordered collections since C++11, I changed the example to mention unordered ones.
  
  > One, relying on the specific order of elements in such container is the one building his house on a sand.
  
  Sure, but as the iteration is provided by container – even without explicitly relying on the order, determinism can be broken.
  
  One example – if we’re iterating over the container, and simply calculating sum of float fields from each of the elements, most of the people wouldn’t say that we’re really relying on the specific order of the elements. OTOH, the order of iteration can easily affect the sum (because floating addition involves non-linear rounding, and as a consequence – is non-associative).
  
  In another example – we could send all the items from the unordered container to whoever-requests-it – and depending on the protocol and/or receiver, any such order can be valid, but we’ll still be breaking cross-platform determinism if order is not exactly the same between different platforms.
  
  Overall – unordered containers were observed to cause their fair share of non-deterministic trouble even in otherwise perfectly working (and valid) programs.
  
Mikhail Maximov says

December 28, 2016 at 8:41 am

Missed bullet before “make changes”?
———————–
The idea for such testing goes along the following lines:

record all the program inputs while the old code runs in production (usually this is done on per-Reactor basis)
make changes, producing new code (and a new executable)
———————–

- "No Bugs" Hare says
  
  December 28, 2016 at 9:00 am
  
  Thanks, fixed now (damn copy-paste).
  
DePro says

May 1, 2019 at 1:39 pm

Game engines often employ frame skip if the frame processing is falling behind, and the circumstances in which this occurs depends on the exact state of the entire machine on which it’s executing. I think you also must log such frame-skips in order to maintain determinism for reproducing bugs, and you need to ensure that no changes to the program state occur in the skippable graphics processing if you want to apply determinism for the sake of client-side replays. Does that sound accurate?
(For example, in City of Heroes, particle effects stack up if the game lags due to swapping or whatever. Apparently they’re generated within the frame-skip code. If the list of particle effects is implemented incorrectly, this could lead to an exception.)

- "No Bugs" Hare says
  
  May 1, 2019 at 5:12 pm
  
  > I think you also must log such frame-skips in order to maintain determinism for reproducing bugs,
  
  Yep (for example, as a separate frame-skip event), though if frame-skip code does nothing, alternatively we can log “we’re-processing-frame” event instead of logging frame-skip as such (positive logic is usually simpler to deal with).