Deterministic Components for Distributed Systems

 
Author:  Follow: TwitterFacebook
Job Title:Sarcastic Architect
Hobbies:Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek
 
 
Schrödinger's Cat

Over the last 10 years, significant improvements have been made in program testing. I don’t want to go into a lengthy argument whether TDD is good or bad here – this is not the point of this article. However, one thing I want to note is a rather obvious (so obvious that it is often taken as granted) point that

For test results to have any meaning, the test needs to be reproducible

Hare thumb down:In theory, we can speak of ‘statistical testing’, but for most of the programs out there it would be a Really Bad Way to test them.In theory, we can speak of ‘statistical testing’ when the program is run for 1000 times (and if it fails not more than x times out of these 1000 runs, it is considered fine), but for most of the programs out there it would be a Really Bad Way to test them.

So far so obvious, but what does this really mean in practice? Let’s consider the relationship between two subtly different properties: the ‘test being reproducible’ and the ‘program being deterministic’ (defined as ‘the program has all its outputs completely defined by its inputs’). For the time being, we’ll refrain from answering the question ‘what qualifies as an input’; we’ll come to discussing it a bit later.

First, let’s note that for a 100% deterministic program, all the tests are always reproducible. In other words, a program being deterministic is sufficient to make all the tests against this program reproducible. On the other hand, if all the possible tests for our program are always reproducible, it means that our program is deterministic.

However, there are tests out there which are reproducible even when the program is not entirely deterministic. For example, if our program X prints the current time when we specify the -t option, and it emits concatenation of the input string with ‘beyond any repair’ otherwise, we can say (assuming that we don’t consider current time as one of the program inputs) that:

  • The program is not entirely deterministic
  • Surprised hare:the test with the -t option is not reproduciblethe test with the -t option is not reproducible
  • all the other tests are reproducible

Let’s see what we have observed up to now:

  • for deterministic programs, all the tests are reproducible
  • for non-deterministic programs, there can be both reproducible tests and non-reproducible ones

The second statement can be seen as an indication that all is not lost – we can have meaningful tests for non-deterministic programs. And this does stand; on the other hand, a much more unpleasant statement stands too:

For a real-world non-deterministic program, we are bound to leave some functionality untested

Let’s see this on an example of our program X. We certainly can have a set of reproducible tests over our program – the ones which don’t use the -t option. However, this will leave us with a significant part of our functionality untested.

To avoid this, we can try to test our program with the -t option to emit the right format, but wait – without meddling with the system time we won’t even be able to test it on Feb 29th etc. Ok, we can add manipulating system time to our test, but even then we’ll be testing only the time format (and not correctness of the time which was printed). We may try to measure current time before calling our program, and to compare this value with that printed by our program X; it might seem a good solution, but apparently, even if the program prints time with minute precision and runs for only 0.1 second, from time to time the test will fail. More specifically, such a test will fail whenever this 0.1 second (between the moment when we’ve measured current time and the moment our program has made its own measurement) happens to occur exactly across the boundary between two different minutes. Ok, we could add tolerance (which BTW already makes our test imprecise) and consider the program valid if the printed output printed_time satisfies an equation measured_time < printed_time < measured_time + 0.1s, but even this is not sufficient, as we haven’t accounted for potential system time adjustments (including automated ones).

Wtf hare:As we can see, even for an absolutely trivial non-deterministic program, accurate testing becomes a very non-trivial task. As we can see, even for an absolutely trivial non-deterministic program, accurate testing becomes a very non-trivial task. In practice, for a non-trivial program writing a set of tests which cover a significant part of the functionality quickly becomes a daunting task (or, more often, leads to significant functionality left untested). And if our program is traditionally multithreaded (defined as ‘multithreaded with explicit thread sync using mutexes or equivalent’) testing very quickly becomes pretty much pointless exactly because of the lack of determinism: the program which works perfectly on your test rig can easily fail on some other computer (or fail on the same computer from time to time) – just because whoever runs it was unlucky. Bummer.

Distributed programs and unit-tests: false sense of security

Let’s come back from our so-far-theoretical clouds to the earth of programming. Let me tell you one real-world story in this regard (let’s call it ‘Unit Test Horror Story’ for future reference).

Hare with hopeless face:Next day, his changes went to production and caused one of those once-per-year downtimes.Once upon a time, there was a company which successfully ran a multi-million-dollar online business. Moreover, they enjoyed very little unplanned downtime by their industry standards (1 hour per year or so). Then, on a nice sunny day (or a dark cloudy one – it won’t change anything), a new developer came to the company. It just so happened that he was a strong proponent of TDD, so he started writing a unit-test for a new feature, and the test failed; he implemented the feature, and the test succeeded. And then the new developer read a lecture to all those non-TDD developers, saying ‘Now you see how easy it is to write error-free programs!’ Next day, his changes went to production and caused one of those once-per-year downtimes. 🙁 🙁

The moral of this story is certainly not along the lines of ‘see, TDD is useless’ (IMHO, TDD, when taken in moderation, is quite useful, though it is far from being a silver bullet). The point here is to understand what has caused the downtime; and apparently, it was an unusual sequence of otherwise perfectly valid messages (sometimes such unusual sequences are referred to as ‘races’).

Moreover, from my experience, these

unusual sequences of otherwise perfectly valid messages are by far The Most Difficult To Find Bugs in pretty much any real-world distributed system.

As a consequence (and combined with an observation that such unusual sequences tend to be among the most unexpected for the developer), it means that

Unit tests, while important, are insufficient to ensure the validity of a distributed system

This happens because even if every component of your program is deterministic, when you make a distributed system out of these components, your distributed system (taken as a whole) is very unlikely to be deterministic. However, as we’ll see below, even per-component determinism helps a lot for debugging such distributed systems. This includes things such as automated replay-based regression testing, low-latency fault-tolerance and relocation for components, and the holy grail of production post-mortem analysis.

Benefits of deterministic components

Let’s discuss the major benefits of deterministic components in a distributed system.

Replay-based regression testing

Hare thumb up:if our component is entirely deterministic, we can do the following...As we’ve seen in the ‘Unit Test Horror Story’ above, unit tests are not sufficient to test components of the distributed system. On the other hand, if our component is entirely deterministic, we can do the following:

  • while a previous version of the component is running in production, we can write down all the inputs for this specific component as an input log. Ideally, we should be able to do this for all the components in the system, but it can be done on per-component basis too.
  • then, we can run the same input log against the new version of the component
  • if our new version of the component doesn’t have any changes to the logic (and has only extensions which are not activated until new inputs are used) – then the output should be exactly the same as for the previous version of the component
  • while there is no strict guarantee that this kind of testing would help with the ‘Unit Test Horror Story’, the chances are that it would. Moreover, for heavily loaded systems, experience shows that the vast majority of those unusual sequences of otherwise valid inputs do happen every few hours (YMMV, batteries not included). In any case, this kind of testing, while not being a guarantee against bugs (nothing is), is a very valuable addition to the arsenal of regression tests which can and should be run.

Of course, you’ve already noticed the weak point of this kind of testing: as noted above, it will work only as long as you have made no changes to existing functionality. This indeed is one of the reasons why this kind of testing is not a silver bullet. However, two things help in this regard:

  • Hare pointing out:more often than not, for a real-world system, functionality is added rather than modifiedmore often than not, for a real-world system, functionality is added rather than modified
  • provided that you make rather small and functionally oriented git-style independent commits (as opposed to one commit for everything which happened this week), it is often still possible to separate those changes which are not supposed to change any existing behavior, to re-apply these changes on top of the previous version, and to run replay-based regression tests against these changes. While it is admittedly an imperfect way of testing, it still does help quite a bit in practice.

Deterministic production post-mortem

However thoroughly we test our systems, from time to time they do fail . In such cases, it is of utmost importance to identify the problem ASAP, and to fix it, so it doesn’t happen again.

In this regard, deterministic components can provide a Holy Grail of production post-mortem: an ability to reproduce the bug exactly as it has happened in production. If our component is deterministic, then:

  • we’re able to write all the inputs of the component into an input log
  • Arguing hare:after the program fails in production, we can get the input log and run it in the comfort of a developer’s machine, under a debugger, as many times as we want, and get exactly the same variables at exactly the same points as happened in productionafter the program fails in production, we can get the input log and run it in the comfort of a developer’s machine, under a debugger, as many times as we want, and get exactly the same variables at exactly the same points as happened in production. Note that, strictly speaking, to get exactly the same variable values, we’ll need each line of our component to be deterministic, which is a stronger property than the component being deterministic as a whole. However, in practice the latter is normally achieved via the former, so we can ignore the difference between the two for most practical purposes.

In reality, of course, it is not that simple. If our system runs for months, storing and replaying all the inputs which have happened during those months is rarely feasible. However, we can avoid this problem by running our input log in a circular manner.

As mentioned above, running input log for the entire history of our component rarely qualifies as a viable option. On the other hand, if our component is deterministic, and we can get current state of our component in some kind of serialized form, then we can build our input log as follows:

  • input log is written in a circular manner, so that it stores only the last t seconds of the component inputs
  • on each wrap-around of our circular storage, we store the current state of our component in the same input log
  • if we need to perform a post-mortem, we use a deserialized current state to initialize our deterministic component and then reapply the remaining inputs from the input log to get a complete picture of the last t seconds of the component’s life before the crash

Such circular input logs are highly practical at least in some deployment scenarios, and provide an ability to see the last t seconds of life of a component before it failed. While it might be that the problem occurred before these last t seconds, and was only manifested later, this technique still happens to be quite useful in practice.

Low-latency fault tolerance and component relocation

Yet another major benefit of having your components both deterministic and serializable is that it is possible to gain some fault tolerance from them (without any help from application level code). In particular, the following schema will work:

  • circular input log runs on a physical server X, and our deterministic component runs on a physical server Y
  • all the outputs from our deterministic component are sequentially numbered, and go through physical server X, where the last number of the output is kept as a variable LastOutputNumber

Assertive hare:Then you can recover from any single server failure in a perfectly transparent mannerThen you can recover from any single server failure in a perfectly transparent manner:

  • if server X has failed, then server Y has all the information we need, and replacement server X’ can start writing input log from scratch (starting from storing serialized current state)
  • if server Y has failed, then we can follow the following procedure:
    • start replacement server Y’ with a replacement instance of our deterministic component (initialized with serialized current state from input log)
    • variable LastSkipNumber is set to LastOutputNumber
    • all the records in input log after serialized current state are replayed
    • during replay, all the outputs of our deterministic component are skipped until we reach LastSkipNumber. This is possible because all the outputs are deterministic, and they exactly repeat those outputs which have already been sent back to the other components/clients/whoever-else

This kind of determinism-based fault tolerance provides fault-tolerance with very little additional latency. In fact, it is ideologically similar to the virtual lockstep way of achieving fault tolerance (and provides significantly better latency than so-called fast checkpoints).

In a somewhat similar manner, it is possible to achieve low-latency relocation of the components from one physical server to another one. The idea revolves around running two instances of the component for some time to reduce latency during serialization/deserialization/state-transfer. The second instance of the component would have its outputs skipped until it has caught up with the first one, and then the first one can be dropped.

Making components deterministic

Ok, I hope that by now I’ve managed to convince you that determinism is a Good Thing™. Now, let’s discuss how to write those deterministic components.

Obvious stuff

  • First of all, let’s note that there are obvious things to keep in mind when aiming for determinism.
    • In particular: don’t use any uninitialized memory in your program
  • Judging hare:don’t use anything which causes the dreaded ‘undefined behavior’ in C/C++more generally, don’t use anything which causes the dreaded ‘undefined behavior’ in C/C++1
  • don’t use pointers for any operations except for dereferencing.
    • In particular, using pointer values (as opposed to data pointed to by pointers) as a key for sorting is especially dreadful (as usually allocators are not required to be deterministic, especially in a multithreaded environment, so such sorting can easily lead to non-deterministic results)

NB: for the purposes of this article, we won’t aim for cross-platform determinism; while cross-platform determinism is an interesting beast and has its additional benefits, for now we will set it aside to avoid complications. This will save us from quite a few problems, such as iterations over unordered collections and even more nasty different-order-of-floating-point-operations stuff.


1 well, you shouldn’t do it anyway TBH

 

Relation to Reactors

Now, let’s note that deterministic components tend to go very well alongside with the Reactor event-driven pattern (as described in [Wikipedia]). This more or less includes such things as Erlang message passing, Node.js, and to certain extent – Akka Actors.

Inquisitive hare:Reactor also serves as a very convenient point to write all these incoming events into our input log.In summary, the Reactor pattern takes incoming inputs (a.k.a. ‘service requests’ or ‘events’) and processes them synchronously, one by one. It also serves as a very convenient point to write all these incoming events into our input log.

So far so good, and if all our Reactor does is calculate a pure function, it stays deterministic just by its very nature. However, as soon as you do as little as call get_current_time() within your component, it ceases to be deterministic . Therefore, we need to discuss ways how to deal with non-determinism.

While the determinism-assuring techniques described below are, strictly speaking, not limited to Reactor and Reactor-like patterns, it is easier to describe some of them in terms of Reactors (and they’re likely to be used in the context of Reactors too).

Considering call output as a program input

As we’ve seen above, even a very simple program which prints current time is not deterministic. However, there is a trick which allows to make it deterministic. More specifically,

If we make current time an input of our program, related non-determinism will go away

In other words, all we need to do to make our program deterministic, is to replace a call to a system-level function get_current_time() with a call to our own function get_our_own_current_time(), where get_our_own_current_time() goes along the following lines:

time_t get_our_own_current_time() {
  switch( deterministic_mode ) {
    case DETERMINISTIC_REPLAY:
      return read_time_t_from_input_log();
    case DETERMINISTIC_RECORDING:
      {
      time_t ret = get_current_time();
      //NON-DETERMINISTIC!
      write_time_t_to_input_log(ret);
      //wrote to input-log
      //to ensure determinism
      return ret;
      }
    case DETERMINISTIC_OFF:
      return get_current_time();
      //NON-DETERMINISTIC!
  }
}

Bingo! We can have our determinism and eat it too!

Hare with an idea:This trick of declaring something non-deterministic as an input and writing it into input log is pretty much universal in the sense that in theory it can be applied to pretty much any non-deterministic function call including, but not limited to, reading of real random data from /dev/urandomThis trick of declaring something non-deterministic as an input and writing it into input log is pretty much universal in the sense that in theory it can be applied to pretty much any non-deterministic function call including, but not limited to, reading of real random data from /dev/urandom. Even mutexes can be handled this way (though in this case all the data protected by mutex needs to be written to the input log, ouch). On the other hand, in practice there are two significant caveats:

  • when large data chunks are read, it is often infeasible to handle them this way (if you read 1Gbyte from your file/DB, throwing it into input log is rarely a good idea); however, there are many practical cases when it can be avoided:
    • if the data is expected to be constant (like ‘read from a constant file’) – then there is no need to record it to the input log (as it can be reproduced); in extreme cases, if you suspect that data potentially can be corrupted, you can write some kind of hash into input log instead of the data itself
    • if the data is some kind of cache (which can be re-obtained from the authoritative source instead) – it doesn’t need to be logged either.
    • If we can say that data on disk is a part of our current state, then there is no need to log such accesses either. Note though that this option would usually make serializing your current state significantly more expensive
  • For frequently called functions such as obtaining current time, using this trick makes replay more fragile than it is necessary. For example, if you use this trick and then add another call to get_current_time() somewhere within your implementation, it makes your input logs incompatible (and usually this is not intended)

This trick doesn’t go well with fault-tolerant implementations (which need to write all the inputs to a separate physical box)

Guessing game

Surprised hare:In particular, as lots of input events need current time, we can say that we’ll provide time to all of them.A different technique to ensure determinism can be described as follows. If we know in advance what will be necessary for the processing of our input event, then we can supply it to our component (logging it to input log in advance, so that it is no longer a problem for fault-tolerant implementations). In particular, as lots of input events need current time, we can say that we’ll provide it to all of them. In this case:

  • Code outside of the Reactor will call the system level get_current_time() before processing each input event.
  • Code outside of the Reactor will store the time read (say, to TLS variable event_time)
    • Note that global/per-process singleton won’t be able to ensure determinism if you have more than one thread running your deterministic objects within your process.
  • It will call Reactor’s process_event() or equivalent

Reactor app-level code still needs to call: get_our_own_current_time()instead of: get_current_time()but implementation of get_our_own_current_time() becomes much simpler in this case:

thread_local time_t event_time;
//event_time is pre-populated by caller

time_t get_our_own_current_time() {
  return event_time;
}

Note that with this implementation, it is not possible to measure execution times within the event handler (as all the calls to get_current_time() for the same event will return the same value). On the other hand, as any kind of execution times measurements would make our program inherently non-deterministic (at the very least when we’re running it on a modern CPU), it is not that big deal. And if you need to make some performance analysis, you still can use something along the lines of Node.js-style console.time()/console.timeEnd(); as these functions do not return the measured time interval value to the program, but rather print it to a log file – then, as long as we do not consider log file as one of program outputs (and the program itself doesn’t read these values from the log file), we’re fine from determinism point of view.

Thinking hare:Unfortunately, not all the required inputs can be pre-guessed successfully. However, ...Unfortunately, not all the required inputs can be pre-guessed successfully. However, in quite a few cases, the following technique does help:

  • We’re starting to process_event() with or without the data that may be needed
  • If the data is needed but is not provided, we throw a special exception requesting the data from the caller
    • This exception must be thrown before any changes to Reactor’s state are made. This fits very well into a typical Validate-Calculate-Modify Reactor pattern described in [ITHare16].
    • It is also important to discard all the outputs coming from the processing of this process_event() (or to avoid emitting them in the first place)
  • If a caller receives this ThisKindOfDataRequested exception, it simply provides the requested data and repeats the same call

This exception-as-a-request-for-non-deterministic-data does help in quite a few scenarios. However, as with anything else, it is not a silver bullet , and chances are that you will need to look for your own ways to ensure determinism.

Conclusions

We’ve discussed the impact of per-component determinism on programming. In particular, we’ve found several major benefits of making your components deterministic (from replay-based regression testing to determinism-based fault tolerance). Also, we’ve discussed some ways of achieving this holy grail of determinism; while techniques discussed here won’t ensure determinism for all programs, they have been seen to allow determinism for quite a few real-world systems (ranging from stock exchanges to online games).

Don't like this post? Comment↯ below. You do?! Please share: ...on LinkedIn...on Reddit...on Twitter...on Facebook

Acknowledgements

This article has been originally published in Overload Journal #133 in June 2016 and is also available separately on ACCU web site. Re-posted here with a kind permission of Overload. The article has been re-formatted to fit your screen.

Cartoons by Sergey GordeevIRL from Gordeev Animation Graphics, Prague.

Join our mailing list:

Comments

  1. Alessandro Stamatto says

    Considering that you’re an experienced network programmer, adept of determinism, careful for high up-time, and defensor of the “Actor-like” (Reactor) Concurrency…

    What do you think of Erlang? It seems to be handmade for those cases! (maybe the performance is bad? Or it’s cumbersome for writing games? I don’t know)

    • "No Bugs" Hare says

      I know that Erlang is used for distributed systems with quite significant success. However, I do believe it is possible to do better than Erlang (and I’ve seen it done better in real world too) – in terms of performance, AND in terms of enforcing determinism, AND in terms of better handling of non-blocking stuff, AND having the same platform for client and server, AND finding developers, etc. etc…

      In other words – Erlang is good (especially as a concept), but – well, in real-world 2016 development it is possible to do better :-).

  2. Wanderer says

    Hi!

    And what is the situation about floating-point arithmetic determinism?

    I totally understand that as long as we are speaking about poker or even stock-exchange games, it’s all about integers and fixed-point numbers. Probably RPG can get away with fixed arithmetic too. But things like simulations (any kind of simuations – FPS, racing) tend to rely on floating-point. And, as far as I remember, there are some peculiarities and even randomness in those – from x87 to SSE, FMA and everything in between.

    Am I wrong?

    • "No Bugs" Hare says

      Floating-point determinism is a very-well-known headache :-(. However, for practical purposes we need to distinguish between (a) “cross-platform determinism” (which is Next to Impossible to achieve, at least with C/C++ – unless you’re running your own VM or something, as order of floating point ops is not guaranteed by compiler, and even addition is not associative in floating-point world 🙁 ), (b) “same-platform determinism for programs with most of the code being the same”, and (c) “determinism for the same executable” (which is more or less deterministic for floating-point).

      Keeping this in mind, stuff such as production post-mortem, replay testing, and determinism-based fault tolerance becomes feasible (with certain reservations). On the other hand, things such as cross-platform input-based replays or cross-platform deterministic lockstep, remain pretty much beyond the practical reach :-(.

Leave a Reply to Alessandro Stamatto Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.