Eight Ways to Handle Non-Blocking Returns in Message-Passing Programs – with Script

Author:  Follow: TwitterFacebook
Job Title:Sarcastic Architect
Hobbies:Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek

Pages: 1 2 3

slide 48

Now, as we’re done with describing our Takes – let’s compare them. Most of this table should be self-explanatory, but a few things do need an additional word or two:

  • Of course, readability is inherently subjective, but it certainly exists – and it is certainly very important.
  • As for “Hidden State Changes” – it is essentially about having that REENTRY marker (or a reasonable facsimile) to denote points where state of the (Re)Actor can be implicitly changed.
    • Note that for Take 7 – we cannot really enforce REENTRY markers on the functions-which-call-our-RPC-functions. This is pretty bad, as we’ll need to resort to non-enforceable things such as naming conventions to indicate such functions-which-can-cause-sudden-state-change; this, in turn, will require to spend time looking for any violations of such policies, which, while not fatal, certainly won’t make our life easier.
  • With regards to serialisation of Takes 4 to 6 – it hinges on serialising lambda closures; while as of now, I don’t know of a way to serialise lambdas as such – it IS possible to do one of two things:

    a) write a kinda-preprocessor which automagically replaces lambdas with equivalent OO callbacks during build stage.

    or b) intercept ALL the allocations including lambda allocations, with a custom allocator – and to serialise the whole allocator-including-both-our-own-stuff and lambdas together. As noted above – this is quite risky, and depends on OS-related stuff, but I know at least of one working implementation of it.

    While neither of these approaches is perfect – they seem to work, though with a fair share of trouble.

  • For co_await – my current understanding is that we have only the second option (serializing allocator as a whole, including all the co_await frames), and it is a rather risky one (there are both risks related to clashes of virtual addresses when deserializing, and risks related to compilers/libraries bypassing our user-defined allocator for whatever-reason).

slide 49

Overall – I’d say that depending on specifics of your project, THREE different approaches may happen to be viable:

  • OO callbacks are good, old, working-for-sure even in C++98, and having no problems with serialisation. They are not exactly the most readable ones – but if you want a sure-fire approach which will serialise your state without any need to experiment – it will work at least for smaller projects pretty well (and I’ve seen it working for a million-lines-of-code project too).
  • futures are good if you want to stick to C++11. On the other hand, serialisation is going to be a headache – but is generally solvable one way or another.
  • co-await is almost-perfect for our purposes. Still – serialisation is going to be even-a-bigger-headache than for futures (while hopefully will be still doable).

To summarise – while quite a few of our Takes are usable in real-world, unfortunately, NONE of them represents an ideal solution for our non-blocking problems (at least not yet).

slide 50

Now, let’s discuss current C++ standard proposals, and what we want from them from our non-blocking-handling perspective. Relevant C++ proposals in this regard include:

  • co_await, currently billed as “stackless coroutines” (formerly Resumable Functions). As of now, co_await seems to be the most likely thing to make it into the C++2a, and IMNSHO is certainly the most viable proposal out there. There is still a significant issue which current co_await, related to difficulties with serialisation (while it seems to be possible at allocator level, it is rather risky and cumbersome).
  • boost::-style stackful coroutines (though I know that Gor prefers to call them fibers). From our perspective – with stackful coroutines, one big problem is that we can hide REENTRY markers (which is pretty bad for our purposes, as to prevent sudden state changes causing us trouble, we’ll need to resort to the stuff such as naming conventions <ouch! />). PLUS – situation with serialising stackful coroutines is even worse then for “stackless” ones (rough translation: “I have no idea how to serialize stackful coroutines even if we’re speaking about deserialising within EXACTLY the same executable”).

  • The next proposal on our list is so-called Resumable Expressions. To be perfectly honest, I do NOT like this proposal for two Big Reasons: first, it doesn’t allow to enforce markers equivalent to REENTRY, and second – when implementing await-like logic, they’re using mutexes (and with devastating results too); more on it a bit later.

  • The last one on our list is so-called Call/CC (call with current continuation). I have to admit that I don’t know much about Call/CC (that’s why the grey color on the slide), but it seems to me that it is way too low-level to be intuitively used in app-level code.

slide 51

Now, let’s give a few pointers on what-is-important-for-our-non-blocking-handling-purposes implementation-wise. As none of these newer proposals are carved in stone yet (and implementations are even less so) – there are a few things out there which either do-exist-but-can-magically-disappear, or things which we’d like to add (at least in the long run).

  • First, we DO need to see those points where the state of our non-blocking program can suddenly change. In this regard, I am a big fan of so-called Suspend-Up-and-Out model used by co_await nee Resumable Functions. Using an opportunity to speak to the members of the almighty committee – please PLEASE do NOT throw Suspend-Up-And-Out model away, especially on the premises such as those in P0114R0.

  • During our discussion, we mentioned serialisation quite a few times; overall – it is a VERY important feature in the context of using deterministic properties of the (Re)Actors (and message-passing programs in general); in particular – practical implementations of features such as post-mortem debugging and low-latency fault-tolerance require serialisation (more specifically – “Same-Executable Serialization”).

    On the other hand, it is clear that currently – without ANY built-in serialisation in C++ – we cannot ask to implement serialisation for lambdas and co_await frames. Still, there are TWO things we can (and I think SHOULD) ask for:

    (a) we have to be sure that await-frames are using ONLY heap (and no stack, nothing else), and (b) we need an ability to control allocator which is used to store lambda closures and await frames. This will allow us to implement serialization which we need; it will be ugly, but as a stop-gap measure, it will do.

    In addition – WHEN serialisation is supported (using static reflection or otherwise) – we want to have it supported also for lambda closures and for await-frames, please make sure to keep us in mind <smile />. In particular – when static reflection is ready, please make sure it DOES cover BOTH lambda closures AND await-frames.

    Last but not least, for stackful coroutines – having current stack serialised (and deserialised later – assuming it is EXACTLY the same code deserializing) – might help too.

  • Last but certainly not least – we DON’T want mutexes within implementations of whatever-coroutines-are-pushed-at-us by the almighty committee. As practice has shown – mutexes are soooo difficult to deal with, that even the committee members can easily leave a bad mutex-related bug in their code.

slide 52

Let’s take a look at the proposed implementation of await in P0114R0 (a.k.a. Resumable Expressions). As for using it – it is more or less on par with Take 7 (not perfect, but more or less usable); however, the devil, as always, is in details.

I would be REALLY happy to say that simulation of await() in P0114R0 qualifies as an implementation detail, so we shouldn’t care about it, but – there is a significant problem with this specific implementation.

The problem is that emulation of await in Resumable Expressions is mutex-based. In practice, it will mean a potential extra thread context switch per await, and with the cost of the context switch ranging from 2000 CPU cycles to – if we account for cache invalidation costs – to a MILLION CPU cycles, I don’t really like it.

slide 53

To make things even worse, implementation proposed in P0114R0 calls a user-defined function from under a hidden mutex. Such a practice has been observed to lead to unexplainable-to-user fatal deadlocks happening-once-a-month. It is worth noting that this problem was first demonstrated as early as in 98, when analysing a hopelessly-buggy multithreaded-STL implementation (carrying a copyright by another member of WG21). The worst case of observed behaviour was what-is-seen-in-developer-space as a “deadlock on one single recursive mutex“ (which is a thing which cannot possibly happen – unless there is a hidden mutex conveniently provided by the library exactly to allow this kind of deadlock to happen). My educated guess is that the same problem exists for P0114R0.

Necessary disclaimers:

  • Code in P0114R0 is convoluted enough, so there is a chance that I’m misreading it
  • More importantly, both these problems arising from using the mutex MIGHT be fixable (or MIGHT be not). TBH, I don’t even see why this mutex is necessary in the first place – but IF there was a reason, we MAY end up in trouble.

Fortunately, our further discussion does NOT depend on the result of the P0114R0 implementation of await being fixable. Still, it DOES illustrate an important point of avoiding mutexes when implementing coroutines.

slide 54

To summarise this hour-long talk in one single slide:

  • First, we DO need to handle non-blocking returns.

    Moreover, the whole point of handling non-blocking returns is to allow interaction of intervening input events with the state WHILE non-blocking call is in progress

    As a result – we DO need a way to clearly see whenever the state has a potential to change.

  • Second. Unfortunately, none of the options we have to handle non-blocking returns in C++ is perfect. Some of the options are outright ugly, some don’t allow to see potential-for-state-change, and some are not easily serialisable.

  • Still, I am sure that with co_await, it is about as-good-as-it-can-realistically-get in foreseeable future.

    On the other hand, when serialisation comes to the standard, I’d certainly appreciate a way to serialise (or statically reflect) such things as lambdas and await-frames.

slide 55

This concludes our very intensive talk; I hope that I was able to convey my thoughts in a digestible manner. Now, we have 3 minutes to answer some of your questions.


– ‘No Bugs’ Hare, “Development&Deployment of Multiplayer Online Games”, Vol. II, pp. 70-129.

– Kevlin Henney, “Thinking Outside the Synchronisation Quadrant”, ACCU2017

– “Effective Go”, https://golang.org/doc/effective_go.html

– Dmitry Ligoum, Sergey Ignatchenko. Autom.cpp. https://github.com/O-Log-N/Autom.cpp

– N4463, N3985, P0114R0, P0534R0

– Chuanpeng Li, Chen Ding, Kai Shen, “Quantifying The Cost of Context Switch”, Proceedings of the 2007 workshop on Experimental computer science

– “STL Implementations and Thread Safety”, Sergey Ignatchenko, ”C++ Report” , July/August 1998, Volume 10, Number 7.

Don't like this post? Comment↯ below. You do?! Please share: ...on LinkedIn...on Reddit...on Twitter...on Facebook


Cartoons by Sergey GordeevIRL from Gordeev Animation Graphics, Prague.

Join our mailing list:


  1. Jesper Nielsen says

    I love to see all of those different takes on the problem. I also had to figure out a way to let workflows cross boundaries between reactors, and although it’s very verbose I’ll share it here anyway because it’s pretty different. The idea is that I’m making my “co-routine” as a state machine, with each state being a chunk of code to execute on a reactors thread. I’m writing this in C# – could probably do it better using yield return and enumerators the way Unity coroutines are done (with each step yield returning the reactor – if any – to execute the next step on)

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using Engine.Server.Model.DAL;
    using Engine.Server.Model.Player;

    namespace Engine.Server.Controller.Tasks
    public class AssignPlayerTask: TaskBase
    private int _state = 0;
    private ServerController _controller;
    private PlayerCharacter _player;

    public AssignPlayerTask(ServerController controller, PlayerCharacter player)
    _controller = controller;
    _player = player;

    public override void Execute()
    switch (_state)
    case 0:
    case 1:
    case 2:

    private void RemoveOldPlayer()
    PlayerCharacter alreadyPlaying = _controller.World.FindPlayerByName(_player.Name);
    if (alreadyPlaying != null)
    _controller.World.RemovePlayer(alreadyPlaying, true);
    StateTransit(1, TaskContext.Db);//always – we might have just logged out before logging in again – and be waiting in DB queue

    private void RefreshPlayerFromStorage()
    StateTransit(2, TaskContext.World);

    private void AssignPlayer()

    private void StateTransit(int newState, TaskContext newContext)//TODO: Move to base task
    _state = newState;
    _context = newContext;

    private TaskContext _context = TaskContext.World;
    public override TaskContext Context
    get { return _context; }

    • "No Bugs" Hare says

      > “I also had to figure out a way to let workflows cross boundaries between reactors” – ouch!

      Overall – yep, this is #9 indeed :-). Still, at least on the first glance, the same thing (“let workflows cross boundaries”) can be achieved via Take 3 (OO callbacks) with serializing them in first Reactor and then deserializing them in different Reactor; or am I missing something?

      In other words – all we have to do to move a workflow from one Reactor to another one, is to serialize state of the outstanding-request-processing, and with Take 3, all the state is within easily-serializable-callback-objects, so it _seems_ that the same thing can be achieved with IMO-better-readable-Take-3.

      • Jesper says

        Yeah I’m not claiming it to be superior to any of the takes you posted. I’m not serialization anything though – just enqueueing the same object for different reactors – each of them calling Execute in turn while I keep the state in the Task object itself

        • "No Bugs" Hare says

          > just enqueueing the same object for different reactors – each of them calling Execute in turn while I keep the state in the Task object itself

          You mean “within the same process, sharing memory between different reactors?”

          • Jesper says

            More like handing off objects between reactors. To be honest I’m not working with a formalized concept of reactors and I do also use some thread synchronization here and there.

          • "No Bugs" Hare says

            > More like handing off objects between reactors.

            Yes, this is what I meant :-).

            > To be honest I’m not working with a formalized concept of reactors and I do also use some thread synchronization here and there.

            Well, it is fine to break the rules occasionally – as long as you do understand what the rules are ;-). Which, in turn, means that the larger the team – the stricter should be compliance with the rules 🙁 .

Leave a Reply

Your email address will not be published. Required fields are marked *