Eight Ways to Handle Non-Blocking Returns in Message-Passing Programs – with Script

 
Author:  Follow: TwitterFacebook
Job Title:Sarcastic Architect
Hobbies:Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek
 
 


Pages: 1 2 3


slide 15

Historically, first non-blocking implementations were based on plain messages; such an implementation for our “Item Purchase” example is shown on the slide. Yes, it is THIS bad.

Let’s take a bit closer look at it: 

slide 16

First, we need to create a struct to store the context of our request (so we can handle return properly). There is nothing really meaningful here – but we still have to write this boilerplate code.

slide 17

Then, we have to add a map of request_ids to this request data, so we can find the necessary request context when reply arrives back.

While in some cases it is possible to say that there is only one outstanding request per type, in general – it is better to avoid such things as they have a tendency to become unmanageable rather quickly. The same stands for ad-hoc tricks which allow to identify replies by embedding additional information into reply instead of request_id.

slide 18

As we got our initial request from the Client – we need to save the context into our struct, compose the message using the function prepared by our IDL compiler, send the message, and save our struct into the map.

BTW, in addition to being verbose, this code is also error-prone: this kind of code is not only cumbersome, but is also pretty easy to get wrong.

slide 19

Then, on receiving reply from DB (Re)Actor, we need to find the request in the map, and check its status

slide 20

Now, just for a change, we have one line of non-boilerplate meaningful code (the one which was present in our blocking version)

slide 21

…but as you have already guessed – it is followed by even more boilerplate.

slide 22

Overall, amount of boilerplate code in Take 1 is enormous: out of over-70(!) lines-of-code in Take 1, only about 10 are meaningful. This becomes especially obvious when placing Take 1 and our “Holy Grail” baseline code side by side…

This slide pretty much summarises the problems with Take 1: amount of boilerplate code is soooo large, that all the business logic is completely buried in it. What’s interesting though – is that I’ve seen perfectly working systems based on this kind of non-blocking handling (though their maintenance costs was a completely different story).

slide 23

For our Take 2, we’ll consider a technique which is quite popular in game development today – and based on non-blocking VOID-ONLY RPC calls. The idea behind using void (and non-throwing) RPC calls is that as there is no reply whatsoever to such calls – there is no question “what to do when the reply arrives”.

On the other hand – such void-only RPC calls provide only very marginal improvement over plain messages; in particular, to implement a typical request-response exchange, we need to have two void RPC calls – one to send request, and another to send reply back.

slide 24

As a result, most of the boilerplate stuff from Take 1 such as request context struct,

slide 25

map of request_ids into these structs,

slide 26

as well as filling context on sending request, and finding it on receiving reply – are all still here, with all the associated error-prone stuff.

slide 27

Overall, improvements in readability and reduction of boilerplate code, provided by Take 2, are not that large as we’d like them to be. While 50 lines-of-code is indeed better than 70, it is still five-times more than in our target “Holy Grail” code.

slide 28

Our Take 3 is about object-oriented callbacks. It is a thing which I was doing myself almost 20 years ago when architecting a pretty large (Re)Actor-based system; the architecture is still in use pretty much without changes – and it does work.

slide 29

As we can see from slightly increased font size, Take 3 is just a bit less verbose than Take 2. With Take 3, it is all about callback objects – and it takes lots of boilerplate to describe them. We need a callback object to handle return from our non-blocking request to the database…

slide 30

and another callback object for the non-blocking call to our Game World

slide 31

And only after we’re done with this meaningless boilerplate (and are over 50% down our Take 3 code) – we get to somewhat-meaningful stuff (it still has lots of boilerplate, but at least it is interspersed with SOMETHING meaningful).

Currently highlighted is the code processing original purchase request from Client…

slide 32

and here is return handler. One thing which we can see, is that – compared to Takes 1 and 2 – there is no explicit handling of the request_id-to-callback-data maps; in Take 3, while these maps are still present behind the scenes (there is no magic here), we managed to hide them from the view of app-level developer (which is a REALLY BIG relief).

slide 33

If we compare our Take 3 to our baseline “Holy Grail” code, we’ll see that Take 3, while being less verbose than Take 2, still has like 4-times more lines of code than we’d like to have. On the plus side, however (in addition to being less-error-prone than Takes 1 and 2), we can see that meaningful code in Take 3 is concentrated within a smaller number of lines, which significantly improves its readability.

Overall, Take 3, while being verbose, has been observed to be perfectly usable and perfectly working. I would even go as far as saying that it is the best thing we could do before the advent of C++11, and most importantly – before lambdas.

slide 34

Up to now, all our Takes were doable even in C++98; now let’s see how we can improve it with the power provided by C++11.

Our Take 4 is all about lambdas, and more lambdas, leading us to the infamous “lambda pyramid”. We can immediately see that the amount of boilerplate code went down very significantly; actually, it is the first time when our non-blocking code became somewhat-readable on a single slide.

Ideologically, the code in Take 4 is actually very similar to the Object-Oriented Callbacks from Take 3: it is just that instead of those verbose manually-written object-oriented callbacks, compiler creates equivalent lambda closures for us behind the scenes.

One important point about C++ and lambdas is that while we’re using lambda capture ”by copy”, it still captures current object (the one referred to by THIS) by reference. Accidentally, this is EXACTLY the behaviour we want for our purposes: we DO want to capture stack variables by value (so they survive return from our cashierPurchaseItem function), but at the same time we DO want to capture our (Re)Actor object by reference, so we can access its CURRENT state while processing non-blocking returns.

If we compare our Take 4 to our “Holy Grail” code, we’ll see that verbosity-wise they’re about the same.

slide 35

However, readability-wise Take 4 is still far from perfect (and it will become more obvious if we introduce exception handling). Most importantly, the code which is logically linear on the right side – becomes NESTED within our Take 4 code. This (as well as rather counterintuitive indents for what-is-essentially-linear-logical-code) is what earned Take 4 approach the name “lambda pyramid”; as soon as the number of non-blocking operations in original non-linear code goes to 4-5, the whole thing becomes rather poorly readable.

slide 36

Let’s see how we can improve it further. Our Take 5 is about so-called FUTURES. 

Very very briefly, the idea is to create a placeholder, known as “future”, where the result of some operation which-will-only-become-known-in-the-future can be stored

slide 37

And as soon as we have these “future” objects – we can specify actions which have to be done as soon as the result within the future becomes available. These actions are often referred to as continuations.

slide 38

If we compare our Take 5 to our baseline “Holy Grail” code – we’ll see that verbosity of futures is still in check, but, unlike “lambda pyramids”, the code-which-was-linear-in-blocking – is still linear with futures.

I’d certainly say that Take 5 is the best we have so far. BTW, as an added benefit, concepts such as “we need to wait for TWO non-blocking RPCs to complete”, are easily expressible with the futures too.

slide 39

Last but not least about our Take 5, let’s note that in spite of being CONCEPTUALLY similar to std::future<> in some sense, our ReactorFutures are not identical to it. Most importantly – std::future<> is primarily about THREAD sync, and our ReactorFutures, being used within one single thread, are quite different, and are more similar to Facebook’s folly::Futures (though it SEEMS that future futures from std::experimental will implement continuations, and will become what-we-want).

slide 40

If we extend the idea of futures a bit further, we can get to constructing the whole code trees from lambdas. The value of such approach becomes obvious as soon as we realise that not all the code is linear, and that there are conditions and exceptions. As we’re very limited on time today, I won’t give an example with conditions – but here is an example comparing our-“Holy Grail” code with exceptions added, to the “Code Builder” approach in our Take 6.

The idea behind “Code Builder” is that we’re effectively “building” the code tree in runtime (in CCode object), and as soon as the code tree is built – we can run it in a perfectly asynchronous manner. Most importantly for our current purposes is that elements of our code tree directly correspond to the elements of our original “Holy Grail” code: 

Whenever we have “try” on the right side – we have “ttry” on the left, whenever we have linear code on the right – we have more-or-less similar code on the left too, and so on. And even if we’d have loops with non-blocking calls within – we’d still have exact 1-to-1 correspondence between blocking code and non-blocking one, and WITH EXACTLY THE SAME NESTING TOO.

Overall, Take 6, while being admittedly more verbose than Take 5, is MUCH more flexible than that, and (before the advent of co_await) could be a good choice for more complicated use cases.

slide 41

BTW, if we introduce preprocessor to the picture, we can make the code in Take 6 significantly less verbose (while preserving all its good properties). Whether improved readability is worth the trouble of debugging code which uses heavy preprocessor macros – is still unclear, but such a possibility does exist.

slide 42

Our Take 7 will be about stackful coroutines and/or fibers. Fortunately for us, we don’t have to go into a lengthy discussion about the differences between the two; for us, it is sufficient to say that fibers and stackful coroutines are pretty much the same for our purposes.

In not-exactly-standard but still-widely-cross-platform C++, stackful coroutines are represented by boost::coroutines (though personally I’d prefer my IDL compiler to use boost::context directly).

As we can see, the code in our Take 7 on the first glance looks better than that of the Take 6 and Take 6a. However, it comes at a cost of two important caveats…

slide 43

First, with stackful fibers/coroutines, it becomes very difficult to serialise state of our Reactor – and to achieve quite a few all-important properties of Reactor (including post-mortem debugging and low-latency fault tolerance) it IS necessary to serialise Reactor state.

Granted, in C++ serialisation is not a picnic even with lambdas, but with stackful co-routines I don’t even know how we can approach this task (BTW, if you do – I would be VERY interested in hearing it after the talk).

slide 44

Even more importantly, stackful co-routines come with a Big Fat word of caution. While with fibers/coroutines we CAN avoid futures and write the code EXACTLY as a blocking one – we DON’T really want to do it.

When we compare Take 7x (using ‘x’ to denote that it SHOULD NOT be used) to the “Holy Grail” code – we’ll notice that there is one thing missing from the Take 7x: it is those “REENTRY” markers which are necessary to indicate those points where the Reactor state can be modified. This, in turn, as we discussed above, leads to difficult-to-see errors, and to significantly increased code maintenance costs. As a result – I do NOT recommend using Take 7x in real-world projects.

slide 45

A really short real-world story in this regard. Some time ago, I was presenting the Take 7x option to a billion-dollar company having several million-lines-of mostly-non-blocking (Re)Actor code (the idea was to replace Take-3-they’re-currently-using, with something more palatable). Long story short: right away after seeing this Take 7x – they asked me: “Hey, how we’ll know those points where the state can be modified??“.

It illustrates one of the earlier raised points – that THOSE REENTRY MARKERS are REALLY important for real-world non-blocking development.

slide 46

Our last Take, Take 8 is related to the concept which is well-known for the other programming languages (such as C#), but is a new-kid-on-the-block for C++. I’m speaking about await (which was renamed into co_await in recent versions by WG21).

As of now – it is NOT a part of standard yet, and changes are still to be expected; on the other hand – implementations are available both for MSVC and Clang.

As we can see, the code in Take 8 is almost-exactly the same as our “Holy Grail” code on the right.

Even our REENTRY markers have their direct counterparts in Take 8.

slide 47

Still, we have to note that at least as of current C++ proposals, Take 8 is not 100% ideal for our purposes. Most importantly, it has problems with serialising the state of our (Re)Actor, and while the problem is NOT as severe as that of Take 7, it is still going to cause quite a bit of trouble. On the plus side, it MIGHT be possible to serialise Take 8’s state as a part of serialising the whole allocator – but it is rather tricky, relies on certain properties of underlying OS, and as a result – causes complications with ASLR. From our current perspective- it is currently not 100% clear whether it is really GUARANTEED that ALL the co_await frames will REALLY go via a user-definable allocator in ALL the compilers and libraries (also we need a guarantee that await-frames won’t use anything else but allocator, so any on-stack storage for await frames doesn’t happen).

Join our mailing list:

Comments

  1. Jesper Nielsen says

    I love to see all of those different takes on the problem. I also had to figure out a way to let workflows cross boundaries between reactors, and although it’s very verbose I’ll share it here anyway because it’s pretty different. The idea is that I’m making my “co-routine” as a state machine, with each state being a chunk of code to execute on a reactors thread. I’m writing this in C# – could probably do it better using yield return and enumerators the way Unity coroutines are done (with each step yield returning the reactor – if any – to execute the next step on)

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using Engine.Server.Model.DAL;
    using Engine.Server.Model.Player;

    namespace Engine.Server.Controller.Tasks
    {
    public class AssignPlayerTask: TaskBase
    {
    private int _state = 0;
    private ServerController _controller;
    private PlayerCharacter _player;

    public AssignPlayerTask(ServerController controller, PlayerCharacter player)
    {
    _controller = controller;
    _player = player;
    }

    public override void Execute()
    {
    switch (_state)
    {
    case 0:
    RemoveOldPlayer();
    return;
    case 1:
    RefreshPlayerFromStorage();
    return;
    case 2:
    AssignPlayer();
    return;
    }
    }

    private void RemoveOldPlayer()
    {
    PlayerCharacter alreadyPlaying = _controller.World.FindPlayerByName(_player.Name);
    if (alreadyPlaying != null)
    {
    _controller.World.RemovePlayer(alreadyPlaying, true);
    }
    StateTransit(1, TaskContext.Db);//always – we might have just logged out before logging in again – and be waiting in DB queue
    }

    private void RefreshPlayerFromStorage()
    {
    PlayerDataMapper.Refresh(_player);
    StateTransit(2, TaskContext.World);
    }

    private void AssignPlayer()
    {
    _controller.World.AssignPlayer(_player);
    }

    private void StateTransit(int newState, TaskContext newContext)//TODO: Move to base task
    {
    _state = newState;
    _context = newContext;
    _controller.Enqueue(this);
    }

    private TaskContext _context = TaskContext.World;
    public override TaskContext Context
    {
    get { return _context; }
    }
    }
    }

    • "No Bugs" Hare says

      > “I also had to figure out a way to let workflows cross boundaries between reactors” – ouch!

      Overall – yep, this is #9 indeed :-). Still, at least on the first glance, the same thing (“let workflows cross boundaries”) can be achieved via Take 3 (OO callbacks) with serializing them in first Reactor and then deserializing them in different Reactor; or am I missing something?

      In other words – all we have to do to move a workflow from one Reactor to another one, is to serialize state of the outstanding-request-processing, and with Take 3, all the state is within easily-serializable-callback-objects, so it _seems_ that the same thing can be achieved with IMO-better-readable-Take-3.

      • Jesper says

        Yeah I’m not claiming it to be superior to any of the takes you posted. I’m not serialization anything though – just enqueueing the same object for different reactors – each of them calling Execute in turn while I keep the state in the Task object itself

        • "No Bugs" Hare says

          > just enqueueing the same object for different reactors – each of them calling Execute in turn while I keep the state in the Task object itself

          You mean “within the same process, sharing memory between different reactors?”

          • Jesper says

            More like handing off objects between reactors. To be honest I’m not working with a formalized concept of reactors and I do also use some thread synchronization here and there.

          • "No Bugs" Hare says

            > More like handing off objects between reactors.

            Yes, this is what I meant :-).

            > To be honest I’m not working with a formalized concept of reactors and I do also use some thread synchronization here and there.

            Well, it is fine to break the rules occasionally – as long as you do understand what the rules are ;-). Which, in turn, means that the larger the team – the stricter should be compliance with the rules 🙁 .

Leave a Reply

Your email address will not be published. Required fields are marked *