Eight Ways to Handle Non-Blocking Returns in Message-Passing Programs - with Script

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

The following are slides and a script of my talk on CPPCON2017 in Bellevue, WA, US.

My apologies to those who’re not here yet, but we have LOTS of material (55 slides to be exact), so we don’t have time to wait.

Hello Almighty All. Today, I’ll be speaking about message-passing programs (in particular – (Re)Actors), and about eight different ways to handle non-blocking returns in them. In addition, I’ll try to make a very brief overview of existing proposals under the consideration by C++ committee (a.k.a. WG21) – and try to demonstrate how do they fit into our non-blocking returns.

That’s a lot of stuff, and to fit into allotted time, it has to be a very intensive session. My apologies if anything looks too sketchy; in the end I will try to provide a few pointers where further information can be found.

Before we can really start with the talk, I’d like to make an important announcement. I have to confess that it is not really MY presentation.

Rather – it is a talk by (da-dum)… this guy (and you can also see him on my t-shirt too).

It means that if there is anything good with this presentation – it should be attributed to me, and if there is anything bad – it is all HIS fault.

BTW, if you think that my accent is bad – you should be grateful that it is not him speaking; I can assure you that his accent is MUCH worse than mine (as you may guess, it is quite difficult to produce human sounds for a hare).

Preliminaries aside, we can proceed with the substance of this talk.

In part 0 of the talk (as we’re in C++ land, we HAVE to start everything from 0), I’ll try to define what is the task we’re going to solve.

First, we’ll have an extremely brief overview of the message-passing and (Re)Actors including their benefits, and then we’ll discuss non-blocking processing (which is VERY closely related to interactions between the main processing and return processing). In particular, we’ll mention that – depending on the whether we want to process intervening events while waiting for the result of outstanding operation – non-blocking processing CAN be simpler than blocking one.

There is no one single definition of message-passing, but to convey the idea, I think that it is the best to use the following adage which was provided in “Effective Go”:

Do not communicate by sharing memory; instead, share memory by communicating.

While this wording doesn’t come from C++-land – it does convey the concept (which is more-or-less equivalent to Shared-Nothing).

In practice, it means three things:

first, all the processing within business logic is confined to one single thread (or at least “as if” it is single-threaded)
second, there is no memory sharing (which in turn means that all the mutexes at the business-logic level are gone)
last but not least, there should be a way to pass a message between different threads, processes, or even different computers

Unfortunately, we don’t have time to get into detailed discussion of the benefits of message passing, so I will just list them here, highlighting the most interesting ones.

Most importantly, message-passing allows to avoid cognitive overload (which arises whenever we’re trying to deal both with business logic and with thread sync at the same time). This, in turn, simplifies programming greatly.

Second, with message-passing, it is easy to make our programs deterministic, which in turn enables such goodies as testability and production post-mortem debugging (yes, this is used in practice).

And when it comes to concurrency, scalability, and performance – message-passing architectures tend to beat mutex-based ones too – in particular, due to zero contention and due to avoiding expensive context switches (and context switch can cost up to a million CPU cycles, if we account for cache invalidation); another performance-related consideration is that message-passing programs tend to have MUCH better spatial locality. Overall, from scalability point of view, message-passing systems are“Shared-Nothing” – and “Shared-Nothing” architectures rulezz forever.

There are many different ways to implement message-passing; however, to be specific, for this talk we’ll concentrate on one single incarnation of them – the one I prefer to name (Re)Actors.

(Re)Actors are historically known under different names – Actors, Reactors, Event-Driven Programs, and ad-hoc Finite State Machines. They’re widely used in very different environments (from GUI to HPC with game development in between), and are supported by different programming languages and frameworks (from Windows messages to Node.js); now it certainly looks as a good time for C++ to start supporting event-driven programming as a first-class citizen too.

Most important for us, however, is that while from now on we’ll be speaking only about (Re)Actors – most of our findings are generalisable to more generic message passing.

One exception is related to allocator-related serialisation – while it does work for (Re)Actors, at the moment I don’t know how to generalise it to all the message-passing programs.

As we’ll be speaking in terms of (Re)Actors quite a bit – let’s establish some basic terminology which we’ll use.

Let’s name a common denominator for all our (Re)Actors a Generic (Re)Actor. As we can see, GenericReactor is an almost-empty abstract class – with a pure virtual function react().

Let’s name “Infrastructure Code” a piece of code which calls Generic Reactor’s react(). Quite often – this call will be within so-called “event loop” as shown on the slide.

As we can see – there is one single thread, so there is no need for thread synchronisation within react(); this is very important for several reasons (including allowing for determinism and avoiding cognitive overload).

Let’s also note that get_event() function can obtain events from wherever-we-want-to – from select()/poll()/epoll()/kqueue() (which is quite typical for servers) to libraries such as libuv (which are common for clients).

Also, let’s note that event loop is not the only possible way to use (Re)Actors. In particular, it is perfectly possible to call (Re)Actor::react() from a thread residing in a thread pool, providing thread sync OUTSIDE of (Re)Actor (while scenarios where this is beneficial, are relatively rare, formally they’re perfectly valid). Most importantly, even in this scenario our (Re)Actor::react() stays completely thread-sync-free.

And finally, let’s name any specific derivative from class GenericReactor – a Specific Reactor. It is SpecificReactor which will actually implement our react() function with all the business logic within.

I need to note that a virtual-function-based implementation shown is not the only one possible; in particular, the same thing can be done using templates instead of virtualisation – without any differences for the our further discussion.

After we defined our landscape, let’s see what is the specific problem I’m trying to discuss today. It is a problem of non-blocking processing.

Non-blocking code has quite a bad reputation among the programmers; in particular, it is perceived to be significantly more complicated than equivalent blocking code.

As we’ll see, in part it can be attributed to the lack of adequate support for non-blocking processing in the programming languages, but in addition, there is some confusion about it.

In practice we need to distinguish two very different scenarios. The first scenario arises when we do NOT need to process any incoming events while waiting for the result of requested operation. In this case – if going non-blocking – we’re essentially using it ONLY to improve performance, and indeed, code complexity of non-blocking code INCREASES compared to blocking one.

The second scenario occurs when we DO need to process events while we’re waiting for the result of requested operation. Examples of such situations are numerous; as one all-important example, for an interactive program, refusing to process inputs while waiting for a result of outstanding over-the-Internet operation, results in effectively “hanged” programs, which is pretty much suicidal.

And if we’ll take a look at this second scenario from the point of view of code complexity, we’ll observe that while non-blocking code can be ugly, blocking code, to implement processing events while blocking, will need to resort to threads – and worse, thread sync, which, in turn, will make it MUCH more complicated than non-blocking one.

In particular, I am of a very strong opinion that combining of thread sync with business logic IN THE SAME PIECE OF CODE almost-inevitably leads to cognitive overload (exceeding the magic number of 7+-2 very quickly). This, in turn, means that reasoning about the blocking-code-which-processes-intervening-events inevitably becomes extremely difficult.

This dual nature of the complexity with relation to non-blocking processing leads us to the concept of “Mostly-Non-Blocking Processing”, where we’ll use non-blocking processing ONLY if there is a chance that we DO need to process events while we’re waiting for the result of our outstanding call.

On the other hand, under mostly non-blocking processing, we MAY block AS LONG AS we the outstanding operation is short enough, so we can postpone processing of the intervening events while waiting.

One such example includes accessing local disk or LOCAL database. For most of the real-world scenarios, we can say that if this takes too long – we should be already doing Fault Recovery rather than non-blocking processing.

Unfortunately, we don’t have time to elaborate on mostly-non-blocking processing; what’s most important for us now, is that (mostly-non-blocking or otherwise) we’re most interested in INTERACTIONS between main program control flow on the one hand, and processing of returned values on the other hand.

Let me repeat it once again: IT.IS.ALL.ABOUT.INTERACTIONS, nothing else.

Now, we can get to the real stuff. In Part 1 of the talk, first we’ll define The Holy Grail of the non-blocking processing. In general, we’d want our non-blocking processing to be as close to blocking one as possible, but there is a significant caveat related to those interactions mentioned a few seconds ago.

Then, we’ll proceed to taking a quick look at the eight ways of handling non-blocking returns, one by one. Of course, given our time restrictions, detailed analysis isn’t possible – but as promised before, I will provide pointers to a more detailed discussion at the end of the talk.

To compare different ways of handling non-blocking returns, we’ll take one simple close-to-real-world example – and will see how our 8 different approaches will handle it.

The idea of the processing which we’ll use as a litmus test for our different Takes, is shown on the diagram. We have a game, and a player who wants to purchase some item to be used in our game. Player tells Client what she wants to purchase – and then Client sends request to Cashier (Re)Actor, Cashier (Re)Actor issues a request to DB (Re)Actor, and receives reply. If the purchase has failed for whatever reason – Cashier reports the failure back to the Client. And if DB transaction was successful – Cashier (Re)Actor issues another request to the Game World (Re)Actor (so the player gets the item and can start using it), and notifies Client that the transaction went through ok.

By the standards of the distributed systems it qualifies as a VERY simple scenario – but as we’ll see a bit later, it is sufficient to demonstrate the differences between different approaches to handling non-blocking returns.

Let’s take a look at the blocking-code-which-implements-this-logic within Cashier (Re)Actor; while it is clear that non-blocking code cannot be as simple as the blocking one – we should aim our non-blocking code to be as close to this one as possible.

In our blocking version, everything looks very simple – there is a very modest amount of boilerplate code, and the essence of our interactions is very clear:

First, we’re making a blocking RPC call to our Database (Re)Actor,

and then, depending on the result of the first call – we may issue another blocking RPC call to our Game World (Re)Actor. That’s pretty much it.

Let’s also note that to have this kind of code – we have to assume that we do have some kind of Interface Definition Language (IDL) and an IDL compiler which generates stubs and skeletons for these blocking RPC functions (when the bright future described by Herb Sutter in his talk on Wednesday, materialises – we’ll be able to have in-language IDL, but at this time we’re not there yet).

We’ll also assume that our IDL compiler is sufficiently universal (or written by ourselves, which is not a rocket science) – and that it will generate everything-we-may-need not only for this blocking example, but also for ALL our non-blocking Takes.

When looking at our blocking code, we have to note that for non-blocking purposes, the blocking code we have seen is NOT ideal. In particular, as our non-blocking RPC calls are all about allowing interactions with our (Re)Actor while the call is outstanding, it means that the state of our (Re)Actor MAY change while we’re waiting for the results of the call.

As a result, the non-blocking code on the slide may occasionally fail.

Compared to the previous slide, all we did was just adding two highlighted lines – and while in blocking code they’re perfectly fine, in a non-blocking code the assertion may fail. Indeed, while we were waiting for the result of the dbPurchaseItem, there MAY have been an event which we need to process (after all – the whole point of non-blocking processing is to allow processing events while we’re waiting). And processing of this intervening event, while staying completely out of sight, MAY have modified the state of our CashierReactor.

To allow developers to know when such implicit-updates-to-state can happen – it is necessary at least to make it very clear WHERE EXACTLY such unexpected modifications can potentially occur.

On the slide, such points-where-state-can-be-implicitly-and-silently-modified, are marked with a “REENTRY” marker for function calls (there is no special meaning for REENTRY here, it is merely intended to illustrate SOME kind of marker). As we’ll see, equivalent markers can look VERY differently depending on the specific technology we’re using; however – my current point is that there should be SOME kind of indication to make the potential points where interleaving events can cause us trouble, VERY CLEAR to the developer.

With this in mind – I contend that the code shown, is the best one we can possibly get for a non-blocking code solving our task (at the very least, I have never seen anything better); as such – I consider it representing a “Holy Grail” non-blocking implementation for our example code. And – we’ll use this “Holy Grail” code as a baseline for comparison for all our 8 non-blocking Takes.

Pages: 1 2 3

Jesper Nielsen says

October 6, 2017 at 10:20 pm

I love to see all of those different takes on the problem. I also had to figure out a way to let workflows cross boundaries between reactors, and although it’s very verbose I’ll share it here anyway because it’s pretty different. The idea is that I’m making my “co-routine” as a state machine, with each state being a chunk of code to execute on a reactors thread. I’m writing this in C# – could probably do it better using yield return and enumerators the way Unity coroutines are done (with each step yield returning the reactor – if any – to execute the next step on)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Engine.Server.Model.DAL;
using Engine.Server.Model.Player;

namespace Engine.Server.Controller.Tasks
{
public class AssignPlayerTask: TaskBase
{
private int _state = 0;
private ServerController _controller;
private PlayerCharacter _player;

public AssignPlayerTask(ServerController controller, PlayerCharacter player)
{
_controller = controller;
_player = player;
}

public override void Execute()
{
switch (_state)
{
case 0:
RemoveOldPlayer();
return;
case 1:
RefreshPlayerFromStorage();
return;
case 2:
AssignPlayer();
return;
}
}

private void RemoveOldPlayer()
{
PlayerCharacter alreadyPlaying = _controller.World.FindPlayerByName(_player.Name);
if (alreadyPlaying != null)
{
_controller.World.RemovePlayer(alreadyPlaying, true);
}
StateTransit(1, TaskContext.Db);//always – we might have just logged out before logging in again – and be waiting in DB queue
}

private void RefreshPlayerFromStorage()
{
PlayerDataMapper.Refresh(_player);
StateTransit(2, TaskContext.World);
}

private void AssignPlayer()
{
_controller.World.AssignPlayer(_player);
}

private void StateTransit(int newState, TaskContext newContext)//TODO: Move to base task
{
_state = newState;
_context = newContext;
_controller.Enqueue(this);
}

private TaskContext _context = TaskContext.World;
public override TaskContext Context
{
get { return _context; }
}
}
}

Comments

Jesper Nielsen says

October 6, 2017 at 10:20 pm

I love to see all of those different takes on the problem. I also had to figure out a way to let workflows cross boundaries between reactors, and although it’s very verbose I’ll share it here anyway because it’s pretty different. The idea is that I’m making my “co-routine” as a state machine, with each state being a chunk of code to execute on a reactors thread. I’m writing this in C# – could probably do it better using yield return and enumerators the way Unity coroutines are done (with each step yield returning the reactor – if any – to execute the next step on)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Engine.Server.Model.DAL;
using Engine.Server.Model.Player;

namespace Engine.Server.Controller.Tasks
{
public class AssignPlayerTask: TaskBase
{
private int _state = 0;
private ServerController _controller;
private PlayerCharacter _player;

public AssignPlayerTask(ServerController controller, PlayerCharacter player)
{
_controller = controller;
_player = player;
}

public override void Execute()
{
switch (_state)
{
case 0:
RemoveOldPlayer();
return;
case 1:
RefreshPlayerFromStorage();
return;
case 2:
AssignPlayer();
return;
}
}

private void RemoveOldPlayer()
{
PlayerCharacter alreadyPlaying = _controller.World.FindPlayerByName(_player.Name);
if (alreadyPlaying != null)
{
_controller.World.RemovePlayer(alreadyPlaying, true);
}
StateTransit(1, TaskContext.Db);//always – we might have just logged out before logging in again – and be waiting in DB queue
}

private void RefreshPlayerFromStorage()
{
PlayerDataMapper.Refresh(_player);
StateTransit(2, TaskContext.World);
}

private void AssignPlayer()
{
_controller.World.AssignPlayer(_player);
}

private void StateTransit(int newState, TaskContext newContext)//TODO: Move to base task
{
_state = newState;
_context = newContext;
_controller.Enqueue(this);
}

private TaskContext _context = TaskContext.World;
public override TaskContext Context
{
get { return _context; }
}
}
}

- "No Bugs" Hare says
  
  October 7, 2017 at 6:59 am
  
  > “I also had to figure out a way to let workflows cross boundaries between reactors” – ouch!
  
  Overall – yep, this is #9 indeed :-). Still, at least on the first glance, the same thing (“let workflows cross boundaries”) can be achieved via Take 3 (OO callbacks) with serializing them in first Reactor and then deserializing them in different Reactor; or am I missing something?
  
  In other words – all we have to do to move a workflow from one Reactor to another one, is to serialize state of the outstanding-request-processing, and with Take 3, all the state is within easily-serializable-callback-objects, so it _seems_ that the same thing can be achieved with IMO-better-readable-Take-3.
  
  - Jesper says
    
    October 7, 2017 at 12:03 pm
    
    Yeah I’m not claiming it to be superior to any of the takes you posted. I’m not serialization anything though – just enqueueing the same object for different reactors – each of them calling Execute in turn while I keep the state in the Task object itself
    
    - "No Bugs" Hare says
      
      October 7, 2017 at 2:24 pm
      
      > just enqueueing the same object for different reactors – each of them calling Execute in turn while I keep the state in the Task object itself
      
      You mean “within the same process, sharing memory between different reactors?”
      
      - Jesper says
        
        October 7, 2017 at 7:39 pm
        
        More like handing off objects between reactors. To be honest I’m not working with a formalized concept of reactors and I do also use some thread synchronization here and there.
      - "No Bugs" Hare says
        
        October 10, 2017 at 4:53 am
        
        > More like handing off objects between reactors.
        
        Yes, this is what I meant :-).
        
        > To be honest I’m not working with a formalized concept of reactors and I do also use some thread synchronization here and there.
        
        Well, it is fine to break the rules occasionally – as long as you do understand what the rules are ;-). Which, in turn, means that the larger the team – the stricter should be compliance with the rules 🙁 .
Cadenza Song says

September 27, 2019 at 9:34 pm

Excellent post, thanks!

One nitpick: in the “Take 6/6a” example code, seems the there a missing letter “a”, should probably be: “ReactorFuture db_ok;”

Eight Ways to Handle Non-Blocking Returns in Message-Passing Programs – with Script

Comments

Leave a Reply Cancel reply

Related posts

Comments

Leave a Reply Cancel reply