“Multi-Coring” and “Non-Blocking“ instead of “Multi-Threading” – with a Script

 
Author:  Follow: TwitterFacebook
Job Title:Sarcastic Architect
Hobbies:Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek
 
 

The following are slides and a script of my talk on ACCU2018 in Bristol, UK, on Apr 14th. Note that “script” ~= “how I planned to speak about it”, which is slightly different from “transcript” ~= “how it turned out” <wink />

Pages: 1 2 3 4

slide 1

Good afternoon everybody. Thanks for coming, and I hope that during this talk I will be able to tell a thing or three which might be of interest.

As it says on the tin, I am going to argue that usual multi-threading patterns (especially those using mutexes at app-level) are not the only way to design multi-core non-blocking systems. Moreover, I am going to argue that (Re)Actor-based systems is a BETTER alternative for a REALLY wide range of real-world use cases.

slide 2

Before we start, let me provide a very brief outline for the talk.

In Part I of the talk, I’ll argue that multithreading is NOT really an ultimate goal which is worth pursuing no-matter-what, but rather a TOOL to implement two related-but-distinct concepts:

#1 is multi-coring (which is very closely related to scalability)

and

#2 is non-blocking (more precisely – guarantees against response being blocked by some long-lasting I/O operation).

More importantly – user DOES NOT care HOW EXACTLY we achieve these things, which opens the door to different implementations.

In the second part of the talk, we’ll move towards a discussion of my personal favorite subject – (Re)Actors. We won’t have much time to go deeply into details – but will still discuss such things as how certain aspects of (Re)Actors can (and I think should) be implemented, how (Re)Actors fare against Shared-Memory Architectures, and how they achieve our Goals specified above

Armed with a concept of one single (Re)Actor, we’ll be able to move to architectures which use NOTHING BUT (Re)Actors – that is, at app-level.

Among other things, we’ll see examples of such architectures in an AAA first-person shooter on the Client-Side, and for the Server-Side – we’ll mention some real-world systems which handle billions-messages-per-day and write tens-of-billions-DB-transactions-per-year (and purely accidentally happen to make billions-of-dollars in the process <wink />).

Last but certainly not least, we’ll briefly discuss applicability limits of message-passing systems in general, and (Re)Actors in particular.

Giving away a spoiler, I have to say that according to my experience with architecting serious systems, MOST of the heavily-loaded distributed interactive systems out there will benefit from being implemented as a (Re)Actor-fest.

With all this in mind, we’ll come to an inevitable conclusion:

What are you waiting for? Architect your next system as a (Re)Actor-fest!

slide 3

Before we can really start with the talk, I’d like to make an important announcement. I have to confess that it is not really MY presentation.

Rather – it is a talk prepared by (da-dum)… this guy (and you can also see him on my t-shirt too).

It means that if there is anything good with this talk – it should be attributed to me, and if there is anything bad – it is all HIS fault.

BTW, if you think that my accent is bad – you should be grateful that it is not him speaking; his accent is MUCH worse than mine.

slide 5

Preliminaries aside, we can proceed with the substance of this talk.

At the first part of the talk, we want to take a deep breath and ask ourselves: “Does multithreading qualify as an almighty business requirement – or it is merely a puny implementation detail?”

slide 6

As for “what qualifies as a requirement”, I happen to be a big fan of the following criteria laid out by one of my former managers:

“How many customers we will lose/gain if we implement this feature?” As the guy is currently worth well over a billion, he should have been doing something right.

slide 7

One of the consequences of this “how many customers we will gain” approach, IF certain thing doesn’t exist in the customer’s space – it CANNOT be a business requirement, plain and simple.

This BTW, is very consistent with the well-known requirement-for-requirements that good requirement SHOULD BE implementation-free. Indeed, specifying too much in the requirements restricts our abilities to build an optimal system, for example:

  • if we have to have serious security, but we’re saying “we have to use TLS-over-TCP”, we’re restricting our ability to improve latencies (using UDP-with-DTLS).
  • if we want our app to run in-browser, but write it down as “we have to use JavaScript as our programming language”- we’re preventing ourselves from using C++ via emscripten (and C# via IL2CPP+emscripten)
  • and so on, and so forth.
slide 8

Now, with this separation between good-requirements and bad-requirements in mind, we can ask ourselves whether “we have to use multithreading” qualifies as a good requirement?

And the answer (going contrary to intuitive feelings of LOTS of developers out there) is that

As a Big Fat Rule of Thumb – no, ‘use multithreading’ does NOT qualify as a good requirement, for a simple reason that multithreading as such is not directly observable in the end-user space.

In other words – (unless our product IS an OS or a library-to-be-used-by-other-developers) we WON’T get any customers due to writing our app as multi-threaded.

slide 9

OTOH, we still have business requirements which CAN be satisfied by using multi-threading.

Thinking about it a bit further, we can observe that if NOT for a business-level requirement to provide a response within the certain time, we wouldn’t care about multithreading AT ALL.

Indeed, we don’t have to use multithreading to do things correctly – it comes into play ONLY to provide reply faster. This stands for ALL the use cases of multithreading – from games (where “responsiveness” is measured in terms of milliseconds), to HPC (where “in 30 days” can qualify as “responsive enough”, especially compared to“you’ll have to wait until next century”).

More specifically, multithreading is usually summoned to solve one of two related-but-still-separate problems:

  • The first one is “Doing things faster” (using multiple CPU cores). This one is fairly obvious – there are cases out there when one single CPU core is not sufficient to do whatever-we-need-to-do, within the allotted time. And to get the job done – we do have to find a way to use multiple cores.
    • A close cousin of this requirement is Scalability; in a sense – it can be seen as an ability to scale our job to as-many-cores-as-we-might-need.
    • The second problem we’re often trying to solve with multithreading is keeping our app from being blocked by long external operations. Indeed, a scenario when my desktop app “hangs” just because it is waiting for a DNS server to reply while the Internet happens to be down – is a Bad Thing(tm), there is no argument about it (though whether multithreading is a good way to satisfy this requirement – is a completely different story).

That’s it – these two simple things cover vast majority (if not all) use cases for multithreading.

slide 10

To summarise Part I of the talk:

What we REALLY want is not multithreading as such; instead, we have two separate and distinct business requirements. The first one is multi-coring (and related Scalability), and the second one is being Non-Blocking.

Note that we’re STILL not at the point of saying that “multithreading is bad” (though we’ll certainly come to it later <wink />), but what we have here is that most of the time, multithreading as such is NOT a firm business requirement, which opens us a door for looking for alternative implementations.

Join our mailing list:

Comments

  1. Jesper Nielsen says

    I’m a little skeptical when it comes to business processes that typically go like:
    1: Read from DB
    2: Perform some business logic – perhaps reading more from DB as required
    3: Write to DB.

    If the flow 1-3 must be encapsulated in a transaction then it will block the entire application for the full duration if it’s implemented with a single writing connection. Synchronizing business logic and storage seems to be inevitable here?

    Multiple connections don’t have to wait for each other – except when they do due to row/column/table locks including false sharing from page locks, even escalating to full on deadlocks etc…

    Basically my point is – the “old” way of doing things is a mess, but how to avoid it if business logic is part of a transaction?

    • "No Bugs" Hare says

      > I’m a little skeptical when it comes to business processes that typically go like:

      This is _exactly_ the kind of business processes I’m speaking about :-). Keep in mind though that due to one single point, “read from DB” can be 99.9% done from in-app 100%-coherent cache(!!).

      > it will block the entire application for the full duration if it’s implemented with a single writing connection.

      Yes, but OLTP apps tend to have transactions which can be made _very_ fast (I’ve seen 500us or so on average for a very serious real-world app). If elaborating on it a bit more, it tends to go as follows: all the reads within transactions are EITHER “current state” reads (these are Damn Fast, and will get into that 500us average), OR “historical”. For “historical” reads (which BTW from my experience are fairly rare for OLTP systems – I’d say that less than 5% of overall transactions involve them), they can be made in a special read-only connection (they’re historical hence immutable(!)), processed, and then the result can be passed to the write connection for writing (and as there are no “long” reads involved, it will get under that 500us limit too).

      > how to avoid it if business logic is part of a transaction?

      It depends on the specific business logic we’re talking about – but up to now I did not see a real-world case where it is not possible (see also above re. “current but fast” reads vs “historical and immutable” reads). Also, FWIW, I had a long conversation on the whole thing with Hubert Matthews (BTW, you SHOULD see his ACCU2018 talk when it becomes available – he’s speaking about EXACTLY THE SAME things), and we were in agreement on pretty much everything; what was clear is that _one monolithic DB doesn’t scale, you have to split it along the lines of the transactions involved, AND should use ASYNC mechanisms to communicate between different sub-DBs_. Given that Hubert is one of the top consultants out there and deals with LOTS of various real-world systems (I have to admit that his experience is significantly wider than mine) – it should count for something ;-). The rest is indeed app-specific – but is certainly doable (my addition: and as soon as you got your DBs small enough – you can process them in one single connection ;-)).

      • Jesper Nielsen says

        There would also be the added latency between the business server and the storage server, unless they are the same (not very scalable I would think?) so we’re easily talking single or double digit milliseconds here if multiple reads and writes must be issued, which is a typical case.

        Not a problem in itself. Even ~100ms latencies are perfectly acceptable for many business processes but if the business processes are becoming serialized then it becomes problematic when scaling to many clients.

        On the other hand I just had a mental rundown of business processes I’ve been working with through the years and in fact I’ve rarely been in situations where Serializable isolation level was used. Typically we were talking Read Committed, which means that reads prior to writes might just as well be outside the transaction – and typically were. (In fact in a lot of cases prior to where I’m working now even writes for a business process weren’t batched in transactions even though they probably should have been…)

        So I guess in many cases designing processes to postpone all writing until the very end of the task, then issuing a set of writes as a transaction to the DB reactor should make it possible to interleave business tasks, with only a small writing transaction being serialized.

        • "No Bugs" Hare says

          > so we’re easily talking single or double digit milliseconds here if multiple reads and writes must be issued, which is a typical case.

          Yes, but I yet to see apps where for DB writes it is not enough.

          > if the business processes are becoming serialized then it becomes problematic when scaling to many clients.

          Nope :-). As I said – I can handle the whole Twitter (well, coherent part of it) on one single box, and I have my doubts that your app has more than that :-).

          > I’ve rarely been in situations where Serializable isolation level was used…

          Sure – and it is akin to writing to the same memory location without the mutex :-(. Most of the time, it will work, but if it doesn’t – figuring out where the client’s money went, becomes a horrible problem. I have to say that in that (Re)Actor-based system which moves billions dollars a year in very small chunks, in 10 years there were NO situations when the money wasn’t possible to trace (there were some bugs, but they were trivially identifiable so counter-transactions can be issued easily).

          > even writes for a business process weren’t batched in transactions even though they probably should have been

          An atrocity, but indeed a very common one. I remember when being at IBM 20 years ago, a horror story was told to me. Guys from some big company (let’s name it eB**) came to IBM, and asked to help optimize their DB. And IBM guys were like “what transaction isolation you guys are using?” And eB** guys were like “sorry, but what is transaction?” Curtain falls.

          And FWIW, it didn’t improve since across the industry :-(.

          > should make it possible to interleave business tasks, with only a small writing transaction being serialized.

          I’d say “parallelize read-only parts of the business tasks” – it is better than interleaving, it is real parallelization (somehow reminiscent of (Re)Actor-with-Extractors approach for in-memory Client-Side (Re)Actors).

          • Jesper Nielsen says

            >I’d say “parallelize read-only parts of the business tasks” – it is better than interleaving

            Yup that’s a more precise explanation of what I meant:)
            Still there will be instances where “bad stuff” can happen since this is pretty much equivalent to “read committed” with multiple writing connections.
            A restaurant table could easily get 2 overlapping bookings when business constraint validations are dealt with in parallelized reads, and unfortunately optimistic locking is a bit more complex than checking row versions in this case.

            “double digit ms” latencies easily become “single digit seconds” when as few as 100 clients enqueue work simultaneously.

Leave a Reply

Your email address will not be published. Required fields are marked *