Server-Side MMO Architecture. Naïve, Web-Based, and Classical Deployment Architectures

posted December 21, 2015 by "No Bugs" Hare, translated by Sergey Ignatchenko

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

DB Server

DB Server handles access to a database. This can be implemented using several very different approaches.

“While in theory, it is possible to use your usual ODBC-style blocking calls to your database right from your Game Server FSMs, do yourself a favor and skip this option.The first and the most obvious model is also the worst one. While in theory, it is possible to use your usual ODBC-style blocking calls to your database right from your Game Server FSMs, do yourself a favor and skip this option. It will have several significant drawbacks: from making your Game Server FSMs too tightly coupled to your DB to having blocking calls with undefined response time right in the middle of your FSM simulation (ouch!). In short – I don’t know any game where this approach is appropriate.

DB API and DB FSM(s)

A much better alternative (which I’m arguing for) is to have at least one FSM running on your DB server, to have your very own message-based DB API (expressed in terms of messages or non-blocking RPC calls) to communicate with it, and to keep all DB work where it belongs – on DB Server, within appropriate DB FSM(s). An additional benefit of such a separation is that you shouldn’t be a DB guru to write your game logic, but you can easily have a DB guru (who’s not a game logic guru) writing your DB FSM(s).

DB API exposed by DB Server’s FSM(s), SHOULD NOT be plain SQL (which would violate all the decoupling we’re after). Instead, your DB API SHOULD be specific to your game, and (once again) should be expressed in game terms such as “take PC Z and place it (with all it’s gear) into game world #NN”. All the logic to implement this request (including pre-checking that PC doesn’t belong to any other game world, modifying PC’s row in table of PCs to reflect the number of the world where she currently resides, and reading all PC attributes and gear to pass it back) should be done by your DB FSM(s).

In addition, all the requests in DB API MUST be atomic; no things such as “open cursor and return it back, so I can iterate on it later” are ever allowed in your DB API (neither you will really need such things, this stands in spite of whatever-your-DB-guru-may-tell-you).

As soon as you have this nice DB API tailored for your needs, you can proceed with writing your Game Server FSMs, without worrying about exact implementation of your DB FSM(s).

Meanwhile, at the King’s Castle…

As soon as we have this really nice separation between Game Server’s FSMs and DB FSM(s) via your very own message-based DB API, in a sense, the implementation of DB FSM will become an implementation detail. Still, let’s discuss how this small but important detail can be implemented. Here I know of two major approaches.

Single-connection approach. This approach is very simple. You have run just one FSM on your DB Server and process everything within one single DB connection:

“Application-level cache has been observed to provide 10x+ performance improvement over DB cache even if all the necessary performance-related optimizations are made on the DB sideHere, there is a single DB FSM which has single DB connection (such as an ODBC connection, but there are lots of similar interfaces out there), which performs all the operations using blocking calls. A very important thing in this architecture is application-level cache, which allows to speed things up very considerably. In fact, this application-level cache has been observed to provide 10x+ performance improvement over DB cache even if all the necessary performance-related optimizations (such as prepared statements or even stored procedures) are made on the DB side. Just think about it – what is faster: simple hash-based in-memory search within your DB FSM (where you already have all the data, so we’re speaking about 100 CPU clocks or so even if the data is out of L3 cache), or marshalling -> going-to-DB-side-over-IPC -> unmarshaling -> finding-execution-plan-by-prepared-statement-handle -> executing-execution-plan -> marshaling results -> going-back-to-DB-FSM-side-over-RPC -> unmarshaling results. In the latter case, we’re speaking at least a few dozens of microseconds, or over 1e4 CPU clocks, over two orders of magnitude difference.¹ And with single connection to DB which is able to write data, keeping cache coherency is trivial. The main thing which gets cached for games is usually ubiquitous USERS (or PLAYERS) table, as well as some of small game-specific near-constant tables.

Despite all the benefits provided by caching, this schema clearly sounds as an heresy from any-DB-gal-out-there point of view. On the other hand, in practice it works surprisingly well (that is, as soon as you manage to convince your DB gal that you know what you’re doing). I’ve seen such single-connection architecture² handling 10M+ DB transactions per day for a real-world game, and it were real transactions, with all the necessary changes, transactions, audit tables and so on.

Actually, at least at first stages of your development, I’m advocating to go with this single-connection approach.

It is very nice from many different points of view.

First, it is damn simple.
“There is no need to worry about transaction isolation levels, locks and deadlocksSecond, there is no need to worry about transaction isolation levels, locks and deadlocks
Third, it can be written as a real deterministic FSM (with all the associated goodies); moreover, this determinism stands (a) both if you “intercept calls” to DB for DB FSM itself, or (b) if we consider DB itself as a part of the FSM state, in the latter case no call interception is required for determinism.
Fourth, the performance is very good. There are no locks whatsoever, the light is always green, so everything goes unbelievably smoothly. Add here application-level caching, and we have a winner! The single-connection system I’ve mentioned above, has had an average transaction processing time in below-1ms range (once again, with real-world transactions, commit after every transaction, etc.).

The only drawback of this schema (and the one which will make DB people extremely skeptical about it, to put it very mildly) is an apparent lack of scalability. However, there are ways to modify this single-connection approach to provide virtually unlimited scalability³ The ways to achieve DB scalability for this single-connection model will be discussed in Vol. 2.

One thing to keep in mind for this single-connection approach, is that it (at least if we’re using blocking calls to DB, which is usually the case) is very sensitive to latencies between DB FSM and DB; we’ll speak about it in more detail in Chapter [[TODO]], but for now let’s just say that to get into any serious performance (that is, comparable to numbers above), you’ll need to use RAID card with BBWC in write-back mode⁴, or something like NVMe, for the disk which stores DB log files (other disks don’t really matter much). If your DB server is a cloud one, you’ll need to look for the one which has low latency disk access (such things are available from quite a few cloud providers).

¹ with stored procedures the things become a bit better for DB side, but the performance difference is still considerable, not to mention vendor lock-in which is pretty much inevitable when using stored procedures

² with a full cache of PLAYERS table

³ while in practice I’ve never went above around 100M DB transactions/day with this “single-connection-made-scalable” approach, I’m pretty sure that you can get to 1B pretty easily, and then it MAY become tough, as the number is too different from what-I’ve-seen so some unknown-as-of-now problems can start to develop. On the other hand, I daresay reaching these numbers is even more challenging with traditional multiple-connection approach

⁴ don’t worry, it is a perfectly safe mode for this kind of RAID, even for financial applications

Multiple-Connections approach. This approach is much more along the lines of traditional DB development, and is shown on Fig VI.7:

In short: we have one single DB-Proxy FSM (with the same DB API as discussed above),⁵ which does nothing but dispatches requests to DB-Worker FSMs; each of these DB-Worker FSMs will keep its own DB connection and will issue DB requests over this connection. Number of these DB-Worker FSMs should be comparable to the number of the cores on your DB server (usually 2*number-of-cores is not bad starting number), which effectively makes this schema a kind of transaction monitor.

“The upside of this schema is that it is inherently somewhat-scalable, but that's about it.The upside of this schema is that it is inherently somewhat-scalable, but that’s about it. Downsides, however, are numerous. The most concerning one is the cost of code maintenance in face of all those changes of logic, which are run in multiple connections. This inevitably leads us to a well-known but way-too-often-ignored discussion about transaction isolation levels, locks, and deadlocks at DB level. And if you don’t know what it is – believe me, you Really don’t want to know about them. And updating DB-handling code when you have lots of concurrent access (with isolation levels above UR), is possible, but is extremely tedious. Restrictions such as “to avoid deadlocks, we must always issue all our SELECT FOR UPDATEs in the same order – the one written in blood on the wall of DB department” can be quite a headache to put it mildly.

Oh, and don’t try using application-side caching for multiple-connections (i.e. even DB-Proxy SHOULD NOT be allowed to cache). While this is theoretically possible, re-ordering of replies on the way from DB to DB-Proxy make the whole thing way too complicated to be practical. While I’ve done such a thing myself once, and it worked without any problems (after several months of heavy replay-based testing), it was the most convoluted thing I’ve ever written, and I clearly don’t want to repeat this experience.

But IMNSHO the worst thing about using multiple DB connections, is that while each of those DB FSMs can be made deterministic (via “call interception”), the whole DB Server cannot possibly be made deterministic (for multiple connections), period. It means that it may work perfectly under test, but fail in production while processing exactly the same sequence of requests.

Worse than that, there is a strong tendency for improper-transaction-isolation bugs to manifest themselves only under heavy load.

So, you can easily live with such a bug (for example, using SELECT instead of SELECT FOR UPDATE) quietly sitting in, but not manifesting itself until your Big Day comes, and then it crashes your site.⁶ Believe me, you really don’t find yourself in such a situation, it can be really (and I mean Really) unpleasant.

In a sense, working with transaction isolation levels is akin to working with threads: about the same problems with lack of determinism, bugs which appear only in production and cannot be reproduced in test environment, and so on. On the other hand, there are DB guys&gals out there who’re saying that they can design a real-world multi-connection system which works under the load of 100M+ write transactions per day and never deadlocks, and I don’t doubt that they can indeed do it. The thing which I’m not so sure about, is whether they really can maintain such quality of their system in face of new-code-required-twice-a-week, and I’m even less sure that you’ll have such a person on your game team.

In addition, the scalability under this approach, while apparent, is never perfect (and no, those TPC-C linear-scalability numbers don’t prove that linear scalability is achievable for real-world transactions). In contrast, single-connection-made-scalable approach which we’ll discuss in Vol. 2, can be extended to achieve perfect linear scalability (at least in theory).

⁵ in particular, it means that we can rewrite our DB FSM from Single-connection to Multiple-connections without changing anything else in the system

⁶ And it is not a generic “all the problems are waiting for the worst moment to happen” observation (which is actually purely perception), but a real deal. When probability of the problem depends on site load in a non-linear manner (and this is the case for transaction isolation bugs), chances of it happening for the first time exactly during your heavily advertised Event of the Year are huge.

DB Server: Bottom Line.

Unless you happen to have on your team a DB gal with real-world experience of dealing with locks, deadlocks, and transaction isolation levels for your specific DB under at least million-per-day DB write-transaction load – go for single-connection approach

If you do happen to have such a DB guru who vehemently opposes going single-connection – you can try multi-connection, at least if she’s intimately familiar with SELECT-FOR-UPDATE and practical ways of avoiding deadlocks (and no, using RDBMS’s built-in mechanism to detect the deadlock 10 seconds after it happens, is usually not good enough).

And in any case, stay away from any things which include SQL in your Game Server FSMs.

Pages: 1 2 3 4

ITHare (12)
Reddit (15 upvotes, 4 comments)

Comments

Ivan Lapshov says

December 24, 2015 at 3:37 pm

Thanks for sharing the article 🙂

I have 2 suggestions that you may consider to include in the book.

You have mentioned the TCP and UDP protocols, however websockets weren’t described. I think comparing websockets with tcp would be useful. Also, I hope there will be some chapter with frameworks description where you could mention actor-based architechture like Akka.

Reply
- "No Bugs" Hare says
  
  December 26, 2015 at 6:32 am
  
  THANKS! I’ve added Websockets above (they can be handled pretty much like TCP), and mentioned Akka’s Actors in Chapter V(d) (right near Erlang, they’re actually very similar to each other and to QnFSM).
  
  Reply
Wanderer says

January 28, 2016 at 11:55 am

Thanks again for a detailed insights and sharing your experience!

I have an FSM-related question. May be it’s again a bit too “techy” and will be discussed in vol.II only, but I’d like to ask anyway if you don’t mind.

On your diagram, it’s obvious that network-related FSMs are using “wait for event” triggering. Whether it’s good old select() or something like WSAWaitForMultipleEvents() – doesn’t really matter as it’s implementation details. At the same time, I’d like to ask about your thoughts on scheduling strategy of logic FSMs.

Basically, I know two usual approaches there – “wait for event” and “timed polls”.
* First one is basically the same as in the network FSM, with inbox queue having an event object. Again, whether it’s std::condition_variable::wait() or something like WaitForSingleEvent() – implementation details;
* Second approach can be expressed with a tight-loop including std::this_thread::sleep_for() and something like while (queue::pop_event()…);

While first one looks conceptually better, I still think second one has its own merits, especially in the cases when there is no “true real-time” constraints on event processing. Basically, my observations that it’s sometimes better to “accumulate” such events in inbox for, say, 100ms (or 500ms) and then process all of those in one batch, effectively decreasing the amount of active concurrent threads and reducing the contention. What I saw is that such approach helps with contention in case of “trivial” event handlers (i.e. when the amount of time needed for each event processing is negligible comparing to OS tick, which I suspect is true for a lot of MMO logic processing).

Of course, I suspect that such “scheduled poll” approach might not work that nice in MMO architectures with accepted poll period around 10ms-ish (*wildguess* poll value). I don’t think you can reliably make it smaller on usual OSes, definitely not for Windows, not totally sure about Linuxes.

All in all, I’d love to hear your experienced thoughts on this matter. Of course, if it’s something from much later part of the book, I totally don’t want you to distract from your plan 🙂

Reply
- Wanderer says
  
  January 28, 2016 at 12:18 pm
  
  PS: I’m asking because I don’t have any experience with MMO realms, but I worked on distributed calculations (think “Big Data Crunching” and multi-thread/multi-server simulation of physical processes). And, based on what I saw in your “64 Do’s/Dont’s” articles and this book, the back-end part of the work, i.e. “world simulation”, is actually pretty similar. Although, I never had any problems with those pesky “cheaters” 🙂
  
  So, I’m curious to see the differences in architectural decisions due to different business requirements.
  
  Reply
  - "No Bugs" Hare says
    
    January 29, 2016 at 9:42 am
    
    > Whether it’s good old select() or something like WSAWaitForMultipleEvents() – doesn’t really matter as it’s implementation details.
    
    Yep.
    
    > Basically, my observations that it’s sometimes better to “accumulate” such events in inbox for, say, 100ms (or 500ms) and then process all of those in one batch, effectively decreasing the amount of active concurrent threads and reducing the contention.
    
    Wait, which contention you’re speaking about? If you have a massive shared state protected by mutex – then there would be contention (on this mutex) and reducing number of threads would be a good thing (though it is better to be done in a different manner). With FSMs/Actors, however, it is shared-nothing, so there is nothing to compete for, no mutex, and no contention.
    
    Overall, as a Really Big Fat rule of thumb: stay away from tight loops and polling on the server side. While on the client-side they’re just nuisances (though I’m avoiding them on the clients too), on the server-side they SHOULD be avoided at all costs (well, there are exceptions, but they’re more of exotic nature, like “it may be ok to use polling when you’re shutting down your daemon”).
    
    The reason behind is trivial: it is damn too expensive – OR it brings too much latencies. Each time when you wake up your thread (only to find that nothing has arrived), you’re getting a context switch, and that’s spending like 10000 CPU clocks (EDIT: more like 100K-1M, see, for instance, http://www.cs.rochester.edu/u/cli/research/switch.pdf ). Way Too Expensive (especially when you find out that you did it for no reason). In addition, it puts you into a kind of predicament – reducing poll interval is bad because of the context switches, and increasing it hits game responsiveness.
    
    One additional interesting thing about these select()/WaitFor*() functions: with them in use, as load on the system grows (and unlike data crunching, games do not operate under 100% load, so there should be reserve at all times), “batching” of multiple requests will start to occur naturally, reducing number of context switches as it is needed. In other words, select()-based system will automagically adapt to higher loads, increasing latencies to the extent which is necessary to handle current load. It is graceful degradation in action.
    
    Overall, there is a Good Reason for all those WaitFor*() and select() functions (and there is a consensus against tight loops) – and this is avoiding context switches (and context switches can kill server performance instantly, been there, seen that).
    
    Reply
    - Wanderer says
      
      January 29, 2016 at 1:39 pm
      
      > With FSMs/Actors, however, it is shared-nothing, so there is nothing to compete for, no mutex, and no contention.
      
      Yes, except the queue itself. That can be implemented in lock-free approach, is that you mean? Without lock-free techniques, the queue is a shared resource, so there is some concurrency. And with multiple-writers/single-reader, as I understand, you still need some mutex-like or spinlock-like technique. The simple pure lock-free rung-buffer for single-producer/single-consumer doesn’t work here.
      
      > …OR it brings too much latencies
      
      I think that’s the main difference between real-time MMOs and something like data processing/simulation. In the second case, it’s sometimes OK to sync not very often (i.e. once in a second, for example). And the amount of data passing through queue is often non-trivial too (which also differs from MMO).
      
      OK. Thanks for providing these insights! I think now I better understand these differences and context of MMO.
      
      Reply
      - Wanderer says
        
        January 29, 2016 at 6:48 pm
        
        Please disregard my last comment. I just suddenly figured out that I can take any queue with any number of events and any amount of data from “shared” queue into private FSM queue with just single swap of pimpl’s. Looks like this idea is 7 years late, but it makes the process of “taking current queue” just a trivial task with a single mutex locked for 2 pointer assignments (or some other kind of lightweight sync).
      - "No Bugs" Hare says
        
        January 30, 2016 at 7:13 am
        
        What you’re suggesting, would probably work for Single-Writer-Single-Reader Queue, but IIRC, for Multiple-Writer-Single-Reader queues (and that’s what we generally need for FSMs) it is not as simple as two-pointers swap. However: (a) even if using mutex, it is still small (and the smaller the code under the lock – the less contention you have); (b) it can be implemented in a completely lockless manner, based on a circular buffer, plus CAS primitive (a.k.a. LOCK XCHG for x86 a.k.a. std::atomic_compare_exchange for C++ a.k.a. InterlockedExchange() for Windows). Implementing (b) properly is a Big Headache, but it needs to be done only once, and it has been done for example in boost::lockfree::queue (though in practice, you’ll additionally need some kind of waitForPop() function, which doesn’t seem to be provided by boost::lockfree::queue 🙁 ).
        
        I’m planning to write more on queues (specifically Multiple-Writer-Single-Reader ones) in Chapter [[TODO]] (currently Chapter XIV).
Jesper Nielsen says

April 18, 2016 at 11:01 am

Perhaps you could elaborate a little on how to scale simulation of a large world – both in terms of using several single-threaded FSM on a single server and distributing the world on several servers.
In particular – how to handle scaling of large contiguous zones if a single FSM – or even a single server -won’t cut it. I suppose the “share nothing” rule must be worked around in this case?

Reply
- "No Bugs" Hare says
  
  April 18, 2016 at 12:15 pm
  
  Good question. Yes, I didn’t answer it in “beta” chapters, but I will include it into “final” version of the book (Chapter III, protocols). Very shortly – the typical way of doing it is to split your game world into “zones” (with zones often having an overlap to account for objects moving near the border). It was described in “Seamless Servers: the case for and against” by Jason Beardsley (which is a part of “Massively Multiplayer Game Development” book published in 2003) and is still in use (it was recently mentioned, for example, in WoT presentation on GDC2016 (which should be on GDC Vault soon)).
  
  Hope it helps :-).
  
  Reply
Carlos C says

August 25, 2016 at 10:06 am

Hello there Hare!

Thanks for making this book, I’m looking forward for it’s final version, it’ll be a great addition to my library. There’s tons of knowledge in here.

Could you give out a very basic example of DB FSM? I think I’m understanding it the wrong way.

From what I’ve understood, DB FSM(s) provide a finite number of states that DB API should build it’s logic upon. That is perfectly reasonable. But..

Wouldn’t that require a huge amount of states?
What about too many specific states? (worst case being one for every DB API function/method)

I’m worried about duplication but as I said I probably got something very wrong.

Thanks!

Reply
- "No Bugs" Hare says
  
  August 27, 2016 at 4:12 pm
  
  DB FSMs I’ve seen, were essentially stateless (except for app-level caching as their state – usually cache is read-only, but write caches are also possible).
  
  One simple example: game world sends a request to DB FSM asking to move artefact X from player Y to player Z (as artefact X was lost during fight, whatever-else). On DB FSM side, most of the checks (like “whether player Y has artefact X”, etc. etc.) can be done from the read-only app-level cache, but transaction itself can be committed to DB (or can be write-cached, if the artefact is not THAT important one, or transaction commit can be postponed for a few milliseconds to save on commits – and reply back to game world can be delayed until transaction is committed, to make sure that ACID properties stand despite postponed commit, or…).
  
  So, ‘DB FSM’ (at least as I’ve seen it) was pretty much a “thing which processes DB-related requests”, with its state usually being something along the lines above.
  
  Hope it helps a bit (if not – feel free to ask further questions :-)). Also some discussion on DBs and DB FSMs is planned for upcoming Chapter XVII.
  
  Reply

DB Server

DB API and DB FSM(s)

Meanwhile, at the King’s Castle…

Related posts

Comments

Leave a Reply Cancel reply