Server-Side MMO Architecture. Naïve, Web-Based, and Classical Deployment Architectures

posted December 21, 2015 by "No Bugs" Hare, translated by Sergey Ignatchenko

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

Failure Modes & Effects

FMEA Failure mode and effects analysis (FMEA) was one of the first systematic techniques for failure analysis.— Wikipedia —When speaking about deployment, one all-important question which you’d better have an answer to, is the following: “What will happen if some piece of hardware fails badly?” Of course, within the scope of this book we won’t be able to do a formal full-scale FMEA for an essentially unknown architecture, but at least we’ll be able to give some hints in this regard.

Communication Failures

So, what can possibly go wrong within our deployment architecture? First of all, there are (not shown, but existing) switches (or even firewalls) residing between our servers; while these can be made redundant, their failures (or transient software failures of the network stack on hosts) may easily cause occasional packet loss, and also (though extremely infrequently) may cause TCP disconnects on inter-server connections. Therefore, to deal with it, our Server-to-Server protocols need to account for potential channel loss and allow for guaranteed recovery after the channel is restored. Let’s write this down as a requirement and remember until Chapter [[TODO]], where we will describe our protocols.

Server Failures

“Note that the stuff marked as 'High Availability', doesn't help with losing in-memory state: what we need to avoid losing in-memory state, is 'Fault-Tolerant' techniques.In addition, of course, any of the servers can go badly wrong. There are tons of solutions out there claiming to address this kind of failures, but you should keep in mind that usually, the stuff marked as “High Availability”, doesn’t help with losing in-memory state: what you need if you want to avoid losing in-memory state, is “Fault-Tolerant” techniques (see “Server Fault Tolerance: King is Dead, Long Live the King!” section below).

Fortunately, though, for a reasonably good hardware (the one which has a reasonably good hardware monitoring, including fans, and at least having ECC and RAID, see Chapter [[TODO]] for more discussion on it), such fatal server failures are extremely rare. From my experience (and more or less consistently with manufacturer estimates), failure rate for reasonably good server boxes (such as those from one of Big Three major server vendors) is somewhere between “once-per-5-years” and “once-per-10-years”, so if you’d have only one such server (and unless you’re running a stock exchange), you’d be pretty much able to ignore this problem completely. However, if you have 100 servers – the failure rate goes up to “once or twice a month”, which is unacceptable if such a failure leads to the whole site going down.

Therefore, at the very least you should plan to make sure that single failure of the single server doesn’t bring your whole site down. BTW, most of the time it will be a Game World Server going down, as you’re likely to have much more of these than the other servers, so at first stages you may concentrate on containment of Game World server failures. Also we can note that, counter-intuitively, failures of DB Server are not that important to deal with;¹ not because they have less impact (they do have much more impact), but because they’re much less likely to happen that a failure of one-of-Game-World-servers.

¹ that is, beyond keeping a DB backup with DB logs being continuously moved to another location, see Chapter [[TODO]] for further discussion

Containment of Game World server failures

“If Game World server fails, it can be restarted from scratch, losing all the changes since last save-to-DB, but at least preserving previous results.The very first (and very obvious) technique to minimize the impact of your Game World server failure on the whole site, is to make sure that your Game World reports relevant changes (without sending the whole state) to DB Server as soon as they occur. So that if Game World server fails, it can be restarted from scratch, losing all the changes since last save-to-DB, but at least preserving previous results. These saves-to-DB are the best to be done at some naturally arising points within your game flow.

For example, if your game is essentially a Starcraft- or Titanfall-like sequence of matches, then the end of each match represents a very natural save-to-DB point. In other words, if Game World server fails within the match – all the match data will be lost, but all the player standings will be naturally restored as of beginning of the match, which isn’t too bad. In another example, for a casino-like game the end of each “hand” also represents the natural save-to-DB point.

If your gameplay is an MMORPG with continuous gameplay, then you need to find a way to save-to-DB all the major changes of the players’ stats (such as “level has been gained”, or “artifact has changed hands”). Then, if the Game Server crashes, you may lose the current positions of PCs within the world and a few hundred XP per player, but players will still keep all their important stats and achievements more or less preserved.

Two words of caution with regards to save-to-DB points. First,

For synchronous games, don’t try to keep the whole state of your Game Worlds in DB

“If you disrupt the game-event-currently-in-progress for more than 0.5-2 minutes, for almost-any synchronous multi-player game you won't be able to get the same players back, and will need to rollback the game event anyway. Except for some rather narrow special cases (such as stock exchanges and some of slow-paced and/or “asynchronous” games as defined in Chapter I), saving all the state of your game world into DB won’t work due to performance/scalability reasons (see discussion in “Taming DB Load: Write-Back Caches and In-Memory States” section above). Also keep in mind that even if you would be able to perfectly preserve the current state of the game-event-currently-in-progress (with game event being “match”, “hand”, or an “RPG fight”) without killing your DB, there is another very big practical problem of psychological rather than technical nature. Namely, if you disrupt the game-event-currently-in-progress for more than 0.5-2 minutes, for almost-any synchronous multi-player game you won’t be able to get the same players back, and will need to rollback the game event anyway.

For example, if you are running a bingo game with a hundred of players, and you disrupt it for 10 minutes for technical reasons, you won’t be able to continue it in a manner which is fair to all the players, at the very least because you won’t be able to get all that 100 players back into playing at the same time. The problem is all about numbers: for two-player game it might work, for 10+ – succeeding in getting all the players back at the same time is extremely unlikely (that is, unless the event is about a Big Cash Prize). I’ve personally seen a large commercial game that handled the crashes in the following way: to restore after the crash, first, it rolled forward its DB at the DB level to get perfectly correct current state, and then it rolled all the current game-events back at application level, exactly because continuing these events wasn’t a viable option due to the lack of players.

Trying to keep all the state in DB is a common pitfall which arises when the guys-coming-from-single-player-casino-game-development are trying to implement something multiplayer. Once again: don’t do it. While for a single-player casino game having state stored in DB is a big fat Business Requirement (and is easily doable too), for multi-player games it is neither a requirement, nor is feasible (at least because of the can’t-get-the-same-players-together problem noted above). Think of Game World server failure as of direct analogy of the fire-in-brick-and-mortar-casino in the middle of the hand: the very best you can possibly do in this case is to abort the hand, return all the chips to their respective owners (as of the beginning of the hand), and to run out of the casino, just to come back later when the fire is extinguished, so you can start an all-new game with all-new players.

The second pitfall on this way is related to DB consistency issues and DB API.

Your DB API MUST enforce logical consistency

“You should have a special DB request “PC X took over artifact Y from PC XX” (and it should be implemented as a single DB transaction within DB FSM)For example, if (as a part of your very own DB API) you have two DB requests, one of which says “Give PC X artifact Y”, and another one “Take artifact Y from PC X”, and are trying to report an occurrence of “PC X took over artifact Y from PC XX” as two separate DB requests (one “Take” and one “Give”), you’re risking that in case of Game World server failure, one of these two requests will go through, and the other one won’t, so artifact will get lost (or will be duplicated) as a result. Instead of using these two requests to simulate “taking over” occurrence, you should have a special DB request “PC X took over artifact Y from PC XX” (and it should be implemented as a single DB transaction within DB FSM); this way at least the consistency of the system will be preserved, so whatever happens – there is still exactly one artifact. The very same pattern MUST be followed for passing around anything of value, from casino chips to artifacts, with any other goodies in between.

Server Fault Tolerance: King is Dead, Long Live the King!

If you want to have your servers to be really fault-tolerant, there are some ways to have your cake and eat it too.

However, keep in mind, that all fall-tolerant solutions are complicated, costly, and in the games realm I generally consider them as an over-engineering (even by my standards).

Fault-Tolerant Servers: Damn Expensive

Historically, fault-tolerant systems were provided by damn-expensive hardware such as [Stratus] (I mean their hardware solutions such as ftServer; see discussion on hardware-vs-software redundancy in Chapter [[TODO]]) and [HPIntegrityNonStop] which have everything doubled (and CPUs often quadrupled(!)) to avoid all single points of failure, and these tend do work very well. But they’re usually way out of game developer’s reach for financial reasons, so unless your game is a stock exchange – you can pretty much forget about them.

Fault-Tolerant VMs

Fault-Tolerant VMs (such as VMWare FT feature or Xen Remus) are quite new kids on the block (for example, VMWare FT got beyond single vCPU only in 2015), but they’re already working. However, there are some significant caveats. Take everything I’m saying about fault-tolerant VMs with a really good pinch of salt, as all the technologies are new and evolving, and information is scarce; also I admit that I didn’t have a chance to try these things myself 🙁 .

“Modern Fault-Tolerant VMs are using one of two technologies: 'virtual lockstep' and 'fast checkpoints'. Unfortunately, each of them has its own limitations.When you’re using a fault-tolerant VM, the Big Picture looks like this: you have two commodity servers (usually right next to each other), connect them via 10G Ethernet, run VM on one of them (the “primary” one), and when your “primary” server fails, your VM magically reappears on the “secondary” box. From what I can see, modern Fault-Tolerant VMs are using one of two technologies: “virtual lockstep” and “fast checkpoints”. Unfortunately, each of them has its own limitations.

Virtual Lockstep: Not Available Anymore?

The concept of virtual lockstep is very similar to our QnFSM (with the whole VM treated as FSM). Virtual lockstep takes one single-core VM, intercepts all the inputs, passes these inputs to the secondary server, and runs a copy VM there. As any other fault-tolerant technology, virtual lockstep causes additional latencies, but it seems to be able to restrict its appetite for additional latency to a sub-ms range, which is acceptable for most of the games out there. Virtual lockstep is the method of fault-tolerance vSphere prior to vSphere v6 was using. The downside of virtual lockstep is that it (at least as implemented by vSphere) wasn’t able to support more that one core. For our QnFSMs, this single-core restriction wouldn’t be too much of a problem, as they’re single-threaded anyway (though balancing FSMs between VMs would be a headache), but there are lots of applications out there which are still heavily-multithreaded, so it was considered an unacceptable restriction. As a result, vSphere, starting from vSphere 6, has changed their fault-tolerant implementation from virtual lockstep to checkpoint-based implementation. As of now, I don’t know of any supported implementations of Virtual Lockstep 🙁 .

Checkpoint-Based Fault Tolerance: Latencies

To get around the single-core limitation, a different technique, known as “checkpoints”, is used by both Xen Remus and vSphere 6+. The idea behind checkpoints is to make a kind of incremental snapshots (“checkpoints”) of the full state of the system and log it to a safe location (“secondary server”). As long as you don’t let anything out of your system before the coming-later “checkpoint” is committed to a secondary server, all the calculations you’re making meanwhile, become inherently unobservable from the outside, so in case of “primary” server failure, it is not possible to say whether it didn’t receive the incoming data at all. It means that for the world outside of your system, your system (except for the additional latency) becomes almost-indistinguishable² from a real fault-tolerant server such as Stratus (see above). In theory, everything looks perfect, but with VM checkpoints we seem to hit the wall with checkpoint frequency, which defines the minimum possible latency. On systems such as VMWare FT, and Xen Remus, checkpoint intervals are measured in dozens of milliseconds. If your game is ok with such delays – you’re fine, but otherwise – you’re out of luck 🙁 . For more details on checkpoint-based VMs, see [Remus].

Saving for latencies (and the need to have 10G connections between servers, which is not that big deal), checkpoint-based fault tolerance has several significant advantages over virtual lockstep; these include such important things as support for multiple CPU cores, and N+1 redundancy.

² strictly speaking, the difference can be observed as some network packets may be lost, but as packet loss is a normal occurrence, any reasonable protocol should deal with transient packet loss anyway without any observable impact

Complete Recovery from Game World server failures: DIY Fault-Tolerance in QnFSM World

If you’re using FSMs (as you should anyway), you can also implement your own fault-tolerance. I should confess that I didn’t try this approach myself, so despite looking very straightforward, there can be practical pitfalls which I don’t see yet. Other than that, it should be as fault-tolerant as any other solution mentioned above, and it should provide good latencies too (well in sub-ms range).

As any other fault-tolerant solution, for games IMHO it is an over-engineering, but if I’d feel strongly about the failures causing per-game-event rollbacks, this is the one I’d try first. It is latency friendly, it allows for N+2 redundancy (saving you from doubling the number of your servers in case of 1+1 redundancy schemas), and it plays really well alongside our FSM-related stuff.

The idea here is to have separate Logging Servers logging all the events to all the FSMs residing on your Game World servers; then, you will essentially have enough information on your Logging Servers to recover from Game World server failure. More specifically, you can do the following:

have an additional Logging Server(s) “in front of Game Servers”; these Logging Server(s) perform two functions:
- log all the messages incoming to all Game Server FSMs
  - these include: messages coming from clients, messages coming from other Game Servers, and messages coming from DB Server
  - moreover, even communications between different FSMs residing on the same Game Server, need to go via Logging Server and need to be logged
- timestamp all the incoming messages
all your Game Server FSMs need to be strictly-deterministic
- in particular, Game Server FSMs won’t use their own clocks, but will use timestamps provided by Logging Servers instead
In addition, from time to time each of Game Server FSMs need to serialize its whole state, and report it to Logging Server
then, we need to consider two scenarios: Logging Server failure and Game Server failure (we’ll assume that they never fail simultaneously, and such an event is indeed extremely unlikely unless it is a fire-in-datacenter or something)
- if it is Logging Server which fails, we can just replace it with another (re-provisioned) one; there is no game-critical data there
- “if it is Game Server which fails, we can re-provision it, and then roll-forward each and every FSM which was running on itif it is Game Server which fails, we can re-provision it, and then roll-forward each and every FSM which was running on it, using last-reported-state and logs-saved-since-last-reported-state saved on the Logging Server. Due to the deterministic nature of all the FSMs, the restored state will be exactly the same as it was a few seconds ago³
  - at this point, all the clients and servers which were connected to the FSM, will experience a disconnect
  - on disconnect, the clients should automatically reconnect anyway (this needs to account for IP change, what is a medium-sized headache, but is doable; in [[TODO]] section we’ll discuss Front-End servers which will isolate clients from disconnects completely)
  - issues with server-to-server messages should already be solved as described in “Communication Failures” subsection above

In a sense, this “Complete Recovery” thing is conceptually similar to EventProcessorWithCircularLog from Chapter V (but with logging residing on different server, and with auto-rollforward in case of server failure), or to a traditional DB restore-and-log-rollforward.

Note that only hardware problems (and software bugs outside of your FSMs, such as OS bugs) can be addressed with this method; bugs within your FSM will be replayed and will lead to exactly the same failure 🙁 .

Last but not least, I need to re-iterate that I would object any fault-tolerant schema for most of the games out there on the basis of over-engineering, though I admit that there might be good reasons to try achieving it, especially if it is not too expensive/complicated.

³ or, in case of almost-strictly-deterministic FSMs such as those CUDA-based ones, it will be almost-exactly-the-same

[[TODO!]] DIY VIrtual-Lockstep

Classical Game Deployment Architecture: Summary

To summarize the discussion above about Classical Game Deployment Architecture:

It works
It can and should be implemented using QnFSM model with deterministic FSMs, see discussion above for details
Your communication with DB (DB API) SHOULD use game-specific requests, and SHOULD NOT use any SQL; all the SQL should be hidden behind your DB FSM(s)
Your first DB Server SHOULD use single-connection approach, unless you happen to have a DB guy who has real-world experience with multi-connection systems under at least millions-per-day write(!) transaction loads
- Even in the latter case, you SHOULD try to convince him, but if he resists, it is ok to leave him alone, as long as external DB API is still exactly the same (message-based and expressed in terms of whatever-your-game-needs). This will provide assurance that in the extreme case, you’ll be able to rewrite your DB Server later.

[[To Be Continued…

This concludes beta Chapter 9(a) from the upcoming book “Development and Deployment of Massively Multiplayer Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 9(b), “Modular Architecture: Server-Side. Throwing in Front-End Servers.”]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Pages: 1 2 3 4

ITHare (12)
Reddit (15 upvotes, 4 comments)

Comments

Ivan Lapshov says

December 24, 2015 at 3:37 pm

Thanks for sharing the article 🙂

I have 2 suggestions that you may consider to include in the book.

You have mentioned the TCP and UDP protocols, however websockets weren’t described. I think comparing websockets with tcp would be useful. Also, I hope there will be some chapter with frameworks description where you could mention actor-based architechture like Akka.

Reply
- "No Bugs" Hare says
  
  December 26, 2015 at 6:32 am
  
  THANKS! I’ve added Websockets above (they can be handled pretty much like TCP), and mentioned Akka’s Actors in Chapter V(d) (right near Erlang, they’re actually very similar to each other and to QnFSM).
  
  Reply
Wanderer says

January 28, 2016 at 11:55 am

Thanks again for a detailed insights and sharing your experience!

I have an FSM-related question. May be it’s again a bit too “techy” and will be discussed in vol.II only, but I’d like to ask anyway if you don’t mind.

On your diagram, it’s obvious that network-related FSMs are using “wait for event” triggering. Whether it’s good old select() or something like WSAWaitForMultipleEvents() – doesn’t really matter as it’s implementation details. At the same time, I’d like to ask about your thoughts on scheduling strategy of logic FSMs.

Basically, I know two usual approaches there – “wait for event” and “timed polls”.
* First one is basically the same as in the network FSM, with inbox queue having an event object. Again, whether it’s std::condition_variable::wait() or something like WaitForSingleEvent() – implementation details;
* Second approach can be expressed with a tight-loop including std::this_thread::sleep_for() and something like while (queue::pop_event()…);

While first one looks conceptually better, I still think second one has its own merits, especially in the cases when there is no “true real-time” constraints on event processing. Basically, my observations that it’s sometimes better to “accumulate” such events in inbox for, say, 100ms (or 500ms) and then process all of those in one batch, effectively decreasing the amount of active concurrent threads and reducing the contention. What I saw is that such approach helps with contention in case of “trivial” event handlers (i.e. when the amount of time needed for each event processing is negligible comparing to OS tick, which I suspect is true for a lot of MMO logic processing).

Of course, I suspect that such “scheduled poll” approach might not work that nice in MMO architectures with accepted poll period around 10ms-ish (*wildguess* poll value). I don’t think you can reliably make it smaller on usual OSes, definitely not for Windows, not totally sure about Linuxes.

All in all, I’d love to hear your experienced thoughts on this matter. Of course, if it’s something from much later part of the book, I totally don’t want you to distract from your plan 🙂

Reply
- Wanderer says
  
  January 28, 2016 at 12:18 pm
  
  PS: I’m asking because I don’t have any experience with MMO realms, but I worked on distributed calculations (think “Big Data Crunching” and multi-thread/multi-server simulation of physical processes). And, based on what I saw in your “64 Do’s/Dont’s” articles and this book, the back-end part of the work, i.e. “world simulation”, is actually pretty similar. Although, I never had any problems with those pesky “cheaters” 🙂
  
  So, I’m curious to see the differences in architectural decisions due to different business requirements.
  
  Reply
  - "No Bugs" Hare says
    
    January 29, 2016 at 9:42 am
    
    > Whether it’s good old select() or something like WSAWaitForMultipleEvents() – doesn’t really matter as it’s implementation details.
    
    Yep.
    
    > Basically, my observations that it’s sometimes better to “accumulate” such events in inbox for, say, 100ms (or 500ms) and then process all of those in one batch, effectively decreasing the amount of active concurrent threads and reducing the contention.
    
    Wait, which contention you’re speaking about? If you have a massive shared state protected by mutex – then there would be contention (on this mutex) and reducing number of threads would be a good thing (though it is better to be done in a different manner). With FSMs/Actors, however, it is shared-nothing, so there is nothing to compete for, no mutex, and no contention.
    
    Overall, as a Really Big Fat rule of thumb: stay away from tight loops and polling on the server side. While on the client-side they’re just nuisances (though I’m avoiding them on the clients too), on the server-side they SHOULD be avoided at all costs (well, there are exceptions, but they’re more of exotic nature, like “it may be ok to use polling when you’re shutting down your daemon”).
    
    The reason behind is trivial: it is damn too expensive – OR it brings too much latencies. Each time when you wake up your thread (only to find that nothing has arrived), you’re getting a context switch, and that’s spending like 10000 CPU clocks (EDIT: more like 100K-1M, see, for instance, http://www.cs.rochester.edu/u/cli/research/switch.pdf ). Way Too Expensive (especially when you find out that you did it for no reason). In addition, it puts you into a kind of predicament – reducing poll interval is bad because of the context switches, and increasing it hits game responsiveness.
    
    One additional interesting thing about these select()/WaitFor*() functions: with them in use, as load on the system grows (and unlike data crunching, games do not operate under 100% load, so there should be reserve at all times), “batching” of multiple requests will start to occur naturally, reducing number of context switches as it is needed. In other words, select()-based system will automagically adapt to higher loads, increasing latencies to the extent which is necessary to handle current load. It is graceful degradation in action.
    
    Overall, there is a Good Reason for all those WaitFor*() and select() functions (and there is a consensus against tight loops) – and this is avoiding context switches (and context switches can kill server performance instantly, been there, seen that).
    
    Reply
    - Wanderer says
      
      January 29, 2016 at 1:39 pm
      
      > With FSMs/Actors, however, it is shared-nothing, so there is nothing to compete for, no mutex, and no contention.
      
      Yes, except the queue itself. That can be implemented in lock-free approach, is that you mean? Without lock-free techniques, the queue is a shared resource, so there is some concurrency. And with multiple-writers/single-reader, as I understand, you still need some mutex-like or spinlock-like technique. The simple pure lock-free rung-buffer for single-producer/single-consumer doesn’t work here.
      
      > …OR it brings too much latencies
      
      I think that’s the main difference between real-time MMOs and something like data processing/simulation. In the second case, it’s sometimes OK to sync not very often (i.e. once in a second, for example). And the amount of data passing through queue is often non-trivial too (which also differs from MMO).
      
      OK. Thanks for providing these insights! I think now I better understand these differences and context of MMO.
      
      Reply
      - Wanderer says
        
        January 29, 2016 at 6:48 pm
        
        Please disregard my last comment. I just suddenly figured out that I can take any queue with any number of events and any amount of data from “shared” queue into private FSM queue with just single swap of pimpl’s. Looks like this idea is 7 years late, but it makes the process of “taking current queue” just a trivial task with a single mutex locked for 2 pointer assignments (or some other kind of lightweight sync).
      - "No Bugs" Hare says
        
        January 30, 2016 at 7:13 am
        
        What you’re suggesting, would probably work for Single-Writer-Single-Reader Queue, but IIRC, for Multiple-Writer-Single-Reader queues (and that’s what we generally need for FSMs) it is not as simple as two-pointers swap. However: (a) even if using mutex, it is still small (and the smaller the code under the lock – the less contention you have); (b) it can be implemented in a completely lockless manner, based on a circular buffer, plus CAS primitive (a.k.a. LOCK XCHG for x86 a.k.a. std::atomic_compare_exchange for C++ a.k.a. InterlockedExchange() for Windows). Implementing (b) properly is a Big Headache, but it needs to be done only once, and it has been done for example in boost::lockfree::queue (though in practice, you’ll additionally need some kind of waitForPop() function, which doesn’t seem to be provided by boost::lockfree::queue 🙁 ).
        
        I’m planning to write more on queues (specifically Multiple-Writer-Single-Reader ones) in Chapter [[TODO]] (currently Chapter XIV).
Jesper Nielsen says

April 18, 2016 at 11:01 am

Perhaps you could elaborate a little on how to scale simulation of a large world – both in terms of using several single-threaded FSM on a single server and distributing the world on several servers.
In particular – how to handle scaling of large contiguous zones if a single FSM – or even a single server -won’t cut it. I suppose the “share nothing” rule must be worked around in this case?

Reply
- "No Bugs" Hare says
  
  April 18, 2016 at 12:15 pm
  
  Good question. Yes, I didn’t answer it in “beta” chapters, but I will include it into “final” version of the book (Chapter III, protocols). Very shortly – the typical way of doing it is to split your game world into “zones” (with zones often having an overlap to account for objects moving near the border). It was described in “Seamless Servers: the case for and against” by Jason Beardsley (which is a part of “Massively Multiplayer Game Development” book published in 2003) and is still in use (it was recently mentioned, for example, in WoT presentation on GDC2016 (which should be on GDC Vault soon)).
  
  Hope it helps :-).
  
  Reply
Carlos C says

August 25, 2016 at 10:06 am

Hello there Hare!

Thanks for making this book, I’m looking forward for it’s final version, it’ll be a great addition to my library. There’s tons of knowledge in here.

Could you give out a very basic example of DB FSM? I think I’m understanding it the wrong way.

From what I’ve understood, DB FSM(s) provide a finite number of states that DB API should build it’s logic upon. That is perfectly reasonable. But..

Wouldn’t that require a huge amount of states?
What about too many specific states? (worst case being one for every DB API function/method)

I’m worried about duplication but as I said I probably got something very wrong.

Thanks!

Reply
- "No Bugs" Hare says
  
  August 27, 2016 at 4:12 pm
  
  DB FSMs I’ve seen, were essentially stateless (except for app-level caching as their state – usually cache is read-only, but write caches are also possible).
  
  One simple example: game world sends a request to DB FSM asking to move artefact X from player Y to player Z (as artefact X was lost during fight, whatever-else). On DB FSM side, most of the checks (like “whether player Y has artefact X”, etc. etc.) can be done from the read-only app-level cache, but transaction itself can be committed to DB (or can be write-cached, if the artefact is not THAT important one, or transaction commit can be postponed for a few milliseconds to save on commits – and reply back to game world can be delayed until transaction is committed, to make sure that ACID properties stand despite postponed commit, or…).
  
  So, ‘DB FSM’ (at least as I’ve seen it) was pretty much a “thing which processes DB-related requests”, with its state usually being something along the lines above.
  
  Hope it helps a bit (if not – feel free to ask further questions :-)). Also some discussion on DBs and DB FSMs is planned for upcoming Chapter XVII.
  
  Reply