Ultimate DB Heresy: Single Writing DB Connection. Part II. Gradual Scalability. All the way from no-scale to perfect-scale.

 
Author:  Follow: TwitterFacebook
Job Title:Sarcastic Architect
Hobbies:Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek
 
 

Multiple Connections vs Single Connection

#DDMoG, Vol. VI
[[This is Chapter 20(e) from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

In the previous part of this two-part post, I’ve pushed the concept of single-write-DB-Connection architectures pretty hard; as I’ve argued, these are MUCH simpler (and MUCH less error-prone) than multi-connection ones, and one single-write-connection DB can provide performance of up to millions of write transactions per day. While this kind of performance doesn’t mean that we don’t need scalability, it can easily mean that at least we can start with a single-write-DB-Connection architecture, working on scalability when/if the need for scalability arises.

Real-World Story of Achieving Perfect Share-Nothing Scalability

Once upon a time, I’ve seen a game which went through the whole process of achieving scalability over single-write-connection DBs, and, well, I DID like both the process and the end-result. So, let’s take a look at how the story has unfolded.

Stage 1. Just a Single-Write-Connection DB

At first, everything was extremely simple (and because of this simplicity – extremely robust, too). There was a DB Server which accepted requests from Game World Servers (and other Game Servers such as Cashier Server), and processed them serially, one by one. These requests were implemented along the lines which I’ve insisted on in [[TODO]] section before:

  • Hare thumb up:DB Server requests were expressed in terms of Game World Servers (and without any SQL).DB Server requests were expressed in terms of Game World Servers (and without any SQL). All conversion from Game Logic terms into SQL was done within DB Server.
  • DB Server requests were atomic (i.e. each of the requests corresponded to exactly one DB transaction).

At this stage, several basic DB optimizations were used, including the following:

  • Prepared statements
    • Another alternative would be to use stored procedures, but with DB Server isolating DB from Game Logic anyway, they weren’t a strict necessity, and with a strong dislike of any vendor lock-ins (and going for stored procedures automatically means Absolute Vendor Lock-In) it was decided against stored procedures.
  • Indexes, more indexes, and even more indexes.
    • This included indexes over multiple fields, too. BTW, in the process of indexing, a myth that two single-field indexes are the same as one double-field index, was proven deadly incorrect experimentally: at least for one request, double-field index provided about 100x improvement over two single-field indexes (not real surprise here, but I’ve seen people arguing against multi-field indexes with very serious faces).
  • Optimized DB physical layout and underlying hardware, with the most important items in this regard being the following:
    • Separate RAID-1 disk for DB logs
    • BBWC RAID card on the physical server box
    • RAID-1/RAID-10 for data (not a 100% requirement, but tends to help too)

We’ll discuss DB optimizations in more detail in Vol. 3 (tentatively Chapter [[TODO]]).

CSR A customer service advisor, or CSA, (also customer service associate or customer service representative (CSR)) is a generic job title in the service industry.— Wikipedia —As for “real-time reporting” for CSR (support) purposes (see [[TODO]] section above for better definition), these reports were running from the same DB (via separate connections using “Read Uncommitted” transaction isolation).

Overall, DB Server was a lean and mean transaction-processing machine. And it worked pretty well – processing up to 5–10 million transactions per day. Moreover, given the simplicity, it was bulletproof-reliable and 100% consistent (such things as lost money or other assets, were unheard of).

By the end of this stage, the game was able to handle up to 50K simultaneous players.

Stage 2. App-Level Cache of USERS table

At some point, as the game gained popularity, and as a part of general effort to optimise DB and DB Server, app-level cache of USERS table (or more precisely – of the Users read from USERS table) was introduced. This one thing has been observed to have a profound effect on the DB Server performance; while some other tables were cached later too, it was USERS table which justified additional complexity necessary to implement caching.

Inquisitive hare:with our connection having a monopoly on modifying DB, we can ensure that app-level cache is coherent at all times, merely by updating it each time when we're updating our USERS tableOne Good Thing(tm) about single-write-DB-connection architectures is that with our connection having a monopoly on modifying DB, we can ensure that app-level cache is coherent at all times, merely by updating it each time when we’re updating our USERS table. And as there is usually only a few SQL statements updating USERS table – it is not a problem to update app-level cache when any of SQL-statements-updating-USERS-table is invoked.

With app-level caches added, DB worked even better, and was able to handle up to 30M-50M write transactions a day. Sure, getting there did require quite a bit of work – but on the other hand, any real-world DB handling this much requires quite a bit of work to get it running smoothly.

However, at some point it was observed, that as the CSR-oriented real-time reports became more and more complicated (and DB became larger and larger, in part, due to historical data being accumulated) some of those reports started to cause table scans. In turn, these table scans sometimes caused the DB to drop some other data from the DB caches;1 this, in turn, has caused observable slowdowns of the time-critical operational processing by DB Server.


1 Not app-level caches, of course

 

This has eventually lead to…

Stage 3. Reporting Replica

To prevent heavy CSR-oriented real-time reports from hitting DB, it was decided to separate reporting DB from the operational/OLTP DB completely. Such separation was intended to provide several all-important benefits (and turned out to provide these benefits too 🙂 ):

  • Once and for all resolve the issue of reports affecting operational performance
  • Allow for more heavy reports to be written (and to speed up report development too)
  • Ability to reduce the size of historical data in operational DB, by moving “old” historical data to the replica(s) (and removing it from the operational DB)
  • Ability to run “newer greater” version of RDBMS on replicas, to see how often they tend to fail (and to find out when they become good enough to move them into production) ;-(

Hare pointing out:To achieve these goals, replica should be an asynchronous replica, the one with eventual consistency.To achieve these goals, replica should be an asynchronous replica, the one with eventual consistency. It had been expected (and confirmed later in practice) that delays of up to 1–2 minutes are completely unnoticeable for CSR-oriented real-time reporting purposes; so, as long as the delays stayed within these bounds – they were not causing any problems.

Unfortunately, at that time, built-in DB replication of the RDBMS in use wasn’t able to cope with the required kind of DB load (failing after only a few days of operation with some obscure error; moreover, each such failure required a complete re-population of the replica 🙁 ). As a result, it was decided to implement DIY replication (I really hope that DIY replication is no longer needed these days, but at that time, it was best solution available; still, we will discuss DIY replication in Vol. 3, tentatively Chapter [[TODO]]). Fortunately, with single-write-DB-connection, DIY replication is perfectly viable :-).

In a sense, read-only replicas can be seen as incarnations of CQRS (=”Command and Query Responsibility Segregation”) pattern (see, for example, [MSDN.CQRS] and [Richardson.CQRS] for discussion of CQRS). This especially true for DIY replication we’re going to describe in Vol. 3 (which effectively follows an Event Sourcing pattern for the replica DB); on the other hand – viability of the replica doesn’t depend much on Event Sourcing being used to implement it, and any asynchronous replica will do.

Coming back to the real-world case in hand. After implementing read-only replica, all the goals stated above, were achieved. In practice, however, there were two instances of reporting replicas running in parallel most of the time; if one was severely delayed (usually by an ultra–heavy report – and such reports, of course, mushroomed as soon as replicas were allowed to be much more liberal with regards to DB load), it was possible to switch to the second one easily. Another reason for having two replicas, was that RDBMS was observed to crash MUCH more frequently on reporting DBs than on OLTP one; the reasons behind this disparity were two-fold: (a) inherently more complicated nature of RDBMS work on the reporting DB; (b) using the latest – greatest version of RDBMS on replicas (and the latest – greatest stuff tends to have MANY more bugs than the very same DBMS with SP1 or with FixPack 3).

With reporting replica, the system was able to handle about the same 30M-50M transactions per day – but without any risk of occasional slowdowns due to heavy reports. At this point, the game was handling hundreds of thousands of simultaneous players – and all the DB transactions were still handled by one single DB writing connection.

However, as the game has continued to grow, the system came to the point when capabilities of single writing connection were finally exhausted.

[[TODO: Stage 3a. Truncating OLTP history]]

Stage 4 (Kinda Failed). Trying Multi-Write-Connection

At this point, an attempt was made to make a (limited) switch to the multi-connection architecture. However, significant problems were observed on this way:

  • It was difficult to provide any guarantees for data consistency; as a result, achieving concurrency safety – while possible – took LOTS of effort.
    • Granted, some of these difficulties might be attributed to the team having a long-standing culture of inherently concurrency-safe single-write connections. Still, it is obvious that complexity of concurrent-safe development in presence of multiple modifying connections is MUCH higher than that of inherently concurrent-safe single-write-connections.
  • Hare with hopeless face:Much more importantly, observed results were VERY far from linear scalabilityMuch more importantly, observed results were VERY far from linear scalability 🙁 . In other words – when number of connections/cores was doubled, the improvement was anywhere between 1x (i.e. no improvement at all), and 1.7x (which, while being MUCH better, was still an indication of severe contention somewhere within the DB).
    • Moreover, while some efforts to speed multi-connection system up were successful, all things which helped to achieve improvements in this regard, were highly irregular; in other words – you never knew whether any specific attempt to improve performance in a multi-write-connection environment would do any better 🙁 . It was a striking contrast with single-writing-connection approach, where effects of any change were very-well predictable in advance.

I am not saying that it wasn’t possible to make a workable system out of multi-connection approach. However, symptoms of significantly-worse-than-linear behaviour were very annoying, and risks of working without any idea of what can be achieved by using multi-connection approach, were deemed unacceptable.

Instead, it was decided to take a completely different route – the one of Share-Nothing DBs. One of the big advantages of Share-Nothing architectures is that there are no hidden contentions within DB; therefore, behaviour of the system (both when load grows and in response to any changes) remains highly predictable. In particular, if we scale a Share-Nothing system (after we’ve already accounted for the costs of all necessary separations to make it Share-Nothing) – it will exhibit perfect linear scalability: doubling the number of servers exactly doubles the number of transactions we can process.

Stage 5. Separating Game Worlds and Tournaments

Pre-requisite: Inter-DB Transfers

As a prerequisite to separating DB into several Share-Nothing DBs, there was a need to develop a way to transfer arbitrary data between DBs – and with data consistency guarantees too.

One example when such transfers are necessary, arises whenever a player wants to start playing within a certain Game World (taking some of his inventory with him). In this case, the data about inventory taken needs to be moved to a Game World DB (and in a perfectly consistent manner too – it wasn’t acceptable to have any losses/gains on the way).

Asynchronous Inter-DB Transfer

Thinking hare:The protocol allows to take some asset from some source table in DB A, and to transfer this asset to some target table in DB B. It might seem trivial, but the problem here is how to provide guarantees against the disappearance of the asset or creation of its extra copies – and without using cross-DB transactionsThis was implemented via an Asynchronous Inter-DB-Transfer protocol described in Chapter III. The protocol allows to take some asset from some source table in DB A (manipulated by DB Server A), and to transfer this asset to some target table in DB B (manipulated by DB Server B). It might seem trivial, but the problem here is how to provide guarantees against the disappearance of the asset or creation of its extra copies – and without using cross-DB transactions (as cross-DB transactions would mean that the architecture is not Share-Nothing anymore). To reiterate the Asynchronous Inter-DB-Transfer protocol briefly:

  • DB Server A makes an ACID transaction over DB A, with transaction consisting of:
    • Taking out an asset from an appropriate table
    • Making a record in “outgoing messages” table, with the record specifying the nature of transfer and Outgoing-Message-ID
  • There is a communication protocol which guarantees that each of “outgoing messages” is delivered from DB Server A to DB Server B at least once.
  • DB Server B receives the “outgoing message”, and:
    • Checks in DB B that message with Outgoing-Message-ID was NOT processed yet; if it wasn’t, DB Server B makes an ACID transaction over DB B, with the transaction consisting of:
      • Writing that Outgoing-Message-ID has already been processed
      • Putting the asset into appropriate table

This protocol can be seen as conceptually similar to Event-Driven Architecture as described in [Fowler], [Richardson.DatabasePerService], and [Richardson.EventDrivenArchitecture] with [Richardson.ApplicationEvents]. More importantly, the protocol provides a strict guarantee that eventually – whatever happens (including crashes of DB Servers and DBs, but excluding DB corruption during crashes) – the assets will be delivered to DB B – and exactly once. For more detailed description of the protocol – please refer to Chapter III.

Alternative: Two-Phase Commit

In theory, it might have been possible to use two-phase commit instead of Async Inter-DB Transfers. Two-phase commit (for example, via standard XA protocol), would provide BETTER guarantees than the ones provided by Async Inter-DB Transfer protocol described above (two-phase commit guarantees ACID, and async exchanges can only guarantee BASE). On the other hand, these better guarantees don’t come for free:

  • With two-phase commits (and regardless of single- or multiple-DB connections), we will inevitably need to issue some kind of locks in one DB waiting for another one. As long as we have only a very few of these two phase commits/locks, the system will continue to operate just fine; however, as the number of distributed locks grows, we can easily run into “distributed bottlenecks” and/or other issues which can adversely affect scalability (and dreadfully so) 🙁 .
  • Implementing two-phase commits with single-connection DBs – and without blocking the whole thing until the two-phase transaction is committed on both sides, is tricky (though possible – feel free to ask me if you’re interested ;-)).

Overall – feel free to try two-phase commit instead of Async Inter-DB Transfer, but don’t blame me if it doesn’t really scale well ;-(. On the other hand, if you’re mostly working with async transfers, having only a few (and rarely occurring) two-phase commits – it MIGHT work for you.

Actual Separation

As soon as we have this Asynchronous Inter-DB Transfer protocol in place, further separation is simple. Just take all the Game Worlds, (which are almost-universally are very loose coupled to anything else), and move them into a separate DB; whenever a player needs to go into the Game World – use the Asynchronous Inter-DB Transfer protocol to transfer relevant player’s assets there. Whenever there is a need to go back – use Asynchronous Transfer to transfer player’s assets back to the original table within ‘main’ DB.

Hare thumb up:And as soon as the Game Worlds sit in a separate DB, it becomes quite easy to spread them over several separate DBsAnd as soon as the Game Worlds sit in a separate DB, it becomes quite easy to spread them over several separate DBs. Moreover, exactly the same process works for Tournaments, too. It means that at this point, the system becomes perfectly scalable with regards to Game World and Tournaments.

Let’s also note that at this stage, the system became very close to DB-per-Service pattern as described in [Fowler] and [Richardson.DatabasePerService] (with Event-Driven Architecture [Richardson.EventDrivenArchitecture] and Application Publishing Events [Richardson.ApplicationEvents] on the side): we have several DBs, each providing one specific service (serving Game World, Tournament, or Player-related requests) – and exchanging application-generated Events (represented by ”outgoing messages” of our Async Inter-DB Transfer Protocol) between them.

Stage 6. Scaling USERS – and achieving Perfect Scalability

By the time when all the Game Worlds and Tournaments were separated – the “main” DB only contained records related to players (in case of the real-world scenario under discussion, it was the USERS table). Moreover, with the separation implemented, all of the inter-player interactions had been taken out of the “main” DB.

And as soon as “main” DB no longer processed any inter-player interactions, it became easily shardable on per-player basis. In other words, it became possible to split “main DB” into several DBs, each having its own USERS table, and hosting different bunches of USERS (plus rows from tables associated with USERS).

In fact, this represents well-known “horizontal partitioning” technique which is widely used among DB developers across many industries. In the real-world case we’re discussing now, horizontal partitioning was implemented on top of single-write-connection DBs. USERS were split (as it is customary for “horizontal partitioning”) using hashes of userIDs.

Assertive hare:And after this split of USERS, the system has obtained perfectly linear scalability.And after this split of USERS into several partitions, the system has obtained perfectly linear scalability. If USERS become a bottleneck, another server can be added relatively easily (although rehashing the whole DB can be tricky and time-consuming, it is perfectly doable).

And, if the Game World and/or Tournaments become a bottleneck – introducing a new server is even simpler: usually there is no need to rehash, and all the necessary changes can be made by adjusting matchmaking process fairly easily.

Observed Caveats

Of course, each solution aiming for real-world scalability has its own set of caveats. DB-per-Service approach (and single-write-connection DBs) is not an exception.

[[TODO: sword transfer within Game World without Game World DB: including specific target slot (with inter-message DB having info about the target slot); most importantly – there MUST be only one SINGLE request to DB for transfer (NOT two separate requests, as it can violate BASE!)]]

Requests Spanning Multiple DBs

The first caveat with the architecture discussed above concerns requests which span multiple DBs. However, as soon as we realize that in most (if not all) of the cases, we’ll need ONLY read-only requests to span multiple DBs, we realize that there are at least two solutions for these:

  • Traditional multi-DB read-only requests considering our separate DBs as one federated DB (though ONLY for the purposes of reading, not writing(!)). While federated DBs are usually using two-phase commits (and as noted above, I don’t feel that two-phase commit is a good thing for scalability), in case of read-only requests, it does not really apply; if we’re issuing read-only requests only over Read Uncommitted connections, I expect it to be ok. NB: I didn’t try this  on a heavily loaded DB myself, so take it with a grain of salt.
  • Assertive hare:This approach does work for sure (and as a charm too), at least with the DIY replication I’ve mentioned above.Replication from multiple operational DBs into one single reporting one. Note that in this case, there can be no collisions of the data coming to replica from different DBs (by design), so it is not really a master-master replica, but rather a special case of “merge” replication without any conflicts. This approach does work for sure (and as a charm too), at least with DIY replication I’ve mentioned above. If you don’t want to go DIY for replicas, then whether your RDBMS supports this kind of “merge” replica is up to you to figure out ;-).

Consistent Distributed Backups

In any Shared-Nothing DB system with DBs being completely independent, achieving backup which is consistent across all the DBs involved, becomes a headache 🙁 . Still, at least with single-write-DB-connections, they’re perfectly doable, and we’ll discuss such backups in Vol. 3 (tentatively Chapter [[TODO]]). For now, let’s just note that in most general case, such “distributed backups” require “online backup” functionality from the underlying DB – and with “roll-forward-to-point-in-time” functionality too; fortunately, most of serious DBs out there support these commonly expected features since time immemorial :-). One caveat of such distributed backups (at least the way we’ll discuss them in Chapter [[TODO]]) is that at some point, they require to stop the processing in all of the DBs for substantial time (normally – dozens of milliseconds); fortunately, for DB Servers, this kind of delays is rarely a problem.

Generalization to Other Games

By now, we’ve discussed one successful example of single-DB-connection architectures, scaled to a hundred of millions of transactions per day (and probably beyond too; however, you can never be sure about “beyond” until you try).

The next question we have is the following: am I sure that it is not a one-off case which is especially suitable for this kind of processing? IMO (and this opinion is based on some experience too), it is not. In other words:

My rather-educated guess is that this approach will work for most of the games out there.

I’ve honestly thought about all the systems and business/game logics I know about – and asked fellow game developers too, and didn’t find any game where the model above wouldn’t work. Of course, this doesn’t mean too much (and of course, your own case can be very different); but well, this is the best possible answer I can provide without taking a look at your specific game :-) .

Arguing hare:The stock exchange which I was working with, was perfectly suitable for this modelOne potential concern MAY apply to stock exchanges (those guys may have extremely convoluted rules); still, the stock exchange which I was working with, was perfectly suitable for this model (with each of the securities traded considered a separate Game World; it will work as long as there are no synchronous interactions between different securities traded).2


2 while the stock exchange that I’ve co-architected didn’t really need partitioning at that point (working perfectly fine over a single DB connection), still after thinking about its logic, I’m quite sure that it could be scaled along the lines above

 

Batch Processing

One special case arises when your game (usually bank or lottery) needs some kind of batch processing. Most of the time, batch processing has the following properties:

  • It makes LOTS of modifications.
  • It should be done “as if” it is done over a snapshot in time.
  • Also, as a rule of thumb, it is easy to separate batch processing into bunches of statements, which statements cannot interact by design. When we need to calculate something over a million of rows – usually the calculations are rather similar for all the rows involved – and usually are more or less on per-row-basis too.

With this in mind, it is often possible to play the following game (pun intended):

  • Stop all the processing.
    • if there are any new incoming requests, they can be delayed (for example, recorded in memory or in a separate DB), to be applied after the batch is processed.
  • Run batch processing.
    • It can be run in parallel – simply separating statements into non-interacting groups, and running these non-interacting groups over different DB connections. While it still requires dealing with concurrency, this is MUCH easier (IMO, orders of magnitude easier) than ensuring that any of the statements cannot interact with any other statement.
  • Process all the delayed requests.
  • Bingo! We’ve processed the whole batch – in minimal time, and with very limited efforts to ensure concurrency safety.

Summary/My Personal Advice

When developing a DB for your game, I usually suggest the following:

  • As discussed in several places above, DO separate your DB Server from the rest of your game (in particular, from Game Servers) via a very-well defined DB Server API
    • This API MUST NOT contain any SQL; instead, it MUST be expressed in terms of your Game Logic (such as “transfer artifact A from player P1 to player P2”).
    • EACH request within the DB Server API MUST correspond to exactly one ACID transaction. In other words, there MUST NOT be any situations when a transaction is left incomplete after DB Server is done with the request.
  • After doing so, DB Server becomes an implementation detail (!). It means that it can be implemented in whatever-way-you want. On the other hand, as one (and my own preferred) way to skin this cat, I suggest to consider the following development path (with each additional step implemented only when you start feeling that you might have problems with performance without it):
    • Assertive hare:Start with a simplistic single-write-connection DB, with reporting running off the same DBStart with a simplistic single-write-connection DB, with reporting running off the same DB (reporting using Read-Uncommitted isolation level)
    • Optimize indexes and physics
    • Add app-level caching (most importantly, of USERS/PLAYERS table)
      • NB: at this point, our system is high-performing but doesn’t scale (yet)
    • Add replica DB(s) for reporting
      • NB: at this point, we’ve got perfect read scalability (at least for reporting purposes)
    • Implement inter-DB asynchronous transfers
    • Separate Game Worlds (and if applicable, Tournaments) based on Inter-DB Async Transfers protocol
    • Optionally, separate Payments DB (again, using Async Inter-DB Transfers)
    • Horizontally partition USERS/PLAYERS table
      • Enjoy your perfectly linear scalability! :-) (both read and write)
  • That being said, there are other ways to implement scalable systems, BUT they tend to require DB expertise with real-world highly-loaded systems, and you’re not too likely to find people with such expertise :-( .

How exactly these steps should be implemented is a subject for several separate (and rather lengthy) discussions, and we’ll describe some ways of doing it in Vol.3 (tentatively Chapter [[TODO]]). What’s important for the time being – is that you can start with single-write-DB-connection – adding scalability later as the need arises.

[[To Be Continued…

Tired hare:This concludes beta Chapter 20(e) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 20(f), where we’ll discuss the choice of the RDBMS for your transactional/operational DB.]]

Don't like this post? Comment↯ below. You do?! Please share: ...on LinkedIn...on Reddit...on Twitter...on Facebook

[]References

[Fowler] Martin Fowler, “Microservices”

[Richardson.DatabasePerService] Chris Richardson, “Pattern: Database per service”

[Richardson.EventDrivenArchitecture] Chris Richardson, “Pattern: Event-driven architecture”

[Richardson.ApplicationEvents] Chris Richardson, “Pattern: Application publishes events”

[MSDN.CQRS] “Command and Query Responsibility Segregation (CQRS) Pattern”, MSDN

[Richardson.CQRS] Chris Richardson, “Pattern: Command Query Responsibility Segregation (CQRS)”

Acknowledgement

Cartoons by Sergey GordeevIRL from Gordeev Animation Graphics, Prague.

Join our mailing list:

Comments

  1. Jesper Nielsen says

    Imagine this: You have player A stored in one database, and player B stored in another.
    Player A sells a sword to Player B.
    The sensible thing now is to save both players to avoid creating a duping loophole.
    This means we now have a distributed transaction.

    • "No Bugs" Hare says

      > This means we now have a distributed transaction.

      Well, yes – but this distributed transaction can be implemented either via traditional XA/two-phase-commit, or via Async Inter-DB Transfer protocol (which guarantees to deliver the sword from one DB to another one – and without blocking any of DBs too :-)). For details of Async Inter-DB Transfer – see Stage 5 in the OP.

      As for exact way of avoiding loopholes in “selling” scenarios – it can be implemented as (a) async transfer of both items to be traded, from respective “user DBs” into a “trading” DB, (b) making an ACID transaction of trading within the “trading” DB, and (c) transferring the results of the trade back to “user DBs”. Everything is bulletproof and without contention/blocks too (at the cost of the distributed transaction as a whole being BASE rather than ACID – but I don’t see how it can become an issue in this scenario).

      • Jesper Nielsen says

        I will argue that just transfering just the sword won’t cut it, because the sword typically has to go somewhere.
        Take WoW as an example – the sword goes to a location in the player’s inventory. This creates a need to save the entire player inventory because the version known by the database might have other items blocking that location.

        • "No Bugs" Hare says

          THANKS for the question! It highlights an important point – and I will need to discuss it in the book.

          The answer goes along the following lines. There are basically two options.

          Option 1 (preferred). To (a) transfer the players with their whole inventories into Game World DB when they enter Game World, then (b) transfer the artefact within Game World DB (as an ACID intra-DB transaction), and (c) transfer players back to their respective USER DBs with their inventories when they’re leaving the Game World. All transfers can easily be Async Inter-DB Transfers.

          Option 2 (I don’t like it, but it will work too). All the Game Logic (including “the slot where the sword will go”) will be validated by the Game World; most of the time Game Worlds are living their own lives and are just writing to DB the most important points (such as a transfer of sword-costing-$20k-on-eBay).

          When such a transfer happens, Game World Server has already validated the movement – and the only thing it needs to do to save it to DB – is to tell DB-which-hosts-player-A to “send sword to player B with a note to place it into slot X”. Then – the “outgoing message” will contain not only sword, but also target slot (and on receipt this slot is guaranteed to be free as the Game Logic has already checked it on app level). While in theory there can be temporary collisions within the slots due to inter-DB delays – they’re temporary by definition and are solvable via creating time-based “as of” marks for the artifacts (with timestamps set by the Game World).

          • Jesper Nielsen says

            I also like option 1 better – especially since it can be implemented at a later time as an extension of the simple case with a single Game World DB that is also the USER DB.
            What I like best about it though is that it doesn’t matter here how the player inventory is stored – serialized in a single varbinary/varchar/blob or normalized into hundreds of rows per player in Item, ItemModifier tables etc.

          • "No Bugs" Hare says

            > it can be implemented at a later time as an extension of the simple case with a single Game World DB that is also the USER DB.

            Yep :-).

            > What I like best about it though is that it doesn’t matter here how the player inventory is stored – serialized in a single varbinary/varchar/blob or normalized into hundreds of rows per player in Item, ItemModifier tables etc.

            Yes, decoupling here is great (and decoupling helps a LOT for projects which have more than 2 developers). Overall, decoupling is one big reason why I LOVE Share-Nothing reactors and async exchanges (whether in-memory or DB); performance and scalability of these are great too – but personally my favourite feature of Reactor-style Share-Nothing systems is that they tend to provide extremely good decoupling.

  2. Marcos says

    Hi, I have very little experience with BD, and some grey points at ‘Requests Spanning Multiple DBs’.

    ‘Replication from multiple operational DBs into one single reporting one’

    It seems that this may present aditional issues, since both replications may be out of sync.

    When sending itemX from dbA to dbB, if replication of dbA gets behind replication of dbB, the single reporting db will see it arriving at dbB before leaving dbA, so it will show duplicate.
    While this seems a solvable issue, I can’t see if it is a hard one or an easy one.

    Also, will a single database be able to handle replication from two heavely loaded DBs? The reason we split the DB in the first place was because a single DB won’t handle the load.

    • "No Bugs" Hare says

      > When sending itemX from dbA to dbB, if replication of dbA gets behind replication of dbB, the single reporting db will see it arriving at dbB before leaving dbA, so it will show duplicate.

      It may indeed happen; however: (a) it won’t cause any formal conflicts at table level (the data will be either in different tables, or – if in the same table – separated by DBID). (b) as reporting/analytics is 99% about historical data (more on it in Chapter XVII(g)) – it doesn’t really matter much in practice. Mechanisms for this “doesn’t matter much” differ, but all are leading to the same end result :-). Example 1: very popular family of CSR/support reporting requests goes along the lines of “show me all the artifact movements on USERID account in last 2 months” – and as (as described below) each specific USER goes from the same DB, it will be pretty much consistent by itself (it may lag a bit, but consistency is not a problem; and as for the lag – as long as it is comparable to human perception, it is not a problem either – as the human reading it doesn’t really know when exactly she pressed the button compared to the events within the DB). Example 2: very typical analytical request: “give me per-country stats on player spending after we started this promotion” (ok, it will be much more complicated than that – but the idea will be still pretty much the same); these requests tend to be very resilient to any minor inconsistencies because they’re all about statistics (and because law of large numbers will level any point-in-time inconsistencies); however – if formal correctness is required, in 99% of the cases such analytics can simply ignore all the data after (now – max_replica_lag_time), making the history perfectly consistent. As a result, while there is a small percentage of reports which do need “current snapshots” – they’re really few and far between (IIRC, for a rather large system with 500+ reports, only 2 or 3 needed consistency of current snapshots).

      So – yes, pretty much any async replica (merged or not) is not a good place to get consistent snapshots “as of now” (formally – because it has only BASE guarantees rather than ACID guarantees); on the other hand – BASE (eventual consistency) guarantees in this “merged replica” still stand (and without additional efforts too), so historical reporting and analytics can run there without problems.

      Hope it helps 🙂

  3. Sebastiano Pilla says

    What about the availability of your single DB Server app instance?

    If it goes down, or if it’s unreachable from the other instances, or if you simply have to perform an upgrade, then you cannot write anymore to the database.

    How do you address this point?

    • "No Bugs" Hare says

      Good question – but I have a good answer to it too :-). Overall, this discussion belongs to Vol. 3, but I will give a few very brief hints here.

      First of all, when speaking of failures, we don’t really care about SPOFs a.k.a. single points of failure – but rather the ONLY thing we EVER care about with regards to availability etc., is MTBFs. As a result (and accompanied by quite a bit of both theoretical and real-world stats, such as MTBFs of good 4S/4U server being of the order of 5-10 years) – we will find that for quite a few games out there (that is, if number of OLTP DB servers is not too high – and to run a 1M-player game one such server is usually more than enough), it will be perfectly possible to run the game without any additional availability features – and just on one single DB server box.

      Moreover, adding fault-tolerant mechanisms can easily REDUCE MTBF (if we take into account that all such mechanisms are inevitably SPOFs themselves – and way too often poorly designed at that, their own MTBFs tend to be significantly lower than those 5-10 years, which means that supposedly fault-tolerant system will fail more frequently than non-fault-tolerant one 🙁 ). I have quite a bit of stats, analysis, and real-world stories to support this claim ;-( . As a result, I contend that for most of the games out there – no fault tolerance is likely to be better than having faulty fault tolerance.

      However, IF such “natural” MTBF (resulting from having no fault tolerance at all, and relying on MTBFs of higher-quality commodity servers instead) is not good enough – it is perfectly possible to build a fault-tolerant instance of single-writing-connection DB Server. In particular, it can be done by relying on its determinism. As soon as our DB Server is deterministic (and with single-writing-connection this is easily achievable) – we can, for example, have a proxy (which fault-tolerance can be achieved using other means such as VM fault tolerance or determinism-based fault tolerance discussed in 2nd beta/Chapter V), sending requests to two of single-writing-DB-connection DB Servers sitting behind the proxy – and returning the value only after BOTH of them reply. Then – all the usual stuff (failure detection, switching to non-redundant mode etc. etc.) can be done within the proxy. In extreme case, such a proxy can even do majority voting among 3+ DB Servers behind it – though I didn’t see a game when such voting is necessary (IMO – not even a bank or stock exchange, but as always, YMMV). IIRC further improvements (by combining fault tolerance of proxy and DB Servers together) are possible – but the thing above is a perfect “proof of concept” (and is relatively difficult to misimplement too :-)).

Leave a Reply to Sebastiano Pilla Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.