NoSQL vs SQL for MOGs - IT Hare on Soft.ware

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[[This is Chapter 20(c) from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

In recent years, as an alternative to traditional SQL-based databases, so-called NoSQL has gained a lot of popularity. In particular, in certain circles it is touted as The Solution for Everything Out There, including games. However, before starting to analyze whether it is the case, we need to understand what NoSQL is all about. As usual, we won’t go into too much details here – our job is to compare things and allow to select approaches, leaving further details to the other sources.

NoSQL vs SQL

There are lots of NoSQL databases out there – and unlike SQL DBs (which tend to provide functionality which is relatively similar for all the SQL DBs), NoSQL ones tend to be very different. Still, I will try to outline two properties which I think differentiate NoSQL from SQL the most.

Rigidly Structured vs Semi-Structured vs Unstructured

The first difference between SQL and NoSQL DBs is about how the data in the databases is structured. This is actually the difference which has made it to the name of ‘NoSQL’ 🙂 . [[TODO: rename Semi-Structured NoSQL into Structured NoSQL – though it is still “optionally structured” compared to SQL]].

SQL – Rigidly Structured Data

Traditional SQL databases have a very rigid structure – everything needs to be fit into a table, and the table is always a 2D structure with rows and pre-defined columns.

SQL – Cost of Changing the Structure

Changing this SQL-enforced structure, while possible, is rather a headache.

“in real world, after deployment, most of the changes in DB structure are about widening columns and adding the new onesOn the other hand, in real world, after deployment, most of the changes in DB structure are about widening columns and adding the new ones. And, contrary to popular beliefs (coming from certain popular SQL DB which I don’t want to name here – and from marketing materials of certain not-so-scrupulous commercial NoSQL vendors), there are SQL DBs out there which allow to add and/or widen the column instantly (i.e. without rewriting the whole table); this is done just by changing a table description, with DBMS adding default field values as it reads the “old” rows, automagically.

This feature (which is a de-facto standard for enterprise-level DBs) makes database administration and maintenance MUCH easier. So, if you’re tired of stopping your whole game site for half a day to run your ALTER TABLE ADD COLUMN statement – keep in mind that this atrociously long wait is not a necessary property of SQL DB, but merely a problem of one sub-optimal implementation (which is apparently poorly suited for serious production environments, in spite of everybody and their dog using it); in production-oriented RDBMS systems, ADD COLUMN can be usually done even under rather heavy load (thousands writing transactions per second) without affecting performance in an any visible manner (we won’t count a 10-ms lock on DB catalog as a significant delay for DB).

On the other hand, I will not say that SQL’s rigid structure is not without its maintenance costs – it is just that these costs can be made significantly lower by choosing production-friendly RDBMS.

SQL – All Kinds of Queries, but Beware Performance Caveats

One strong point of SQL is ability to search for “whatever you want” in the table; on the other hand, one needs to keep in mind that while this stands – searching for “whatever you want” on a table with a large number of rows (we’ll discuss exact numbers just a bit later) will be veeeeerrrryyyy ssssllllooooowww unless you have indexes.

SQL – Indexes

“Without indexes – your database will become slow as soon as it goes above, say, 100'000 to 1'000'000 rows.One advantage of having some level of structure, is ability to have indexes; while indexes are often ignored during SQL development (usually – under a false premise of “indexes is a job for DBA, not DB developer”), they play an enormous role in DB performance, so at the very least DB developer (the one writing SQL statements) should think of “which indexes I will need to create for this thing to work fast when there is a billion rows in this table” (otherwise he risks ending up in a situation when such indexes don’t exist at all). Without indexes – your queries will become unacceptable for operational/transactional DBs (where usually you need to complete your query within 1-10ms) as soon as your tables reach ~100K rows (give or take an order of magnitude). For “reporting” DBs, requirements are usually more relaxed (acceptable times are generally within single-digit seconds), so the threshold of “how large the table can be before it requires indexes” is higher – more like ~10M rows (again, give or take an order of magnitude).

For modern SQL DBs, you can have pretty much any index you want (in recent years, RDBMS added even long-overdue function indexes, though I didn’t have a chance to try them myself in a serious production yet) – and they help A LOT to optimize performance of your DB (that’s without changing app-level code or structure(!)). The difference between having indexes and not having them, is the difference between having your DB realistically working or not; we’ll discuss more on indexes (and execution plans in general) in Vol. 3, Chapter [[TODO]].

NB: SQL does NOT have monopoly on indexes – read ahead for the discussion about indexes in NoSQL.

SQL – Document Storage and BLOBs

“BLOBs tend to be second-class citizens in SQL RDBMSIf we need to store unstructured data (such as documents or “current state of our Game World” for asynchronous games where storing all Game World States in-memory can too expensive) – it can be done over SQL DB too. Traditionally, it is done via using so-called Binary Large Objects (BLOBs). On the other hand, also traditionally, BLOBs tend to be second-class citizens in SQL RDBMS; they work, but in some cases they can be Really Slow 🙁 . In one case, I’ve even seen a program which simulated BLOBs (as a table with multiple VARCHAR rows per stored simulated BLOB) – and such a simulated BLOB was MUCH faster than the RDBMS-provided one.

On the other hand, unless BLOBs is most you’re going to store – it is not going to be a big problem for SQL DB; however, if most of your DB is about BLOBs – you need to think twice whether you need SQL. Also if you have a big chunk of non-BLOBs and a big chunk of BLOBs – splitting the DB in two along these lines (with BLOB DB being a NoSQL DB, or even just a plain-file-based storage) MIGHT become reasonable (though you’ll need to keep in mind consistency issues on inter-DB boundaries, ouch – we’ll discuss these in a [[TODO]] section a bit later).

[[TODO: rename SQL into “Codd-style SQL”, and introduce “SQL-with-XML”; this separation is used in Ch. XVII(i)]]

Semi-Structured NoSQL

With regards to schemas and structured/unstructured data, I tend to separate NoSQL DBs into two groups – semi-structured and unstructured.

One example of NoSQL with Semi-Structured data, is MongoDB. MongoDB stores data as BSON serialized objects (with BSON essentially being an extended JSON). This allows to store documents which (optionally) have certain fields you’re interested in – allowing (though not necessarily enforcing) data structure. Another family of Semi-Structured DBs is so-called “NoSQL column stores” such as Apache Cassandra; from our 50’000-feet view there is not that much difference between MongoDB and Cassandra in a sense that both allow semi-structured data in roughly the same sense (though if going lower, the differences will start to pop up, see, for example, discussion in [StackOverflow.MongoDBvsCassandra]).

BTW, Semi-Structured data can be stored in SQL DBs too – using so-called [SQL/XML].

Semi-structured NoSQL Queries and Indexes – Comparable to SQL

“As soon as there is some kind of “fields” in at least some of the objects – well, these fields can be queried and indexed.Semi-structured NoSQL DBs such as MongoDB or Cassandra exhibit query and indexing functionality which is very roughly comparable to that of SQL. As soon as there is some kind of “fields” in at least some of the objects – well, these fields can be queried and indexed.

Semi-Structured NoSQL – Changing Structure

Changing structure in NoSQL is trivial – actually, you can store whatever-you-want in a NoSQL table/collection/row (if you’re storing something weird, you may have difficulties finding it later – but this is a different story).

Overall, I think there is a good analogy to SQL-vs-Semi-Structured-NoSQL programming; SQL is more like statically strongly typed programming language – it is unforgiving and has significant development overheads at first, but in exchange it provides certain built-in guarantees (like “this function is always called with correct parameter types” for programming language, and “this column always contains correct type of data” for DB). Semi-Structured NoSQL is more like dynamically typed programming language – speeding up development at prototype/early stages, but not providing any built-in guarantees about the program/data.

Each of the approaches has its own merits. However, I need to say that from my experience, for larger-scale longer-term projects I tend to find that guarantees provided statically strongly typed languages – as well as guarantees provided by SQL schemas – start to outweigh speed-up benefits from the early development stages. Whether your project is large enough or will be long enough to warrant it – is difficult to say, but being an optimist, I am usually trying to bet on it to happen 😉 (it doesn’t mean I never make mistakes in this regard – time to market is extremely important these days, especially with games). Still, this observation of mine is not too firm, and costs/benefits difference are not that drastically different, so basically YMMV, and also if you choose between SQL and NoSQL wrong based on this structured/unstructured criteria – it won’t be the end of the world (however, other differences, discussed below, can be Really Fatal for specific uses of DBs).

Semi-Structured NoSQL – Documents and BLOBs

“For documents and BLOBs, NoSQL is a natural habitatFor documents and BLOBs, NoSQL is a natural habitat; if you need just a document storage (or a storage for serialized Game World States) – some kind of NoSQL (semi-structured or unstructured) is your natural choice. However, if you need to integrate your document storage with some structured data and some kind of consistency guarantees between the documents – then the situation becomes much less obvious (mostly due to the lack of multi-document transactions in quite a few NoSQL DBs out there); we’ll discuss ways of such integration (and of inter-DB interaction in general) in [[TODO]] section below.

Unstructured NoSQL – Key-Value Storages

Besides semi-structured NoSQL storages, there are also what I name unstructured NoSQL DBs; these DBs merely store key-value (name-value) pairs. One very popular example of such key-value storage is Redis.¹

While such unstructured DBs have their own uses (in particular, Redis is especially good for caching) – they also have significant drawbacks. In particular:

Pure key-value DBs can’t allow any queries besides very simple “get me the object given the key X”. In theory, they can also support “range” queries (such as “get me all the objects with the key between Y and Z”) – but this is very uncommon for key-value storages.
Pure key-value DBs cannot enable any indexes besides the key. Unstructured DB just doesn’t know where to look for the information to index it. While “point” index queries can be simulated for key-value DBs at app level, this tends to be cumbersome; moreover, “range” index queries cannot be supported² unless original key-value DB supports range indexes for its key (and usually it doesn’t).
“Still, documents, BLOBs, and serialized game state are a natural fit for unstructured key-value NoSQL DBs.Still, documents, BLOBs, and serialized game state are a natural fit for unstructured key-value NoSQL DBs.

Of course, as usual, all these drawbacks come into play if and only if you need queries and indexes (and as we’ll see below, quite often you don’t).

¹ NB: Redis values can be different from simple strings, so you might use Redis as a sorta-Semi-Structured storage; however, keep in mind that indexing capabilities are still very much lacking compared to SQL and Semi-Structured NoSQL DBs listed above

² Well, beyond storing all the data to be indexed within one very large BLOB value

BASE vs ACID

The second fundamental distinction between NoSQL and SQL databases, is that most of NoSQL DBs out there provide the only BASE set of guarantees, and most of SQL DBs out there provide much stronger ACID guarantees. In other words, when speaking about NoSQL, we usually have only eventual consistency. With SQL/ACID DBs, we tend to have much stronger ACID guarantees (though Isolation is a strange beast, and all kinds of stuff can happen with transaction isolation levels, as was described in [[TODO]] section above). Note, however, that this BASE/ACID dichotomy doesn’t really need to follow SQL/NoSQL boundary (neither it is a real dichotomy, as there is actually a whole spectrum of guarantees between BASE and ACID).

“it is perfectly possible to have ACID guarantees at least for some of the transactions in NoSQL. Note, however, that in practice most of existing NoSQL DBs out there have some limitations on ACIDIn particular, it is perfectly possible to have ACID guarantees at least for some of the transactions in NoSQL. Note, however, that in practice most of existing NoSQL DBs out there have some limitations on ACID; one popular limitation (even if NoSQL DB is marketed as ACID-compliant) – is to have ACID on a single object/row/document only (and not allowing to have a transaction across two objects/rows/documents). While not necessarily completely fatal (we’ll discuss handling this kind of stuff below in [[TODO]] section), this represents a very severe limitation compared to SQL-style guarantees.

SQL DBs without at least some ACID compliance are uncommon (and there is an argument that at the moment when you gave up ACID compliance, they’re no longer SQL-compliant, so that they don’t deserve name “SQL” anymore – I don’t want to argue about it here), but – if ignoring terminology issues – are perfectly possible.

With regards to guarantees provided, existing NoSQL DBs can exhibit one or more of the following properties (you need to figure out which ones your specific NoSQL DB has):

Atomicity
- Atomicity on single document/row/… is quite common. Example DBs providing it – MongoDB, Redis, and Cassandra.
  - There can be some strings attached, like those in Cassandra [Cassandra].
- Atomicity on transactions involving several documents/rows/objects… tends to be MUCH more complicated with (and MUCH more rarely found in) NoSQL DBs than with SQL ones. While there are a few NoSQL DBs which claim to do it – they’re far from being NoSQL mainstream, so I cannot really tell what kind of limitations apply in reality to these multi-row ACID transactions.
- One interesting example of atomicity, including atomic modifications of several different objects, is provided by Redis. Redis idea of atomicity, however, is very different from implementing atomicity in SQL DBs (which relies on rollbacks heavily). Instead of providing concurrent transactions (and atomic concurrent transactions in turn imply rollbacks), Redis just receives all the sequence of operators forming the whole transaction, and executes it in one atomic go, that’s it (and no rollback is possible – neither it is necessary). In a sense – we can think of it as of a rough analogue of SQL stored procedures (provided that each of these SQL stored procedures starts with BEGIN TRAN and ends with COMMIT).
- BEWARE: as [Allen] states (and I’ve heard about it from rather credible sources too), some of NoSQL DBs (even some of those which claim atomicity) allow you to read partially updated rows/objects (from a different connection than the one modifying DB). In my books (pun intended), I consider it as a blatant violation of atomicity, but well – who am I to argue with the DB authors who claim that their transactions are atomic while allowing this kind of stuff?
  - Much more importantly, however, is that it is EXTREMELY difficult to write something usable without such basic guarantees, so make 100% sure that your NoSQL DB DOES provide guarantees against partial reads no-matter-what-happens, before deciding on use of any specific NoSQL DB.
  - Just one example what can happen if partial reads are allowed. You have a BLOB/document which contains two fields (X=5 and Y=6) and you’re moving something from field X to field Y (that’s within the same row/object(!)). Your connection A needs to calculate X+Y for this BLOB; it starts reading this BLOB, and reads Y=6 from it; now another connection B starts to move 3 items from field X to field Y, starting with X-=3; now your connection A continues reading the BLOB – and reads X=2, so X+Y is calculated as 8, and this is only because our connection A came in at a unlucky time (while if we come at any other moment, or don’t allow partial row reads – X+Y will be always equal to 11). Believe me, we Really Really don’t want to deal with this kind of stuff, so guarantees against it are Damn Important.
- Durability
  - “In quite a few NoSQL DBs (such as in Redis), durability is optional.In quite a few NoSQL DBs (such as in Redis), durability is optional. If there is no durability – then even committed transactions may be rolled back later in case if NoSQL process crashes for whatever reason. Which is not necessarily a bad thing BTW – in particular, when we consider that for MOGs in case of crash it is often better to rollback the whole state of the “game event” (because you won’t be able to gather all the players back, see discussion on it in Chapter VII).
- Built-in Replication
  - Most of popular NoSQL DBs support replication out of the box.³ Examples include MongoDB, Redis, and Cassandra. And replication does come handy in quite a few usage scenarios.
    - Most of the time, NoSQL DBs are supporting only simple single-master-multiple-slaves replication (i.e. no multi-master stuff is supported). On the other hand, this is exactly what we need in practice.
      - For master-slave replication, usually slave replica is “asynchronous”; it means that slave is “behind” the master for some seconds or a few minutes. This goes against ACID guarantees for slave DB (but is fine under BASE) – which needs to be taken into account; on the other hand, in many use cases (in particular, for “reporting” DBs which we mentioned in [[TODO]] section above) – it is usually acceptable.
      - It IS possible to have slave exactly synchronous with master – however, it goes at the cost of slave performance affecting master – and this is rarely what we want (except for fault tolerance purposes, see below).
      - “you need to keep in mind that if the slave replica is asynchronous – then failover to it is likely to cause violation of Durability propertySome NoSQL DBs (such as MongoDB) allow to use replicas for fault tolerance; however, you need to keep in mind that if the slave replica is asynchronous – then failover to it is likely to cause violation of Durability property (i.e. some of the transactions which were already committed – may be rolled back on failover to the replica). This problem does NOT apply to synchronous replicas, though even in this case there can be significant caveats and bugs (see, for example, [Kingsbury] on durability issues in MongoDB).
    - Built-in Horizontal Partitioning/Sharding
      - Most of popular NoSQL DBs out there have a built-in horizontal partitioning a.k.a. sharding. Examples of NoSQL DBs supporting horizontal partitioning, include Redis, MongoDB, and Cassandra. The idea behind horizontal partitioning in NoSQL DBs is (give or take) about the same as the idea of horizontal partitioning in SQL DBs (supported at least for 20 years or so). Mostly it is about storing different key values (for SQL DBs – different values of PRIMARY KEY) on different servers. How exactly to distribute different keys to different servers – it depends, but usually it is either some kind of hash over the key, or user-configurable (NB: not always, but quite often having user-configurable algorithm of sharding works more optimally than its alternatives).

³ SQL DBs tend to support replication too, but keep in mind that for commercial SQL DBs, most of the time you’ll need to buy something like “Enterprise Edition” to get replication included, and “Enterprise Editions” are usually Damn Expensive 🙁

NoSQL vs SQL for MOGs. Box!

Now we know about NoSQL and SQL to start comparing them. On the other hand, comparing them “in general” is hopeless (and pointless too), so we’ll need to compare in the context of specific use cases. And as it was mentioned above, for games (this includes stock exchanges and banks ;-)) there are often five different types of databases: (a) operational/transactional one (OLTP) – that’s where all the decision making happens; (b) DBs to store Game World States; (c) “reporting” one for online reporting by CSRs etc, (d) archive DBs, and (e) analytical DB (OLAP).

Operational/Transactional DB: ACID guarantees and Multi-Document Transactions

“For the operational / transactional / OLTP DBs (more generally - for any kind of data used for automated up-to-the-second decision-making), data integrity and consistency is usually extremely importantFor the operational/transactional/OLTP DBs (more generally – for any kind of data used for automated up-to-the-second decision-making), data integrity and consistency is usually extremely important. In particular, if you store anything of value (including those in-game artifacts which are sold on eBay for $20K of real cash) – you certainly don’t want it to disappear or to be duplicated.

Moreover, for OLTP DBs it is a Really Good Thing™ to have multi-row ACID transactions. Let’s once again consider the example with transferring an artifact from one player to another one. Let’s say that we need to transfer artifact X from player A to player B. As it was described in [[TODO]] section above, to do the transfer in a manner which cannot leave us with two artifacts X – nor with zero artifacts X, we need an atomic transaction involving both player A and player B.

Now, if our DB supports multi-row ACID transactions – everything is fine: we can have each of the players in its row, and to implement reliable transfer based on this multi-row ACID transaction involving both player rows. However, if our DB doesn’t support multi-row ACID transactions, things become MUCH uglier 🙁 . I know of two ways of implementing reliable transfer procedure (i.e. transfer procedure which guarantees that there is exactly one artifact in the system under [almost-] any circumstances; this includes app crashes, DBMS crashes, and server crashes):

To keep all the players in one huge BLOB, then assuming that our DB does support single-object ACID transactions – we can update the BLOB in one atomic ACID transaction, providing all the necessary guarantees. While this approach is theoretically fine and does provide the necessary guarantees, having ALL your players in one huge BLOB extremely rarely qualifies as a good idea (due to performance/scalability issues).
To have each player as a separate BLOB, and to use Async-inter-DB-transfer protocol (see [[TODO]] section above) to transfer the artifact. It will work (well, as any other inter-DB transfer), and surprisingly, this is not THAT bad in practice (well, at least compared to the previous option of having all the players in one BLOB). However, it is still rather cumbersome.
- First of all, you’ll need to keep potentially outstanding outgoing messages and at least last-ID-of-received-messages within each of your BLOBs; moreover, you’ll need to keep it for ALL the peers with which your player can communicate.
- Second – you can have hundreds of thousands of players (hopefully more 😉 ), so keeping hundreds of thousands of such communications in each of your ‘players’ object is usually too cumbersome (and causes N^2-like patterns too); to avoid it – you’ll likely need to say that each player BLOB is not communicating with the all other player BLOBS directly, but rather communicates via a special “broker BLOB” (i.e. inter-player transfer is implemented as two inter-DB async transfers – the first one from player A to broker, with commit there(!), and the second one – from broker to player B, with another commit). And as soon as this “broker BLOB” becomes a bottleneck – you’ll need to load-balance these brokers.
- While all this is doable (and in a correct manner too) – it is rather cumbersome, especially if we compare it to simple intra-DB multi-row ACID transaction.

“while it is theoretically possible to implement operational DB without multi-row/multi-object/multi-document ACID transactions – it is MUCH easier to program operational DBs relying on having multi-row transactionsBottom line: while it is theoretically possible to implement operational DB without multi-row/multi-object/multi-document ACID transactions – it is MUCH easier to program operational DBs relying on having multi-row transactions. This observation (unless there are very strong overriding considerations which I didn’t encounter in the wild, but anything can happen), will effectively rule out most of the NoSQL DBs out there – that is, for operational/transactional DB. In addition, rigid structure for operational/transactional DB is usually a good thing (if you have “money” field for the player – it is generally good to have this field of the same type for all the players, and so on). As a result, I strongly prefer to use SQL RDBMSs for operational/transactional DBs.

If speaking about scalability/performance – don’t worry, just like indexes are not monopolised by SQL, scalability is not monopolized by NoSQL (in spite of claims of certain NoSQL vendors). In fact, near-perfect scalability is perfectly achievable on SQL RDBMS (I’ve done it myself for a real-world project with 50M daily transactions – not too much, but not too shabby either); we’ll discuss the ways to do it a bit later, in [[TODO]] section. And, if speaking about extensibility of SQL schemas – as noted above in [[TODO]] section, quite a few SQL DBs out there can execute ADD COLUMN statement instantly on a billion-row table.

As for specific type of SQL RDMBS (MVCC-based or Lock-based) to use as operational/transactional DB – the answer generally depends on the way you organize your DB. If going towards single DB connections and DB-per-Service (which I will be arguing for just a bit later) – I’d say that it doesn’t matter (and I personally had VERY good experiences with Lock-based DB/2 in such environments). If going towards much more traditional massively multi-connection stuff – my relatively wild guess is that MVCC would perform a bit better (though not without the problems which are inherent to any kind of multiple connections). We’ll discuss more on choosing specific SQL RDBMS a bit later, in [[TODO]] section.

[[TODO: Cassandra CAS-based “lightweight transactions” (probably it is possible to write ACID equivalent on top of it, but optimistic locking doesn’t scale well, and it is atrociously difficult to write correctly), something else?]]

Game World State DB

In general, I am not that big fan of saving the whole current Game World States in DBs. As we discussed it in Chapter VII (and audaciously dubbed it as “’No Bugs’ Rule of Thumb for Multi-Player Games” 😉 ), in case of a substantially multi-player “game event” (such as fight, match, even casino hand) being interrupted even for 2 minutes, chances of getting all the same players back – are minimal; as a result, exact reconstructing of the game event as of the moment of interruption, doesn’t make much sense, as it will need to be rolled back anyway (see Chapter VII for further discussion of this phenomenon). In addition – saving Game World State into DB is usually way too expensive (especially if we can avoid it ;-)).

However, there is a Big Fat Exception to this rule of thumb above. Namely, if your game is asynchronous – you’ll likely want to use something cheaper than RAM, to store your Game World States. Also, for inherently asynchronous games, there are very few (if any) essentially multi-player game events – so the logic above doesn’t really apply.

As a result, I am all for using some kind of DB to save Game World State of your asynchronous games.

“as a rule of thumb, Game World State DBs are a MUCH better fit for NoSQLNow to the question – which DB is better for these Game World State DBs? The answer here is rather clear – as a rule of thumb, they’re a MUCH better fit for NoSQL DBs. Game World State is essentially a BLOB, and as noted above – NoSQL DBs are a natural habitat for BLOBs 🙂 . Moreover, complicated queries are usually unnecessary when speaking about serialized Game World States, so even unstructured NoSQL (such as Redis) will usually do (though in this regard, YMMV depending on game specifics). Moreover, guaranteed transactions between such Game World States are often unnecessary – and it they are, they still can be handled (via that async inter-DB transfer protocol).

“Reporting” DB and “Archive” DB

“Reporting” DB is intended to run all the CSR/security/… reports. These reports (unlike analytical reports for business/marketing) are usually in the context of one, or at most very few (like 2-3) specific players; answering questions such as “give me history of money transactions of player X”, or “let’s see all his logins to see if it looks likely that his account was indeed hijacked”, or “let’s see whether these two players have enough in common to consider them as the same player” (with the latter being a reason to ban them both) – all these things (and lots of the others) belong here. I’ve seen a game which had over 500 of different reports (with 200+ of them used on a regular basis), and their CSRs were running hundreds of thousands of such reports per day.

“Until your game becomes Rather Large – you should be able to run your reports right from your operational DB. Later on – you’ll probably need to create a slave read-only replica of operational DB – and to run reports from there.Until your game becomes Rather Large (like 50K of simultaneous players – though this a VERY approximate number and may easily vary by 10x in any direction) – you should be able to run your reports right from your operational DB. Later on – you’ll probably need to create a slave read-only replica of operational DB – and to run reports from there.

As a result, it means that “reporting” DB is usually structurally identical to operational/transactional DB (which, as we’ve discussed, is a good fit for SQL DBs). In addition, “reporting” requests also tend to benefit from relations significantly (queries mentioned above are expressed in terms of SQL rather naturally). Therefore, I feel that SQL RDMBSs is a natural fit for “reporting” DBs. BTW, if speaking about MVCC-vs-Lock-based RDBMS, for “reporting” DB usually MVCC works better (mostly because that with MVCC, writes don’t block reads and vice versa).

Archive DB appears when even your “reporting” DB replicas become too large to store everything in one DB; instead of partitioning of the “reporting” DB (which is usually possible, but tends to cause quite a bit of trouble), I am usually arguing to move most of historical data (say, beyond last 3-12 months) from “Reporting” into “Archive DBs”. In practice, CSR requests beyond, say, last 6 months, are rather rare, so even asking CSR to go to archive DB if necessary is rarely a problem (and most of the time, data from “reporting” and “archive” DBs can be merged by report itself if requested). On the positive side, “Archive” DBs become purely read-only, which helps quite a bit (just as one example – they can sit on top of RAID-5); more on organisation of “Archive” DBs in [[TODO]] section below.

Analytics/OLAP DB

Unlike with “reporting” DBs, which are primarily serving player-specific requests, analytical DBs are all about aggregates, more aggregates, and even more aggregates. Typical requests to analytics DBs go like “tell me which of the countries provided us with the best effect on the ad XXX run between 9am and 10am on MM/DD/YYYY?” – and as you can imagine, the sky is the limit for complexity of these requests.

“most likely, you’ll need to repackage your raw data to work with analytics/OLAP efficientlyAs analytics/OLAP stuff revolves around aggregates – it means that most likely, you’ll need to repackage your raw data to work with analytics/OLAP efficiently; there isn’t that much point in storing all the player-specific stuff when all you’ll ever need from analytics is aggregates.

This, in turn, means that more often than not, analytics DB does NOT need to be similar in structure to “reporting”/”archive” DBs, opening the door to use either SQL-based analytics, or NoSQL-based analytics (or both 🙂 ).

To be honest, I am not that much of an expert of Analytical/OLAP stuff (that is, to put it mildly ;-)); however, my personal feeling has always been that SQL never felt the right tool for OLAP (which is supported by an observation that most of OLAP SQL-based solutions are not really “pure SQL”, and involve certain stuff on top of bare SQL) – and as a result, I’d probably prefer NoSQL for analytics (on the other hand, if I’d have an OLAP guru on my team – I’d simply referred to her judgement and experience, as they’re VERY important in this field); also let’s note that map-reduce (present in many NoSQL DBs) can be very useful when dealing with huge datasets of historical data spread over multiple server boxes (and on the other hand, for inherently historical tables ACID is rarely a strict requirement, as we can usually get a logical “snapshot in time” relatively easily).

SQL vs NoSQL for MOGs: Summary (or “On Horses for Courses”)

All the findings above can be summarized in the following short table:

DB Type	Suggested SQL/NoSQL	Reason(s)
Operational/Transactional DB (OLTP)	SQL	Multi-row ACID, write-heavy, queries tend to benefit from relations, few BLOBs
Game World State DB	NoSQL (maybe even unstructured)	Nothing but BLOBs, usually no relations, no multi-object transactions
Reporting/Archive DB	SQL (with a preference towards MVCC)	Similarities to OLTP DB, queries tend to benefit from relations heavily
Analytics/OLAP DB	SQL or NoSQL (with a preference towards NoSQL)	Very different structure, very different requests, no natural fit for SQL/relations, mostly reads

[[To Be Continued…

This concludes beta Chapter 20(c) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 20(d), where we’ll start discussing specifics of implementing transactional/operational DBs (as everything else within this book – in the context of MOG).]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.