Databases 101: ACID, MVCC vs Locks, Transaction Isolation Levels, and Concurrency

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

Database alphabet soup: ACID, BASE, CAP, etc.

[[This is Chapter 20(a) from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

This Chapter XVII is dedicated to databases and related issues – in the context of MOGs, of course.

And in the context of medium- and large-scale MOGs, functionality-wise it is common to have up to four different layers (or types) of the databases:

OLTP Online transaction processing is a class of information systems that facilitate and manage transaction-oriented applications... OLTP applications are high throughout and insert or update-intensive in database management.— Wikipedia —Transactional processing DBs (OLTP). Transactional-processing DB is the most time-critical DB in the whole system; it needs to handle all the real-time game requests. Most of our discussion here will revolve around transactional/OLTP processing (we’ll briefly discuss other types of DBs in Vol.3, tentatively Chapter [[TODO]]). These DBs can usually be kept fairly small (very ballpark number being 10G to 100G for a millions-of-simultaneous-players game) – which in turn can facilitate certain trickery such as in-memory DBs; more on it in Vol. 3 (tentatively Chapter [[TODO]] on DB Scaling).
- While term “OLTP” can be used to mean many different things, for our purposes we will speak about very small and simple requests – not including anything which can possibly take longer than single-digit milliseconds (not that you will need these things in OLTP).
- For the sake of clarity – we’ll assume that a typical mix of our OLTP transactions is (give or take) a 50-50 split between INSERT and UPDATE statements (on the other hand, there is some trickery which we can use to adjust this ratio if necessary, see [[TODO]] section below). This 50-50 (or actually anything from 20-80 to 80-20) split usually naturally arises from the very-frequent need to do some changes (such as “change amount of <whatever> user has in USERS table”) – and to add a record about the changes to the audit table; the former is usually implemented by UPDATE, the latter – by INSERT.
[[TODO: State Storing DBs for Asynchronous Games]]
“Real-time reporting” DBs (somewhere in between classical OLTP and OLAP). Real-time reporting DBs are usually used by support team (CSRs, security, payments, etc. etc.). Imagine a room with 100 people answering 10K e-mails per 8-hour shift (and to answer e-mails – they’ll need to run some kind of report against DB, such as “when exactly this user has logged in” etc. etc.) – and you’ll get an idea for the environment where this type of DB is used. On a positive side, information within such databases is not required to be up-to-millisecond, and usually we can live with delays of up-to-1-2-minutes. Real-time reporting DBs are often significantly larger in size that transactional DBs (up to several terabytes for a decently large game); still, as support rarely goes beyond several months back (except for payments etc.), to keep size of these beasts in check, big chunks of information can be moved to archive DBs (see below). We’ll discuss ways to organize “real-time reporting” in Vol. 3 quite a bit; for the time being, it will suffice to say that until you’re large enough, you should be able to run your real-time reporting from your OLTP DB ;-).
Archive DBs. Archive DBs are not routinely used for real-time reporting, but they can be accessed if demand arises (which is usually pretty rare). Archive DBs can be as large as you can imagine (hundreds of terabytes anybody?).
OLAP Online analytical processing is an approach to answering multi-dimensional analytical (MDA) queries swiftly... Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries with a rapid execution time— Wikipedia —Analytical DBs (OLAP). Analytical DBs include all the data warehouses/data mining/etc. stuff; we won’t spend much time discussing them in this book, in particular because they are not that different from OLAP DBs used for usual business analytics. A few pointers will be provided, however, in Vol. 3. As for the size of analytical DBs, it depends so much on the kind of stuff you want to analyse, that making any estimates is pretty much pointless.

All of these DB types are different – and often require different approaches to implement.

However, before we can start speaking about implementing these DBs in the context of games, we need to define a few well-known things which we’ll refer to as our discussion goes on. I’m speaking about three abbreviations everybody has heard about – but which still cause quite a bit of confusion. It is ACID, BASE, and CAP.

ACID properties

These days, ACID is often used as a gospel: “hey, our DB supports ACID transactions, so you should be fine!”. Still, consistently with my point of view on pretty much anything else, I strongly suggest to understand what this alphabetical soup is all about, well before starting to rely on it.

Four-letter ACID abbreviation stands (no surprise here) for four properties: Atomicity, Consistency, Isolation, and Durability. Let’s take a closer look at them (though in a bit different order).

Atomicity

“Atomicity is an extremely important property for keeping integrity of your dataAtomicity is a critically important property. Transaction being atomic means that if you’re trying to do a DB transaction, it is guaranteed to be all-or-nothing. Atomicity is an extremely important property for keeping integrity of your data;¹ to illustrate its importance, let’s consider the following example.

Let’s say that you need to transfer an artifact (which is worth $20k on eBay) from player A to player B. And then, you’re doing two operations: first, you remove the artifact from player A, and second, you add the artifact to player B. Most of the time, it will work fine; however, one gloomy day your app/DB/server/whatever-else will decide to crash right in between these two operations. Which will mean that the artifact has just disappeared 🙁 (and you can be sure that players will complain and will raise hell in forums, etc. etc.). BTW, note that if you do these two operations in reverse order, you’ll still have a problem – if your server crashed at an unfortunate time, then after you re-launch your system, the artifact may appear in your DB twice.²

This problem cannot be resolved in a good way unless you have Atomic transactions. With an atomic transaction – you just put both your operations under the same Atomic transaction – and have a guarantee (from your DB) that whatever-happens, the whole transaction will either succeed, or fail (in the latter case no changes to DB will be made at all).

For our artifact transfer example, it means that if we do both operations above under single atomic transaction, it is guaranteed that there will be exactly one artifact (if the transaction succeeds – with player B, otherwise – with player A). Bingo! We’ve got our cake and ate it too ;-).³

Overall,

to keep integrity of our data, as a Big Fat rule of thumb, ALL the modifications to our transactional processing DBs SHOULD be done under Atomic transactions. Moreover, each single transaction SHOULD leave DB in a consistent state

Note that latter sentence has severe implications on transactions. In particular, we SHOULD NOT have a DB transaction such as “add money to user account” or “remove money from user account”; instead, we should have transactions such as “move money from user account X to user account Y”, or “move money from user account to our account”,⁴ or “move money from user account to ‘outgoing payments’ table”.

As with any generic statement, exceptions do exist,⁵ but they’re very rare and far between.

¹ “consistency” may be a better word here, but I don’t want to use the word “consistency” here to avoid confusion with Consistency property, which has significantly narrower semantic meaning than we need here

² Also let’s note that crash is not the only case when having two separate transactions for a transfer will lead to problems; in particular, with concurrent access, lots of things can happen in between these two transactions

³ BTW, it also means that the order of operations under atomic transaction doesn’t matter – at least from the point of view of the end result (though it IS important from the point of view of deadlocks, see discussion in [[TODO]] section below)

⁴ this one is music for CEO’s ears 😉

⁵ one example would be cases when operations themselves are atomic, and we don’t have any DB modifications which span more than one operation (this happens, for example, if we’re using DB only for logging independent events); another example is data which is only loosely related to the rest of the data, and with loose semantical meaning too

Atomicity over multiple objects

One very important point about atomicity, is that your DB should allow you to have transactions involving multiple objects (such as rows). This is necessary to work when objects interact (and they do interact even in a simple “transfer artifact” scenario above). Moreover, contrary to implications from [MySQL.AtomicOperations], simply using incremental UPDATE statements (while being a Really Good Technique) is NOT sufficient to guarantee data integrity (in particular, if your server or app fails in between the two UPDATEs – you will be in REAL trouble (not to mention that with relying purely on incremental statements, you can easily end up with a negative amount for something inherently positive, such as number of artifacts).

In SQL world, such multi-object transactions are almost-universal (with a notable exception of MySQL-with-MyISAM which is mentioned above); for NoSQL databases (even those which support ACID), multi-object transactions are MUCH more rare; at the very least – don’t take that they exist, for granted until you read the associated documentation.

Durability

Transaction being durable means that after you’ve got a reply from DB that the transaction has succeeded, DB guarantees that it will never go back. You can think about it as of a kind of flush (sync) operation when you’re writing to file – you may write stuff to the file (or within transaction), but only after you’ve got a ‘successful’ notification from flush() (or from transaction) – you can be sure that it is indeed guaranteed to be written.⁶

“One obvious exception when durability property is not satisfied, is when you’ve used a restore from backup for your DB.One obvious exception when durability property is not satisfied, is when you’ve used a restore from backup for your DB. And well – in real world restoring from backup does happen, so we need to keep it in mind; it becomes quite a headache when multiple DBs are involved.

⁶ BTW, DBs themselves are usually using flush/sync (or a reasonable facsimile) to implement transactions

Consistency

“Consistency” term is heavily overloaded one in the field of databases, and is used to denote very different things depending on the context⁷ (that’s why I’m trying to avoid using the term myself within this book). In the context of ACID, it means that if you’ve put some constraints on your DB, then DB guarantees that no transaction which violates these constraints, will ever succeed.

In theory, there can be any kind of constraints; in practice, however, the most common ones are related to things such as primary keys and foreign keys in RDBMS (we’ll discuss them later in this Chapter). On the other hand, if your DB doesn’t allow to specify any constraints – it automatically provides Consistency property for ACID (all the constraints are satisfied, and who cares that there can be no constraints at all? 😉 )

⁷ We’ll see one example just a few pages below when discussing ‘consistency’ in the context of CAP

Isolation

Probably out of ACID properties, Isolation is the one which is the most difficult to deal with in practice 🙁 . On paper, it looks very simple: isolation guarantees that all your transactions, even if they’re running at the same time, are executed “as if” they’re executed serially. In practice, however, it is MUCH more complicated – having Isolation while keeping reasonable performance, requires quite a few compromises (and brings us to the really ugly topic of transaction isolation levels).

I don’t want to delve into fine details of transaction isolation here (especially as at a certain level, quite a few of them are DB-specific); however, we’ll spend some time discussing them – especially as we’ll need to refer to transaction isolation levels in the future.

“in real world, there are two very different approaches to implementing isolated transactions and concurrency controlFirst of all, let’s note that in real world, there are two very different approaches to implementing isolated transactions and concurrency control – and these approaches have profound effects on isolation levels.

Lock-based Concurrency Control

The first approach to implementing concurrency/isolation is based solely on locking; in other words, if one transaction uses some piece of data (these days – usually row⁸), a lock is set on this row, and is kept until the transaction succeeds or fails.

Examples of RDBMS which implement lock-based isolation, include DB/2, MS SQL Server, and the ones such as Sybase and Informix.

While from my experience, this approach works reasonably well in OLTP environments (though worse – in mixed OLTP-and-reporting ones), using it requires quite a bit of knowledge about locking – and it has quite a few different isolation levels too. While specifics of isolation levels depend on your DB, the most common ones for lock-based DBMS include (in the order from weakest guarantees to highest, and from best performance/least locks to worst performance/most locks):

Read Uncommitted. At Read Uncommitted transaction isolation level, writing within transaction sets write lock only while the row is being modified, and lifts it immediately afterwards. Reading within transaction sets read lock only while the data is being read. As such, “dirty reads” (i.e. reading data which will never make it into the database), are possible under Read Uncommitted. Let’s note though that “dirty data” in DB-speak doesn’t usually mean that DB can show you a partially-updated data object/row (I’ve never seen such a database myself; however, I’ve got reports that in some of NoSQL DBs reading partially-updated data is possible; beware!); instead, usually “dirty data” is data which has been updated by a transaction – but as modifying transaction is not committed yet, transaction including all the modifications, may be still rolled back (which, unless you’re careful, can cause all kinds of inconsistencies).
Read Committed. At Read Committed transaction isolation level, writing within transaction sets write lock (on the data/rows written) which is kept until the transaction is terminated (either committed or rolled back). As such, “dirty reads” are no longer possible; however, if two exactly the same reads are issued within the same transaction, results may be different. In particular, both so-called non-repeatable reads, i.e. reading of data modified between different queries, and phantom reads, i.e. reading of rows which were added by other transaction after this one was started, are possible. Read Committed (or similar CS level in DB/2) is probably the most popular isolation level for lock-based DBs used for transactional processing, and in most cases provides a very reasonable compromise between isolation guarantees and performance.
Repeatable Reads. At Repeatable Reads transaction isolation level, in addition to locks above, DB also locks data after reading it – and keeps the lock until the transaction is terminated. Repeatable Reads isolation level eliminates both “dirty reads” and “non-repeatable reads”, but is still open to “phantom reads”. Repeatable Reads isolation is rarely used for OLTP-style lock-based databases, due to performance issues it tends to cause 🙁 ; read locks which are kept until the end of transaction, are not a picnic at all.
Serializable. Serializable transaction isolation level eliminates “phantom reads”, at the cost of requiring even more locks than Repeatable Reads;⁹ as such, is almost-never used (on transaction processing lock-based DBs, that is).

⁸ these days, if you run into an RDBMS which doesn’t support row-level locking and is stuck with page-level locking (or, Codd forbid, table-level locking), your best bet will be to run away really fast

⁹ specifically – a range lock (or a table-level lock, ouch!)

Multi-Versioned Concurrency Control (MVCC)

“MVCC-based DBMS will create a “previous version” (“snapshot”) of that row, and will supply that “previous version” of the row to any other transaction which may try running concurrentlyWith Multi-Versioned Concurrency Control (MVCC), the whole approach is very different. Instead of locking that row for reading when somebody starts working on it, MVCC-based DBMS will create a “previous version” (“snapshot”) of that row, and will supply that “previous version” of the row to any other transaction which may try running concurrently. It makes sense – after all, until the first transaction is committed, the state of DB is not considered to be changed (and might be not changed at all if we have the transaction rolled back).

Examples of RDBMS which support MVCC, include: Oracle, MySQL with InnoDB, PostgreSQL, and MS SQL Server (the last one – not by default).

With MVCC-based databases, mostly there are only two different isolation levels (which is a Good Thing(tm) BTW 😉 ):

Read Committed. At Read Committed level, MVCC-based DB implements reads using snapshots taken at the moment of the read query issued. Writes still request write (exclusive) lock. At Read Committed level for MVCC-based databases, just like with Read Committed for lock-based DBs, “dirty reads” are not possible, but non-repeatable reads are possible. However, unlike with lock-based DBs, with MVCC-based databases writers do not block readers, and vice versa.
Repeatable Read/Serializable. At Repeatable Read isolation level (which usually – though not always – coincides with Serializable isolation level), MVCC-based DB implements reads using shapshots taken at the moment of beginning of the transaction. As a result, none of “dirty reads”, “non-repeatable reads”, and “phantom reads” are possible (at the cost of keeping more snapshots, especially if the longest of your transactions is long enough). Writes still request write lock. In addition, there is a special kind of failure which is specific for this isolation level with MVCC: the failure happens if our current transaction is trying to modify a row which has already been modified by another transaction after current transaction started. This phenomenon is known as a “serialization failure”; for further discussion, see, for example, [Oracle] or [Postgre].

There are some other types of isolation as it applies to MVCC , for example, Serializable isolation level in Postgre 9.1 and up (providing even stricter guarantees than avoiding artifacts above, see [Postgre]); however, I have my doubts that you’ll realistically need them for your game DB servers.

SELECT FOR UPDATE

In addition to isolation levels (which are usually set for the whole connection and don’t normally change per-transaction), for SQL-based DBs you can use SELECT FOR UPDATE statement to ensure proper locking even if you’re running at a lower isolation level. Most of the time, SELECT FOR UPDATE simply issues a write lock on all the rows in the SELECT (which lock, as all write locks, will stand until the transaction is completed). An example use case for SELECT FOR UPDATE goes as follows:

We’re running under Read Committed isolation level
We need to SELECT a list of items to be manipulated, and then to work on items there one by one
As we’re operating under Read Committed level, we cannot be sure that the items we’ve selected, won’t be modified by another transaction
To make sure it won’t happen, we can add FOR UDPATE to that original SELECT of list of items

“note that SELECT FOR UPDATE, while useful to avoid problems described above, can easily cause LOTS of problems with concurrency and deadlocks.Let’s note that SELECT FOR UPDATE, while useful to avoid problems described above, can easily cause LOTS of problems with concurrency and deadlocks.

Locking and Concurrency under Lock-based and MVCC-based DBs

Ok, all those lists of isolation levels are nice and everything, but what does it really mean for us as developers? There are several Big Fat Caveats with DB development, which tend to cause severe problems as soon as your DB becomes at least somehow loaded. These problems are quite similar to those problems with synchronization which we routinely experience with multi-threaded programs, so while discussing them, I will provide analogies from multi-threaded world.

Concurrency Problems are Non-Testable, Big Ouch!

One thing which needs to be mentioned with regards to concurrency problems, is that

you cannot possibly test them 🙁 🙁¹⁰

Which means that a typical pattern for a concurrency problem on the Server-Side (either with multi-threading, or with DB concurrency) goes as follows:

A bug is introduced
Then, it sits quietly for many months, or even years
Then, as server load goes up – it starts to manifest itself
- At this point, if it is a multi-threaded problem, it usually crashes/hangs the whole thing, so at least it gets noticed
- “For DB-related problems, however, you just start seeing deadlocks – or even worse, things quietly go not as supposed (like some of those people who’re entitled for bonuses, cannot get ones, or money is lost in transit, or a dog appears out of thin air and eats all your homework).For DB-related problems, however, you just start seeing deadlocks – or even worse, things quietly go not as supposed (like some of those people who’re entitled for bonuses, cannot get ones, or money is lost in transit, or a dog appears out of thin air and eats all your homework).
  - As the problem is very evasive, it may take quite a while (and you can take quite a hit in terms of how-much-players-like-your-game too), until you realize that the problem does exist
- Then, you’re finally at the stage of admitting that the problem exists, and starting to hunt it down. However, 99% of the time you won’t be able to reproduce the problem in lab, so it will take quite a while (and even more hit on player disposition towards you) until you realize what is going on.

I’ve been in these shoes myself much more than once, and should say that

Concurrency Problems Are By Far The Worst Ones Out There

This, of course, means that the best would be to avoid them altogether, but we’ll come to it a bit later ;-). For the time being, let’s discuss some specific concurrency problems.

¹⁰ ok, in theory there are ways of exhaustive testing of concurrent problems, but in practice the complexity of such tests grows exponentially, so they’re not an option at least for ever-changing application-level code

Dirty Reads, forgotten SELECT FOR UPDATEs, etc.

“no DB I know allows reading partially updated rowsWhenever we have an option to read dirty data, it can cause problems; those coming from multi-threaded world, know what can happen if we forget to acquire a lock before writing (or reading) share data – it is pretty much everything, up to a Boojum (or CEO) coming out of the sky and eating our program (or salary slip). With DBs, it is a little bit better (as no DB I know allows reading partially updated rows), but still can cause all kinds of trouble, including (depending on your app specifics) such things as money going missing 🙁 . The same goes for missing SELECT FOR UPDATEs.

Unfortunately, the only way to ensure that you never ever read dirty data – is to review your code; and then to review more, and then to pray that none of your reviewers has missed something important 🙁 .

Deadlocks

“At this point, we have transaction TA holding resource RA and waiting for resource RB, and transaction TB - holding resource RB and waiting for resource RA. In this state, unless something external happens, both transactions will stay forever 🙁 .Deadlocks occur whenever DB transaction TA has obtained lock on one object/row RA, and another DB transaction TB has obtained lock on another object/row RB, and then TA needs a lock on RB, and TB needs a lock on RA. At this point, we have TA holding RA and waiting for RB, and TB – holding RB and waiting for RA. In this state, unless something external happens, both transactions will stay forever 🙁 .

In multi-threading, this problem is non-recoverable; in DB world, deadlocks are routinely detected by RDBMS, and one of the transactions is rolled back forcefully (and application can get an error, and re-try the transaction using new version of the data involved).

However, even with this in mind (and with correct handling of such rollbacks, which may be non-trivial, though such handling may be generalized with Reactors at the cost of processing “fairness”, see [[TODO]] section below), the impact of deadlocks can be quite severe; deadlock detection normally takes seconds (which is already Damn Large for quite a few games out there), and forced rollbacks, if not kept in check, can be devastating for performance.

To deal with deadlocks, it is common to establish one common order of obtaining locks; if everybody follows this order – there can be possibly no deadlocks. On the other hand, ensuring that all the transactions follow the same lock order – is also a thing which is usually found only by code review 🙁 (however, detecting lock order violations at least CAN be implemented in a testable manner).

MVCC-based vs Lock-based DBs

Armed with all this information, now we’re ready to start answering the question which of the approaches (MVCC-based or Lock-based) is better for our DBs (specifically – for a transactional-based one).

“both MVCC-based and Lock-based DBMS issue locks (and therefore, both can cause all kinds of trouble such as deadlocks etc.); however, the difference lies with the number of locks issued by them.As noted above, both MVCC-based and Lock-based DBMS issue locks (and therefore, both can cause all kinds of trouble such as deadlocks etc.); however, the difference lies with the number of locks issued by them.

The main advantage of the MVCC-based DBs lies with the fact that with MVCC writers don’t block readers and vice versa. In practice, it means that Lock-based DBMS tend to issue MUCH more locks than MVCC-based ones; this, in turn, has substantial implications:

For a decently-loaded lock-based DB, you’re usually stuck with Read Committed isolation level; for MVCC-based one you might be able to get away with Repeatable Read/Serializable one
- This in turn means that, as a rule of thumb, you need to be MUCH more careful when writing code for a Lock-based DB 🙁
Usually, MVCC-based DBs do NOT suffer from problems with so-called “lock escalation” (which results from having too many locks, increase chances for deadlocks dramatically)

On the other hand, MVCC has a price tag attached: there is a cost of keeping all these versions (and a cost of reading old versions from the “rollback/undo segment” where they’re stored). Still, for a concurrent DB app – I would prefer MVCC-based DB.

What about Avoiding Concurrency Completely?

In the midst of the word he was trying to say,
In the midst of his laughter and glee,
He had softly and suddenly vanished away—
For the Snark was a Boojum, you see.

— Lewis Carroll —

With all that being said, usually for transactional DB I prefer to avoid concurrency completely (see [[TODO]] section below).

“I know that this is highly controversial (=”I will be pounded Really Hard just for saying it” ;-( ). However, I’ve seen it working extremely well in real-world games, and at the moment it is my preferred way of doing things.”]I know that this is highly controversial (=”I will be pounded Really Hard just for saying it” ;-(). However, I’ve seen it working extremely well in real-world games, and at the moment it is my preferred way of doing things. Also, apparently, I am not the only person in the game world successfully using this approach. Once upon a time, I had a friendly chat with a CTO of another game, with both our games running hundreds of thousands concurrent users at the moment. At some point, the dialog took the following turn:

Me: you know, we’re handling all the database load from one single DB connection (and are able to handle like 50M DB transactions/day on a single box).
Him: (looking around to make sure nobody can hear him except for me): Don’t tell anybody (I don’t want to become a laughing stock), but we’re doing the same thing…

I will save all my arguments on the merits of this approach for later (until section [[TODO]]); for the time being I should just mention that in case of no-concurrent DBs, all the arguments above against lock-based DBs tend to (softly and suddenly) vanish away, and in fact I had very good experiences with lock-based DB/2 in such non-concurrent highly loaded environments.

[[To Be Continued…

This concludes beta Chapter 20(a) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 20(b), where we’ll discuss distributed databases, CAP theorem, and NoSQL databases.]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Databases 101: ACID, MVCC vs Locks, Transaction Isolation Levels, and Concurrency