Distributed Databases 101: CAP, BASE, and Replication

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

From Duchess DB. An invitation from Queen DB

[[This is Chapter 20(b) from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

Ok, now we’re done with traditional single-node databases (i.e. databases which run on one single piece of hardware), and can proceed into a world of distributed databases, which will feed us another few bowls of alphabetical soup, including CAP and BASE (with some replication thrown in for a good measure).

CAP Theorem

For healthcare, there are three properties:

Availability, Being Free (as in Free Beer), and Quality.

Unfortunately, you cannot have more than two of these properties at the same time.

— Popular (and rather sad) joke in Soviet Union —

Actually, CAP theorem, in spite of all the scientific-sounding buzz around it, is merely a formal description of a pretty obvious observation. Let’s consider it based on one simple example.

Let’s say we have two datacenters (A and B), and we have a database in each of datacenters, with databases being synchronized. Moreover, we have some Clients connected to datacenter A, and some Clients – to datacenter B. Everything works fine as long as datacenter A has no problems communicating with datacenter B. However, as soon as the link between datacenters is broken (and Clients are not able to reconnect either), we have two distinct choices (in addition to the choice of scrapping the whole thing ;-)):

we can stop processing (at least in one of the datacenters) – this will keep data consistent between datacenters, but will make system unavailable for some of the Clients (in a sense that they won’t be able to work – at least to modify the database as they’re usually doing – until the connectivity between datacenters is restored)
“The worst case of inconsistency happens when row X in database A is modified (by a Client connected to datacenter A), and the same row X in database B is independently modified too (by a Client connected to datacenter B).alternatively, we can keep processing data in both datacenters (i.e. in both databases) – but then we risk that the data will become inconsistent between the databases (and if it happens – then we’ll need to reconcile the databases somehow). The worst case of inconsistency happens when row X in database A is modified (by a Client connected to datacenter A), and the same row X in database B is independently modified too (by a Client connected to datacenter B). In this case, reconciliation becomes a Really Big Headache (in general case – it is not resolvable without help from application level, and often is not resolvable at all without human intervention 🙁 🙁 ).

In other words: if we have our system partitioned (i.e. without connection between different parts – in our case between datacenter A and datacenter B) – we cannot have it both “available” and “consistent” at the same time.

This can be generalized to a CAP theorem: out of three properties, Consistency, Availability, and Partitioning – we can have no more than two simultaneously. It has been noted, however, that this “2 out of 3” interpretation may be misleading, and more down-to-earth interpretation (along the lines of our example above, i.e. “what to do when Partitioning occurs”) has been suggested; apparently, both Consistency and Availability are actually not binary, but continuous, which opens a door for many implementations with different balance of these two properties [Brewer]. In other words – what CAP theorem really prohibits, is only that we cannot have perfect Consistency with perfect Availability when arbitrary Partitioning is present; however, different implementations may make subtly different choices and combine some Consistency with some Availability when facing some kind of Partitioning. In short – while CAP theorem still stands, it is necessary to look at the mechanisms used by different systems to understand what exactly happens in case of communication problems between your datacenters (and this is exactly what we’re going to do – at least for our scenarios of interest ;-)).

Also let’s note that “Consistency” in CAP has different meaning from “Consistency” in ACID; in the context of CAP, “Consistency” is understood as (give or take) a synchronization between different copies of the same piece of data.

BASE vs ACID

If, in our “partitioning” example above (and in general too), we try to keep our ACID properties (see [[TODO]] section above) – we’ll end up with a system that, when facing Partitioning, will choose Consistency (in CAP sense) over Availability; in turn, this means sacrificing (at least some) Availability.

On the other hand, there is a different approach – sacrificing Consistency while keeping high Availability. This leads us to a different-than-ACID set of guarantees, known as BASE. BASE (IMO rather awkwardly) stands for “Basically Available, Soft state, Eventually consistent”.

I won’t go into a deeper discussion on BASE vs ACID (those interested in such a discussion may refer, for example, to [Brewer]), but will note a few things:

IMNSHO Initialism of In my not-so-humble opinion— Wiktionary —As a rule of thumb, ACID Atomicity doesn’t affect our choices when it comes to distributed systems. In other words, BASE-compliant systems also can (and IMNSHO SHOULD) exhibit Atomicity
ACID Consistency (not to be confused with CAP Consistency) can be maintained within one partition (datacenter in our example).
ACID Isolation (more specifically – high transaction isolation levels such as Serialization) CANNOT be achieved when facing Partitioning [Brewer] 🙁
ACID Durability is optional with BASE; in other words, as BASE is about “soft state”, it doesn’t strictly require Durability – but it also doesn’t prohibit Durability either.

In general case, programming having only generic BASE guarantees is very difficult and unpleasant (on intuitive level – because there is just no one single thing which we can rely on); however, if we consider specific use cases – it is often possible to make a rather straightforward and manageable system on top of BASE.

Coming from theoretical clouds back to Earth, let’s take a look at two very common (and rather obvious) examples of BASE systems.

Replication

Probably, the most obvious example of BASE system is an “asynchronous replica”. Let’s consider the following very practical example:

We have our transactional/operational DB, which makes all the decisions, keeps all the game artifacts and player balances. etc. etc. Moreover, it is an ACID-compliant DB which can do it in a perfectly strict and relatively straightforward way.
However, we also need to run lots of reports for our support/CSR/security/marketing/whatever teams. These reports tend to cause lots of reading (which can involve table scans, ouch! – more on these and execution plans in general in [[TODO]] section below). As a consequence, such heavy reads can cause “cache poisoning” for our DB, slowing down our operational DB (ouch) – and of course, we want to avoid it.
- “One way to avoid reports affecting operational DB, is via creating a read-only replica of our main operational DB – and running our reports off that replicaOne way to avoid reports affecting operational DB, is via creating a read-only replica of our main operational DB – and running our reports off that replica. In general, such a replica can be “synchronous” or “asynchronous”; however, to achieve our goal of “reports not slowing down operational DB”, we’ll need an “asynchronous” replica.
  - “Asynchronous” replica is merely about “master” (in our case – operational/transactional) DB writing all-the-stuff-it-is-changing (say, to a DB log¹) – and sending this log to the “slave” (in our case – “reporting”) DB, where this log can be re-applied to achieve the same state as on the “master”.
    - as our replica is “asynchronous”, “master” doesn’t need to wait until “slave” is updated; as a result – “slave” may be a little bit behind the “master” – but eventually (in practice, if you’re doing it right – it is possible to get it down to at most 1-2 minutes) it will receive the information from the “master”. This eventual receiving of the updates from “master” is an illustration of “Eventually consistent” property representing last letter of BASE.

This “asynchronous replication” approach is rather common and often very useful; of course, data in the reports is a little bit “stale”, but most of the time support teams work with the historical data anyway (and historical data older than that 1-2 minutes are perfectly valid in replica); even if support team needs some kind of an “instant” value, it is very rarely an exact value – but rather a very rough approximation; this is even more true if this value changes all the time (which is the case for most of the data taken from operational DB). As a result, while there MIGHT be a report or two in your system which will need to use “master” database to get the data – the vast majority of them will be just fine running off replica (and “asynchronous” replicas are essentially the systems with BASE kind of guarantees).

““multi-master” replication will inevitably lead us either to re-introduced dependencies between DBs – or to “conflicts”Let’s note that here we’re speaking only about “single-master” replication; “multi-master” replication (i.e. multiple masters for the same piece of data) will inevitably lead us either to re-introduced back dependencies between DBs (i.e. long report running on “reporting” DB will be able to affect performance of “master” one) – or to “conflicts” (with some kind of conflict resolution – see, for example, [CouchDB]). The latter is rarely acceptable in decision-making systems,² ³ and the former is the thing which we’ve tried to avoid in the first place, so it is rarely a good choice at least for this kind of usage scenarios.

¹ BTW, all the ACID-compliant DBs I know, do the logging all the time anyway, to keep ACID properties

² in theory, there is a way to avoid both locking and conflicts – it is so-called Conflict-free replicated data types (CRDTs); unfortunately, CRDTs work only in a few very narrowly defined cases and are rarely used for real-world development 🙁

³ I’ve heard of people who told that they knew some people who have seen a decision-making system with multi-master replication and conflict resolution; however, chances of me seeing it with my own eyes, are IMO pretty low

Asynchronous Inter-DB Transactions

In Chapter III, we’ve discussed a protocol for inter-database transactions with strict guarantees on the data integrity over the whole system (in particular, nothing will be lost even if everything crashes – as long as ACID guarantees for both underlying DBs stand). Actually, this protocol is a close cousin of “Event Sourcing” pattern (for example, as described in [Richardson]).

Just to remind, this protocol was based on:

moving “something” within the DB A from one of the tables to an “outgoing transfers” table in one ACID transaction; each “outgoing transfer” in “outgoing transfers” table needs to have a unique ID.
transferring the data about the “outgoing transfer” to DB B.
upon receipt of “outgoing transfer”, DB B should make an ACID transaction, both (c1) processing the “transfer” (such as adding money to recipient), and (c2) making a record that “transfer” with this ID has already been processed.

“One advantage of this protocol is that it is asynchronous; in other words, delays in one of DBs won’t affect the other one (well, beyond delays to receive the data which is necessary for processing).One advantage of this protocol (over, for example, classical two-phase protocol and/or XA) is that it is asynchronous; in other words, delays in one of DBs won’t affect the other one (well, beyond delays to receive the data which is necessary for processing). However, it comes with a price tag attached – when using this protocol, there is a time frame when whatever-we-transfer, is in a special “in-transit” state – i.e. at this time, this whatever-we-transfer is not available for processing in either of DBs.

Here, once again, we have only eventual consistency type of guarantees. On the other hand, the guarantees we have here, are pretty strong, and there are development scenarios using this protocol, which are very straightforward – we’ll see how to use this protocol in a very practical setup, a tiny bit later, in [[TODO]] section.

Bottom line about BASE

TL;DR about BASE: in a general case, programming in BASE environment is a Big Fat Headache; however, there are specific cases (such as single-master replication or asynchronous inter-DB transactions described above) where BASE-based databases (pun intended) can be made usable, and very useful too. We’ll see examples of such setups a bit later.

[[To Be Continued…

This concludes beta Chapter 20(b) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter XVII(c), where we’ll briefly discuss a few of popular NoSQL databases.]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Comments

Adi says

October 18, 2016 at 5:32 am

It would be interesting to mention Conflict-Free Replicated Data Types (CRDT) and the DBMSes that support this type of operations. These systems are useful on a per-case basis, when you have data that can be cached on the client and perform operations locally, and the state / operational change induced by operations gets lazily replicated (eventually). Also, these systems perform well on variable latency environments, if the logic allows nodes to reply with close approximations to queries (e.g., number of users online).

Genera Links:
CRDT – Wikipedia definition -> https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type
CRDT popular paper -> http://hal.upmc.fr/file/index/docid/555588/filename/techreport.pdf
Github easy to follow explanation -> https://github.com/ljwagerfield/crdt

DBMSes that support CRDT:
Akka CRDT – -> https://github.com/jboner/akka-crdt
Datanet – an open source CRDT based data synchronization system -> http://datanet.co/
Riak -> http://docs.basho.com/riak/kv/2.0.2/developing/data-types/

Interviews and posts about CRDT:
High Scalability, Datanet -> http://highscalability.com/blog/2016/10/17/datanet-a-new-crdt-database-that-lets-you-do-bad-bad-things.html
Medium, CRDT types -> https://medium.com/@istanbul_techie/a-look-at-conflict-free-replicated-data-types-crdt-221a5f629e7e#.a2enxjd2j

- "No Bugs" Hare says
  
  October 20, 2016 at 1:19 pm
  
  Interesting, I’ve added a footnote about them (not more as use cases seem to be too narrow for real-world stuff 🙁 ). THANKS!