Choosing RDMBS for OLTP DB - IT Hare on Soft.ware

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

Multiple Connections vs Single Connection

[[This is Chapter 20(f) from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

At this point, you will usually need to choose RDBMS for your OLTP (transactional/operational) database. While it IS possible to write portable SQL and avoid choosing it now (more on it in [[TODO]] section below) – chances are that you will still be forced at least to try making your choice right away (it is after realising that the choice it not that obvious, you may decided to go for cross-RDBMS SQL).

Criteria for Production OLTP RDBMS

Before going into holy wars of “<insert-RDBMS-here> is the best, period”, let’s try to define features which are important for a production-level 24×7 OLTP RDBMS.

Lock-Based vs MVCC

“with respect to concurrency, pretty much all RDBMS these days are lock-based or MVCC-based onesAs we discussed in [[TODO]] section above – with respect to concurrency, pretty much all RDBMS these days are lock-based or MVCC-based ones. From the point of view of write-heavy OLTP processing, I’ve seen that:

For a mix of writes and reads (as in “OLTP transaction and reports”), MVCC-based RDBMS perform somewhat better than the lock-based ones
- Make it MUCH better if isolation levels higher than Read Uncommitted, are necessary for reads/reports.
- OTOH, it is NOT that common to have OLTP with lots of parallel reads – and as soon as it happens, these reads usually go to the replicas, making it a relatively moot issue.
For a pure mostly-writing OLTP (without too much reporting), lock-based RDBMS tend to perform a bit better than MVCC-based ones
- On the other hand – if you can make your OLTP load to use INSERTs over UPDATEs (see also discussion on history and audit in [[TODO]] section below), MVCC can be very efficient.

Also it is worth noting that if you’re using single-write-connection DB architecture (discussed in [[TODO]] section above) – the logical difference between Lock-based RDBMS and MVCC-based ones becomes rather moot (though performance-wise, all other things being equal, Lock-based ones will usually have a bit of an edge).

ACID guarantees

“for OLTP DB we DO need ACID transactions involving multiple rows and multiple tablesAs discussed above, for OLTP DB we DO want full-scale ACID transactions. Moreover, we DO need ACID transactions involving multiple rows and multiple tables. While exceptions to this rule do exist, they’re extremely rare and far between.

This pretty much automatically rules out MySQL+MyISAM for OLTP DBs. Note that MySQL+ISAM can be a good thing for quite a few apps (for example, as a back-end to courier tracking systems, or as a back-end for a system monitoring tool) – it is not just a good thing for usual OLTP processing which involves some kind of money-related information.

BTW, your RDBMS providing ACID guarantees pretty much implies that it has a DB log; which usually implies automated recovery (and automated rollforward) from the DB log in case of RDBMS crash.

Support for 24×7 Operation

Next set of features we’ll need, is related to support of 24×7 operations (you’re going to run your game 24×7, aren’t you?). These features include:

Online backup. Whatever we’re doing – we DO want to have a backup, and with 24×7 operation, online backup becomes a necessity.
- Usually, online backup also implies “log rollforward” capabilities. Most of the time, it goes like this – you can have 2 DBs, one being “master” and another being “slave”; you just take log files from “master”, send them to “slave”, and “rollforward” them on the slave.
  - Moreover, there are some DBs out there which allow to run read-only requests on a “slave” in “log rollforward” state (effectively making it a read-only slave replica); some other RDBMS, however, do NOT enable such requests (i.e. you need to finish “log rollforward” to bring your slave RDBMS into queriable state).
- “As an alternative to online backups, asynchronous master-slave replication can be usedAs an alternative to online backups, asynchronous master-slave replication can be used to keep an almost- in-sync backup copy (in particular, it is an interesting option for MySQL+InnoDB).
  - Note that replicas may or may not allow for “point in time” recovery which is possible with online backups + rollforward (check it with your RDBMS doc). While “point in time” recovery is needed to recover only from Really Bad Scenarios (and I didn’t see the need for it in real-world production) – it might save your bacon on one Really Bad Day.
- “instant” ADD COLUMN statement. With your DB in production, you will need to extend it, this is for sure; most of the time – this is done via ALTER TABLE… ADD COLUMN statements. And when facing ADD COLUMN statement, quite a few RDBMS out there will simply rewrite the whole table into new format of the row. And if your table had a billion rows – well, it is going to take many hours 🙁 (and while the copying is being made, all the access to that table is blocked, rendering your DB unusable for all those hours 🙁 ). It is not a rocket science to make ADD COLUMN near-instant (in single-millisecond range regardless of table size) – and there are RDBMSs out there which are doing it too, but you SHOULD keep in mind that this property is not universal 🙁 .
  - An “poor man’s” alternative is to implement lock-free ADD COLUMN (and ALTER TABLE in general) as follows:
    - make a “shadow” table with new structure
    - make a trigger which will write all the modifications from current table there
    - copy data from current table to “shadow” table (ignoring already existing rows(!) which have been put there by trigger)
    - substitute current table for a “shadow” one
  - This “poor man’s” ADD COLUMN is quite cumbersome (and affects performance significantly while working) – but if no other alternatives are available, it MIGHT work
  - “instant” ALTER COLUMN (widening fields) is a nice feature too, but as widening fields can be emulated via ADD COLUMN – I’d say it is not that important
- “As the RDBMS keeps modifying its tables – the tables gradually degradeOnline table optimization. This one needs a bit of explanation. As the RDBMS keeps modifying its tables – the tables gradually degrade (in fact, all kind of nasty things happen depending on the storage engine in use – from “overflow rows” to “dead rows”). To deal with it, some kind of optimization (known as OPTIMIZE TABLE for InnoDB, REORG TABLE for DB/2, VACUUM for Postgres, etc.) becomes necessary – and we DO want to do it online (without stopping the whole thing, as optimization of a hundred-million-row table may take a loooong while).
  - Most of the time, such optimization will require creating a “shadow copy” (kept by DB, which is always better than doing it yourself), which means additional space requirements. On the other hand, at least one RDBMS provides “in-place” table optimization.
- Containers with an optional re-balancing. We’ll discuss this issue in detail in Chapter [[TODO]], but for now let’s just mention that it is related to the need to add new HDD to store your data (which happens all the time) – and to improve the speed by spreading your data over all the HDDs, including the just-added one. This can be done by one of two mechanisms: (a) using RAID-10 (and then it doesn’t matter how DB stores data), and (b) using DB containers over multiple RAID-1 disks (with DB essentially working as RAID-0). As long as we’re not adding a new HDD (actually, a pair of HDDs to ensure redundancy) – both systems are pretty much equivalent; however, after adding a new pair of disks we’ll need a re-balancing between the disks to re-balance the load – and this re-balancing will be done by RAID or by DB respectively. RAID-level rebalancing usually causes MUCH more severe performance hit than DB-level one (pretty often, your system won’t be able to cope with the load while RAID-level rebalancing is in progress). Hence, my preference for DB-managed containers (with rebalancing after the container is added, being optional).

Performance

“Unfortunately, benchmarking DBs without a specific use case doesn't make much senseOf course, performance (and especially write performance) is critical to the OLTP DB. Unfortunately, benchmarking DBs without a specific use case doesn’t make much sense 🙁 . As a result, the best I can do is to mention some well-known performance-related architectural features and misfeatures for some of RDBMSs.

Hints for SQL Compiler

That men do not learn very much from the lessons of history is the most important of all the lessons of history.

— Aldous Huxley —

When we feed our SQL to a RDBMS, it gets compiled into an “execution plan”. And (in spite of DB developers may think or DB sales may tell you) compilers tend to get wrong from time to time 🙁 . Just one very common case of such problems occurring:

We’re using stats-based (a.k.a. cost-based) SQL Compiler
We have a large historical table with a TIMESTAMP field (happens all the time)
Stats happen to be a bit out of date – by a few hours/days (as it usually is)
We’re compiling SQL which gets some data over T=the last hour
- At this point, SQL Compiler sees in the stats that there is no data over T we requested – and decides to use index scan over the last hour (expecting 0 rows to be read).
  - There was another (actually better) execution plan (based on other index) – but SQL optimizer (expecting 0 rows in this index scan) decided to use time-based index
- However, during last hour a few millions transactions have been made, causing this index scan to take a veeeryyy loooong whiiile 🙁

To deal with such (and quite a few other) mishaps – there are so-called “SQL compiler hints”. “Hints” allow us to force RDBMS into execution plan which we want to use (and for 99% of OLTP statements it is possible to tell optimal execution plans well in advance).

“do NOT trust those people who’re saying “hey, DB always knows better than you” (usually it does, but always is an immensely strong statement which is virtually impossible to achieve)BTW, do NOT trust those people who’re saying “hey, DB always knows better than you” (usually it does, but always is an immensely strong statement which is virtually impossible to achieve). Besides the real-world cases of epic optimizer failures such as those listed above, there is also one interesting story about it. In the times of ancient RDBMS, IBM DB/2 team was adamant about NOT allowing hints, period (“We know better than you! And we don’t – this is our bug to be fixed.”). However (as always with such blanket statements) it didn’t work quite as promised. Over time problems with DB/2 compiler choosing wrong execution plan, have lead to de-facto-standard practices “how to cheat DB/2 optimizer”, that included things such as adding “OR 1=0” to the WHERE clause – just to affect optimizer (actually, “OR 1=0” was acting as a SQL compiler hint). Moreover, this trick became so ubiquitous (in spite of DB/2 team still telling “we always know better”) that DB/2 team has eventually said that behavior surrounding “OR 1=0” is so important for their customers, that they will guarantee it to continue working this way forever 😉 . But this is still not the end of the story. Around 10 years ago (and about 20 years of persistent claiming “we know better than you”) DB/2 did introduce hints (which are ugly – but are arguably still better than jumping through the hoops of “OR 1=0” etc.).

Currently, Postgres team takes the same stance as DB/2 team was taking 30 years ago; the only thing I’m wondering in this regard is not IF Postgres will change their position on hints – but WHEN it will happen. Unfortunately, the only lesson which people tend to learn from history – is that nobody learns anything from history 🙁 .

[[TODO: indexes and more indexes; including: clustered indexes, probably referring to http://use-the-index-luke.com/sql/clustering/index-organized-clustered-index , function indexes, sparse indexes, b-tree (incl. uni-/bi-directional) index, hash index, bitmap indexes (little use for OLTP)]]

OLTP Performance Issues

“It seems that some of RDBMSs were not designed with write-heavy OLTP in mind (concentrating on read queries instead)It seems that some of RDBMSs were not designed with write-heavy OLTP in mind (concentrating on read queries instead). While it doesn’t make such RDBMS inherently bad (after all, MOST of DBs out there are indeed working mostly with read queries) – it can be quite a problem for real-world write-heavy OLTP environments 🙁 . Let’s take a closer look at these rather well-known issues.

Postgres: ctid changes even on updates to a non-indexed field (controversial)

There were reports that Postgres has been observed to have serious performance issues when updating real-world DBs with quite a few indexes. You can find a detailed discussion in the [StackOverflow.PostgresUpdates] and [Klitzke], but from our perspective, it all boils down to the following quote:

“Since an index references a row by ctid, a simple UPDATE (even on a non-indexed column) will change the ctid, resulting in the need to rewrite the ctid in every index in the table that references that changed row.”

That would be pretty bad, especially for a write-heavy OLTP DB 🙁 . On the other hand, since Postgres 8.3, there is a so-called Heap-Only Tuples (HOT) feature which at least in theory should eliminate most of the related problems (though I don’t have any real-world confirmation that it really does), see [Postgres.HOT] for a brief description. The idea roughly goes as follows: with HOT working, and if new row fits into the same page – then in spite of ctid being changed, indexes still point to the same page, so they do NOT need to be updated. This can work, of course, only as long as the new row fits into the same page; to deal with it, “opportunistic mini-vacuums” seem to help: while Postgres still cannot prune the tuple being updated (it is necessary to keep it for MVCC purposes) – it can (and supposedly will) prune older tuples from the same page, which may allow to keep new row within the same page, and avoid updating indexes.

Bottom line: while Postgres did have that issue of unnecessary index updates – it is significantly mitigated by HOT feature; whether it is completely mitigated by HOT – is still an open question (and mitigation may require additional configuration to keep some space in pages for HOT); at least you can monitor efficiency of HOT for your instance of DB (see [Postgres.HOT]).

MemSQL: polling for log writes

While in-memory DBs look potentially quite interesting for OLTP – I don’t seriously consider MemSQL appropriate for these purposes, and here is an explanation why.

As mentioned in [Mituzas] – MemSQL uses 50-ms polling to issue writes to DB log. This is a pretty bad practice for any kind of OLTP DB, but if you’re going to follow my advice and go for single-write-DB-connection architecture – this misfeature of MemSQL will hurt really badly 🙁 . Make sure to double-check if they’re still doing it – but if they do, stay away at the very least for single-write-DB-connection configurations.

Execution plans and Profiling

“to debug and profile your SQL statements, the bare minimum you need is a tool which can show you “execution plans” for your SQL queriesWhen working with production DBs, you’ll need to debug and profile your SQL statements. And to do it, the bare minimum you need is a tool which can show you “execution plans” for your SQL queries. This allows to predict the way how your query will be executed (that is, if you’re compiling your SQL into the execution plan on a production DB – or a DB with the stats imported from production).

On the other hand, execution plans show only predicted costs of execution (calculated from DB stats); in practice the numbers can differ by orders of magnitude.

To address it – some kind of real-time profiling can be used. This MIGHT be a useful tool, though IMO it is not an absolute necessity: usually, with some experience and using some common sense, performance problems with queries are easy identifiable (it is usually MUCH more difficult to force DB to use the execution plan you want than to identify why currently selected execution plan sucks).

In-Memory Processing

These days, quite a few RDBMS provide in-memory processing options. These can be divided into two large groups:

“Unfortunately, non-Durable in-memory processing is usually NOT acceptable for OLTP DBs.Non-durable in-memory processing. This includes running your RDBMS from RAM disk, and MEMORY storage engine of MySQL. Unfortunately, these are usually NOT acceptable for OLTP DBs.
Durable in-memory processing. This can be seen as having an in-memory cache sitting between your app and RDBMS (actually, Oracle’s TimesTen IMDB Cache is even marketed as a cache). All major commercial vendors listed below, provide this option – and for all of them it costs arm and leg 🙁 .

Replication

As it was discussed in [[TODO]] section above – under a serious load, you will most likely need a (read-only) replica of your DB sooner or later. And if your RDBMS supports replication (and it works) – as noted above, you won’t need to DIY 🙂 .

What we need most of the time – is so-called master-slave asynchronous replication (so that delays with slave replica don’t affect the master). Other features, such as merge replication capabilities (also in a simple master-slave asynchronous environment – and without any conflicts whatsoever) may be of use too (see, for example, [[TODO]] section above).

I should mention that from what I’ve seen, RDBMS-provided replicas behaved pretty bad under heavy loads 🙁 . In one extreme case – after several days under the load of less than 1M transactions/day, replication just kept falling apart with some obscure error, requiring complete re-sync of the replica (which was a HUGE headache too 🙁 ). Moral of the story –

make sure to test your replication under severe load before relying on it

DUD A device or machine that is useless because it does not work properly or has failed to work— Wiktionary —Fortunately, if replication happens to be a DUD (and you’re using a single-connection approach) – you can implement DIY replication with a relative ease (more on it in Vol. 3, tentatively Chapter [[TODO]]).

Partitioning

RDBMS-provided partitioning is one thing which is highly touted as a tool to achieve scalability. On the other hand – as discussed in [[TODO]] section above, I tend to prefer pure Share-Nothing models (with app-level partitioning) as they tend to provide much more linear scalability than partitioning one DB across different server boxes. Still, there are cases when RDBMS-provided partitioning can be useful, so if it is provided – it is a plus (though not as big as your RDBMS sales guy will tell you).

On OLTP Non-Issues

When trying to look for comparisons between different RDBMSs, you’ll certainly see tons of arguments about support for JOINs or different interpretations of SQL standard in different RDBMSs. However, one thing I need to tell is that

While all these things are all-important for “Reporting” DBs and “Analytic” DBs – as a rule of thumb, they’re NOT that important for an OLTP one

OLTP DB is a very strange beast; in particular – statements using JOINs are rather rare there. Sure, you’ll need your JOINs somewhere else (hey, that’s what SQL is all about 😉 ) – but most of the time not in your OLTP DB (and even if you have your JOINs here – they will be extremely simple anyway). Therefore, unless you’re relying on your “Reporting DB” and your “OLTP DB” to be the same (more on it a bit below) – these issues become rather moot.

“That being said, there ARE reasons to have your “reporting” RDBMS the same as your OLTP RDBMSThat being said, there ARE reasons to have your “reporting” RDBMS the same as your OLTP RDBMS. In particular, if you manage to run your RDBMS-level replication between these two DBs – you’ll get your replica(s) without (straightforward but rather time-consuming) DIY replication.

Another issue which is highly touted by RDBMS vendors (especially commercial ones) – is fault tolerance. However, as we’ve discussed it in Chapter VII, for DB Servers starting to use fault tolerance features does NOT guarantee improvement in MTBF (this happens as soon as we take into account that MTBF of the fault tolerance system itself is not an infinity); moreover – as it was also discussed in Chapter VII – real-world observations tend to show that MTBF of the fault tolerance features tend to be less than MTBF of the good hardware server – which in turn means that MTBF of the system is higher if we do NOT use any fault tolerance (and this was observed in real world too). In other words – chances of the hardware (such as CPU or motherboard) failure for a good server are apparently lower than chances of the fault tolerance system misfiring (and causing all kinds of trouble, all the way up to “split brain” scenarios).

Sure, there are cases when you do need fault tolerance (for example, if your game is a stock exchange or a bank) – but these are likely to run on top of DB/2 / Oracle anyway; as they’re rather similar in this regard, the question of fault tolerance for your RDBMS will become rather moot.

Licensing

“as soon as you start delving into licensing for commercial RDBMSs, you’ll not only find that they’re expensive – but also that their licensing is so convoluted, that you’re going to spend several days until you find out how much it is going to costLast but certainly not least – there is always a question of licensing and license costs. And as soon as you start delving into licensing for commercial RDBMSs, you’ll not only find that they’re expensive – but also that their licensing is so convoluted, that you’re going to spend several days just to figure out how much it is going to cost. Below is my own analysis of the licensing of the three top commercial RDBMS as of the end of 2016.

A few notes before we start:

I will try to compare pricing on per-core basis
I’ll concentrate only on production-oriented licensing, ignoring all the special editions such as “Developer” ones, “Student” ones, editions available only via hosting providers, and so on.
In addition, we’ll ignore “authorized clients” licensing models too (as inapplicable for deployments with an unknown number of users).
Disclaimer: all the licensing presented as “at the best of my understanding”, with no warranties of any kind. Make sure to do your own analysis before making any decisions, especially expensive ones.
Also keep in mind that while there are certain general tendencies (like providing lower-tier DB for free) – licensing for commercial DBs changes frequently; in particular, while usually limits on cores etc. tend to go up with time – they can go down too 🙁 .
Last but not least: make sure to negotiate with your distributor; in most cases, discounts of 50% are to be expected – and for larger contracts, they can easily reach 80%.

Microsoft SQL Server 2016

Out of major commercial DBs, MS SQL Server is the cheapest one – and has one of the simplest licensing model too (yes, what follows is cheap and simple by commercial RDBMS standards ;-( ). NB: data below is derived from [SQLServer.Editions].

Microsoft SQL Server Express

Cost: Free.

Limitations: max 1 socket (or 4 cores, whichever is less), 1GB RAM, 10GB total DB Size. With SQL Server 2016 these limits are per-instance, apply to resources which will be used (rather than to the server where it will work), and multiple instances running on the same physical server to bypass per-instance limits, are officially allowed [SQLServer.CapacityLimits].

“with mere 10GB limit per DB, even DB-per-Service DBs might hit this limit pretty quicklyThis makes a pretty good case for DB-per-Service models (with single-DB-connection or not). Still, with mere 10GB limit per DB, even DB-per-Service DBs might hit this limit pretty quickly 🙁 .

Functionality: basic SQL, support for 24×7 operation, being a client for replication.

Missing Functionality: partitioning, being a master for replication.

Of course, it is possible to do DIY partitioning and replication; for DIY partitioning of non-Enterprise SQL Server – see, for example, [Clement]; for DIY replication – see Vol. 3 (tentatively Chapter [[TODO]]).

Microsoft SQL Server Standard

Cost: ~$2K-$4K per core (see [Ozar] and [SQLServer.Pricing]).

Limitations: max 4 sockets (or 24 cores, whichever comes first), 128GB RAM, pretty much unlimited total DB Size.

Functionality: basic SQL, support for 24×7 operation, replication, SQL Profiler.

Missing Functionality: partitioning.

SQL Server Standard is a fully-functional RDBMS and is a good candidate for OLTP; if your budget allows – I tend to prefer it to MySQL (reliability- and feature-wise); on the other hand, if your budget is still fatter – I’d consider DB/2 or Oracle instead (they’re substantially more expensive, but they still MIGHT be more stable in the long run than MS SQL¹).

¹ 15 years ago SQL Server crashed MUCH more frequently than well-established DB/2 and Oracle; however, since that point MS SQL did reduced the gap significantly; whether the gap is at zero now – is everybody’s guess

Microsoft SQL Server Enterprise

Cost: ~$7K-$14K per core.

Limitations: none.

Functionality: everything we might need, plus in-memory OLTP processing.

Missing Functionality: none.

SQL Server Enterprise is expensive for sure; TBH, I do NOT see any realistic use cases for it – except for in-memory OLTP (and in this field, it beats all other commercial RDBMS price-wise by a Damn Lot). Fortunately, if you’re doing things along the lines outlined in this book – most likely you won’t need in-memory OLTP until you have hundreds of millions write transactions per day – and then the price probably won’t be that high.

IBM DB/2 10.5

One field where IBM DB/2 is an indisputable champion… is in making pricing as incomprehensible as possible 🙁 ². On the other hand, DB/2 does have quite a few good technical properties (and I had very good experiences with it too), so I will still include it in this analysis.

² I’m wondering – how long it will take DB/2 marketing team to realise that they’re losing sales to new customers BADLY because of it?

DB/2 Express-C

Cost: Free.

Limitations: max 2 cores, 16G RAM, 15 TB DB size. The limits seem to be per-server (rather than per-instance), but can be applied per virtualization session [RadaMelnyk].

Functionality: basic SQL, support for 24×7 operation.

Missing Functionality: partitioning, replication.

Overall, with quite a few single-write-DB-connection deployments (especially if we’re delegating reporting to replicas), 2-core limit doesn’t look too bad, and the whole thing will likely work without being really restricted. Other limits aren’t likely to hit too bad either.

See also above on DIY partitioning and replication (and DIY partitioning for DB/2 can be done pretty much the same way as for SQL Server).

DB/2 Express

“With DB/2 Express, there are essentially two pricing models which may work for us.Cost: With DB/2 Express, there are essentially two pricing models which may work for us. One is based on so-called PVUs (whatever it means): at around $70/PVU³ and 100PVUs/core, we’ll get $7K per core. Another pricing model is based on so-called FTL – basically it is rental of the whole thing on per-year basis (with TCO lower at first, and higher in the long run). Unfortunately, I wasn’t able to find out about current FTL costs for DB/2 🙁 .

Limitations: max 8 cores, 64G RAM (per server), 15 TB DB size.

Functionality: basic SQL, support for 24×7 operation, replication (one flavor of replication, so-called SQL Replication).

Missing Functionality: partitioning, Q Replication.

DB/2 Express is quite expensive – but quite capable DB too. For OLTP, it is rather unlikely that you’ll exceed its limits. For reporting replicas (and analytics) – well, you may.

³ NB: there is no publicly available data on DB/2 pricing 🙁 , you’ll need to go to distributor, but they will probably start negotiating around this number

DB/2 Workgroup

As it stands now, DB/2 Workgroup is pretty much the same as DB/2 Express, with the following differences:

Limits are increased to 16 cores and 128G RAM (per server).
Cost is about 1.5x higher than that of DB/2 Express (i.e in the range of $10K/core).

Not sure whether you’ll need Workgroup for your OLTP – but you MAY need those increase limits for your reporting replicas.

DB/2 Enterprise

Compared to DB/2 Workgroup, limits are lifted, and price goes up by 5x-6x compared to DB/2 Workgroup (NB: that’s around $50K/core(!)). Also Enterprise Server includes quite a few bells and whistles (like Q Replication) – but TBH, at this price I’d rather develop replication myself ;-).

TBH, I do NOT see use cases for DB/2 Enterprise for OLTP (not even for banks/stock exchanges).

DB/2 Advanced * Server

To add even more to licensing complexity, DB/2 features Advanced Workgroup Server and Advanced Enterprise Server. These tend to have exorbitant prices (Advanced Workgroup being almost as expensive as non-Advanced Enterprise, and Advanced Enterprise being about 1.5x more expensive than non-Advanced Enterprise). On the other hand – they provide per-TeraByte pricing (be prepared to pay around $50K/TB for Advanced Workgroup – and $100K/TB for Advanced Enterprise); as your OLTP DB isn’t likely to exceed 1TB (and less-than-1TB licenses are not available) – that’s about what you’re going to pay.

In our context of OLTP processing, there MIGHT be one use case DB/2 Advanced Workgroup Server – that is, if you have TONS of money, and you need in-memory processing too.

Oracle Database 12c

While Oracle licensing is less complicated than DB/2 one, it tends to be even more expensive 🙁 .

Oracle DB Express (DB XE)

Cost: Free

Limits: 11G of user data, up to 1G of RAM, using single core.

Functionality: basic SQL, support for 24×7 operation.

Missing Functionality: partitioning, replication.

IMO, limit of 11G on user data makes it rather difficult to use in real-world even if we use DB-per-Service model. OTOH, if we’re speaking only about OLTP – well, I won’t say that it cannot possibly fly.

Oracle DB Standard Edition 2 (DB SE2)

Cost: $17500/core⁴ (perpetual), $3500/core (lease for 1 year) – and be prepared to pay extra for additional features whenever you need them 🙁 .

Limits: up to 16 cores (or 2 sockets, whichever comes first).

Functionality (included): basic SQL, support for 24×7 operation, replication.

Missing Functionality: partitioning.

“TBH, SE2 should be enough for all your OLTP needs (except for in-memory processing). It is going to cost you though.TBH, SE2 should be enough for all your OLTP needs (except for in-memory processing). It is going to cost you though 🙁 .

⁴ Oracle pricing is “per-Processor”; it seems that for recent x64 CPUs, “processor” is the same as “core”

Oracle DB Enterprise Edition (DB EE)

Cost: $47500/core (perpetual), $9500/CPU (lease for 1 year) – and be prepared to pay extra for additional features whenever you need them 🙁 .

Limits: none.

Functionality (included): basic SQL, support for 24×7 operation, replication, partitioning.

Optional functionality (for additional price): in-memory processing (TimesTen In-Memory Cache for Oracle EE).

The only reason I know for using Oracle EE in OLTP environments – is TimesTen. And it is Damn Expensive too 🙁 .

RDBMS-to-OLTP-Features Table

[[TODO: Google Spanner(latencies?)]]

[[TODO: online (no-lock) RUNSTATS(!!)]]

All the previous features can be summarized into one table (to reiterate: these things are important for OLTP production, and may be completely irrelevant to reporting / analytics / etc.; in other words – it is NOT about “what’s best RDBMS overall”; it is rather about “what’s the best DB for write-heavy OLTP”):

NB: most important for our purposes properties are in bold	MySQL + InnoDB⁵ ⁶	PostgreSQL	MS SQL Server	IBM DB/2	Oracle
Type	MVCC	MVCC	Lock-based or MVCC	Lock-based	MVCC
ACID guarantees	Multi-row ACID	Multi-row ACID	Multi-row ACID	Multi-row ACID	Multi-row ACID
24×7 operation
Online backup	3^rd-party Async replication can be used instead	Yes	Yes	Yes	Yes
ADD COLUMN	Copies whole table; SLOW 3^rd-party trigger-based workaround⁷	“Instant”	“Instant”⁸	“Instant”	“Instant”
Table optimization	Online⁹	Online	Online	Online In-place¹⁰	Online
Adding disks	Relies on file system; adding disks is possible only via RAID, which is very likely to cause rebalancing, likely causing severe performance problems	Relies on file system; adding disks is possible only via RAID, which is very likely to cause rebalancing, likely causing severe performance problems	There are containers (as files in “file groups”), but seems to be no rebalancing option¹¹	Containers with Optional Rebalancing	Containers with Optional Rebalancing [[TODO: check]]
OLTP Performance
Hints	USE INDEX	Third-party ([Github.pg_hint_plan])¹²	WITH(INDEX)	Possible but Ugly¹³	INDEX/NO_INDEX
Known OLTP Performance Issues		Rewriting whole row on non-index updates¹⁴
Execution Plans / Profiling	EXPLAIN, Profiling¹⁵	EXPLAIN	SHOWPLAN_*, Profiler¹⁶	EXPLAIN, Profiler	EXPLAIN, Profiling
In-Memory Processing (with Durability)	No¹⁷	No	Yes (expensive)	Yes (expensive)	Yes (expensive)
Replication / Partitioning
Async Master-Slave Replication	Yes	Yes	Yes¹⁸	Yes¹⁹	Yes²⁰
Partitioning	Yes	Yes	Yes²¹	Yes²²	Yes²³
Pricing
Price option 1	Free²⁴	Free	Free (up to 10G data)	Free (up to 2 cores)	Free (up to 11G data)
Price option 2	$5K/Server/year		$2K-$4K / core	$7K / core	$17.5K/core or $3.5K/core/year
Price option 3 (incl. in-memory processing)			$7K-$14K / core	$40K / core or $50K / TB	$70.5K / core

⁵ we did NOT consider MySQL+MyISAM because of lack of support of multi-row ACID transactions

⁶ I am not qualified to tell how much MariaDB+XtraDB is different from MySQL+InnoDB; probably not too much – but do NOT take my words for granted

⁷ at least the one by Percona; DIY is also possible

⁸ previously existing restrictions seem to be gone starting from SQL Server 2012: [Rusanu]

⁹ since MySQL 5.7.4

¹⁰ in-place reorg seems to be unavailable in Express-C edition

¹¹ see [Randal]

¹² while there are ways to influence optimiser in Postgres itself (see, for example, [Postgres.QueryPlanning]) – they’re not per-query, so using them, while theoretically possible, is more than just “Damn Inconvenient” 🙁

¹³ XML-based “profiles”? Gimme a break. And classical DB/2-ish “OR 1=0” to play around optimizer is probably even worse

¹⁴ while it is mitigated by HOT, it is not clear whether mitigation is 100%

¹⁵ visualizing is a paid option

¹⁶ Profiler requires at least SQL Server Standard

¹⁷ No, MEMORY storage engine or running whatever engine off RAM disk don’t qualify because of lack of Durability

¹⁸ Being replication master requires at least SQL Server Standard

¹⁹ requires at least DB/2 Express (and TBH, for OLTP I strongly prefer Q replication, which requires DB/2 Enterprise and pretty much out of reach)

²⁰ requires at least Oracle SE2

²¹ requires SQL Server Enterprise

²² requires DB/2 Enterprise

²³ requires Oracle Enterprise

²⁴ there is a paid support option

Picking Your Poison

“As we can see from the table above – choosing your RDBMS it is not as easy as it might seem.As we can see from the table above – choosing your RDBMS it is not as easy as it might seem. Free DBs, while the price being Really Attractive ;-), have certain problems in production. Honestly, if I could forget about price – I’d certainly choose one of commercial RDBMS above (probably DB/2 or Oracle). On the other hand – pricing for commercial RDBMS is very high even for “Standard”/”Express” editions; when going to the Enterprise level – it becomes outright atrocious 🙁 – though in practice we don’t really need them 🙂 .

As a result, if starting a new project, I would probably consider several different options and weigh their pros and cons depending on the needs of your specific project.

On Cross-RDBMS SQL

One of the possible answers to “which RDBMS we’re going to use” question is “well, ANY of them” 😉 . While this is an interesting (and actually viable) option – to do this, we’ll need to make sure that all our SQL is cross-RDBMS.

“all RDBMS vendors are working hard on achieving vendor lock-in, and usually succeed with itAnd while it IS possible to write RDBMS-agnostic SQL, it ranges from “quite difficult” to “very difficult” 🙁 (IMO – more difficult than writing cross-platform C++, which is pretty difficult to start with). The reason for this difficulty is that all RDBMS vendors are working hard on achieving vendor lock-in, and usually succeed with it 🙁 . If you ask your DB guys about going cross-RDBMS – 99% chance that they will tell you that you’re crazy on missing on all those great features of <whatever-RDBMS-they’re-familiar-with>.

Still, being a consistent critic of all the vendor lock-ins, I am advocating for cross-RDBMS SQL (and I’ve done it on a pretty large scale too). On the other hand – it is NOT a life-and-death question for your project, so it is up to you whether to do it. As usual with going cross-platform, it will hurt in the short-run, but will help in the long run.

IF you want to go in this direction, keep in mind the most common pitfalls on this way:

DON’T use stored procedures – they’re as vendor-specific as they go. With app-level “DB Server” isolating your DB from your app-level – this is not as much problem as it sounds (and prepared statements have been seen to provide very similar performance).
DON’T use triggers. Not that you really need them anyway (exceptions do exist, but they’re extremely rare and far between).
DON’T use auto increment stuff²⁵
DON’T use vendor-specific functions (the most common example of such function being CURRENT_TIMESTAMP/CURRENT TIMESTAMP/getdate()/Now()).

BTW, regardless of what your DB guy may tell you – vendor-specific extensions are NOT necessary to achieve good performance (see [[TODO]] section above about real-world system processing ~30M write transactions/day in one single DB connection without all these things).

Last but certainly not least – compiled SQL bindings (which we’ll discuss soon, and which is a Good Thing(tm) for quite a few reasons) – can be of a significant help for achieving cross-platform SQL.

²⁵ BTW, for single-write-DB-connection apps it is not a problem at all, as the last value is easily cacheable

Free DB All the Way

“Cross-platform SQL aside, let's start discussing our options. The most obvious one is to use a free RDBMS.Cross-platform SQL aside, let’s start discussing our options. The most obvious one is to use a free RDBMS.

If using free RDBMS for OLTP purposes, I tend to prefer MySQL+InnoDB over Postgres. The reason is that IMO, design decisions made for Postgres, tend to prefer read-intensive workloads to write-intensive ones, and Postgres has been reported to experience serious performance problems in write-heavy OLTP environments. On the other hand, when it comes to 3rd-party reports, quite a few Really Large companies (such as Skype) are running PostgreSQL for really large loads successfully, so it is clearly possible to use Postgres for OLTP purposes too.

On the third hand ;-), I need to mention that with MySQL as your OLTP DB, running it in production won’t be a picnic; in particular, adding new fields to your tables is going to be rather cumbersome (though solvable).

One more thing to be mentioned with respect to your RDBMS: when you have hundreds of thousands of simultaneous players – any downtime is going to cost you a LOT. And commercial RDBMS will usually have an edge over MySQL Community Edition (which tend to change much more frequently than commercial DBs, see, for example, discussion in [Schwartz]) – and over Postgres too (see, for example, complaint by [Klitzke] about intermittent problem with data inconsistency – ouch!). And if going to Enterprise Edition of MySQL – well, price-wise we won’t be too far from our next contender – Microsoft SQL Server (and if choosing between the two without including price into consideration – I’d clearly prefer SQL Server at least for OLTP).

Microsoft SQL Server, and more Microsoft SQL Server

Microsoft SQL Server is positioned between two extremes (free DBs and ultra-expensive DB/2 and Oracle). For quite a few games out there, it would be a reasonable compromise between stability (and 24×7 features) and price.

“This setup (costing you around $40K-$80K total) should be able to sustain the order of a few hundreds of thousands of simultaneous playersIf going this way, it MIGHT be possible to start with a free SQL Server Express (well, until your DB exceeds very modest 10G in size), and to go ahead – to buy SQL Server Standard 4-core license for your OLTP, and 2x 8-core licenses for your reporting replicas as the time goes. This setup (totalling to 20 cores and costing you around $40K-$80K total) should be able to sustain the order of a few hundreds of thousands of simultaneous players – and if your monetization allows to spend this kind of money for this setup²⁶ – it can be a perfectly viable option.

I want to emphasize that feature- and reliability-wise (and in spite of all the improvement made to SQL Server during last 20 years) I am still not convinced that MS SQL Server can be considered to be on par with DB/2 and Oracle. However, considering its price (which is low compared to other commercial offerings) – it might be a reasonable compromise.

Commercial RDBMS OLTP + replicas based on free RDBMS

The third route which I would consider as viable – is to use commercial RDBMS (personally I had very good experiences with DB/2) as your OLTP DB; as a rule of thumb, you won’t need more than 4 cores there. As a rule of thumb, OLTP DBs are usually not large (most of the space is taken by historical data, which can be usually pushed down to replicas) – so there is a chance that you’ll be able to run your OLTP system even on a free version (NB: with current restrictions, it looks more likely for DB/2 than for Oracle and SQL Server, but DB/2 is going to be on a higher side when you’re out of free allowance).

And if/when a free version won’t be enough – even running a “Standard” / “Express” edition for an OLTP DB (but not for your reporting replicas) isn’t likely to kill you (well, at least if you have some kind of monetisation in mind, and given this monetisation – $10K-30K doesn’t look prohibitively high).

“On the other hand, running your reporting replicas off commercial RDBMS can become way too costly quite easilyOn the other hand, running your reporting replicas off commercial RDBMS (unless it is a MS SQL Server) can become way too costly quite easily; to avoid it – well, we can use DIY replication and run replicas off your favorite RDBMS (this MAY include Postgres, though some testing will be necessary to make sure it does work reasonably fast under replication writing load).

Cash Cows (Stock Exchanges/Casinos): DB/2 or Oracle

Last but not least: if you’re running a stock exchange, bank, or a casino – I’d rather not take any risks and would run the whole thing off DB/2 or Oracle. With these games traditionally being cash cows – it is not likely that DB/2 or Oracle license costs will kill you; and both these RDBMS are known to work as a charm in heavy OLTP environments (and with all the features too). Note that most likely, you still won’t need to go above Workgroup / SE2 editions for your OLTP DB (though as always, YMMV and batteries are not included).

[[To Be Continued…

This concludes beta Chapter 20(f) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 20(g), where we’ll discuss “compiling” your SQL bindings (which are useful for quite a few reasons)]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Comments

David Turner says

November 14, 2016 at 5:51 pm

Hi,

The conventional shortening of PostgreSQL is Postgres; you use “Postgre” (without the trailing “s”) enough that I think it’s deliberate.

Fwiw there’s a certain amount of influence possible on the Postgres query planner: https://www.postgresql.org/docs/9.5/static/runtime-config-query.html although the docs do recommend against using it too much.

- "No Bugs" Hare says
  
  November 15, 2016 at 6:47 am
  
  THANKS!
  
  About the “Postgres” – you’re right, changed it.
  
  About influencing Postgres query planner – it seems to be not a per-query hint, but rather a DB-wide config, am I right? I’ve added a note about it, but I’m not even sure if it qualifies as “barely usable” (changing config parameters for each SQL to be compiled? – well, theoretically possible, but extremely ugly and extremely inconvenient at the very least).
  
Matthew Woodcraft says

November 14, 2016 at 7:54 pm

Nitpicks:

Nearly everywhere you are admirably clear about distinguishing “core” from “socket”. But there are a couple of places where you have “/CPU”, which is not so clear.

It might be worth always being explicit about whether prices are perpetual or per year.

The [StackOverflow.PostgreUpdates] link might be better going directly to RolandoMySQLDBA’s answer rather than to the question (because the reason for the deadlocks described in the question is the poster’s locking-order bug, not the “ctid changes even on updates to a non-indexed field” performance problem).

- "No Bugs" Hare says
  
  November 15, 2016 at 7:18 am
  
  About CPUs: THANKS for pointing it out; actually, because of this I’ve took another look – and found a 2x mistake in my Oracle estimates (Oracle is actually 2x more expensive than I wrote before 🙁 , as last x64s seem to qualify as full “Processors” for Oracle licensing – and previously each x64 core counted only as 0.5x of the Oracle “Processor”).
  
  As for links – yes, I changed it (and added a link to original Uber article too). THANKS! 🙂
  
Richard Simpson says

November 16, 2016 at 5:00 pm

With Microsoft’s unification of feature sets across SKUs (announced @ Connect), how would that affect your opinions?

- "No Bugs" Hare says
  
  November 17, 2016 at 6:22 am
  
  Can you give a link to the announcement?
  
Frank H says

December 8, 2016 at 1:13 am

the “poor but wise man” can get ahead of the expensive ADD COLUMN problem by adding a several nullable columns of various types. many DBs use a bitmap to indicate whether a nullable column has a null value. therefore, at worse, an unused column will add 1 byte per row. and 1 extra byte can buy you 8 extra future use null columns. or something like that.

- "No Bugs" Hare says
  
  December 8, 2016 at 4:28 am
  
  Well yes – but it will only work once for each type :-(.
  
  - Frank H says
    
    December 8, 2016 at 8:53 pm
    
    I think the more popular DBs can support ~ 100 to 1000 columns per table. One could add many extra columns, even multiple of the same type. But I agree w your sentiment. This trick can only take you so far.
    
    - "No Bugs" Hare says
      
      December 9, 2016 at 7:47 am
      
      Exactly; it IS a workaround, but it is Damn Ugly and it can only take you so far. 🙁 I am still MUCH happier with DBs which are allowing to ADD COLUMN without rewriting the whole thing (it is not a rocket science at any rate; DB/2 is doing it since time immemorial).
      
Max Z says

December 14, 2016 at 3:31 pm

I was informed that PostgreSQL has a hints extension that is quite usable in production. I haven’t used it myself, though.

https://github.com/ossc-db/pg_hint_plan

- "No Bugs" Hare says
  
  December 15, 2016 at 5:28 am
  
  THANKS! I’ve added a reference to it.