OLTP DB Optimizations 102 – Group Commits

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[rabbit_ddmog vol=”7″ chap=”Chapter 27(e) from “beta” Volume VII”]

As it was discussed in Vol. VI’s chapter on Databases – OLTP DB with Single-writing-DB-connection tends to be very latency-sensitive. To alleviate this problem (which, BTW, happens to be pretty much the only problem with single-writing-DB-connection Shared-Nothing architectures) – we can use a concept of Group Commits.

The Idea Behind Group Commits

The basic idea behind Group Commits is simple:

When performing writing transaction, a very significant cost is the cost of fsync()-ing the DB log to a physical disk; this is necessary to comply with Durability requirement from ACID – after transaction is committed, even in case of sudden power loss (kernel panic, RDBMS crash, whatever-else) transaction should already reside in persistent storage, so RDBMS can use during crash recovery. In practice (and even if we’re using BBWC RAID) – fsync() can cost in the range of 300-500µs (see, for example, [Callaghan], and it is also consistent with my personal observations).
“if we can commit several transactions at the same time – RDBMS needs to fsync() our DB logs to physical disks only onceOn the other hand, if we can commit several transactions at the same time – RDBMS needs to fsync() our DB logs to physical disks only once
- To comply with ACID requirements, we MUST NOT confirm ANY of the transactions before fsync() completes (!).
- As soon as we complete fsync() – we can report all of the transactions written to DB log before fsync(), as “successfully committed”.

RDBMS-Level Group Commits

Most of modern RDBMSs tend to support a concept of Group Commits at DB level – and for multiple-connection configurations they DO work with only a minimal effort from our side. At least, for DB/2 Group Commits are supported via mincommit parameter, and for Oracle it can be achieved via setting COMMIT_WRITE to BATCH.¹

Unfortunately, for the single-writing-DB-connections, RDBMS-level group commits do NOT work <sad-face />. Indeed – with single-writing-DB-connections we cannot start second transaction before we commit a previous one – and in turn it means that at no point RDBMS has information about two transactions which it might want to commit. Moreover, if RDBMS is configured to wait for the second transaction (delaying committing the first one) – then using RDBMS-level group commit for our single-writing-DB will increase latency, causing performance degradation rather than expected performance improvement. In other words –

if going our preferred way of single-writing-DB connection – make sure NOT to use RDBMS-level Group Commits.

¹ Make sure NOT to use NOWAIT parameter of COMMIT_WRITE if you care about ACID(!)

DIY Group Commits for single-writing-DB-connections

Still, the same basic idea (~=”avoiding expensive fsync() by combining several transactions into a single one”) can be applied to our single-writing-DB-connection too. Moreover, it can be done with a very minimal help from app-level (i.e. most of it will reside within our own Infrastructure Code, which is independent from business- and game-logic).

Let’s see how DIY Group Commits can be implemented if we use a DB (Re)Actor to implement our DB App.² A very very brief recap of a hundred-page discussion on (Re)Actors from Vol. II:

Our app-level logic implements DBReactor object which has DBReactor.react(event) function – and we provide an instance of this DBReactor object to the Infrastructure Code
- Within app-level logic (according to Mostly-Non-Blocking paradigm) – we still use blocking calls (such as ODBC/JDBC/… calls) to communicate with our RDBMS. While this is NOT a strict requirement, it is the way all-real-world-DB-(Re)Actors-I’ve-seen, were implemented.
our Infrastructure code waits for events to arrive, and calls DBReactor.react() whenever a new event arrives.
- react() is called within single thread,³ so no thread sync within the react() is ever necessary.

Now, let’s see how we can implement Group Commits at the level of Infrastructure Code:⁴

Infrastructure Code provides “wrapper” function for the SQLEndTran()/SQLConnection.commit()/…⁵
- We make sure that to commit transaction, DBReactor always calls these “wrapper” functions provided by Infrastructure Code (and doesn’t call SQLEndTran()/… directly)
At Infrastructure Code level, before calling DBReactor.react(), we begin DB transaction (while this is not necessary for ODBC or JDBC, it MIGHT be necessary for quite a few other APIs). Let’s name this transaction a “Larger Transaction”.
As usual, we still call DBReactor.react() for each incoming event
- However, whenever DBReactor.react() calls our “wrapper” around SQLEndTran() – we do NOT call underlying ODBC SQLEndTran() immediately⁶ (instead, we SHOULD use our wrapper to make certain checks that our app-level react() is indeed working in a sane manner, with no situations, where modifying SQL statements were invoked from react(), and then react() returned without a call to SQLEndTran() wrapper).
“our Infrastructure Code does NOT send the reply back immediately, storing it within special PendingRepliesQueue insteadAt the same time, whenever our DBReactor.react() calls our Infrastructure Code to send any kind of data to the outside world (including, but not limited to, reply to the incoming request), our Infrastructure Code does NOT send the reply back immediately, storing it within special PendingRepliesQueue instead.
When DBReactor.react() is completed, and our Infrastructure Code feels like it (usually – it happens when (a) there are no incoming messages in the DBReactor’s queue, or (b) once per 10 app-level calls to wrapper-around-SQLEndTran(), or (c) on shutdown(!)) – it does the following:
- Calls real SQLEndTran()/SQLConnection.commit()/… – effectively committing transaction
- If the commit is ok – it means that we have already satisfied all the Durability requirements – and that we got the right to send back all the outgoing messages residing in the PendingRepliesQueue.
- Phew! We completed our “Larger Transaction”, so (if necessary, and if we’re not shutting down) – we should start the next one.

² pretty much the same thing can be done for more generic message-passing programs too, and after some tweaking, it can be even done for massively multithreaded apps, but for the purposes of this book we’ll limit our analysis to (Re)Actor-based implementation of DBReactor.

³ or at least “as if” it called from single thread, so there are no overlaps between different calls to react()

⁴ note that while it is ideologically similar to “write-back cache with durability” from Vol. VI’s chapter on Databases, implementation discussed here is more specific, and makes sure that most of the burden is handled at Infrastructure Code level, relieving app-level from most of the headaches.

⁵ For non-ODBC/non-JDBC environments, we also MAY need to “wrap” functions which begin transaction.

⁶ The same goes for “begin transaction” wrappers too if applicable.

Rolling Back

The schema described above is all fine and dandy, but it leaves two all-important question unanswered:

What do we do if we want to rollback one of app-level transactions?
What do we do if our commit of the Larger Transaction fails?

These are very valid questions, but fortunately, in practice – there are reasonably good answers to them. The answer to the first question is that “we do NOT want to do it at app-level, ever”. Really, with all the years spent on rather serious OLTP DBs, I never ever seen a real need to rollback at app level. Moreover, as well as for all the other (Re)Actors, for DBReactor I am a long-standing fan of the VALIDATE-CALCULATE-MODIFY pattern (see also discussion in Vol. II’s chapter on (Re)Actors), and under this pattern – there is no place for rollbacks at all.⁷

Indeed, if we do validate all the prerequisites for the transaction in advance (during VALIDATE stage) – which, BTW, is very cheap with single-DB-writing-connection approach due to the App-Level Caches – we won’t really run into any problems at DB level; also this is very consistent with an overall approach of “pushing most of the consistency checks to app-level”;⁸ while such pushing is seen as a sacrilege by quite a few DBAs – it is an extremely common practice for highly-loaded DBs.⁹

Also, to make DIY group commits work – there is no need to prohibit app-level rollbacks entirely: if app-level rollback exists, but only happens once a day or so – it still can be handled along the same lines as discussed below; it is an indiscriminate use of app-level rollbacks which will hurt DIY group commits – but such an indiscriminate use usually qualifies as a Bad Idea™ for several different reasons regardless of Group Commits.

As for the second question – well, it is not as easy to brush it off <wink />. While such failures of commit()¹⁰ don’t occur often – they still might happen <sad-face />. On the plus side – they’re indeed Damn Rare™,¹¹ so at least we do know that we shouldn’t care about performance in such a weird case.

So, what should we do if our attempt to commit() our “Larger Transaction” fails? In general, we should:

“The aim is to bring our whole system to exactly the same state as before the Larger Transaction has started. Roll back all the changes which we have made while processing our Larger Transaction. The aim is to bring our whole system to exactly the same state as before the Larger Transaction has started. As all DB changes are already rolled back as a part of rolling back Larger Transaction, remaining app-level rollback includes:
- Dropping all the replies from the PendingRepliesQueue – which is trivial. This potential for the rollback is actually one of the Big Reasons(tm) why we kept them in PendingRepliesQueue in the first place.
- Rolling back the state of our DBReactor. This is a tricky one, and will be discussed below.
Try re-appying events which formed the Larger Transaction, one by one (each in its own DB transaction). This processing will work without group commits at all – so it should be handled by usual logic without any additional issues.
- Most likely, the processing of one of these events will fail, but once again – it should be handled “exactly as failure is handled without DIY Group Commits”, so Group Commits are out of the picture.

The trickiest part in the rollback-and-re-apply logic above – is rolling back the state of our DBReactor. In a general case, perfect guaranteed rollbacks of an arbitrary (Re)Actor are not trivial – however, for our specific DBReactor, things are much better because of certain specifics:

Most of the state of DBReactor is its app-level caches.
- However, if we’re handling app-level caches via our SQL Bindings Compiler (as it was discussed in [[TODO]] section above) – it is trivial to implement rollback for app-level caches completely behind the scenes (i.e. without any involvement from app-level code) <phew />.
As for the state of DBReactor which goes beyond caches – it is NOT really that common to have significant state for DBReactor (except for caches). Moreover – in practice, this non-cache state is usually THAT small, that we can rather easily confine this state to a special data object, with this data object encapsulating all the updates to the DBReactor’s data – and implementing rollback()/commit() operations for this data object too. Then – it becomes trivial to have our Infrastructure Code to call these state.commit() and state.rollback() functions behind the scenes when it becomes necessary.
- BTW, if we’re really concerned about it – we can add another section to the “SQL description file” for our SQL Bindings Compiler – and then we can make SQL Bindings Compiler generate such rollbackable-data-object (with strict guarantees for its correctness).

⁷ And with app-level cache discussed in [[TODO]] section above – VALIDATE stage becomes very very cheap too.

⁸ One example is dropping FOREIGN KEYs in favour of app-level checks, but actually there are LOTS of different constraints to enforce at app-level

⁹ not to mention that there is no way to enforce all-the-integrity-we-really-need without app-level; at the very least, I never heard of a DB which can enforce such all-important integrity constraints as “a sum of all the money on all the accounts in the system MUST be zero”.

¹⁰ Actually, failures of any DB operation, up to and including commit()

¹¹ In particular, with single-writing-DB-connection we cannot run into deadlocks or MVCC-induced rollbacks at higher isolation levels. Still, there are inevitable bugs in dark corners of DBReactor which we should account for, at least to prevent them from killing the whole system.

Compound SQL Statements

One issue which is distrinct-from- but closely-related-to- Group Commits, is Compound SQL Statements. Just like Group Commits, Compound SQL Statements are aiming to reduce overall latencies of the DBReactor; however, unlike Group Commits (which are reducing latencies by reducing number of expensive fsync() calls), Compound SQL Statements are aiming at reducing number of round-trips between app and RDBMS.

“if we have two update-only SQL statements within the same transaction – we can easily combine them into one “compound SQL statement”, and issue this “compound SQL statement” using one ODBC/JDBC call, which in turn will save on round-trips between our app and RDBMS.The point is simple – if we have two update-only SQL statements within the same transaction – we can easily combine them into one “compound SQL statement”, and issue this “compound SQL statement” using one ODBC/JDBC call, which in turn will save on round-trips between our app and RDBMS. NB: while formally, this observation stands both for prepared and non-prepared statements – as always within this book, we’re speaking ONLY about prepared statements; moreover – if your RDBMS forces you to make a choice between “prepared non-compound” and “non-prepared compound” statements – most likely, you should choose prepared and forget about compound, it will be better.¹²

While Compound SQL Statements CAN be used at app-level (i.e. directly within our DBReactor) – I’d rather try to generate them automagically within the same SQL Bindings Compiler. An implementation might go along the following lines:

We have our app, and have no compound statements.
We have our SQL bindings compiler to generate code which will generate statistics of typical sequences of our prepared SQL statements issued while the program is running
We run this code in production – and get typical sequences of our prepared SQL statements in production.
We feed this statistics to our SQL bindings compiler – and it generates SQL Compound Statements for such frequent sequences.
- These generated SQL Compound Statements, just like any OLTP SQL statements, should be prepared.
- When a SQL-Bindings-Compiler-generated function for a SQL statement which SQL statement happens to be the first statement within generated SQL Compound Statement, is called – it does NOT lead to a call to the ODBC/JDBC, but instead, the function simply makes a record about the SQL statement into a special BufferOfOutstandingRequests (alongside with a copy of its parameters – but with OLTP, parameters are small, so copying them is rarely a problem). Effectively – we’re deferring the statement until some later point.
- Then, if the second¹³ SQL-Bindings-Compiler-generated function is called, and current contents of BufferOfOutstandingRequests followed by this second function, corresponds to one of SQL Compound Statements – this SQL Compound Statement can be invoked (and BufferOfOutstandingRequests can be cleared)
- If, by the end of the transaction (or by the point when we call function-corresponding-to-read-statement), BufferOfOutstandingRequests is not empty – Infrastructure Code simply calls all the SQL statements from BufferOfOurstandingRequests one by one.

Overall – IF statistics is favorable (and for OLTP it usually is), this technique can easily provide up to 20-30% of performance gain pretty much for free.

Oh, and if some of you are still unconvinced about usefulness of SQL Bindings Compiler – stay tuned, I’ll provide even more compelling reasons to implement it in a jiffy <wink />.

¹² Even better – re-consider your choice of RDBMS; IMNSHO, all suitable-for-OLTP DBs SHOULD support both prepared statements and compound statements.

¹³ Third, etc.

[[To Be Continued…

This concludes beta Chapter 27(e) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”.

Stay tuned for further parts of Chapter 27, where we’ll continue our discussion on DB optimizations 102 (aiming for app-level heterogeneous replicas)]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.