Historical Data in Databases. Audit Tables. Event Sourcing

posted November 21, 2016 by "No Bugs" Hare, translated by Sergey Ignatchenko

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

Herodotus writes history... in binary and to Database

[[This is Chapter 20(g) from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

When speaking about OLTP databases and reporting/analytics built on top of them, there is one issue that is frequently ignored – it is the issue of historical data.

One observation which stands for pretty much any real-world system, is that

99% of reporting requests and 99.9% of analytics is purely historical

Moreover, for most of OLTP databases out there, 99%+ of their size is history. In other words – unless we’re speaking about rather exotic stuff (such as high-frequency trading) – it is all about history.

Audit Tables

From practical standpoint, it means that for pretty much every transaction modifying something in your DB, you need to add a record to an audit table – explaining what exactly was done – and why. For example, a row in the audit table for modification in USERS adding the USER money tends to have information such as “what was done” (USERID, DELTAMONEY, and probably RESULTINGMONEY), “when it was done” (TIMESTAMP), “who has done it” (for example – ADMINID or NULL/USERID).

One all-important property of audit tables (at least for those tables/fields which you care about) is that

Information within the audit table should be sufficient to validate/justify current state.

What to Audit?

From the statement above, it follows that if you have MONEY field in your USERS table (corresponding to real dollars/euros/etc.) – your audit table should provide enough information to see (and to show your player/user) how it happened that she has exactly this amount of money in this field. The same goes for ANY critical information, including artifacts (the ones which may cost $20k on eBay) etc. etc.

“Regardless of what you’re doing – there will be times when your players will ask you “hey, where is my {money|artifact|…}?!” – and you’d better have a good explanation.The rationale for this is simple. Regardless of what you’re doing – there will be times when your players will ask you “hey, where is my {money|artifact|…}?!” – and you’d better have a good explanation. The explanation is necessary to see that (a) you don’t have a bug, and (b) to convince your player/user that you neither have a bug nor deliberately cheating them. And if you don’t do proper audit (where “proper” = ”implemented according to the statement above”) – then, well, both these tasks become pretty much impossible 🙁 (and no, saying Google-style “we do no evil” won’t fly – unless, of course, you’re Google ;-( ).

On the other hand, if speaking about auditing RPG player inventories – it MAY become too expensive to audit all of the movements of all the items within the player’s inventory. On the other hand, it is not really necessary too – chances that you will be asked to prove how the mundane item such as usual healing potion has “disappeared” from the inventory, are very slim (and even if you’re asked – public complaints about it “disappearing” won’t make too big waves in the community almost for sure). Still, if we’re speaking about unique artifacts (sold on eBay for thousands of real dollars) – having audit trail for them is necessary. In practice, it means that you MAY need to resort to a compromise, auditing only those ultra-expensive artifacts (and skipping audit for mundane items).

Another typical example of what needs to be saved in the audit table – is information about player and admin logins. While (as a rule of thumb) login audits are NOT going to be too interesting to your players (unless they are hacked) – your security team will LOVE login audits for sure; it is an immensely important tool for figuring out relations between supposedly independent players, and it helps to identify bypassing bans / collusions / ToC violations A LOT. At this point, we’re not interested in how to use this information (we’ll discuss it in Vol. 3, tentatively Chapter XXXIII on Anti-Cheating) – but are interested just in how to collect it.

One Central AUDIT Table or Multiple Audit Tables?

One all-important question which arises when designing your audit tables, is the following. As you’re going to save LOTS of different audits, we need to decide “whether it is better to have separate audit tables (such as “one for auditing USERS, another one for auditing LOGINS”) – or one all-encompassing AUDIT table?” Actually, as noted in [Stackoverflow.AuditTables] – there are at least FOUR different options in this regard (plus “Event Sourcing” as a fifth one). Let’s discuss them one by one:

No audit table, just “last change” fields in the main table.
- TBH, I don’t know of real-world scenarios when this approach is viable. It is not really an audit (and cannot be used for the purposes mentioned above) – plain and simple.
One single AUDIT table. Usually contains fields such as “TABLE_ID”, “FIELD_ID”, and “NEW_VALUE” / “OLD_VALUE”.
- “This will work, but often will be not-too-optimal.This will work, but often will be not-too-optimal. Single AUDIT table tends to have increased contention (which is not too good by itself) – and finding of stuff within such AUDIT is (while certainly possible) is not too easy either (both SQLs are cumbersome and performance on retrieval is hit due to an extra index scan involved; while the latter is usually a non-issue for OLTP, it may be an issue for reporting replicas).
An audit table for each table to be audited. These audit tables can be either “clones” of the respective tables-to-be-audited (i.e. the table to be audited plus audit fields such as “who”, “when”, and “why”) – or can be implemented along the lines of single AUDIT table above (i.e. recording just the changes).
- This approach is perfectly viable – and quite optimal too.
- Keep in mind though that to establish a global order of events in the system, we need to have a global audit ID [[TODO:harmonize with section on IDs below]]
- Note that at least for some types of audit tables (in particular, when auditing money movements) I prefer to have significantly extended “clones” (or to avoid “clones” completely). What is important from my perspective is to have app-level fields such as “REQUESTED_OP” and “DELTA” for each audit record (for example, as a part of “audit fields” added to the “clone”). The reason for it goes along the following lines:
  - if we just have a “bare minimum clone” (without DELTA and REQUESTED_OP) – we effectively hide the real operation approved by whoever-initiated-it, and are merely assuming that the system has handled it correctly. It may lead to potential questions of “was it a really $10K transfer which this person in support has inappropriately authorized – or it was $10 authorized as he says, and it was a bug leading it to look as $10K in the audit?”; these questions are extremely unpleasant in concurrent multi-write-connection environments (where demonstrating that there is no possible race which can lead to such effects, can be extremely time-consuming), but even for single-write-connection architectures, I feel MUCH safer if I have a very straightforward field which directly corresponds to the intention of the transaction – before I say it was indeed $10K requested (and effectively accusing that person who authorized it, of wrongdoing).
  - note that in a certain sense, these “REQUESTED_OP” and “DELTA” fields can be seen as an implementation of “Event Sourcing” pattern (though if you want to use the goodies provided by this pattern – make sure to see more detailed discussion of it below).
- Another consideration when answering eternal “to clone or not to clone” question, is that if we’re updating a single field – the whole row may be not easily available. While it is always possible to get the whole row after the update – it may incur substantial additional cost (and in multi-connection environments – make sure not to forget about transaction isolation levels).
- “This is an interesting trick, as it eliminates quite a few UPDATES – and this (obviously) tends to improve write performance. However there are some drawbacks too...An extension of clone-style (or “extended clone-style” as described above) audit tables is to have the cloned audit table similar to the one described above, but without the original table-to-be-audited 🙂 . For example, instead of usual USERS and USERS_AUDIT, we’ll have just USERS_AUDIT (and not USERS). This is an interesting trick, as it eliminates quite a few UPDATES – and this (obviously) tends to improve write performance. However:
  - While logically this model is perfectly equivalent to having two tables, the cost of retrieval of the “current value” when we don’t have USERS table and user USERS_AUDIT instead, goes up. For example, if we’re speaking about MONEY field in USERS table – and are using classical USERS plus USERS_AUDIT, then to retrieve MONEY for specific USERID (which we need all the time in OLTP) we’ll need to issue simple “SELECT money from USERS where USERID=?”, which will result in a “point” index access (and for “point” indexes we can use “hash” indexes, which have roughly O(1) complexity and pretty low costs). On the other hand, if we’re using USERS_AUDIT without USERS, then to retrieve MONEY for USERID, we’ll need to issue SQL such as “SELECT money FROM users_audit where USER_ID=? ORDER BY AUDIT_ID DESCENDING LIMIT 1“ (or equivalent). This will normally result¹ into an execution plan which uses index scan over the two-field index on (USER_ID,AUDIT_ID) – and reads only one record out of it, aborting the index scan after the very first record is read (due to LIMIT clause or equivalent). However: (a) this index needs to be a ranged index (such as btree index)² as AUDIT_IDs in the request represent a range; and (b) search in a btree index (by USER_ID, before actual index scan even starts) will take O(log(N)) rather than O(1). As a result, such requests tend to suffer in performance (compared to straightforward SELECT money FROM users).
    - On the other hand, if we’re using app-level caches (enabled by single-DB-connections, and I’m arguing for both these techniques) – app-level caches will eliminate this “cost of reading” problem almost entirely.
  - IMO even more importantly, this audit-only model doesn’t allow for easy truncation of the audit tables (actually – moving them out of OLTP DB, see discussion below), and relying on truncation/moving AUDIT records was Damn Important for quite a few serious OLTP DBs I’ve seen. In short – having your time-critical OLTP DB growing infinitely is going to hurt its performance, so we’ll likely want to truncate AUDITS (or more precisely – to move them into super-replicas as discussed below).
    - On the other hand, it IS possible to truncate AUDITS table without affecting SELECTs of the last value – it is just more cumbersome. However, while reasonable execution plan does exist for such selective truncation³ – I am not sure whether it is easy/possible for any particular DB to write an SQL which will result in this execution plan (from what I’ve seen, this kind of trickery usually lies on Extremely Difficult part of spectrum – or even beyond 🙁 ). As a result – if you want to go the way of AUDIT-only tables and do need truncation – make 100% sure that such selective truncation is possible on your RDBMS without killing the whole thing.
- Yet another option is to use “Event Sourcing” pretty much as described in [Fowler]. The idea is to record the whole history of the events incoming to our DB Server (while in usual audit we’re recording results of the changes), and then we’ll be able to rely on supposedly deterministic nature of our system to get all kinds of deterministic goodies (including Temporal Queries and Complete Rebuild). As I am a big fan of everything deterministic 🙂 , I am quite interested in this kind of stuff. However, there are a few issues to be kept in mind when implementing about Event Sourcing in practice:
  - for event sourcing to work, it is absolutely necessary to have your processing perfectly deterministic; otherwise, all hell breaks loose (in particular, if on one run you’ve calculated one winner of pretty much anything, and on the replay there will be a different one – you will regret the very moment when you decided to go for Event Sourcing). Worse – in presence of the arbitrary concurrent access, determinism is not possible (even Serializable transaction isolation is not sufficient to achieve determinism: Serializable just guarantees that some order exists between transactions, and does NOT guarantee that this order will be exactly the same on multiple runs). Therefore, at the moment I see only three ways to implement “event sourcing”:
    - using single-modifying-DB-connection architecture as discussed in [[TODO]] section above. Well, I am a fan of single-modifying-DB connections too, so I have no problems with it :-).
    - with certain app-level restrictions on concurrent transactions, it is possible to make system deterministic. However, I’m arguing for a long while that pushing synchronisation problems to the app level is a Really Bad Idea – and usually do NOT recommend this route.⁴
    - [[TODO! – deterministic replay over non-deterministic recording]]
  - [[TODO!: dangers of code changes if ever discarding state]]
  - I’d argue for having “events” written at the same transaction as the results of their processing; separating these two things, while possible, won’t speed things up (and also special mechanisms to restore consistency after the crash will become necessary).
  - truncation of EVENTS table will be a problem (and replaying the whole history of the billions of events to restore current state is not going to be practical either). As a result – snapshots (such as online backups) will be necessary, and replay will need to work from the snapshot (NB: there is a direct analogy with Circular Logging for deterministic replay of Reactors, as was discussed in Chapter V).
  - “Event Sourcing” does NOT prevent from using usual audit tables – and the latter can be more usable for reporting too. On the other hand – for OLTP DB writing both incoming events and audit can be seen as excessive. The balance between the two depends a LOT on the specifics of your application.

Overall, all these solutions (except for the “last change” fields without any history) will work in a more or less reasonable manner. Personally, however, for Really Important Tables I tend to prefer per-table “extended clone audit tables” – with information in audit fields of these “extended clones” effectively representing event (in a sense used by Event Sourcing) which has lead to this change (and if events are referenced in more than one row/table – they can be moved to one or more of separate EVENTS tables with EVENTID referenced in audit); this IMO makes a quite a nice hybrid between classical audits and event sourcing – and it allows to check the hypothesis of being correct too (including possible violations of deterministic behaviour).

¹ That is, if the table is long enough and your SQL compiler is smart enough; also we should have created that two-field index in advance

² Well, in theory it is possible to have “hybrid” hash+btree index, but I don’t know if such thing exists in practice

³ start scanning the same index as the one used for SELECT, and delete all the records on the way EXCEPT the first one

⁴ that is, unless it is a one-off very obvious restriction

On Audit IDs

The next all-important thing about the audits – is Audit_ID. In this regard, a few things need to be observed:

You DO need an Audit_ID for each of your audit records; it MUST be monotonic too. This is necessary because in addition to being a way to simply identify audit records – Audit_ID needs to define a 100% reliable ordering for them (as the final state of the system can easily depend on the ordering of incoming events).
- “in real-world current computer time (whatever it is) can happen to go backwardsUsing TIMESTAMPs in lieu of Audit_ID doesn’t really cut it – first, TIMESTAMP is not guaranteed to be unique,⁵ and second – in real-world current computer time (whatever it is) can happen to go backwards 🙁 .
  - Of course, I am not arguing for abolishing TIMESTAMPs in audit records. Rather, I am arguing for having BOTH Audit_ID (as a counter strictly guaranteed to be monotonic) AND some kind of TIMESTAMP.
- Using DB-provided transaction IDs (see, for example, [Stolze] or [Kyte] or [MSDN.tran_current_transaction]) is better than TIMESTAMPs, but in case of multiple connections by not too much 🙁 . This happens because DB-provided transaction IDs will be usually ordered according to beginning of the respective transactions, and not according to the commit of transactions – and in presence of multiple concurrent transactions these orders can be different 🙁 .
  - Still, transaction_IDs such as those above may be useful to match audit records in different audit tables (if per-table Audit_ID is used).
- One way to implement Audit_ID – is to use an (auto-)incremented field; however, in presence of concurrent transactions special care should be taken to ensure that reading/increment of this field is guaranteed to happen in exactly the same order as the changes to be audited, occur; this is heavily dependent on transaction isolation levels you’re using (and on their implementation within your specific RDBMS), so figuring it out can be tricky 🙁 .⁶
To be able to reconstruct the whole history as it happened (and to find bugs if there are any) – Audit_ID SHOULD be incremented for each new audit record (or for each new ACID transaction) in your whole database – even if you’re using several separate audit tables; in other words – I’m advocating against having per-table Audit_IDs.
- This rule is often ignored in real-world DBs (and it is not fatal either), but it can lead to very unpleasant situations when the problem becomes untrackable due to the order of the Audit_IDs being different from the actual order of transactions. Worse, probability of it happening grows with the load 🙁 .
- If using single-writing-DB-connection – such globally incremented Audit_ID is trivially achieved (by reading and caching Audit_ID as max() on all audit tables on the app start, and merely incrementing it in-memory each time audit record is made). Otherwise – achieving single Audit_ID incremented over all the audit tables may be (depending on your RBDMS) not so trivial, and/or can lead to performance hits. One possible way to bypass it is to use DB-provided transaction ID (see above) alongside your per-table Audit_ID; per-table Audit_ID will supposedly guarantee correct order of Audit_IDs compared to other changes (though depending on isolation levels – see above) – and DB-provided transaction ID will provide correct matching between the different Audit_IDs.
  - Bottom line: for classical multi-connection DBs – and if your DB doesn’t allow for fast retrieval of current transaction ID – it is up to you whether to use such global Audit_ID; otherwise – it is a Very Nice To Have feature.

⁵ NB: on some RDBMS it may be possible to achieve almost-unique TIMESTAMPS (more specifically – “unique provided that system time doesn’t go back” – which happens 🙁 ).

⁶ of course, if you’re using single-writing-DB-connection – such problems do not exist at all 🙂

Truncating/Moving Audit Records

One more thing typical for audit/historical records in heavily-loaded OLTP systems, is related to truncation of the audit records. Motivation for it goes as follows:

“As historical/audit records grow linearly with time – it is only a matter of time until they take any given sizeAs historical/audit records grow linearly with time – it is only a matter of time until they take any given size. In practice – if kept unchecked, they will take 99%+ of the DB size very soon. This leads to decreased performance (for example, it might become no longer possible to keep the whole OLTP DB in DB caches).
On the other hand, most of audit records, while being necessary for reporting purposes, tend to be unnecessary for OLTP decision-making.

Armed with these two observations, the following approach has been seen to be highly efficient:

At some point of operation of the heavy-loaded OLTP system, an asynchronous reporting replica is created (see “Stage 3” in [[TODO]] section above) – to take some load off OLTP into read-only async replica
However, while read-only replicas take away load from OLTP DB – OLTP DB still continues to grow in size (causing gradual performance degradation)
At this point (see “Stage 3a” in [[TODO]] above) – it is possible to start truncating historical data from OLTP (keeping the data in reporting replica(s)).
- This has been seen to arrest growth of OLTP DB, which in turn allows to keep OLTP as lean-and-mean as possible
  - In particular, it allowed to keep OLTP DB size for a game serving hundreds of thousands of simultaneous players, within 100G (probably smaller size if special measures are taken)
    - This in turn allowed to keep 100% of OLTP DB in RAM – which clearly helps with performance (and eliminates performance spikes)
  - It should be mentioned that truncating unused historical tables is a rather heavy process from the performance point of view; a few tips in this regard:
    - Truncation SHOULD be done in off-peak time
    - Truncation is usually best to be done in relatively small bunches (of the order of 100-500 rows) per transaction; much smaller bunches (~1-5 rows) will have too much overhead, and much larger ones (~100-500K) will cause too long locks and too much stress on the DB log.
    - Even if your architecture is generally “single-writing-connection” one (which I am advocating for), truncation still can be usually done from the second “writing” DB connection. This is possible because if only “very old” data (and unused by OLTP itself) is truncated, this second connection:
      - Cannot possibly affect things such as app-level cache coherence (that is, as long as OLTP doesn’t use this historical information at all)
      - isn’t likely to create any contentions (except for inevitable contention over DB log) with the first writing DB connection

“It should be noted that in case of OLTP history truncation and moving it to replicas, “reporting replicas” are no longer really “replicas”, but are instead primary sources of otherwise-irrecoverable information.It should be noted that in case of OLTP history truncation and moving it to replicas, “reporting replicas” are no longer really “replicas” (which can be restored from the original), but are instead primary sources of otherwise-irrecoverable information. This hasn’t been a big practical problem, but has some implications (in particular, backups of these “super-replicas” become necessary).

On the other hand, truncating OLTP history is not without its own drawbacks; in particular –AUDIT tables without the tables to be audited (described above, and allowing INSERT-only processing) becomes much more difficult (if possible at all); still, IMO for majority of game-like processing, I’d choose truncation over INSERT-only approach (though if these too approaches can be combined – that would be even better ;-)).

If we’re going to truncate the data anyway (and are not going to use it within OLTP DB) – a logical question of “do we really need to write audit in the first place?” arises. I cannot tell that I’m a big fan of not writing audits into OLTP (pushing the audit records to the replicas from the very beginning instead) – because of the potential loss of the audit records in such scenarios. On the other hand, for some audit records (for example, for not-so-critical stuff such as logins etc.) – it might be a viable approach, especially if you’re using multiple replicas (as such losses, while pretty much inevitable if you’re not writing your audit to the DB, will be very few and far between).

[[TODO: pre-audit, post-audit, full-audit]]

[[To Be Continued…

This concludes beta Chapter 20(g) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 20(h), where we’ll discuss “compiling” your SQL bindings (which are useful for quite a few reasons)]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

ITHare (15)
Reddit (7)

Comments

Jesper Nielsen says

November 21, 2016 at 5:28 pm

Ouch if virtual items are supposed to be tracked for auditing then it’s definitely a heavy argument against saving player inventories as serialized data… Saving a hundred kilobytes of history per player every few minutes on average would make the history tables grow a couple of terabytes per day with 100,000 simultaneous players. Perhaps it’s doable to save a “diff” between old and new serialized value but I suspect (without having played around with algorithms like that) that it will be probitively expensive.
I wonder how one would handle deletion of items? I guess if items can only be transfered as part of a transaction then it means that any items NOT present on the player(s) are to be deleted. You wouldn’t want to create a database transaction every time a player drinks a health potion and you wouldn’t have to either. But I can see now why most MMO games disallow dropping items on the ground:) That would be a transactional nightmare.

Reply
- "No Bugs" Hare says
  
  November 22, 2016 at 5:19 am
  
  I’d say that in practice, for audit purposes it will usually be necessary to distinguish between “just any item” and “Damn Important Item”. The latter does need to be audited (but there will be only a few of them) – but the former doesn’t. The point is that while it IS necessary to track movements of ultra-expensive artifacts – nobody will jump too high about allegedly missing an alchemy ingredient which can be obtained by spending a few minutes running around. I’ve added a paragraph about it, THANKS!
  
  > but I suspect (without having played around with algorithms like that) that it will be probitively expensive.
  
  I don’t think it will be expensive. Calculating a difference between unordered sets goes as sorting them (O(n log n)) plus one single scan over sorted sets (O(n)); if they’re already pre-sorted in Game Worlds (which comes at virtually zero cost) – then we’re speaking about O(n), and this is not much for rather small inventories.
  
  Reply
  - Jesper Nielsen says
    
    November 22, 2016 at 7:08 am
    
    Well the problem is that knowing what item is super important is less than trivial for games with randomly generated gear 🙂
    
    I was actually thinking about diff between 2 serialized representations. calculating diffs between sets would require deserializing the stored inventory. Perhaps that isn’t too bad either, since you’re already serializing the new inventory anyway. Either way a set diff can be performed in linear time even on unordered sets using hashsets.
    
    Reply
    - "No Bugs" Hare says
      
      November 23, 2016 at 6:14 am
      
      > Well the problem is that knowing what item is super important is less than trivial for games with randomly generated gear
      
      Is it really THAT random? In other words, are you really into Nethack-Like scenarios when L1 character can get a ring of polymorph + a ring of polymorph control? For an MOG, my wild guess is that it would kill the game balance very quickly. My point is that even if generation is kinda-random, order of magnitude for power of generated unit is usually somewhat guessable.
      
      A second consideration is that as a rule of thumb, if you have unique (or extremely rare) items – it is them which will be the most likely ones to have tremendous real-life values.
      
      > calculating diffs between sets would require deserializing the stored inventory.
      
      If your serialised sets are ordered, you can find differences with kinda-deserialising but without allocations – so it can be made Really Fast.
      
      Reply
      - Jesper Nielsen says
        
        November 23, 2016 at 12:48 pm
        
        It is definitely *that* random in the game I’m making:) I’m not afraid to admit that Diablo I and especially II are huge influences for me, and item-hunting + ingame trading is one of the pillars of the design.
      - "No Bugs" Hare says
        
        November 23, 2016 at 4:01 pm
        
        But you still should know relative value/power of the item, shouldn’t you?
      - Jesper Nielsen says
        
        November 24, 2016 at 9:41 am
        
        If you look at Diablo II a lot of market values of items are indeed emergent properties of the game. Some unique items, runes etc. can be defined as being valuable, but many other valuables are random magic or rare items with sought after combinations of modifiers that fit into specific character builds.
      - "No Bugs" Hare says
        
        November 25, 2016 at 4:13 pm
        
        Well, in extreme cases your can even change the “whether to audit it” logic post-factum :-).
        
        For example, you may start with auditing everything, but then when you see that some stuff takes too much time to audit – you can start filtering these audits out. Or the other way around – auditing nothing except very obvious items, and add those which happen to have significant value.
David Turner says

November 22, 2016 at 8:55 pm

Hi,

Thanks for adding the bit about Event Sourcing. Just to reinforce what you say, the strength of ES is that, by construction, it’s an absolutely comprehensive audit trail from which you can know, with certainty, that you can rebuild your world state (and all sorts of other reporting-centric states to boot). This contrasts with separate audit tables which might have most of the same data in them but maybe not some vital metadata that it turns out you want to query some years in the future, or maybe someone updated the Data Access Layer and “forgot” about auditing one of the new operations.

The thing about nondeterminism also bites you with trying to rebuild the world from any audit table, and it’s normally possible to design your events to avoid such problems. For the example you gave (where someone wins something nondeterministically) you record **who won** in the event, not just that a win occurred, pushing the nondeterminism out of the event and into the code that generates it. It’s basically essential to be able to do this for audit purposes anyway: if you ever were to change your rules (even, say, your RNG) then your historical audit logs would not be useful unless you kept the old code around too. Yuck.

Cheers,

David

Reply
- "No Bugs" Hare says
  
  November 23, 2016 at 6:42 am
  
  > by construction, it’s an absolutely comprehensive audit trail from which you can know, with certainty, that you can rebuild your world state
  
  Yes, though this stands only for a _correct_ event sourcing, which involves certain not-so-easy-to-achieve assumptions (like “determinism” and “code being unchanged”).
  
  > you record **who won** in the event… pushing the nondeterminism out of the event and into the code that generates it
  
  Wait. Let’s consider the same scenario when we need to enforce logic of “there is only one item to distribute, whoever comes first – wins”. As I see it: if I’d push the non-determinism out of the event – I will just push the problem from one micro-service to another one. As a rule of thumb, we still DO need to audit the reasons for all important decision-making – so it will need to happen in a separate micro-service with its own DB, and then we’ll face exactly the same problem of “what happens if we change the code of that micro-service”. Overall, I do NOT see any potential to address code changes in a generic manner 🙁 . As a result – to work with Event Sourcing to restore states, I WILL need to keep the old code around; and as this is damn impractical – I’d prefer to keep BOTH event sourcing (as a kind of pre-audit) PLUS traditional (post-factum-)audit.
  
  OTOH, the problem of RNG as such is not a difficult one; RNGs as such do not require much justification (their stats can be analysed post-mortem), so all RNG data can (and should) indeed be moved out (and recorded as events); more on it can be found in 2nd beta of Chapter V of the upcoming book (the discussion there is about Reactors and not DBs, but most of the trickery to achieve determinism still applies; if you want to read it – drop me a line, I’ll email it to you). However – logical decisions cannot be isolated in such a trivial manner (or at the very least I have no idea how to do it) 🙁 .
  
  Reply
Pseudonym says

November 23, 2016 at 3:52 am

English nit: It’s “monotonic”, not “monotonous”.

Reply
- "No Bugs" Hare says
  
  November 23, 2016 at 6:43 am
  
  Oops! Fixed now, thanks! 🙂
  
  Reply
Roberto Nogueira says

November 27, 2016 at 10:30 pm

In the ‘On Audit IDs’ section, a very popular alternative that has not been presented is the UUID (https://en.wikipedia.org/wiki/Globally_unique_identifier). Most RDBMS provide native support for this function, for example in MySQL:

Select uuid ();

Results in:

D6ffe252-b4ee-11e6-97de-fcaa14cc55d2

Another important aspect of having non-sequential identifiers is in security, where it makes it difficult for malicious users to move to the next history record. With sequential identifiers from any record the attacker can navigate all the others by incrementing and decreasing the keys.

Reply
- "No Bugs" Hare says
  
  November 28, 2016 at 5:22 am
  
  Audit_IDs, by their very nature, need to guarantee not only identification, but also _order_ of audit records. And GUIDS are the last thing which provides any kind of ordering; then – if using GUIDs as Audit_IDs, we’ll be back to square 1 to solve the problem of ordering (and once again, TIMESTAMPs won’t cut it because time can go backwards, so we’d need another monotonic field, etc. etc.).
  
  Reply
  - Roberto Nogueira says
    
    November 29, 2016 at 12:45 am
    
    Thank you for clarifying!
    
    Reply