Representing The Same Data Structure in SQL and NoSQL (from Classical Codd-style SQL to Key-Value NoSQL with SQL-with-XML and Structured NoSQL in between)

posted December 5, 2016 by "No Bugs" Hare, translated by Sergey Ignatchenko

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[[TODO: move this section into “NoSQL vs SQL for MOGs”]]

NB: this picture implicitly refers to a 1st round of comparison between SQL and NoSQL, available in a previous post NoSQL vs SQL for MOGs.

[[This is Chapter 20(i) from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

When comparing SQL to NoSQL – we need to set a certain number of criteria for comparison; otherwise – these DBs are soooo much different that we won’t be able to compare them properly. OTOH, we should try to come up with more or less practical criteria. Saying “hey, it is not document-oriented” or “hey, it is not normalized” are just synonyms of “{NoSQL|SQL} is the best one” and won’t be of any use; on the other hand – saying “hey, it leads to O(N^2) vs O(N)” is a very practical argument, which (IF it happens) should be taken into consideration.

So, what do we really want from our DB? I would say that from a 50’000 feet point of view – and speaking about firm requirements – we want just four things:

Identification. We certainly want to identify individual records in our DB. While having a table without a primary key is technically possible in SQL – it is universally recognized as a Bad Idea (and for a good reason too). Overall – pretty much nobody in his right mind argues with having a unique identifier for each record.
No Undue Duplication. The next thing we want is to store data without unnecessary duplication.¹ Let’s note that we’re not speaking about formalities such as normal forms (though they’re one of the ways to implement this “no undue duplication” rule). Rather, we’re speaking about two very practical issues which arise from undue duplication and which we want to avoid:
- First of all, saving the same piece of data a million times wastes space. Pretty obvious – and practical too.
  - “while duplication MAY indeed improve performance – undue duplication also MAY hit performance pretty badlyIn certain circles (especially NoSQL ones) there is a perception that duplication is a Good Thing™ as disks are cheap and as duplication improves read performance. However, it is actually more complicated than that – while duplication MAY indeed improve performance (in SQL world similar process is known as denormalization) – undue duplication also MAY hit performance pretty badly (in particular, because of essentially the same data taking much more space in caches as a result of duplication, which can greatly reduce cache hit ratio). As a result – whether SQL or NoSQL, I tend to argue that duplication should be avoided unless demonstrates to be useful (which certainly does happen, but is certainly not 100% universal).
- However, even more important is the second issue: if there is undue data duplication – we can easily get into consistency issues. I.e. if we have N separate copies which are meant to be consistent – we will need to update our DB in N places (and not to forget any of them). Ouch, and double-ouch.
Perform reasonably well for those queries we want to throw at it. This requirement is a rather obvious one; however, I want to emphasize that it is all defined by specific queries we have against our DB (and those we MIGHT have in the future). If some queries are impossible as a matter of principle – we don’t really care whether there is a possibility to perform them well. Such situations are quite common specifically in OLTP databases (where the set of queries tends to be very limited) and admittedly much more rare for reporting/analytics (especially if we’re speaking about ad-hoc queries).

Most importantly, as far as I can tell – this is quite an exhaustive list (let me know if you think I am missing something!). As for ACID transactions – it is an all-important property for OLTP DBs; on the other hand – for reporting DBs it can be usually relaxed to BASE quite easily. Overall, ACID properties and areas where they’re necessary, are discussed in detail in [[TODO]] section, so we won’t address them for the time being.

¹ Note that having duplicated IDs which are used to refer to a different entity – do not usually qualify as “undue duplication”

Litmus Use Case – Player and His Inventory

Now, we need some kind of a more-or-less-practical use case to see how these criteria can be addressed for different types of SQL/NoSQL DBMS. As we’re speaking about games – I’ll give up on classical “library” example (the one which is usually done via a link table between AUTHORS and BOOKS in classical Codd-style SQL) – and will consider a more relevant example of storing PLAYERS instead.[[TODO: think about extending example to add that there other entities (shops, ground) where inventory items can reside]]

Let’s take a look at a system where each of the PLAYERS has:

ID (for our current purposes – we won’t argue about it being surrogate vs natural)
DISPLAYNAME
MONEY (doesn’t matter for us whether it is in-game money or real money)
INVENTORY, an (unordered) set of INVENTORYITEMS, each of which contains:
- DISPLAYNAME
- TYPE
- POWER
- WEIGHT
- VALUE
- INVENTORYSLOT

NB for SQL developers: do NOT say I am crazy (at least not yet): this is NOT intended as a DB structure, but rather as a high-level description of “what we want to store”.

Let’s further assume that some of the INVENTORYITEMS are pre-defined (i.e. taken from the list rather than generated on the fly – of course, except for INVENTORYSLOT field), and some are generated (and moreover, that they’re random enough so they cannot be enumerated).

As for types of queries we want – first of all, we obviously need to manipulate and retrieve the PLAYER (and her INVENTORY) by PLAYER ID. However, other than that – let’s assume that we never want to query our DB by fields of the INVENTORYITEMS, with one exception. Let’s consider scenario when we DO want to find out who owns very few very-rare artifacts (i.e. there is a pre-defined list of very-rare artifacts – and we want to query who owns them; all other INVENTORYITEMS are of no interest for such a query). NB: while this query may look artificial and unimportant – you can be pretty sure that sooner or later somewhere in your DB you will have some queries of similar nature – i.e. going beyond pure key-based access.

“we’re speaking about a DB with millions of PLAYERS (and taking gigabytes in size), so any execution plans which require O(N) operations – such as table scans (or equivalent) – don’t satisfy our “perform reasonably well” requirement.Oh, and just to clarify – we’re speaking about a DB with millions of PLAYERS (and taking gigabytes in size), so any execution plans which require O(N) operations – such as table scans (or equivalent) – don’t satisfy our “perform reasonably well” requirement.

Armed with this use case and the criteria above – we can start comparing different approaches to DB design under different DMBS – from classical Codd-style SQL to Unstructured SQL (with SQL-with-XML and Structured NoSQL in between).

Classical Codd-style SQL

The key, the whole key, and nothing but the key, so help me Codd

— Unknown

First of all, a disclaimer: the models discussed below, are not the only ones possible for the DBMS we’re discussing; however – they are intended to be rather typical given the current traditions and practices.

For classical Codd-style SQL (the one which dominated database development at least from 70s to 90s), it would look as follows:

PLAYERS table with PLAYERID being PRIMARY KEY, and DISPLAYNAME and MONEY as fields
ITEMDICTIONARY table, with a list of all the pre-defined ITEMS. Has ITEMID as a PRIMARY KEY, and DISPLAYNAME, TYPE, POWER, WEIGHT, and VALUE as fields; also has ISRAREARTIFACT field to define whether it is an artifact which can be queried. This ITEMDICTIONARY table contains all pre-defined items – and doesn’t really change (that is, unless we’re changing game rules).
“PLAYERINVENTORYITEMS table is a typical 'link table' saying which pre-defined items are owned by PLAYERPLAYERINVENTORYITEMS table, which is a typical “link table”² saying which pre-defined items are owned by PLAYER. As it is typical for “link tables”, PLAYERINVENTORYITEMS contains PLAYERID and ITEMID as fields (and has PRIMARY KEY as a combination of both fields). In a not-so-typical-for-link-tables-manner, it also contains an INVENTORYSLOT field (to identify where specific ITEMID item goes within inventory of player PLAYERID).
- NB for those not-so-familiar with SQL: “link tables” such as PLAYERINVENTORYITEMS are routinely used in classical SQL for many-to-many relationships such as PLAYERS-to-ITEMDICTIONARY. Very briefly – “link tables” are necessary within classical SQL to comply with “no undue duplication” requirement (known as “normalization” in SQL-world).

Ok, by this time we dealt with those ITEMS which are coming from pre-defined lists. However, what about generated ITEMS? To support them, we’ll need to add one more table:

GENERATEDITEMS table, with PLAYERID, INVENTORYSLOT, as well as DISPLAYNAME, TYPE, POWER, WEIGHT, and VALUE as fields. PRIMARY KEY is (PLAYERID, INVENTORYSLOT).³

If we take a look at this structure – we’ll be able to observe that:

It can store everything we need
Each record is uniquely identified (well, we do have all those PRIMARY KEYs)
It is free of undue duplication⁴ (this one is not that obvious, but after honestly trying to find duplicate data in such a DB – we aren’t likely to find it, except for very rare purely random collisions).

However, to meet “performing reasonably well” requirement – we will need to analyse a bit further. As it was defined in our use case – we need to provide two “performing reasonably well” queries:

Access by PLAYERID. This type of queries (assuming that all PRIMARY KEYS are also indexes, which stands for all the RDBMS I know) is already completely index-based – and so (most likely) already qualifies as “performing reasonably well”.
“for such a query to be efficient – we need to make sure that we have an index by ITEMIDQueries to find out which players have those “rare artifacts”. Such queries will go to PLAYERINVENTORYITEMS by ITEMID to get PLAYERID. However, for such a query to be efficient – we need to make sure that we have an index by ITEMID; a multi-field b-tree index (but not hash index) by (ITEMID,PLAYERID) will also do – but not (PLAYERID,ITEMID) one (we’ll discuss more on the mechanics of indexing in Vol. 3, tentatively Chapter [[TODO]]). As a result, to make sure that we’re satisfying “performing well enough” requirement, we have two options:
- when specifying our PRIMARY KEY for PLAYERINVENTORYITEMS, we need to make sure that:
  - btree index (and not hash index) is used to implement PRIMARY KEY, and
  - PRIMARY KEY has ITEMID field as the first one
  - two these properties combined make sure that index-used-to-implement-PRIMARY-KEY, will also work as an index-on-ITEMID-alone
- to have a separate index for PLAYERINVENTORYITEMS by ITEMID
  - having this index separate may also allow to take it as a “sparse” index (also known as “filtered index”) – further improving performance of this particular query.⁵ It is when implementing such “sparse”/”filtered” index, ISRAREARTIFACT field can become necessary.

Phew. It wasn’t too easy, but we managed to do it . While we used 4 separate tables to implement it – 4 tables are certainly not considered a big deal for an SQL DB (and it will perform well too, both for OLTP modifications, and for those queries which we’ve defined).

² a.k.a. junction table

³ this assumes that each INVENTORYSLOT can contain at most one ITEM; if it is not the case – a surrogate key will be probably used as a PRIMARY KEY instead

⁴ in Codd-SQL-speak, it probably qualifies as so-called third normal form (3NF), though I certainly wouldn’t bet my life on it ;-), especially as it being 3NF hinges on GENERATEDITEMS being “random enough” which is not that well-defined

⁵ whether you want to improve it further – is a different story, and it depends on relative frequencies of the queries

SQL with XML

Now let’s take a look what changes if our SQL-based RDBMS allows our tables to have XML fields. Actually, we could try more NoSQL-style with only two tables:

PLAYERS with the following fields:
- PLAYERID being PRIMARY KEY
- DISPLAYNAME and MONEY as usual fields
- INVENTORY is an XML field. This field could contain a list of the INVENTORYITEMS, with each INVENTORYITEM having the following properties:
  - For pre-defined items: <ITEMID> (pointing to ITEMDICTIONARY table), and <INVENTORYSLOT>
  - For generated items: <INVENTORYSLOT>, <DISPLAYNAME>, <TYPE>, <POWER>, <WEIGHT>, and <VALUE>
ITEMDICTIONARY table, exactly the same as for plain SQL (described above).

Again, we can more-or-less easily observe that we can represent everything we need, that we can identify things, and that we don’t have undue redundancy. So, the only thing we still need to do to be certain that the model above achieves our Holy Grail of complying to our requirements – is to make sure that our queries perform reasonably well.

As for the first type of queries – accessing everything via PLAYERID – it will work even better than classical Codd-style SQL.⁶

“However, the second type of query (finding which of the PLAYERS have specific artifacts) is not that simple.However, the second type of query (finding which of the PLAYERS have specific artifacts) is not that simple. In absence of indexes (other than PLAYERID) any such query over XML will require a table-scan (with parsing each of XMLs on the way – double-ouch!). To avoid this O(N) table-scan – we need an index over the <ITEMID> property of XML INVENTORY field. If our RDBMS allows it – bingo! We’ve complied with our requirements, and can handle queries-we’re-interested-in, without table scans too.

⁶ more specifically – it will have the same Big-O complexity, but will probably be faster due to effective denormalisation achieved via XML

SQL-with-XML: Current Implementation Issues

As we can see, at least theoretically SQL with XML fields and indexes can do what we need it to do (and also we can see that in theory SQL-with-XML can perform not worse and probably even better than classical SQL).

However, all XML-related database technology is still in its infancy by RDBMS standards, and it seems that as of 2016, support and performance of XML queries still lacks . In particular, there are reports that SQL-with-XML performs incredibly poor compared to competition; for example, [SuarezEtAl] measures SQL Server with an XML field losing around 100x⁷ compared to MongoDB (and references to SQL Server having all XML indexes based on full-text index look very annoying performance-wise too). OTOH, as I look at DB/2 and Oracle documentation ([IBM] and [Oracle] respectively), their respective “XML indexes” and “function-based indexes on XMLType data” seem to be much more like traditional indexes over XML properties, so I’d probably try to use them before claiming that they won’t work fast enough.

To summarize the above:

At this point I don’t see why SQL-with-XML should be necessarily anywhere slower than any alternative technology⁸
On the other hand, as of 2016 quality of XML implementations (and especially XML indexing) tends vary significantly
On the third hand  – I expect this performance handicap of SQL-with-XML to be reduced with time.

In other words – while with time I’d expect XML-based SQL to become a very useful and widely used addition to SQL – for the time being, make sure to test XML-based SQL including its performance (and in conditions-resembling-real-world ones too) before using it.

⁷ when this kind of discrepancies arise, there is always a question whether the test has used the most optimal indexes for the loser – and unfortunately, I don’t know enough to be 100% sure that it has 🙁

⁸ that is, unless we’re able to trade-off ACID for performance

Structured NoSQL

In [[TODO]] section above, we defined Structured NoSQL (such as Mongo-DB). Very shortly – it is a kind of document storage with the document/object stored as a whole, but individually addressable⁹ fields of the document/object.

“NoSQL will usually call for another denormalisation on top of what we’ve described above for SQL-with-XML.NoSQL will usually call for another denormalisation on top of what we’ve described above for SQL-with-XML. For example, if NoSQL DBMS we’re using, is storing documents as JSON (or, like MongoDB, as BSON) – we can store our data within following tables/collections:

PLAYERS, with PLAYERS documents with the following fields:
- PLAYERID being a JSON field which is also a key (_id in MongoDB-speak)
- DISPLAYNAME and MONEY as regular JSON fields
- INVENTORY is a JSON array consisting of INVENTORYITEMS. Each of INVENTORYITEMS contains:
  - For pre-defined items: ‘ITEMID’ (pointing to ITEMDICTIONARY table), and ‘INVENTORYSLOT’
  - For generated items: ‘INVENTORYSLOT’, ‘DISPLAYNAME’, ‘TYPE’, ‘POWER’, ‘WEIGHT’, and ‘VALUE’
ITEMDICTIONARY table, with each of the documents containing ITEMID as a key JSON field, and ‘DISPLAYNAME’, ‘TYPE’, ‘POWER’, ‘WEIGHT’, and ‘VALUE’ as regular JSON fields.

Let’s note that in quite a few NoSQL DBs relying on relations from one document to another one (such as ‘ITEMID’ referring to ITEMDICTIONARY table) are quite dangerous because of the lack of the multi-document ACID transactions (more on it in [[TODO]] section); however, in our case it is ok (as long as ITEMDICTIONARY is an essentially constant table, questions of inter-document ACID consistency won’t arise).

Let’s also note that while the above structure was expressed in term of JSON – pretty much the same would work for a column store.

Now about complying to our requirements. Once again, the only problem on the way of compliance – is performance of that query of “which PLAYERS have those rare artifacts?” And once again – it can be resolved by an appropriate index, this time on INVENTORY.ITEMID field (or column) within PLAYER document; once again, having Sparse/Partial/Filtered indexes is a plus performance-wise.

⁹ as any generalization, it is imprecise, but for our purposes it will do

On Further Denormalisation

In quite a few NoSQL databases “best practices” tell that it is better to avoid references (such as an ITEMID reference to a (near-)constant table ITEMDICTIONARY) completely – arguing that it is better to denormalise in the name of speed.

I’d say that at least in theory this logic shouldn’t apply to references to small near-constant tables (as such tables, again at least in theory, can be replicated to all the shards, which would mean no extra work to retrieve them, and they will be cached all the time, so costs of joining will be very very minimal). Whether your NoSQL DB allows to do this kind of replication (and how fast it will work) – is up to you to figure out. If it doesn’t work – it is indeed possible to denormalise the structure above further (actually – denormalise it completely), and to get rid of ITEMDICTIONARY table entirely, incorporating ITEMDICTIONARY data within each of the PLAYERS instead. This complete denormalisation will have its own drawbacks:

First of all, it will increase the size of your data. And (in spite of claims by certain NoSQL projects that disk costs nothing these days and implying that it is the only cost of data being larger) this increase will also hit your caches and as a result – your read performance too.
Second, if you’ll want to change the properties of all the items of one single type – you will be facing quite a challenge. I don’t want to say that you will need it – but if it happens, it is going to be rather ugly.

“Despite these drawbacks, this completely denormalised data model will work (though you’ll need to figure out its performance yourself on your specific NoSQL DB).Despite these drawbacks, this completely denormalised data model will work (though you’ll need to figure out its performance yourself on your specific NoSQL DB).

Unstructured (Key-Value) NoSQL

If speaking about key-value NoSQL – we still can use exactly the same data models as described above for Structured NoSQL. However, when trying to optimize that non-key-based query to find those PLAYERS who currently have that rare artifact – we’ll run into table scan, which in turn fails to satisfy “perform reasonably well” requirement 🙁 .

On the other hand – “pure” key-value stores are rather rare these days, and functionality to implement secondary indexes MAY be provided in one way or another by a primarily-key-value data store. Just as one example – Redis does provide machinery to implement your own secondary indexes (albeit in a rather cumbersome manner), see, for example, [Redis].

Summary

As we’ve seen above – for our litmus test case, we can comply with our requirements on a really wide range of DBMS (ranging from classical SQL-based RDBMS to primarily-key-value Redis, with pretty much everything else in between).

Or, in other words, I tend to rephrase it as follows –

everything we’ll need, can be expressed (and can perform reasonably well) for all types of the DBMS: from classical normalized SQL DBs on one side of spectrum, through SQL-with-XML¹⁰ and Structured NoSQL, and into Key-Value NoSQL on another side of spectrum.¹¹

It means, in turn, that the differences between data models (discussed above) are not likely to be decisive when making a choice between SQL and NoSQL; the real difference lies elsewhere. In particular, as discussed in [[TODO]] section, for OLTP purposes multi-row ACID guarantees are usually very important; on the other hand, for analytical queries over huge amounts of historical (and eventually-consistent) data tend to help significantly.

¹⁰ while it seems to be lacking right now, my educated guess is that – given enough demand – they will catch up fairly quickly

¹¹ although some queries over the last one will need some aids which go beyond pure key-value

[[To Be Continued…

This concludes beta Chapter 20(i) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 20(j), where we’ll conclude Chapter 20 and will discuss how to put the whole DB thing together (for the time being – and also to provide roadmap into the future)]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

ITHare (10)
Reddit (2)

Comments

Mikhail Maximov says

December 6, 2016 at 11:56 pm

Typo: “hey, it is no document-oriented” – “not”.

Reply
- "No Bugs" Hare says
  
  December 7, 2016 at 4:58 am
  
  As Garfield The Cat has once said:
  – We have a technical term for it. We name it “oops!”
  
  🙂 Fixed, thanks!
  
  Reply
Jesper Nielsen says

December 7, 2016 at 8:19 am

Are you always updating on mondays? I’m starting to look forward to your weekly chapters 🙂

It surprises me to see (PLAYERID, INVENTORYSLOT) being used as a primary key for the GENERATEDITEMS table. Normally I would expect a primary key to be something that rarely/never changes, like a GUID or auto-incremented INT, or if applicable a stable natural key.
It’s quite unfortunate having to change the primary key for an item every time it’s traded – and even every time it’s being moved to another inventory location.
With the chosen key you won’t be able to easily identify an item relative to it’s history or audit data (and to identify it you need to have perfectly timed history/audit trail to unwind changes to the key and track previous occurences)

If you have no duplicate objects with that schema then this is because your business logic (clustering player saves in transactions when needed) is sound and your transactions are implemented correctly. If you do have duplicate objects because of errors in your business or transaction logic then you will have a hard time identifying them.

Now – having both a surrogate key and a UNIQUE constraint on the (PLAYERID, INVENTORYSLOT) pair makes a lot of sense if you want a clustered index (If your RDBMS supports this) because having all item records for the same player next to each other on the disc is probably a GOOD thing, even though it means you have to move them on the disk when an item changes owner. In MS SQL a clustered index is a requirement for table partitioning – I think PostGres doesn’t have an option for maintaining clustered indexes but I don’t know about the other RDBMS.

But: You might also want to store items in other ways. Items might be inside containers, and they might not even be stored in the player inventory if player run shops, auction houses etc. are implemented.

NB: When deciding to only store “stock” items as a link to the ITEMDICTIONARY table you have also made the decision to update “already existing” items whenever you make changes to the ITEMDICTIONARY table. This may or may not be what you want.

Reply
- Jesper Nielsen says
  
  December 7, 2016 at 8:30 am
  
  Edit: I’m reading up on this – in MySQL it seems your primary key will always be the one used for the clustered index of a table. Either way – as long as you have a stable key for a record it won’t matter much if it is also the primary key in your RDBMS or not. In MS SQL Primary key simply means unique, not null, and defaults to clustered.
  
  Reply
- "No Bugs" Hare says
  
  December 7, 2016 at 9:12 am
  
  > Are you always updating on mondays?
  
  Mostly Mondays, though sometimes it slips into Tuesdays.
  
  > I’m starting to look forward to your weekly chapters 🙂
  
  Thanks! OTOH, soon I will finish discussing DBs, and will move ahead (and not sure whether it will be interesting for you).
  
  > It surprises me to see (PLAYERID, INVENTORYSLOT) being used as a primary key for the GENERATEDITEMS table. Normally I would expect a primary key to be something that rarely/never changes, like a GUID or auto-incremented INT, or if applicable a stable natural key.
  
  Logically – (PLAYERID, INVENTORYSLOT) is a perfectly valid PK. If speaking about clustered indexes – to achieve locality around PLAYERID, you will actually want a clustered index which starts with PLAYERID (!), so (PLAYER,INVENTORYSLOT) is not bad from this point of view. Having both surrogate PK and UNIQUE constraint-on-top-of-clustered-index might be possible with some DBs – but my guess is that performance-wise it won’t be better (you will still need to update clustered index each time you’re changing INVENTORYSLOT, and this is the main cost in this case).
  
  So, until there is some specific performance data for specific DB – performance-wise I’d stick to clustered (PLAYERID, INVENTORYSLOT). However, if you want to audit GENERATEDITEMS (which is very unusual) – or if you need to have items lying around (which is much more frequent, see also below) – yes, you will probably need some kind of surrogate key. Note however, that performance with secondary keys heavily depends on the question of “whether secondary keys refer to PK or to pages directly” – and IIRC this varies between DBs, so maybe in this case it will be better to ensure uniqueness at app-level.
  
  > But: You might also want to store items in other ways. Items might be inside containers, and they might not even be stored in the player inventory if player run shops, auction houses etc. are implemented.
  
  Sure, but IMO extending original simple example that far is beyond the scope of this post/Chapter (or maybe it doesn’t? I’ll think about it). In any case, it can be done rather easily via creating a very similar structure for “shops” etc.; for Codd-SQL they will simply refer to GENERATEDITEMS by surrogate ID (and for NoSQL – they will be moved around tables, I don’t think it will really fly some other way without multi-doc/multi-row ACID).
  
  Reply
  - Jesper Nielsen says
    
    December 7, 2016 at 11:38 am
    
    I’m sure whatever you come up with is interesting for me as long as it’s for the book:)
    
    You’re right – it really depends on the actual RDBMS if secondary unique constraints are in fact feasible. It also depends on the actual RDBMS whether clustered indexes are synonymous with PK – or whether clustered indexes are even an option (Not in Postgres… I was actually considering using this until I found out but now I’m leaning more towards MySQL or MariaDB – if I even decide to normalize the inventory that is…)
    If the business and transaction logic is sound then the unique constraint is just an additional workload.
    I will still argue the need for a permanent identifier that follows the item – even if it isn’t enforced in a constraint – so you can consistently query for duplicates. Obviously this query will take a bit of time doing a table scan versus an index scan but it’s not something you’ll necessarily run very often. If this query ever returns anything then there are bugs in the code that need fixing 🙂
    
    Spoiler alert: In my game GeneratedItems would definitely be the table you would want to audit as least as much as you would want to audit InventoryItems:)
    
    Reply
    - "No Bugs" Hare says
      
      December 8, 2016 at 4:38 am
      
      > whether clustered indexes are even an option (Not in Postgres… I was actually considering using this until I found out but now I’m leaning more towards MySQL or MariaDB – if I even decide to normalize the inventory that is…)
      
      Just for reference: Postgres has “clustered tables”: (clustered around the index), described here: https://www.postgresql.org/docs/9.1/static/sql-cluster.html , but it _seems_ that they’re not maintained clustered (and require regular rerunning CLUSTER command which obtains EXCLUSIVE lock) – not an option in any 24×7 environment :-(.
      
      Reply
      - Jesper Nielsen says
        
        December 8, 2016 at 8:31 am
        
        That’s my understanding too. Back in the days I remember using MySQL with MyISAM and regularly running “ALTER TABLE … ORDER BY …” – YUCK 🙂
Mikhail Maximov says

December 9, 2016 at 2:57 am

SQL with XML: I think that purpose of playerID in XML field should be explained. Otherwise this will fall into “undue duplication”.

Reply
- "No Bugs" Hare says
  
  December 9, 2016 at 7:50 am
  
  You clearly got an eagle eye :-). It indeed was a typo – fixed now, THANKS!
  
  Reply