Marshalling and Encodings

posted May 16, 2016 by "No Bugs" Hare, translated by Sergey Ignatchenko

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[[This is Chapter 14 from “beta” Volume IV of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

We’ve touched upon a question of encodings and marshalling briefly in Chapter III, discussing principles rather than specific implementations. Now it is time to discuss implementing marshalling and encodings in more detail.

Encoding Requirements

Before taking a look at the existing and non-existing-yet encodings, let’s think a bit on our requirements. From my perspective, there are several important things which we’ll need:

Compatibility across different platforms. Most of the time compatibility issues can be ignored for Server-to-Server communications (where both sides are usually x86, and usually you can even compile them with exactly the same compiler), but very important for Client-Server stuff
“Encoded data size is especially important for Client-Server communications, and less important (within reason) for Server-to-Server ones and for locally-stored format.Encoded data size. Especially important for Client-Server communications, less important (within reason) for Server-to-Server ones and for locally-stored formats
Processing speed (both compose and parse). More important for Server-to-Server communications and for locally-stored formats, less important (within reason) for Client-Server communications (where encoded data size usually takes precedence)
Extensibility as “ability to add/extend fields without breaking backward compatibility”. Always a MUST-have; ideally should be bullet-proof too (i.e. inadvertent backward-compatibility-breaking changes to the protocol should be impossible or detectable).
Ability to drop no-longer-necessary fields. Nice-to-have, but rarely important in practice, as most of the time you will be adding new features, opposed to abolishing old ones.

Now, let’s take a bit different look at it and rank these properties according to relative importance for Client-Server and Server-to-Server communications separately.

My personal (and inevitably subjective) view on the relative importance of properties for Client-Server encodings goes as follows (from most important ones to the least important ones):

Compatibility across different platforms
Extensibility (add/extend fields)
Encoded data size (you’re paying for the Client traffic, so it is Really Important, not to mention the potential to overwhelm the player’s last mile, smaller packets getting precedence in some of drop algos, etc. etc.)
Processing speed
Dropping fields

My take on the same relative importance, but for Server-to-Server encodings (and local serialization/deserialization ones) goes as follows:

Extensibility (add/extend fields)
Processing speed
Encoded data size
Compatibility across different platforms
Dropping fields

“<spoiler>we shouldn’t be surprised if we find out that the optimal encodings are different for Client-Server and Server-2-Server communications</spoiler>As we can see, the order of priorities goes quite differently for Client-Server and Server-2-Server encodings. As a result, <spoiler>we shouldn’t be surprised if we find out that the optimal encodings are different for Client-Server and Server-2-Server communications</spoiler>.

Some of Existing Encodings

Encodings used for games, vary greatly these days. Let’s take a look at some of the popular choices. On one end of spectrum, there are text-based encodings such as XML and JSON.

XML

XML is a popular text-based encoding, and it has its own merits elsewhere, but honestly, I don’t really see any reasons to use it for games (well, except for communicating with 3^rd-parties such as payment processors which tend to use XML a lot – and there is a good reason for it too).

For all internal Client-Server and Server-2-Server communications, I see XML as too verbose (causing MUCH more traffic than necessary), and taking way too much time to compose/parse.

JSON

JSON is another text-based encoding, and given its roots coming from JavaScript, it is not surprising that it is quite popular among browser-based games.

On the other hand, JSON is still verbose ([Flatbuffers] estimates it about 4.5x larger than Google Protocol Buffers on a sample data chunk intended to be representative of game data). It is quite slow to parse and compose too 🙁 .

In addition, JSON (unless special measures, mentioned below, are taken) tends to have significant problems in ensuring-backward-compatibility department. While you CAN extend JSON (and very easily too), it is also easy to break backward compatibility :-(. Overall, encodings with explicit schemas (which is “pretty much any other encoding we discuss in this Chapter, except for Plain C structures”) tend to enforce backward compatibility significantly better. On the other hand, if you’re using Data Transfer Objects (DTOs), you MAY treat them “as if” they represent your schema, and with appropriate organisational controls it is likely to work. Still, size and CPU costs of JSON will remain high.

As a result, I do NOT recommend using JSON for games (even for Server-2-Server, where on-the-wire sizes are not too important, CPU costs will take their toll).

Google Protocol Buffers

Google Protocol Buffers is an IDL-based binary encoding which has got a lot of popularity in recent years. It is binary (i.e. as a rule of thumb more efficient size-wise than text encodings above), it is extensible, and it is reasonably efficient (or at least not too inefficient) CPU-wise. Overall, it is IMHO one of the best choices for a usual business app.

“However, when it comes to games (more specifically – to Client-Server communications), I still strongly prefer my own IDL with my own IDL compiler and my own (bit-oriented) encoding.However, when it comes to games (more specifically – to Client-Server communications), I still strongly prefer my own IDL with my own IDL compiler and my own (bit-oriented) encoding. The main reason for it is that there is only one encoding for Google Protocol Buffers (a.k.a. protobufs), and this is not exactly optimized for games. In particular, protobufs are missing out on quite a few goodies described in Chapter III: delta compression is not supported, there are no acceptable ranges for values, no rounding, no dead reckoning, and no bit-oriented encodings. Which means that if you use Google Protocol Buffers to marshal your in-app data structures directly, then compared to your own optimized IDL, it will cost you in terms of traffic, and cost a lot.

Alternatively, you may implement yourself most of the compression goodies mentioned above, and then to use Google Protocol Buffers to transfer this compressed data, but it will clutter your application-level code with this compression stuff and still won’t be able to match traffic-wise some of the encodings possible with your own IDL (in particular, bit-oriented streams and Huffman coding will be still pretty much out of question¹)

Therefore, while I agree that Google Protocol Buffers are usually good enough for most of the business apps out there (and for Server-2-Server communications for your game too), I insist that for Client-Server game communications you usually need something better. MUCH better.

¹ that is, unless you’re using Google Protocol Buffers just to transfer pre-formatted bytes, which usually doesn’t make much sense

[[TODO: Thrift, Apache Avro]]

Plain C Structures

Sending plain C structures over the network has been a common practice since time immemorial, and is often used by game developers even in XXI century. Well, everybody who is in the business of network protocol development, will tell you instantly:

DON’T DO IT!

Passing C structures across the network is an extremely risky-in-the-long-run thing, plagued with issues such as endianness and alignment (not to mention an utter lack of sanitisation). It tends to fail even for Server-2-Server communications, and for Client-Server communications with multiple Client platforms it is pretty much doomed.

Endianness

Many hundred large volumes have been published upon this controversy:

but the books of the Big-endians have been long forbidden,

and the whole party rendered incapable by law of holding employments.

— Jonathan Swift, Gulliver’s Travels, 1726 —

One pretty nasty thing which comes into play when we’re trying to pass data “as it is stored in memory” (i.e. plain C structures) over the network, is so-called endianness. Whenever we need to store a more-than-a-single-byte-value in memory, a question arises: “should we store the least significant byte first, or the more significant byte first?” If we store the least significant byte first, we’ll end up with a “little-endian” system, otherwise – with a “big-endian” system.²

Actually, there is very little practical difference between the two (besides them being different), and all the arguments along the lines of “big-endian is better than little-endian” (or vice versa) are perfectly pointless. What matters, however, is that if you pass a C structure (the one with int16_t/uint16_t, int32_t/uint32_t etc. field) from a machine which is little-endian to a machine which is big-endian, you will find that the value of the uint16_t/… field is different from the one which you’ve sent 🙁 .

“To confuse things even further, there is also a term “network byte order”; in fact, “network byte order” is just a synonym to “big-endian”To confuse things even further, there is also a term “network byte order” (the one used by functions such as htons()/ntohs()/htonl()/ntohl())); in fact, “network byte order” is just a synonym to “big-endian”. On the other hand, while htons()/ntohs()/htonl()/ntohl() CAN be used to achieve endian-neutral encodings, there is no law whatsoever which requires all the bytes transferred over the network, to be big-endian; what is important, though, is that both sides of your communication interpret data are on the wire in the same way.

² I’ve seen people who have heard of people who have told that they’ve seen systems with so-called “Middle-Endian” which is neither Little-Endian nor Big-Endian for 4-byte values. I haven’t seen them myself, and the chances that you’ll ever need to deal with such systems, are very slim these days.

Endianness these Days: (almost-)exclusively Little-Endian

“while ARM as such is technically bi-endian – I didn’t see any ARM-based consumer devices which are big-endian.These days, almost-all-systems-you-will-find-out-there (both server-side and client-side) are little-endian. In particular, all x86 boxes (both server-side and desktop), are little-endian. Moreover, while ARM as such is technically bi-endian (i.e. it can be configured to be either little-endian or big-endian at boot time) – I didn’t see any ARM-based consumer devices which are big-endian.

Yes, there are still systems out there which are big-endian (mostly servers such as Solaris/SPARC, AIX/Power and some versions of Linux/Power) – you’re not really likely to use any of these for your game servers. When we’re speaking about consoles – while both PS3 and Xbox 360 were big-endian, both PS4 and Xbox One are little endian.

Bottom line: while there is a chance that at some point in time there will be client-side devices configured to be big-endian, at the moment (saving for PS3 and Xbox 360) pretty much all of them are little-endian.

Endianness-agnostic Code

One further thing to keep in mind with regards to endianness. It is perfectly possible to write “endianness-agnostic” code, which will work equivalently regardless of underlying architecture. For example:

void serialize_uint16(DataBlock& data, uint16_t u16) {
  uint8_t* ptr = data.grow(2);
  *ptr++ = (uint8_t)u16;
  *ptr = (uint8_t)(u16 >> 8);
}//deserializing is very similar

effectively produces little-endian ‘on the wire’ encoding regardless of the underlying platform being little-endian or big-endian (!). While it won’t help to pass plain C structures, there is one thing to keep in mind when implementing marshalling (and actually this is the only way of doing it if your programming language doesn’t expose endianness – and this is pretty much any programming language out there except for C/C++).

Endianness in C/C++

For C/C++, the code above will also work (and there are arguments to use such “endian agnostic code” for C/C++ too, see, for example, [Pike12]). On the other hand, more traditional way to handle endianness in C/C++ is to use macros such as LITTLE_ENDIAN and BIG_ENDIAN (and as you’re supposedly compiling your game yourself, you won’t have much problems to #define these macros for each of your target platforms – or to use trickery such as the one described in [Raymond13]). As soon as you’ve got your LITTLE_ENDIAN and BIG_ENDIAN macros, code such as

uint16_t toLE(DataBlock& data, uint16_t u16) {
#ifdef LITTLE_ENDIAN
  return u16;
#elif defined(BIG_ENDIAN)
  return (u16 << 8) | (u16 >> 8);
  //if you’re REALLY adventurous, you MAY experiment with
  //  asm instructions developed for this purpose, such as
  //  BSWAP instruction on x86
  //Preferred alternative though is intrinsics such as
  //  _byteswap_ushort/_byteswap_ulong and
  //  __builtin_bswap16 /__builtin_bswap32
#else
  #error “ENDIANNESS IS NOT DEFINED”
#endif
}

can be used to swap bytes when applicable.

An alternative to manually-handled LITTLE_ENDIAN/BIG_ENDIAN is pretty-universally-available htons()/ntohs()/htonl()/ntohl() functions; the drawback of using them (very minor if you ask me) is that they will convert everything into the big-endian (and as most-of-the-time you will be communicating between two little-endian machines, it will cause two unnecessary conversions).

Alignment

If you thought that endianness was bad enough, wait until we discuss alignments. The thing with alignments is that sometimes compiler silently inserts padding within your C/C++ structures.³For example,

struct X {
  uint8_t a;
  uint32_t b;
};

is likely (though not guaranteed) to be equivalent to

struct X {
  uint8_t a;
  uint8_t padding[3];
  uint32_t b;
};

where padding is an unused space silently inserted by compiler to make sure that b is “naturally” aligned.

To make things worse,

While alignments DO follow certain patterns,⁴ they are not standard; moreover, default alignments can be overridden with compiler flags or compiler-specific #pragmas⁵

With alignments in play, passing C structures between programs-on-different-platforms becomes a Really Big Problem 🙁 . On the other hand, where one developer sees a problem, another one sees an opportunity; that’s how so-called Zero-Copy protocols have arisen.

³ we won’t discuss here why alignment is necessary, just noting that there are Good Reasons to do it

⁴ we’ll discuss a bit of it in “Flatbuffers (and other Zero-Copy Protocols)” section below

⁵ and in C++11 there is also alignas

Flatbuffers (and other Zero-Copy Protocols)

Recently, a new family of protocols was developed, collectively known as “Zero-Copy Protocols”. For our purposes we’ll concentrate on Flatbuffers (though other Zero-Copy Protocols, such as [CapnProto] and [SBE], are also available).

The idea behind Flatbuffers (as well as behind other Zero-Copy Protocols) is essentially to design a protocol, which will coincide with a meaningfully-aligned C structure for most of the CPUs/compilers out there. Then, at least for those platforms where the format is meaningfully-aligned, marshalling/unmarshalling becomes trivial (and for other platforms, at least in theory, it is possible to generate non-zero-cost marshalling/unmarshalling code⁶).

“On the other hand, due to being optimized for CPU operation, Flatbuffers are NOT optimized space-wise; even compared to not-so-optimal-space-wise Google Protocol Buffers, Flatbuffers can lose additional 1.5x in sizeAs long as Flatbuffers are working on a “good” platform (and this is “pretty much any platform out there”) – they do NOT need to copy data at all. On the other hand, due to being optimized for CPU operation, Flatbuffers are NOT optimized space-wise; even compared to not-so-optimal-space-wise Google Protocol Buffers, Flatbuffers can lose additional 1.5x in size 🙁 . And as noted above, I see it as a major drawback for Client-Server encoding.

⁶ I’m not sure whether existing Flatbuffers already do it, so it is up to you to find it out 😉

ASN.1/PER

ASN.1 is a granddaddy of all the IDL-based protocols, and is a quite an interesting beast – it can generate several different encodings from the same IDL specifications. These encodings are known as “encoding rules”; as of now, there are several ASN.1 encodings out there, including original binary byte-oriented BER, it’s variations DER and CER (used to guarantee that the same semantical message is always encoded exactly the same, which is necessary for certain cryptography uses), XML-based XER, and (most interesting for our purposes now) PER, also known as Packed Encoding Rules.

PER is essentially a bitstream, and is usually somewhat more compact than Google protobuf (YMMV, batteries not included, the end result depends on the constraints specified in ASN.1 file, and on nature of the data). PER does NOT, however, include additional optimisation stuff such as Huffman (discussed below in “not-yet-Existing Encoding: Bit-Oriented Stream” section), so it is not as compact as I’d like to see it.

Also, keep in mind that implementing any encoding of ASN.1 yourself is a Big Headache (as ASN.1 specifications are Big and Ugly). However, there is one tool which is able to “compile” your ASN.1 into C source code: [asn1c]. While I didn’t use it myself, I’ve heard Good Things about it.

not-yet-Existing Encoding: Bit-Oriented Stream

As I’ve said above, when it comes to encodings for Client-Server communications, I am not really satisfied with any of readily-available ones; my main concern here is about excessive traffic caused by most of these encodings.

As we’ve discussed in Chapter III, the difference between a floating-point-oriented encoding and bit-encoded-fixed-point one can be very significant – we can easily speak about 3x difference in traffic. Just one example – if you treat your angle variable as float, you will be transferring 32 bits for it (and if you have it as double – 64 bits); with a bit-oriented encoding (and assuming that your Client uses angle for rendering only, and therefore doesn’t need to deliver your angle with a precision over 0.5 degree) – you don’t really need to pass more than 10 bits of information. And if we’re using “incremental delta compression” (the one which transfers only differences in angle) – transferring the difference as float will still transfer 32 bits,⁷ but if we’re using 10-bit fixed-point representation – we have a chance to get down to 6-8 bits or so on average. And so on, and so forth.

What I am suggesting here is to build your own bit-oriented encoding. Granted, it won’t be a zero-copy one; on the other hand, it will be able to save a LOT of unnecessary information from being sent across the network, and it DOES count for Client-Server communications.

⁷ which won’t compress too, as those-unnecessary-for-client-lower-significance-bits will be filled with pretty-much garbage

Designing BitStream Protocol

Actually, when designing your own game protocol, two things really count performance-wise. In Chapter III the first one was named “encoding” (i.e. format of data on the wire), and the second one – “mapping” (i.e. the way your program will access the data). While usually they are considered together, I strongly suggest to separate them.

My current suggestion⁸ is to use BitStream-oriented encoding with a FlatBuffer-like mapping. In other words, your IDL compiler (see Chapter III) will generate FlatBuffer (or Cap’n Proto, or SBE) structures/classes (and you will access your messages in the same manner as you would for FlatBuffer), but will generate BitStream serialization/deserialization along the lines described below.

“The idea here is to get the best of both worlds: an encoding which is efficient traffic-wise, with a mapping which is reasonably efficient CPU-wise.The idea here is to get the best of both worlds: an encoding which is efficient traffic-wise, with a mapping which is reasonably efficient CPU-wise. The latter is possible because most of the CPU gains of FlatBuffers versus that of Google Protocol Buffers,⁹ come not from parsing as such, but from savings on allocation/deallocation(!).

Now, let’s briefly discuss ways to design our BitStream encoding. First of all, as it was discussed in Chapter III, we want to have an IDL; moreover, as it was also discussed in Chapter III, we want to go beyond specifying “float” or “double” in our IDL, and specify exact ranges and desired accuracy for our data.

For example, an angle field (as long as it is used for rendering only, as it should) can usually be trimmed from 4-byte float into NUMERIC IDL field taking values from -180 to 180 and encoded as a FIXED_POINT with precision of 0.5 degrees. When our IDL compiler sees such specification, it should be able to understand that there are only 720 possible different values for this field, and therefore it can be encoded only with 10 bits (instead of 32 if we use float). This should reduce the number of bits to be transferred, quite a bit.¹⁰

The next thing we’ll be dealing with, are “field-delta-encoded fields” (as defined in Chapter III; in short – whole fields having a flag whether it has changed or not). With bit-oriented stream, each field-which-didn’t-change, will take exactly one bit, which is pretty good.

As a next step, we can think about encoding of those fields which are “incrementally-delta-encoded” (i.e. numeric fields and we’re transferring the difference from the previous value).¹¹ In particular, we can make a guess that our delta has a strong bias towards zero. As long as this stands, we can use a VLQ-like encoding to encode numbers; however, as (unlike VLQ) we’re working with a bitstream – we are not restricted to 8-bit elements, and can use, for example, 5-bit chunks to represent data (i.e. small numbers from 0 to 15 will be represented by 5 bits, and numbers from 16 to 256 – by 10 bits). Even better, we can choose the size of these chunks on a per-field basis depending on per-field statistics common for the game.

⁸ note that my suggestions do evolve over time, so I cannot guarantee that in 2 years I will say the same thing

⁹ at least for non-integer/non-float fields

¹⁰ note that while the result won’t be easily compressible byte-wise, neither are those lower-significand-bits which we don’t really need to transfer (these bits usually look as “garbage”)

¹¹ note that if we encode a variable as float, it usually makes little sense to use “incremental delta compression” (at least as long as the difference is also a float).

On Huffman and Huffman-like Codings

“Let’s take a left turn to a side road and discuss so-called Huffman coding a little bit.Let’s take a left turn to a side road and discuss so-called Huffman coding a little bit. From 50’000-feet view, Huffman coding goes the following way:

you gather statistics about “how frequently certain byte occurs in your data”
you build (and use) an encoding which uses these statistics to represent different bytes with a different number of bits (with more frequently occurring bytes encoded with less-than-8 bits, and less frequently occurring bytes encoded with more-than-8-bits).

Such a specific-to-statistics encoding is usually represented by so-called “Huffman tree”. How this tree can be built, is well beyond the scope of this book, but is described pretty well in [Wikipedia.Huffman].

Let’s note that in the general case, Huffman doesn’t need to operate on ‘bytes’; it operates on ‘symbols’ instead, and number of possible symbols it takes as inputs, does NOT need to be 2^N. This means that if we need to encode our angle variable discussed above, then we can easily take all its 720 possible values and feed it to Huffman coder; and Huffman coder will still generate an optimal number of bits for each of the values (that is, assuming that statistic which was used to calculate encoding, stands).¹²

Let’s further note that while usually, Huffman coding is used to process rather generic stream of symbols (for example, deflate uses it to encode stream of symbols generated by LZ77), we can often get substantially better compression efficiency if we use different statistics for different fields of our messages. In other words, even if we have three quite similar angle variables as yaw, pitch, and roll, statistics for them is likely to be different, and we should be able to utilize this difference in statistics (by generating different Huffman trees for different angle fields). The improvement becomes even more pronounced when we need to encode fields of very different nature (like coordinates, offsets, and angles).

“Don't spend too much time playing with Huffman codings.Finally, a word of caution regarding Huffman codings. Don’t spend too much time playing with them. While Huffman coding can easily get you 20% decrease in size of data-on-the-wire (up to 30% with per-field Huffman trees) – do NOT expect miracles, beyond 20-30% you will be facing a law of diminishing returns, and miracles are not likely to happen 🙁 . On the other hand, ability of Huffman coding to work with symbol vocabularies which are not 2^N, often DOES come handy when designing your protocol/compression.

¹² Strictly speaking, Huffman creates an optimal coding only for a system when each input symbol is represented by whole number of bits; if we’d allow the partial-bits can be used – we’d go into realm of so-called arithmetic coding, but it is usually waaaay tooooo sloooow for game purposes

Optimizing Huffman speed-wise

As Huffman works at bit level, in general it may need to read bits one by one, which is relatively slow. On modern CPUs it is not as slow as it might seem at the first glance (as the cost of register-register operation is <1 CPU clock, and cost of a random memory access can be as large as 100 CPU clocks), but if not implemented carefully, it may lead to an observable slowdown.

Optimizing Huffman serialization is not that difficult – a simple table which tells “how many bits and which ones” you need to use for a specific input symbol, does the trick.

Improving performance of Huffman deserialization is more tricky. In this regard, I know two different (and potentially complementary) techniques.

The first one is to replace optimal-space-wise Huffman coding with a similar-but-trading-off-space-for-speed Huffman-like coding. The idea for Huffman-like coding is still the same (encoding more frequently occurring symbols with less bits), but the implementation is different. For example, an encoded symbol in Huffman-like encoding from [Ignatchenko98] always consists of 2 sub-fields: the first sub-field is a 4-bit “group number” (and group number defines the number of bits in the second sub-field), and the second sub-field is a number of the symbol within this group. Such Huffman-like codings have been seen to provide significant speed-up at the cost of marginal reduction in compression level (and therefore, a marginal increase in the size of data-on-the-wire). As always, YMMV and batteries not included.

“The second technique to improve speed of Huffman coding applies only if our statistics tables are pre-defined.The second technique to improve speed of Huffman coding applies only if our statistics tables are pre-defined. If this is the case, your IDL compiler should be able to generate code with Huffman trees forming a part of the code, saving quite a bit of performance on memory accesses which are necessary otherwise. For example, instead of having “if(table[bit]) do_something(); else do_something_else();” (with code being the same regardless of Huffman tree and only table being generated by IDL compiler based on stats, you can generate specific-to-Huffman-tree “if(bit) do_something(); else do_something_else();”, saving a rather expensive extra indirection.

Using Huffman Coding for our Bitstream

Now we can return back to our BitStream encoding. To use Huffman coding for our Bitstream, we can run our game with one of the serializations mentioned above, and get stats on “how frequently each of the numbers occurs in the stream” for each of the fields involved in the encoding. These stats will allow us to generate those Huffman trees for each of the fields, and to generate statistically-optimal representations for these fields. A few notes about using Huffman for this purpose:

Unlike, say, deflate, we’re speaking about Huffman coding with trees which do NOT change while the program runs.
- These trees will need to be encoded into both Client and Server (either as tables, or as code)
- “As noted above, I suggest your IDL compiler to generate these trees as executable code, not as data structures which CPU needs to traverse. This will allow to save quite a bit of CPU cycles during processingAs noted above, I suggest your IDL compiler to generate these trees as executable code, not as data structures which CPU needs to traverse. This will allow to save quite a bit of CPU cycles during processing
- If speed of Huffman is a concern, consider Huffman-like codings such as the one mentioned above
If number of bits to encode is high, your vocabulary of input symbols will grow significantly (if we’re speaking about coordinates, they can easily take millions of values). In cases of such million-value input vocabularies, usual Huffman trees are not working well.
- However, often representing a long field as a tuple of (Huffman-encoded-bits,remaining-plain-bits) helps to work around this issue. In other words, if you have a 21-bit field (for example, representing a coordinate from -100’000 to 100’000 with a 0.1 precision), then you can often represent it as a pair of (8-bit-Huffman-encoded-field-taking-at-most-256-possible-values,13-bits-passed-exactly-as-is).

[[TODO: arithmetic coding as an alternative to Huffman]]

BitStreams and extensibility

Extensibility for our BitStream-based encoding can be achieved, for example, along the lines discussed in Chapter III (via “FENCEs”). It will allow for an easy way to add new fields (and to extend existing ones).

In addition, I am arguing for having our IDL compiler to have an option to compare two encodings for being backward compatible (again, along the lines outlines in Chapter III), and to make such a check a part of our build process. It will allow to avoid scenarios when a supposedly-backward-compatible encoding happens to be a not-really-compatible one; computers are notoriously better in catching this kind of stuff than people.

Comparison of different Encodings

Now, let’s compare different encodings discussed above, against those requirements discussed even further above. As any such comparison, it is going to be somewhat subjective, so take it (as anything else) with a good pinch of salt.

	XML	JSON	Protocol Buffers	ASN.1 PER	BitStream (with FlatBuffers mapping)	FlatBuffers	Plain C Structures
Compatibility	Very Good	Very Good	Very Good	Limited¹³	Very Good	Good	Poor
Encoded Data Size	Very Poor	Very Poor	Acceptable	Acceptable to Good	Good	Poor	Poor
Processing Speed	Very Poor	Very Poor	Poor¹⁴	Acceptable	Acceptable	Very Good	The Best Possible
Extensibility	Very Good	Acceptable	Good	Good	Good	Good	Poor
Dropping Fields	Very Good	Very Good	Acceptable¹⁵	Acceptable¹⁵	Acceptable¹⁵	Acceptable¹⁵	Very Poor¹⁶
Suitability for Client-Server	Very Poor	Very Poor	Acceptable	Acceptable to Good	Good	Poor	Very Poor
Suitability for Server-2-Server and Local Storage	Poor	Poor	Poor¹⁴	Acceptable	Acceptable	Very Good	Poor

¹³ In particular, support for other-than-C languages is limited

¹⁴ “Poor” applies only to those implementations I’ve seen; Protobufs as a protocol can be made at least “Acceptable” for sure

¹⁵ only fields originally designated as optional, can be dropped

¹⁶ there are no optional fields, so only deprecation (without any space savings) is possible

“As you can see, I tend to favour custom BitStream-based encodings (when applicable – with FlatBuffer-like mappings) for Client-Server communications, and FlatBuffers – for Server-to-Server communications.As you can see, I tend to favour custom BitStream-based encodings (when applicable – with FlatBuffer-like mappings) for Client-Server communications, and FlatBuffers – for Server-to-Server communications. As always, YMMV and batteries are not included (for example, if speaking about inter-data-center Server-to-Server communications, relative importance of size-of-data-on-the-wire MAY increase, causing some shift for these connections).

As usual, the most important thing is to have your encoding isolated from your game logic, so you can play with encodings even after your game goes live. Things DO change, and having your encoding fixed once-and-forever often proves to be Too Restrictive 🙁 .

On Strings and Encodings

One thing which deserves special mention, is choosing encoding for strings. These days, most of people will say “Hey, you don’t even need to think about it – UTF-8 rulezzz!” and will be right most of the time. However, instead of taking it lightly, let’s take a little bit closer look at this question.

First of all, let’s note that

for most of the user-readable strings, it is better not to send them as strings over the wire

Instead, it is better to replace these user-readable strings with integer IDs and perform conversion from ID to string on the Client-Side. Not only it will save you some traffic, it will also keep a decision about the Client’s language completely on the Client-Side, simplifying internationalisation significantly.

On the other hand, this approach is not possible for ALL the strings (for example, chat DOES need to be transferred as strings over the wire).

Now, let’s note that (once again) there are two slightly different things here: the first one is Encoding (i.e. format of the string on the wire) and the second one is Mapping (i.e. representation of the string within your program).

String Mapping

When it comes to Mapping, as a Big Fat Rule of Thumb for all the user-readable strings, you need to use whatever-representation-of-the-string which can handle all the Unicode symbols (at the very least – those in so-called Basic Multilanguage Plane a.k.a. BMP – with code points between 0 and 65535).

“It doesn't matter much in practice whether we're using UTF-8, Microsoft-style UTF-16, or OS-X-style UCS-4 a.k.a. UTF-32 to represent our strings within the program.It doesn’t matter much in practice whether we’re using UTF-8, Microsoft-style UTF-16 (with sizeof(wchar_t)==2), or OS-X-style UCS-4 (a.k.a. UTF-32, with sizeof(wchar_t)==4) to represent our strings within the program. In general, you SHOULD NOT rely on one visible character being exactly one char/wchar_t (this doesn’t stand except for UCS-4/UTF-32), and as long as you do NOT rely on this – you’ll be fine regardless of encoding being UTF-8, UTF-16, or UTF-32. On the other hand, encodings, which are not capable of representing all the Unicode symbols (such as Latin-1), are to be avoided (that is, unless the string is used only for your internal IDs).

String Encoding

When it comes to encoding strings, the best byte-oriented representation is usually UTF-8. On the other hand, if our encoding is BitStream-oriented, nothing prevents us from gathering symbol stats for in-string symbols and use Huffman or Huffman-like encoding for string symbols too.

[[To Be Continued…

This concludes beta Chapter 14 from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 16, describing some peculiarities of C++ coding when it comes to games.]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

ITHare (19)
Reddit (12 upvotes, 5 comments)

Comments

Marek says

May 16, 2016 at 10:15 pm

I’d like to refer you to serialization library by my colleague available on GitHub:
https://github.com/Zbyl/ArbitraryFormatSerializer

“Rationale

This library is meant to solve two problems:

1. Arbitrary format should be defined entirely by the user.
* The library does not limit what formats are possible.
* Every call to a serialization function specifies the format explicitly. There are no default formats for any types. This way the code is easy to verify against the format specification.
2. Serialization and deserialization should be realized by the same, declarative code.
* Declarative way of specifying the binary format is another feature that makes it easy to verify the code against the specification.
* Unifying serialization and deserialization reduces the amout of code and eliminates the need to keep two functions in sync.

What this library will not provide:

1. Versioning of the binary format.
* Since the library does not enforce any specific format, it also won’t add any version tags. This is left for the user to implement is any way he chooses. Helper classes implementing versioning might be added in the future.
2. Default formats for common types.
* The library provides formatters for common types, but it will not make them a default. The format will always have to be specified explicitly to make the code verifiable against the format specification.”

For more see the attached link.

Reply
- "No Bugs" Hare says
  
  May 17, 2016 at 6:52 am
  
  Yes, I completely agree with the idea of having several different encodings for the same data (which goes along the lines of separate “encodings” and “mappings”). On the other hand, TBH, I still strongly prefer IDL-based stuff to libraries.
  
  First, cross-language stuff tends to be MUCH simpler with IDL and IDL compiler.
  
  Second, I strongly prefer declarative statements to imperative code wherever possible (they tend to be significantly easier to maintain).
  
  Third, I don’t like to write all the encodings within the code, IMO it is much better to have them completely separated (and potentially handled by somebody who has no idea about C++ at all, but does have a good understanding of data compression and application specifics).
  
  And last but not least, IDL compilers can do lots of interesting stuff besides serialisation, from reflection to ORM.
  
  As everything in this field, it is admittedly arguable, but my experience clearly points me in direction of IDL compilers.
  
  Reply
Robert Basler says

May 17, 2016 at 7:38 am

I built a bitstream protocol for my game (sadly no IDL.) It is designed to be very bit-efficient, that said, when I send large packets I am able to gain significant savings running ZIP compression on the encoded data.

Most of my unique strings are short, I scan strings and encode as 7-bit values when possible.

For chat data, since people use so few unique words (I used to build word prediction systems) I encode the most common 2450 words as 12 bits each. I also encode casing (all upper, all lower, random mixed, only first letter capitalized.) I may have gone overboard.

Reply
- "No Bugs" Hare says
  
  May 17, 2016 at 8:08 am
  
  > when I send large packets I am able to gain significant savings running ZIP compression on the encoded data.
  
  Wild guess: there are some repetitions in these packets, which are handled by deflate’s LZ77. I would suggest to try LZHL from my http://www.drdobbs.com/an-algorithm-for-online-data-compression/184403560 : it usually works better than deflate for not-so-large packets.
  
  > For chat data, since people use so few unique words (I used to build word prediction systems) I encode the most common 2450 words as 12 bits each.
  
  Interesting; however, if going into this direction, I’d further use Huffman (so more frequent words are encoded with less bits, and less frequent ones – with more bits) :-).
  
  Reply
  - Robert Basler says
    
    May 17, 2016 at 7:05 pm
    
    The repetition isn’t obvious looking at the packet data, but there is apparently enough that the compression can pick it out. I was surprised it worked as well as it did.
    
    Huffman coding would be a good next step for chat data – although I keep looking at Jabber.
    
    Reply
Jesper Nielsen says

May 18, 2016 at 11:06 am

I agree that a custom format is the best choice. At the moment I’m working with a byte-oriented protocol for simplicity because I want momentum on “the game itself”.
Changing protocol later on will of course be work – but shouldn’t be too errorprone if the protocol logic is separated from the game logic and networking code, because it’s an easy thing to unit-test serialization/deserialization.

Reply
- "No Bugs" Hare says
  
  May 23, 2016 at 5:52 am
  
  > shouldn’t be too errorprone if the protocol logic is separated from the game logic and networking code
  
  Exactly! 🙂
  
  Reply
Chris L says

May 21, 2016 at 11:22 pm

You might be interested in the llvm bitcode format, which is a bitstream encoding: http://llvm.org/docs/BitCodeFormat.html

Reply
- "No Bugs" Hare says
  
  May 23, 2016 at 5:53 am
  
  Interesting, but IMHO too specific to be easily-usable for games “as is”; however, as a foundation for your own game-related protocol – why not?
  
  Reply
Arek Marud says

May 26, 2016 at 7:10 am

Theoretically ARMs endiannes can be switched, by none of chips I known support it.

Reply
- "No Bugs" Hare says
  
  May 27, 2016 at 4:58 am
  
  At least Cortex A9 (and probably all the later ones) _seems_ to support it, even at per-process level(!). See, for example, answer by Peter Harris here: https://community.arm.com/thread/5488
  
  Reply
grrr says

May 29, 2016 at 6:03 pm

Isn’t it (uint8_t)(u16 >> 8) instead of (uint8_t)(u16 << 8) in serialize_uint16() ?

Reply
- "No Bugs" Hare says
  
  May 30, 2016 at 5:02 am
  
  Of course, fixed, THANKS!
  
  Reply
Peter says

May 31, 2016 at 9:25 pm

For a bit-oriented protocol have a look at (unaligned) PER, an ISO standard where the protocol is described in ASN.1 (also an ISO standard). I have been working with this protocol out of necessity and have come to rather like its nonverbosity on the wire. For ASN.1, there are quirks and stuff that is quite hairy, but if you define your data, you can stay within the sane subset.

Reply
- "No Bugs" Hare says
  
  June 1, 2016 at 4:14 am
  
  Ah, so there is still somebody using ASN.1 beyond DER and X.509… 🙂 I’ve added a section on it (THANKS!), though implementing ASN.1 yourself is Very Tedious (as ASN.1 specifications are rather crazily written) and all the ASN.1 tools I know are Very Ugly (except, maybe, for http://lionet.info/asn1c/compiler.html ).
  
  Reply
yunspace says

July 11, 2016 at 7:13 am

Would like to point out:
– Thrift is an evolved version of Protobuf, used by FB and Twitter.
– Apache Avro which is JSON based and doesn’t use an IDL. Developed out of Hadoop

Reply
- "No Bugs" Hare says
  
  July 11, 2016 at 7:25 am
  
  THANKS, will take a look at them for “2nd beta”. 🙂
  
  Reply
Tarek Khalil says

April 24, 2018 at 4:00 pm

> therefore it can be encoded only with 10 bits (instead of 32 if we use float). This should reduce the number of bits to be transferred, quite a bit.

Sorry if this sounds silly to ask.

Just to elaborate for myself, if you would “specify exact ranges” in your IDL, implementation wise, those 10 bits will be allocated as int16 (for example) and the relevant bits will be filled, and lastly sent over the wire, right?

What I am trying to ask (which might seem like a fundamental CS question), is there a method you would use to avoid allocating two bytes to use the 10 bits?

Reply
- "No Bugs" Hare says
  
  April 25, 2018 at 9:36 am
  
  > is there a method you would use to avoid allocating two bytes to use the 10 bits?
  
  Yes, if we’re using bitstream-oriented encoding (opposed to more traditional bytestream) – and I wrote some bitstream-oriented stuff myself. As a side benefit, bitstreams also tend to play very well with the stuff such as Huffman coding, allowing to reduce traffic a bit further pretty much for free.
  
  Let’s see how three angles (10 bits each) will look in a bitstream: 10 bits angleXY (say, occupying byte1 and 2 bits of byte2), 10 bits angleYZ (occupying bits 2-7 of byte2, and 4 bits of byte 3), and 10 bits angleXZ (occupying last 4 bits of byte 3 and 6 bits of byte 4). IF this is our whole packet-to-be-sent, at this point (but NOT earlier) we will have to align on byte boundary (as neither TCP nor UDP can send non-whole byte), so we will have to round it up to 32 bits. OTOH, IF it is not the end of the bitstream, we will be able to use remaining 2 bits from byte4 for any further information we have to send.
  
  Sure, bitstream formats are not easy to access, but it is never a problem as marshalling/unmarshalling will be generated by IDL compiler behind the scenes, and will convert from “cpu-oriented representation as 3x int16_t” into “bitstream-oriented representation over the wire” and vice versa.
  
  Hope it helps 🙂
  
  Reply