IDL: Encodings, Mappings, and Backward Compatibility

posted February 15, 2016 by "No Bugs" Hare, translated by Sergey Ignatchenko

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

Example: Encoding

We’ve already discussed IDL and Mapping (and can now use our generated stubs and specify how we want them to look). Now let’s see what Encoding is about. First, let’s see what will happen if we use “naive” encoding for our C++ struct Character, and will transfer it as a C struct (except for inventory field, which we’ll delta-compress to avoid transferring too much of it). In this case, we’ll get about 60bytes/Character/network-tick (with 6 doubles responsible for 48 bytes out of it).

Now let’s consider the following Encoding:

ENCODING(MYENCODING1) PUBLISHABLE_STRUCT Character {
  VLQ character_id;
  DELTA {
    FIXED_POINT(0.01) x;//for rendering purposes, we need our coordinates
                        //only with precision of 1cm
                        //validity range is already defined in IDL
                        //NB: given the range and precision,
                        // 'x' has 20'000'000 possible values,
                        // so it can be encoded with 21 bits
    FIXED_POINT(0.01) y;
    FIXED_POINT(0.01) z;
    FIXED_POINT(0.01) vx;
    FIXED_POINT(0.01) vy;
    FIXED_POINT(0.01) vz;
  }
  DELTA FIXED_POINT(0.01) angle;//given the range specified in IDL,
                                //  FIXED_POINT(0.01) can be encoded
                                //  with 16 bits
  DELTA BIT(2) Animation;//can be omitted, as 2-bit is default
                         //  for 3-value enum in MYENCODING1
  DELTA VLQ animation_frame;
  DELTA SEQUENCE<Item> inventory;
};

VLQ A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an arbitrarily large integer.— Wikipedia —Here we’re heavily relying on the properties of MYENCODING1, which is used to marshal our struct. For the example above, let’s assume that MYENCODING1 is a quite simple bit-oriented encoding which supports delta-compression (using 1 bit from bit stream to specify whether the field has been changed), and also supports VLQ-style encoding; also let’s assume that it is allowed to use rounding for FIXED_POINT fields.

As soon as we take these assumptions, specification of our example Encoding above should become rather obvious; one thing which needs to be clarified in this regard, is that DELTA {} implies that we’re saying that the whole block of data within brackets is likely to change together, so that our encoding will be using only a single bit to indicate that the whole block didn’t change.

Now let’s compare this encoding (which BTW is not the best possible one) to our original naive encoding. Statistically, even if Character is moving, we’re looking at about 20 bytes/Character/network-tick, which is 3x better than naive encoding.

Even more importantly, this change in encoding can be done completely separately from all the application code(!) – merely by changing Encoding declaration

It means that we can develop our code without caring about specific encodings, and then, even at closed beta stages, find out an optimal encoding and get that 3x improvement by changing only Encoding declaration.

Such separation between the code and Encodings is in fact very useful; in particular, it allows to use lots of optimizations which are too cumbersome to think of when you’re developing application-level code.

To continue our example and as a further optimization, we can add dead reckoning, and it can be as simple as rewriting Encoding above into

ENCODING(MYENCODING2) PUBLISHABLE_STRUCT Character {
  VLQ character_id;
  DELTA {
    DEAD_RECKONING(0.02) {//0.02 is maximum acceptable coordinate
                          // deviation due to dead reckoning
      FIXED_POINT(0.01) x;
      FIXED_POINT(0.01) vx;
    }
    DEAD_RECKONING(0.02) {
      FIXED_POINT(0.01) y;
      FIXED_POINT(0.01) vy;
    }
    DEAD_RECKONING { //by default, maximum acceptable deviation
                     //  due to dead reckoning
                     // is the same as for coordinate
                     //  (0.01 in this case)
      FIXED_POINT(0.01) z;
      FIXED_POINT(0.01) vz;
    }
  }//DELTA
  DELTA FIXED_POINT(0.01) angle;
  DELTA BIT(2) Animation;
  DELTA VLQ animation_frame;
  DELTA SEQUENCE<Item> inventory;
};

“How much can be gained by each of such specialized encodings – still depends on the game, but if you can try-and-test a dozen of different encodings within a few hours – it will usually allow you to learn quite a few things about your traffic (and to optimize things both visually and traffic-wise too).When manipulating encodings is this simple, then experimenting with encodings to find out a reasonably optimal one becomes a breeze. How much can be gained by each of such specialized encodings – still depends on the game, but if you can try-and-test a dozen of different encodings within a few hours – it will usually allow you to learn quite a few things about your traffic (and to optimize things both visually and traffic-wise too).

Backward Compatibility

One very important (and almost-universally-ignored) feature of IDLs is backward compatibility. When our game becomes successful, features are added all the time. And adding a feature often implies a protocol change. With Continuous Deployment it happens many times a day.

And one of the requirements in this process is that the new protocol always remains backward-compatible with the old one. While for text-based¹ protocols backward compatibility can usually be achieved relatively easily, for binary protocols (and games almost-universally use binary encodings due to the traffic constraints, see “Publishable State: Delivery, Updates, Interest Management, and Compression” section above for discussion) it is a much more difficult endeavour, and requires certain features from the IDL.

What we need from an IDL compiler is a mode when it tells whether one IDL qualifies as a “backward-compatible version” of another one

Ok, this feature would certainly be nice for code maintenance (and as a part of build process), but are we sure that it is possible to implement such a feature? The answer is “yes, it is possible”, and there are at least two ways how it can be implemented. In any case, let’s observe that two most common changes of the protocols are (a) adding a new field, and (b) extending an existing field.² While other protocol changes (such as removing a field) do happen in practice, it is usually rare enough occurrence, so that we will ignore it for the purposes of our discussion here.

The first way to allow adding fields is to have field names (or other kind of IDs) transferred alongside with the fields themselves. This is the approach taken by Google Protocol Buffers, where everything is always transferred as a key-value pair (with keys depending on field IDs, which can be explicitly written to the Protocol Buffer’s IDL). Therefore, to add a field, you just adding a field with a new field-ID, that’s it. To be able to extend fields (and also to skip those optional-fields-you-dont-know-about), you need to have size for each of the fields, and Google Protocol Buffers have it too (usually implicitly, via field type). This approach works good, but it has its cost: those 8-additional-bits-per-field³ (to contain the field ID+type) are not free.

The second way to allow adding fields into encoded data is a bit more complicated, but allows to deal with not-explicitly-separated (and therefore not incurring those 8-bits-per-field cost) data streams, including bitstreams. To add/extend fields to such non-discriminated streams, we may implement the following approach:

“introduce a concept of “fence” into our Encodingsintroduce a concept of “fence” into our Encodings. There can be “fences” within structs, and/or within RPC calls
- one possible implementation for “fences” is assuming an implicit “fence” after each field; while this approach rules out certain encodings, it does guarantee correctness
- between “fences”, IDL compiler is allowed to reorder/combine fields as it wishes (though any such combining/reordering MUST be strictly deterministic).
- across “fences”, no such reordering/combining is allowed
then, adding a field immediately after the “fence” is guaranteed to be backward-compatible as soon as we define it with a default value
- within a single protocol update, several fields can be added/extended simultaneously only after one “fence”
- to add another field in a separate protocol update, another “fence” will be necessary
extending a field can be implemented as adding a (sub-)field, with a special interpretation of this (sub-)field, as described in the example below

¹ such as XML-based

² that is, until we’re throwing everything away and rewriting the whole thing from scratch

³ Google Protocol Buffers use overhead of 8 bits per field; in theory, you may use something different while using key-value encodings, but the end result won’t be that much different

Let’s see how it may work if we want to extend the following Encoding:

ENCODING(MYENCODINGA) PUBLISHABLE_STRUCT Character {
  UINT16 character_id;
  DELTA {
    FIXED_POINT(0.01) x;
    FIXED_POINT(0.01) y;
    FIXED_POINT(0.01) z;
    FIXED_POINT(0.01) vx;
    FIXED_POINT(0.01) vy;
    FIXED_POINT(0.01) vz;
  }
};
//MYENCODINGA is a stream-based encoding
//  and simply serialized all the fields 
//  in the specified order

Let’s assume that we want to extend our UINT16 character_id field into UINT32, and to add another field UINT32 some_data. Then, after making appropriate changes to the IDL, our extended-but-backward-compatible Encoding may look as follows:

ENCODING(MYENCODINGA) PUBLISHABLE_STRUCT Character {
  UINT16 character_id;
  DELTA {
    FIXED_POINT(0.01) x;
    FIXED_POINT(0.01) y;
    FIXED_POINT(0.01) z;
    FIXED_POINT(0.01) vx;
    FIXED_POINT(0.01) vy;
    FIXED_POINT(0.01) vz;
  }
  //Up to this point, the stream is exactly the same
  //  as for "old" encoding
  
  FENCE

  EXTEND character_id TO UINT32;
    //at this point in the stream, there will be additional 2 bytes placed
    // with high-bytes of character_id
    // if after-FENCE portion is not present – character_id
    // will use only lower-bytes from pre-FENCE portion

  UINT32 some_data DEFAULT=23;
    // if the marshalled data doesn't have after-FENCE portion,
    // application code will get 23
};

As we can see, for the two most common changes of the protocols, making a compatible IDL is simple; moreover, making an IDL compiler to compare these two IDLs to figure out that they’re backward-compatible – is trivial. Formally, IDL B qualifies as a backward-compatible version of IDL A, if all of the following stands:

IDL B starts with full IDL A
after IDL A, in IDL B there is a FENCE declaration
after the FENCE declaration, all the declarations are either EXTEND declarations, or new declarations with specified DEFAULT.

On Google Protocol Buffers

Google Protocol Buffers is one IDL which has recently got a lot of popularity. It is binary, it is extensible, and it is reasonably efficient (or at least not unreasonably inefficient). Overall, it is one of the best choices for a usual business app.

“Therefore, while I agree that Google Protocol Buffers are good enough for most of the business apps out there, I insist that for games you usually need something better. MUCH better.However, when it comes to games, I still strongly prefer my own IDL with my own IDL compiler. The main reason for it is that in Google Protocol Buffers there is only one encoding, and the one which is not exactly optimized for games. Delta compression is not supported, there are no acceptable ranges for values, no rounding, no dead reckoning, and no bit-oriented encodings. Which means that if you use Google Protocol Buffers to marshal your in-app data structures directly, then compared to your own optimized IDL, it will cost you in terms of traffic, and cost a lot.

Alternatively, you may implement yourself most of the compression goodies mentioned above, and then to use Google Protocol Buffers to transfer this compressed data, but it will clutter your application-level code with this compression stuff, and still won’t be able to match traffic-wise some of the encodings possible with your own IDL (in particular, bit-oriented streams and Huffman coding will be still out of question⁴).

Therefore, while I agree that Google Protocol Buffers are good enough for most of the business apps out there, I insist that for games you usually need something better. MUCH better.

⁴ that is, unless you’re using Google Protocol Buffers just to transfer pre-formatted bytes, which usually doesn’t make much sense

[[To Be Continued…

This concludes beta Chapter 3(d) from the upcoming book “Development and Deployment of Massively Multiplayer Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 7, “Engine-Centric Architecture: Unity 5, Unreal Engine 4, and Photon Server from MMO point of view”]]

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Pages: 1 2

ITHare (11)
Reddit (13)

Comments

Wanderer says

February 18, 2016 at 1:42 pm

As always, very nice and informative text! Thanks and keep up the good work! Much appreciated.

I’d like to ask about one thing. In this chapter, you are discussing IDL in a bit “detached” manner from the rest of the code.

As Mark Seemann perfectly pointed out in his blog, “Applications are Not Object-Oriented at the Boundaries”. And transport layer is one of those boundaries (just like persistence layer). That is why all IDL-related code usually operates on what’s called “bags of properties” and that’s perfectly fine.

At the same time, since the rest of the code usually follows OOP principles (unless you are John Carmack in the pre-id4 age), i.e. your actual “game objects” are not a bunch of fields and POD public structs. Instead, these are nice encapsulated objects keeping their invariants and “all that jazz”.

Therefore, my understanding of IDL and mapping was also about this “leap” between actual classes used in game logic and transport objects. Because it’s not really fun to have all this auto-generated code for marshaling if you still have to write all mappings between your “model::Character” and “dto::PublicCharacterData”.

[C++ specific part]

Also, since most of the folks over Internet came to an agreement that having “persistence-ignorant” (and in MMO case, “marshaling-ignorant”) domain model is a “good thing”, the whole thing becomes a bit of a task by itself.

For example, the only viable way I found to do it in C++ is to do something very similar to what CodeSynthesis’ ODB does. I.e. you can’t make it completely “ignorant”, but you can get away with a single injection of friend definition and do the rest with some IDL/auto-generated code (ODB folks just use your source code itself as an IDL to generate all autogen stuff).

[/C++ specific part]

One more addition, out of pure personal taste. I prefer to have de-serialization as a member function, but serialization as a free function without any dependency to the DTO struct – i.e. accepting a bunch of fields and putting them into binary form. IMHO that helps to make this model-to-dto “leap” a bit smaller with the ability to throw fields of your “domain” class directly into serializer without the need to create temporary DTO object. If course, it doesn’t work in all cases (i.e. you do need to keep previous DTO for dead reckoning and other optimizations), but it’s helpful in some cases. Especially in all RPC-like code.

Reply
- "No Bugs" Hare says
  
  February 19, 2016 at 9:43 am
  
  Yes, ability to map IDL to existing classes is one necessary thing. Actually, we’re planning to include this feature into our own IDL compiler :-).
  
  However, I don’t see much problems on this way (that is, until you’re trying to serialize non-owning pointers, and if you can parse target language to get struct definitions). I’ve added a section on it titled “Mapping to Existing Classes”, THANK YOU!
  
  In short – it pretty much follows what you wrote, with free-standing friend functions (I didn’t know of ODB specifics, but we came to the same conclusion anyway). Dealing with custom collections is a bit more tricky, but that I’ve already described with class MyInventory.
  
  Reply
  - Wanderer says
    
    February 19, 2016 at 12:24 pm
    
    On “non-owning pointers”, as far as I see, it’s a common problem for everyone doing any kind of serialization. Cause moving objects to persistent DB can actually be considered as just a specific case of serialization. Whether it’s binary over TCP, JSON in NoSQL, or an Update SQL statement, doesn’t actually matter – it’s a transformation of in-memory “object” entity into some serialized “data” form.
    
    And, from what I see in the papers by Martin Fowler and other founders of DDD approach, concepts like “Aggregates”, “Aggregate Roots”, “Repositories” etc, their main purpose is exactly this – to define the boundary of what is considered “serializable non-dividable item”.
    
    Basically, my current approach is to define some kind of mapping between “references” (i.e. non-owning pointers) and IDs, of course, assuming that I can only hold “references” to an object with identity. And for objects without identity (i.e. http://martinfowler.com/bliki/ValueObject.html), the procedure is to serialize the object as a part of containing one.
    
    In other words, this mimic the approach of C++ itself in value-member/pointer-member mechanics, which is pretty much natural (but, of course, using IDs instead of raw memory pointers 🙂 ).
    
    As of ODB, I actually came to the same conclusion before seeing it there, because there is simply no other way to hide internal details, but deliberately expose them to some specific “marshaling mechanism”. The only other way is to put the implementation of serialization into objects, but this triggers my SRP alarm and all other SOLID/tiers/layers/whatever alarms I have in my head 🙂
    
    Reply
    - "No Bugs" Hare says
      
      February 20, 2016 at 7:58 am
      
      > On “non-owning pointers”, as far as I see, it’s a common problem for everyone doing any kind of serialization.
      
      Of course; I remember discussions on these going on at least 20 years ago. OTOH, I didn’t really see realistic cases where *marshalling* of such things is really necessary (serialization being a different matter though). So for the time being I am usually successful in ignoring this problem altogether :-).
      
      > Basically, my current approach is to define some kind of mapping between “references” (i.e. non-owning pointers) and IDs, of course, assuming that I can only hold “references” to an object with identity. And for objects without identity (i.e. http://martinfowler.com/bliki/ValueObject.html), the procedure is to serialize the object as a part of containing one.
      
      Yes, this is probably the best thing to do. To implement it, I would try to experiment with using some kind of identified_pointer{T} which would hold BOTH object id AND pointer, but pointer essentially being a cache of the id. This way, serialization becomes a breeze, PLUS you’ll be able to move your objects around the heap, removing fragmentation (!), just at the cost of dropping these cached pointers (thread sync will be a headache, but is doable at least as long as you’re fine with controlled “stop the world”). Did you think about doing it this way?
      
      > As of ODB, I actually came to the same conclusion before seeing it there, because there is simply no other way to hide internal details, but deliberately expose them to some specific “marshaling mechanism”.
      
      Exactly 🙂
      
      Reply
  - Wanderer says
    
    February 19, 2016 at 12:46 pm
    
    The only drawback of such “non-owning-pointer to ID” automated transformation is the fact that receiving side will have to perform “ID to non-owning-pointer-on-instance” deserialization. So, if receiving side already has that ID in specific “Identity Map”, we are cool. But when it doesn’t, it basically triggers another RPC.
    
    That works perfectly fine when there are just one or two references. But it might fail miserably when there is an array with N references inside. Think of MyInventory item which is { “item_type”:123, “quantity”:1 } “small object” (ValueObject) with non-owning reference inside. Of course, this case is pretty much trivial, since one can simply store the map of all item_types in-memory on startup, but there are cases when such approach is not possible.
    
    I’m currently mocking up some kind of meta-tag (as a part of IDL) that can designate strategy for the reference – eager/lazy loading. Not sure how far that can get me to, but preliminary results look promising.
    
    Reply
    - "No Bugs" Hare says
      
      February 20, 2016 at 8:10 am
      
      > The only drawback of such “non-owning-pointer to ID” automated transformation is the fact that receiving side will have to perform “ID to non-owning-pointer-on-instance” deserialization. So, if receiving side already has that ID in specific “Identity Map”, we are cool. But when it doesn’t, it basically triggers another RPC.
      
      It is not *that* bad actually. First of all, if you do know what the other side has, you can include the whole thing into original call. And second, even if you don’t know it, you can request all the stuff you’re missing, in one single RPC call, bringing the whole conversation down to not-more-than-two round-trips.
      
      Reply
Wanderer says

July 15, 2016 at 9:49 am

Had to implement a simple IDL on a Windows/MSVC platform just recently, and got a couple of notes that might be helpful for Windows developers:

1. It’s not easy to get original flex/bison source code distribution to work in a Windows environment, even with cygwin/gnuwin.

For those who’d like to save their time, “Win flex-bison” can be used – https://sourceforge.net/projects/winflexbison/ . Also comes with a “custom build step” rules to incorporate lex/yacc files compilation as a Build step. There are some limitations and peculiarities:

1.1. Need to add generated lexer/parser source code manually to the project. It is required only once and, with “custom build rules”, VS will rebuild them from .y and .l on each recompile and pull updated version during compilation;

1.2. I wasn’t able to compile generated scanner/parser without warnings for x64 (initially, it was producing a lot of errors, but those can be reduced to merely “warnings”). For “Win32” target the generated scanner/parser can be compiled without warnings at all (a few manual adjustments are required in m4 prototypes). Also, one will have to define C99 using /D __STDC_VERSION__ = 199901 to avoid int types macro redefinition. Yes, this is a cheat, but it does the trick;

1.3. Versions of “Win flex-bison” differ from actual GNU flex/bison. Latest win-version v2.5.5 is actually flex of v2.5.37 and bison v3.0;

2. When AST is ready, I found it handy to generate source code using CTemplate (former Google Template) – https://github.com/OlafvdSpek/ctemplate

Again, some efforts will be needed to build it within VS. Actually, repository maintainer provides ready-to-use archive for VS2015 users in the comments in Issues section of github. The archive is here – http://xwis.net/ctemplate/ctemplate-2.3.zip

Plain Github distribution of CTemplate uses some generated code during make/install which requires m4 and Python as far as I see (usually not available on a average MSVS workstation);

Just want to drop it here as a comment in case some VS developer would like to setup lex/yacc/CTemplate environment after reading your article 🙂

PS: To sum up, it might actually be easier to setup VM with ubuntu or something and do IDL-related stuff here (since one should already have some *nix VM to do cross-platform checks). Still, it’s perfectly possible to run flex-bison-ctemplate tool-chain within Windows environment.

Reply
- Wanderer says
  
  July 15, 2016 at 10:47 am
  
  Re-reading my own comment, it seems that I didn’t express my tool-chain well enough. The basic idea is to have this pipe:
  
  IDL definitions -> (lex/yacc) -> AST -> (walk the tree) -> in-memory structures with definitions -> (CTemplate) -> generated .h/.cpp
  
  1. Although, since simple IDL is rarely needing actual “walk through AST”, one can create in-memory definitions right from the yacc(bison) by simply defining necessary yacc actions every time a new struct or field is parsed.
  
  2. This flow can generate single .h and single .cpp, running ctemplate twice with the same dictionaries. And I was able to get away with a single huge piece of source code with all stubs in my case. However, I think if one would like to have “cleaner” code and break down generated source into separate files, the following trick can be applied (haven’t tried it myself, but it should work).
  
  Add “file markers” in the single ctemplate output stream, like “#newfile filename.h” (or filename.cpp). When ctemplate output is ready, run one more “parsing” step, find all #newfile “pragmas” and drop all code after #newfile into specified destination. In that case, however, one would really need to make sure that each separate file contains proper includes to other parts of generated code. It shouldn’t be hard, because all information about dependencies is already within in-memory definition structures.
  
  To sum up, my “ideal” processing pipe would look like that:
  
  * prerequisites – idl.l (lexer), idl.y (syntax), enum_h.tpl, struct_h.tpl, struct_cpp.tpl, struct_field.tpl … etc (ctemplates with stubs for each piece of generated code)
  
  IDL definitions -> (lex/yacc) -> metadata describing structures -> (CTemplate) -> amalgam of generated code (simple line-by-line parsing) -> bunch of .h and .cpp with CMakeLists.txt (or makefile, or even something like vcproj)
  
  Reply
- "No Bugs" Hare says
  
  July 15, 2016 at 11:40 am
  
  Strange, as I didn’t really had much problems with setting it up. IIRC, last time I was using GNUWin32’s http://gnuwin32.sourceforge.net/packages/bison.htm (and also flex from them) – and the only problem I was facing, was an extra *nix header in generated code (which can be easily commented out via GnuWin32 sed or something). That was pretty much it; then I’ve made good ol’ batch file to compile the whole thing from .y / .l into .c (avoiding any issues “how to integrate it into VS”). After running .bat and having those .c files – I simply compiled them in VS (no integration needed – they’re just .c files after all). That was pretty much it – and I don’t have any info that this (IMO very straightforward) way doesn’t work now…
  
  Reply
Matthew Horsley says

January 18, 2017 at 3:13 pm

Google’s flatc compiler has a -conform flag to check if a new schema is a valid evolution of an old one.

–conform FILE : Specify a schema the following schemas should be an evolution of. Gives errors if not. Useful to check if schema modifications don’t break schema evolution rules.

Reply
- "No Bugs" Hare says
  
  January 19, 2017 at 7:42 am
  
  THANKS, added a sentence about it.
  
  Reply