IDL: Encodings, Mappings, and Backward Compatibility

 
Author:  Follow: TwitterFacebook
Job Title:Sarcastic Architect
Hobbies:Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek
 
 

Pages: 1 2

[rabbit_ddmog vol=”1″ chap=”Chapter 3(d) from “beta” Volume I”]

As we’ve discussed those high-level protocols we need, I mentioned Interface Definition Language (IDL) quite a few times. Now it is time to take a closer look at it.

Motivation for having IDL is simple. While manual marshalling is possible, it is a damn error-prone (you need to keep it in sync at least at two different places – to marshal and to unmarshal), not to mention too inconvenient and too limiting for further optimizations. In fact, the benefits of IDL for communication were realized at least 30 years ago, which has lead to development of ASN.1 in 1984 (and in 1993 – to DCE RPC).

signing IDL contract

Hare pointing out:However, for most game and game-like communications I still prefer to have my own IDL.These days in game engines, quite often a (kinda) IDL is a part of the language/engine itself; examples include [RPC]/[Command]/[SyncVar] tags in Unity 5, or UFUNCTION(Server)/UFUNCTION(Client) declarations in Unreal Engine 4. However, for most game and game-like communications I still prefer to have my own IDL. The reason for it is two-fold: first, standalone IDL is inherently better suited for cross-language use, and second, none of in-language IDLs I know are flexible enough to provide reasonably efficient compression for games; in particular, per-field Encodings specifications described below are not possible1


1 and even if Encodings (along the lines described below) are implemented as a part of your programming language, they would make it way too cumbersome to read and maintain

 

IDL Development Flow

With a standalone IDL (i.e. IDL which is not a part of your programming language), development flow (almost?) universally goes as follows:

  • you write your interface specification in your IDL

    • IDL does NOT contain any implementation, just function/structure declarations

  • you compile this IDL (using IDL compiler) into stub functions/structures in your programming language (or languages)

  • for callee – you implement callee-side stub functions in your programming language

  • for caller – you call the caller-side stub functions (again in your programming language). Note that programming language for the caller may differ from the programming language for callee

One important rule to remember when using IDLs is that

Never Ever make manual modifications to the code generated by IDL compiler.

Hare thumb down:Modifying generated code usually qualifies as a Really Bad IdeaModifying generated code will prevent you from modifying the IDL itself (ouch), and usually qualifies as a Really Bad Idea. If you feel such a need to modify your generated code, it means one of two things. Either your IDL declarations are not as you want them (then you should modify your IDL and re-compile it), or your IDL compiler doesn’t do what you want (then you need to modify your IDL compiler).

Developing your own IDL compiler

Usually I prefer to develop my own IDL compiler. From my experience, costs of such development (which are of the order of several man-weeks provided that you’re not trying to be overly generic) are more than covered with additional flexibility (and ability to change things when you need ) it brings to the project.

With your own IDL compiler:

  • whenever you feel the need to change marshalling to a more efficient one (without any changes to the caller/callee code) – no problem, you can do it
  • whenever you need to introduce an IDL attribute to say that this specific parameter (or struct member) should be compressed in a different manner2 (again, without any changes to the code) – no problem, you can add it
  • whenever you want to add support for another programming language – no problem, you can do it
  • you can easily have ways to specify the technique to extend interfaces (so that extended interfaces stay 100% backwards-compatible with existing calls/callees), and to have you IDL compiler check whether your two versions of the IDL guarantee that the extended interface is 100% backwards-compatible. While techniques to keep backward compatibility are known for some of the IDLs out there (in particular, for ASN.1 and for Google Protocol Buffers), the feature of comparing two versions of IDL for compatibility, is missing from almost all the IDL compilers I know; one exception is Google flatc which seems to provide this functionality via recently added “-conform” flag. [[IF YOU KNOW ANOTHER IDL COMPILER WHICH HAS AN OPTION TO COMPARE TWO VERSIONS OF IDL FOR BACKWARD COMPATIBILITY – PLEASE LET ME KNOW]]

Now to the queston “how to write your own IDL compiler”. Very briefly, the most obvious and straightforward way is the following:

  • write down declarations you need (for example, as a BNF). To start with your IDL, you usually need only two things:
    • declaring structures
    • declaring RPCs
    • in the future, you will probably want more than that (collections being the most obvious example); on the other hand, you’ll easily see it when it comes 🙂
  • then, you can re-write your BNF into YACC syntax
  • AST In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language.— Wikipedia —then, you should be able to write the code to generate Abstract Syntax Tree (AST) within YACC/Lex (see the discussion on YACC/Lex in Chapter VI).
  • As soon as you have your AST, you can easily generate whatever-stubs-you-want.

2 see section “Publishable State: Delivery, Updates, Interest Management, and Compression” above for discussion of different compression types

 

IDL + Encoding + Mapping

Now, let’s take a look at the features which we want our IDL to have. First of all, we want our IDL to specify protocol that goes over the network. Second, we want to have our IDL compiler to generate code in our programming language, so we can use those generated functions and structures in our code, with marshalling for them already generated.

When looking at existing IDLs, we’ll see that there is usually one single IDL which defines both these things. However, for a complicated distributed system such as an MMO, I suggest to have it separated into three separate files to have a clean separation of concerns, which simplifies things in the long run.

The first file is the IDL itself. This is the only file which is strictly required. Other two files (Encoding and Mapping) should be optional on per-struct-or-function basis, with IDL compiler using reasonable defaults if they’re not specified. The idea here is to specify only IDL to start working, but to have an ability to specify better-than-default encodings and mappings when/if they become necessary. We’ll see an example of it a bit later.

ASN.1 Abstract Syntax Notation One (ASN.1) is a standard and notation that describes rules and structures for representing, encoding, transmitting, and decoding data in telecommunications and computer networking.— Wikipedia —The second file (“Encoding”) is a set of additional declarations for the IDL, which allows to define Encodings (and IDL+Encodings effectively define over-the-wire protocol). In some sense, IDL itself is similar to ASN.1 language as such, and IDL encodings are similar to ASN.1 “Encoding Rules”. IDL defines what we’re going to communicate, and Encodings define how we’re going to communicate this data. On the other hand, unlike ASN.1 “Encoding Rules”, our Encodings are more flexible and allow to specify per-field encoding if necessary.

Among other things, having Encoding separate from IDL allows to have different encodings for the same IDL; this may be handy when, for example, the same structure is sent both to the client and between the servers (as optimal encodings may differ for Server-to-Client and Server-to-Server communications; the former is usually all about bandwidth, but for the latter CPU costs may play more significant role, as intra-datacenter bandwidth usually comes for free until you’re overloading the Ethernet port, which is not that easy these days).

The third file (“Mapping”) is another set of additional declarations, which define what kind of code we want to generate to use for our programming language. The thing here is that the same IDL data can be “mapped” into different data types; moreover, there is no one single “best mapping”, so it all depends on your needs at the point where you’re going to use it (we’ll see examples of it below). Changing “Mapping” does NOT change the protocol, so it can be safely changed without affecting anybody else.

In the extreme case, “Mapping” file can be a file in your target programming language.

Example: IDL

While all that theoretical discussion about IDL, Encodings, and Mappings is interesting, let’s bring it a bit down to earth.

Let’s consider a rather simple IDL example. Note that this is just an example structure in the very example IDL; syntax of your IDL may vary very significantly (and in fact, as argued in “Developing your own IDL compiler” section above, you generally SHOULD develop your own IDL compiler – that is, at least until somebody makes an effort and does a good job in this regard for you):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
PUBLISHABLE_STRUCT Character {
  UINT16 character_id;
  NUMERIC[-10000,10000] x;//for our example IDL compiler, notation [a,b] means
                          //  “from a to b inclusive”
                          //our Game World has size of 20000x20000m
  NUMERIC[-10000,10000] y;
  NUMERIC[-100.,100.] z;//Z coordinate is just +- 100m
  NUMERIC[-10.,10.] vx;
  NUMERIC[-10.,10.] vy;
  NUMERIC[-10.,10.] vz;
  NUMERIC[0,360) angle;//where our Character is facing
                       //notation [a,b) means “from a inclusive to b exclusive”
  enum Animation {Standing=0,Walking=1, Running=2} anim;
  INT[0,120) animation_frame;//120 is 2 seconds of animation at 60fps
   
  SEQUENCE<Item> inventory;//Item is another PUBLISHABLE_STRUCT
                           // defined elsewhere
};

This IDL declares what we’re going to communicate – a structure with current state of our Character.3


3 yes, I remember that I’ve advised to separate inventory from frequently-updated data in “Publishable State” section, but for the purposes of this example, let’s keep them together

 

Example: Mapping

Now let’s see how we want to map our IDL to our programming language. Let’s note that mappings of the same IDL MAY differ for different communication parties (such as Client and Server). For example, mapping for our data above MAY look as follows for the Client:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
MAPPING(“CPP”,“Client”) PUBLISHABLE_STRUCT Character {
  UINT16 character_id;//can be omitted, as default mapping
                      //  for UINT16 is UINT16
  double x;//all 'double' declarations can be omitted too
  double y;
  double z;
  double vx;
  double vy;
  double vz;
  float angle;//this is the only Encoding specification in this fragment
              //  which makes any difference compared to defaults
              // if we want angle to be double, we can omit it too
  enum Animation {Standing=0,Walking=1, Running=2} anim;
              //can be omitted too
  UINT8 animation_frame;//can be omitted, as
                        //  UINT8 is a default mapping for INT[0,120)
 
  vector<Item> inventory;//can be also omitted,
                         //  as default mapping for SEQUENCE<Item>
                         //  is vector<Item>
};

In this case, IDL-generated C++ struct may look as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct Character {
  UINT16 character_id;
  double x;
  double y;
  double z;
  double vx;
  double vy;
  double vz;
  float angle;
  enum Animation {Standing=0,Walking=1, Running=2} anim;
  UINT8 animation_frame;
  vector<Item> inventory;
 
  void idl_serialize(int serialization_type,OurOutStream& os);
    //implementation is generated separately
  void idl_deserialize(int serialization_type,OurInStream& is);
    //implementation is generated separately
};

On the other hand, for our Server, we might want to have inventory implemented as a special class Inventory, optimized for fast handling of specific server-side use cases. In this case, we MAY want to define our Server Mapping as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
MAPPING(“CPP”,“Server”) PUBLISHABLE_STRUCT Character {
  // here we're omitting all the default mappings
  float angle;
  class MyInventory inventory;
    //class MyInventory will be used as a type for generated
    //  Character.inventory
    //To enable serialization/deserialization,
    //  MyInventory MUST implement the following member functions:
    // size_t idl_serialize_collection_get_size(),
    // const Item& idl_serialize_collection_get_item(size_t idx),
    // void idl_deserialize_collection_reserve_size(size_t),
    // void idl_deserialize_collection_add_item(const Item&)
};

As we see, even when we’re using the same programming language for both Client-Side and Server-Side, we MAY need different Mappings for different sides; in case of different programming languages such situations will become more frequent. One classical (though rarely occurring in practice) example is that SEQUENCE<Item> can be mapped either to vector<Item> or to list<Item>, depending on the specifics of your code; as specifics can be different on the different sides of communication – you may need to specify Mapping.

Also, as we can see, there is another case for non-default Mappings, which is related to making IDL-generated code to use custom classes (in our example – MyInventory) for generated structs (which generally helps to make our generated struct Character more easily usable).

Mapping to Existing Classes

One thing which is commonly missing from existing IDL compilers is an ability to “map” an IDL into existing classes. This can be handled in the following way:

  • you do have your IDL and your IDL compiler
  • you make your IDL compiler parse your class definition in your target language (this is going to be the most difficult part)
  • you do specify a correspondence between IDL fields and class fields
  • your IDL generates serialization/deserialization functions for your class
    • generally, such functions won’t be class members, but rather will be free-standing serialization functions (within their own class if necessary), taking class as a parameter
    • in languages such as C++, you’ll need to specify these serialization/deserialization functions as friends of the class (or to provide equivalent macro)
Assertive hare:I want YOU to read page 2!

 

 

Continued on Page 2... Further topics include IDL Encodings (including Delta Compression, rounding, etc.) and IDL Backward Compatibility
Join our mailing list: