UDP from MOG Perspective - Page 2 of 2

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

Point-to-Point Communications over UDP

To implement reliable point-to-point communications over UDP, we’ll need two very basic components:

Acknowledgement packet
Retransmit on timeout

“However, if we take a tiny bit deeper look at it, we’ll see three rather different models with regards to flow control.All the different implementations of reliable UDP use these two things one way or another. However, if we take a tiny bit deeper look at it, we’ll see three rather different models with regards to flow control.

Model 1. No Flow Control

The most obvious approach when it comes to reliable UDP, is the following:

Send the packet
Wait for acknowledgement (ideally – measure RTT with the same client and wait for like 2*typical_RTT)
If there is no acknowledgement after the wait timeout expires – re-send the packet
Rinse and repeat

Even though this approach is very simple, it does work; however, it SHOULD be restricted to one-off relatively-small commands. If we’ll try to use such a simple model for long transfers (such as file transfer) – we’ll overload target Client with packets (and in extreme cases can even overload our own Server too).

Model 2. Partial Flow Control: Limiting In-Transit Data

To ensure that we’re not trying to send that 100-megabyte-file all-at-once, the best way is to keep an eye for acknowledgements coming from the other side, to calculate number of “bytes in transit” (collective size of those packets which are not acknowledged yet), and to ensure that at any given moment there is no more than certain number of bytes “in transit”. These “bytes in transit” are quite easy to calculate on the sending side (assuming that receiving side faithfully acks at least some of the packets sent); and as soon as current number of “bytes in transit” is exceeded – sending side stops sending further packets until it gets an ack from the receiving side.¹

In particular, such partial flow control seems to be the one used by [ENet] library.

¹ of course, retransmit timeouts still apply, so retransmit of the old packet MAY happen even when the number of “bytes in transit” is exceeded. Also – retransmits SHOULD NOT count against “bytes in transit” limit

Model 3. End-to-End Flow Control: Advertised Receive Window

Model 2 described above, will work reasonably well, but only as long as your clients and servers are NOT overloaded. In case if network is fast, but receiving side is slow, partial flow control as in Model 2, MAY cause too much data to be transferred – and as a result, some packets MAY be lost not because Internet has lost them, but because there wasn’t enough space on the receiving side to store these packets before processing them.

“To deal with it, protocols such as TCP use end-to-end flow control with receiving side advertising how much space it has in its receiving buffersTo deal with it, protocols such as TCP use end-to-end flow control with receiving side advertising how much space it has in its receiving buffers (for TCP it is known as “Advertised Receive Window”). Then, we only need to send this “remaining space” in each ACK packet coming from the receiving side, and then use this “remaining space” as a threshold for stopping sending on the “sending side”.²

In practice, however, if we can guarantee the size of this buffer on the receiving side to be not-less-than-X, we can safely skip “advertising” available window – that is, as long as we never allow to have more-than-X “in-transit bytes”. While this approach (essentially Model 2 plus an understanding of receiving side limitations) won’t work for TCP (which needs to operate for an extremely wide range of environments), it MAY (and often DOES) fly for games.

² in practice, for modern TCP implementations it is more complicated than that, with a so-called “congestion window” coming into play too, but for our purposes it doesn’t matter too much

Bottom line about Flow Control

If implementing (or choosing) reliable UDP, you DO need to take into account flow control. Simplistic Model 1 without flow control won’t work well for anything but singular packets. On the other hand, Model 2 (“partial flow control”), while still imperfect, is known to work reasonably well for games (taking into account some limitations described above). Model 3 is clearly the best one here, though in practice it is not used for games too often (most likely, because of additional complexity, though I don’t see this complexity as significant enough).

Retransmission Policies

Another question which arises when implementing Reliable UDP, is “how often to retransmit those lost packets”. Original TCP uses something like “2*RTT” for the first retransmit, and then it starts to double retransmit timeout for each subsequent retransmit. This is known as “exponential backoff”, and is intended to prevent hosts from causing Internet-wide (or at least ISP-wide) network congestions. Discussion whether this exponential backoff is really necessary to avoid network congestion, is far beyond the scope of this book (though there is a research out there which says that it is not [MondalEtAl]). On the other hand, what is perfectly clear is that such exponential backoff does hurt interactivity (and does it badly too).

“as described above in “Fast-paced Updates” section, most of the Really time-critical stuff can (and SHOULD) be transferred not over guaranteed-delivery channels, but rather as “guaranteed-synchronization” channels, making a question of “how often to retransmit” much less critical.On the third hand 😉 , as described above in “Fast-paced Updates” section, most of the Really time-critical stuff can (and SHOULD) be transferred not over guaranteed-delivery channels, but rather as “guaranteed-synchronization” channels, making a question of “how often to retransmit” much less critical.

On the fourth hand (yes, I am half-way towards the octopus now), if you still DO want to play with retransmission times, my personal suggestion would look along the following lines (yes, I know that I will be hit badly by Internet congestion zealots, but well – I don’t care that much about them).

From what I’ve seen, it should be ok to drop exponential backoff, but only for a limited time, and with per-server congestion control. The idea here is to find a reasonable compromise between being Internet-friendly and solving the problem we have at hand. In this regard, I suggest that:

For game, “reliable” connections have two different modes of operation: “time-critical” and “connection-seeking”
I tend to say that in the “time-critical” mode, it is ok to send retransmits more aggressively than exponential backoff would dictate (for example, re-sending at constant intervals instead of exponential back-off)
- However, time length this “time-critical” mode SHOULD be very limited by design. In particular, it usually doesn’t make any sense to keep re-sending stuff at small intervals for more than a minute or so (MUCH less for faster-paced games). If the player wasn’t able to communicate for over a minute – chances that additional exponential delay would make any significant difference for him, become extremely slim.
- As soon as the length of this “time-critical” mode is over, we are actually giving up all hope to recover connection for the current game event (such as combat or whatever-else) – and are actually switching to “connection-seeking” mode
In the “connection-seeking” mode, I suggest to follow exponential-backoff; it won’t do much difference for interactivity (we’ve already given up all the hope to reconnect within current game event), but is significantly more Internet-friendly
In addition, I suggest to keep an eye on number-of-packets-lost (per second) server-wide (actually, any large number of connections will do), and to back off retransmits if you see a sudden increase in number of lost packets over all the channels combined (which indicates congestion either on your server, or with your ISP, so it is better to back off)

UDP Connections and KeepAlives: Simplistic Protocol

As we all know, UDP as such has nothing to do with connections – UDP is just a datagram to be delivered to certain address. However, to have some kind of reliable delivery, it IS necessary to have a concept of “connection”. In particular, all the popular “reliable UDP” libraries I know, are doing it. And as soon as we have a concept of “connect”, we need to find out about “disconnect” too (which, for game purposes, is usually done via keep-alive and disconnect-on-timeout, as described below).

Surprisingly, it is not that difficult; one specific schema which tends to work reasonably well:

you need some kind of handshake (resulting in some kind of connection identifier). Connection handshakes are tricky because of DDoS attacks, so some kind of “SYN cookie” (or equivalent) is necessary to handle it. For example:
- we have a secret key (let’s name it “CookieKey”) on the Server side; this key is not known to anybody besides our server, and SHOULD be regenerated on each Server restart.
- on the first packet (connection request) coming from the Client, we generate “cookie”, which is a tuple of (Client-IP,Server-time,at-least-64-bits-of-crypto-random-data,may-be-something-else), with an additional MAC (such as CMAC or HMAC, see more on MACs on DDoS in section [[TODO]] below); MAC should be generated with a “CookieKey”, effectively authenticating this whole tuple with a “CookieKey”.
- we send this “cookie” back to the Client-IP, and do NOT store anything on the Server side (so there is no potential for resource exhaustion due to the attack).
- Client simply repeats its connection request, but with cookie (the one received from the Server) in it this time
- on receiving connection-request-with-cookie, Server:
  - validates that Client-IP in the cookie is the same as the source of the packet
  - validates that MAC indeed authenticates the tuple in it
  - validates that Server-time is within reasonable limits compared to current Server time
  - proceeds with connection
- such “cookies” are aimed to prevent those DDoS attacks which are substantially similar to
- this whole thing is very similar to the cookies used in [SCTP] and [DTLS] protocols, and are actually a tad better than TCP’s SYN cookies designed for the same purpose
- When using an encryption protocol such as DTLS (which you SHOULD, see [[TODO]] section below), it will implement similar “cookies” itself, so in such a case you won’t need to do cookies yourself (but you still need to implement protection from crypto-level DDoS, see [[TODO]] section below for details)
you need to pass this connection identifier as a part of the UDP datargrams belonging to this connection
you need to drop the connection on timeout (on both sides)
you need to have some kind of Keep-Alive packets if nothing happens, to make sure that the connection is not dropped on timeout when it is actually still alive.

“Note that while this schema is MUCH simpler than TCP, it does workNote that while this schema is MUCH simpler than TCP (in particular, it has no 4-way termination handshake), it does work (in fact, it looks pretty much as a very simplistic Bluetooth-LowEnergy protocol, which tends to work very well even with HUGE packet loss).

[[TODO!: sudden IP change; refer to TCP too]]

UDP Connections: QUIC

Recently, a new reliable-stream-over-UDP protocol emerged: [QUIC]. I didn’t try it myself, but from what I see from their design docs, I tend to like it A LOT (that is, if you don’t need to code it yourself). QUIC has several nice goodies compared to both TCP and homegrown solutions (such as Simplistic Protocol described above).

Compared to TCP, QUIC explicitly tries to reduce latencies (while I don’t have data on the QUIC latencies in gaming context, as we’re speaking about “slow-paced updates”, it is not that important here, and any improvement over TCP counts). In addition, it integrates security into transport layer, which allows to prevent some of the attacks. In addition, QUIC behaves significantly better that TCP in case of IP change (which is very common for mobile networks).

Compared to homegrown simplistic protocol described above, QUIC has quite a few built-in goodies, including built-in packet pacing (which reduces packet loss [QUIC]), better connection avoidance, and forward error correction (which reduces delays in certain contexts).

“Overall, IMHO (no warranties of any kind) QUIC looks very promising for these slow-but-large-updatesOverall, IMHO (no warranties of any kind) QUIC looks very promising for these slow-but-large-updates. If you’re at the early stages of your development, I would certainly suggest to try QUIC (more specifically, libquic library), and to see how it works for you. It is more complicated than the rest of the Reliable UDP libraries out there, but it does more too (and its tight integration with crypto is a Very Good Thing in general).

UDP and Firewalls

In general, UDP is less friendly to firewalls than TCP. Or, looking at it from a different angle, there are less UDP-friendly network devices out there, than TCP-friendly ones. In other words, there are quite a few people on the Internet who will be able to connect to your server via TCP, but won’t be able to connect via UDP. [QUIC] estimates number of such people at about 6-9%.

From my experience, most of the time such things happen over not-so-common connections (such as hotel connections, work connections, etc.), so if you’re targeting ONLY players-playing-from-home, it might be not that bad. On the other hand, as soon as you’re adding mobile support, things will become significantly worse in this regard, and it MAY become a yet another argument to provide TCP fallback for those players who currently cannot connect via UDP.

UDP Hole Punching

NAT Network address translation (NAT) is a methodology of remapping one IP address space into another by modifying network address information in Internet Protocol (IP) datagram packet headers while they are in transit across a traffic routing device.— Wikipedia — One very common issue which game developers are asking about with regards to UDP, is so-called UDP hole punching. Such hole punching is absolutely necessary when both your clients are sitting behind so-called NATs (and with NATs being extremely common these days for home users, it means that hole punching is absolutely necessary for peer-to-peer over-the-Internet games). However, as long as we’re restricting ourselves to server-centric architectures (see Chapter II for reasoning behind), we’ll have generally no need to implement punchthrough. This is quite a relief, as in some cases (specifically with some overzealous NAT implementations, as well as “symmetric NAT” implementations) it becomes a Really Big Headache.

As a result of hole punching being unnecessary for server-centric games, I do NOT want to delve into a detailed discussion of “how does hole punching work” (which requires another rather long discussion “what this whole NAT thing is about” as a prerequisite). However, for those of you who are trying to create a P2P game running over-the-Internet, I will still give a few pointers.

[[P2P-specific]]: From what I’ve seen and heard, proper UDP hole punching is a kind of Black Magic spell which has three different levels:

Novice-level spell. Just implement [STUN] protocol, PLUS make sure that there is a keep-alive in your UDP implementation, so that that the punched hole is NOT closed by NAT devices (if it closes, you’ll need to punch it again). From what I’ve heard, “at least one UDP packet every 15 seconds” should be good enough (though it is very much depends on implementations of the NAT devices involved, and YMMV). This Novice-level spell will usually work for most of your players.
TTL Time to live (TTL) or hop limit is a mechanism that limits the lifespan or lifetime of data in a computer or network. TTL may be implemented as a counter or timestamp attached to or embedded in the data.— Wikipedia — Apprentice-level spell. This one is intended to address those overzealous NAT devices which blacklist IPs when a packet-from-IP-unknown-to-NAT-device, arrives. To deal with it, on top of novice-level stuff, you’ll need to send first your packet with a Really Small TTL (such as 2), and then gradually increase TTL for subsequent packets. This technique more-or-less guarantees that there is a packet which reaches your NAT device, but does NOT reach your peer’s NAT device, so that by the moment when your packet reaches her NAT device, her packet has already been there, and there is already a hole in her NAT, so that your packet doesn’t cause your IP to be banned (phew).
Expert-level spell. From hole punching point of view, the most annoying NATs are so-called Symmetric NATs; these beasts tend to change ports between different connections of the same source to different targets, so usual STUN doesn’t work with them. That’s the point where REAL Black Magic begins. In practice, while the ports are different, existing implementations usually simply increment ports, which MIGHT be (ab)used to establish a punch-through connection. More on it in [Takeda] (and supposedly in RakNet sources too).
With this in mind, there MIGHT also exist a Master-level spell (the one which solves All The Punchthrough Problems), but I haven’t encountered it (yet) in my quest for Holy Connectivity.

Let’s note that the whole thing described above sits on top of STUN; in theory, there are also TURN and ICE protocols (leveraging STUN). However, with TURN requiring a relay server (!) – it is rarely an option for P2P games, so the process described above is probably your best bet.

Comparison of well-known Reliable UDP implementations

[[TODO: add libyohimbo]]

[[TODO: analyse support for blocking/nonblocking]]

There are several different “Reliable UDP” libraries out there; from our perspective, all of them are essentially targeted towards those “slow-paced updates” over UDP. Below is a table which I’ve managed to collect (as everything else, take it with a big pinch of salt):

Library	[ENet]	[UDT]	[RakNet]	[libquic]
License	Permissive	Permissive	Permissive³	Permissive
Last Commit ⁴	2 months ago	3 years ago	1 year ago	2 weeks ago
Reliability ⁵	Optional	Optional	Optional	Mandatory
Streams per connection	Single	Single	Single	Multiple
Flow Control	No(?)	Yes	Yes	Yes
Congestion Control	No	Yes	Yes	Yes
Path MTU Discovery (PMTUD)	No	Optional(?)	Optional	No
Integrated Crypto	No	No	No	Yes
Integrated DDoS Protection	No	No	No	Yes⁶
Integrated Punchthrough	No	No	Yes	No

As you can see, each of the libraries has its own advantages and disadvantages, so you’ll need to pick your poison yourself. As noted above, I find QUIC protocol to be very promising, though didn’t try it (or libquic) myself yet.

³ since 2014

⁴ as of Apr’16

⁵ having Reliability “optional” means that you can implement your own UDP-based stuff (such as fast-paced updates) on top of the same library

⁶ well, to certain extent, more on DDoS protection in [[TODO]] section below

[[To Be Continued…

This concludes beta Chapter 13(b) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 13(c), dedicated to encrypting and otherwise protecting UDP in game and game-like environments.]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Pages: 1 2

Comments

majiy says

May 11, 2016 at 6:35 am

This is coming from a web-programmer with little to no practical experience with programming directly with UDP or TCP.

For a game, how would the server conclude if a packet does actually come from the player it is “claiming” to be from?

My best guess would be that after the player has logged in, a session id is generated for him, which is embedded inside every packet as part of its content (which would probably only make sense with encrypted packets, or session hijacking would probably be very easy). Or is there some mechanism embedded directly in UDP or TCP for this purpuse? Or am I missing something obvious here?

Any explanation or link for further reading would be much appreciated.

- "No Bugs" Hare says
  
  May 11, 2016 at 2:05 pm
  
  It is difficult to explain in in these terms, but I’ll try. For TLS/DTLS, instead of “session ID” there is such a thing as “session key”. It is exchanged by a complicated crypto-protocol in the beginning of the communication (in web world, it happens whenever your browser creates TLS over TCP, which in turn happens each time before HTTPS can be used). After “session key” is established, it is used to authenticate all the data coming from the other side of communication, effectively forming a protected channel. Such “session-key”-based channel is a thing which has its integrity guaranteed by crypto (though it doesn’t have “session key” included into each message). Therefore, if client sends her userid/password over such protected channel, starting from that point and as long as connection is alive (more precisely – as long as “session key” is kept by both sides of communication) – server can be sure that whatever-came-with-the-same-session-key still comes from the very same player (and as we’ve already authenticated the player via userid/password – we know who she is).
  
  Hope it helps.
  
majiy says

May 11, 2016 at 2:41 pm

This helps a lot. Thank you for your writings and explanations, looking very much forward to the finished book 🙂