Part IV: Great TCP-vs-UDP Debate of 64 Network DO’s and DONT’s for Game Engines

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

This post continues our multi-part article on network-related programming for game engines. As before, we’re covering pretty much every game out there (stock exchanges included), except for browser-based games (which are quite a different beast).

Previous parts:
Part I. Client Side
Part IIa. Protocols and APIs
Part IIb. Protocols and APIs (continued)
Part IIIa. Server-Side (Store-process-and-Forward Architecture)
Part IIIb: Server Side (deployment, optimizations, and testing)

Little Endian and Big Endian in Lilliput and Blefuscu The differences between Big-Endians (those who broke their eggs at the larger end) and Little-Endians had given rise to 'six rebellions... wherein one Emperor lost his life, and another his crown'.— Wikipedia — In the Part IV of the article, we, as promised, will get to the question that causes a lot of debates in game programming circles. What is better for gaming purposes: TCP or UDP? There are two camps out there, with flame wars between those groups being comparable to Great Big-Endian-vs-Little-Endian Debate. Without any doubt, if asking enough people about it, you’ll get two diametrically opposed answers: from

“don’t bother with TCP, it will cause terrible delays“,

“UDP is no good, it is not reliable“.

The most funny thing is that both statements above may be correct (depending on your requirements), and the answer to “TCP vs UDP“ question hides in the specifics of your app/game. The worst thing happens, when both these statements are true for your specific game, then you’re in real trouble.

Upcoming parts:

Part V. UDP
Part VI. TCP
Part VIIa. Security (TLS/SSL)
Part VIIb. Security (concluded)

27. DO understand the Difference between TCP and UDP

First of all, to make an informed decision, you need to understand what all the buzz is about.

UDP is as close to plain IP as possible, and is all about sending “datagrams” (packets) of limited size, where each packet can be lost. There are no guarantees on delivering every packet, and it is a responsibility of the application layer to track lost packets and to recover from losses. In addition, with UDP it is quite easy to overload receiver, sending it more information than it can process (this is known as “lack of flow control”).

“Why bother with UDP at all? The answer is simple: there is a price tag attached to the TCP goodies.TCP is quite opposite to UDP. It provides streaming (not datagrams), it doesn’t restrict the size of each send(), it does provide “reliable stream”, and it provides “flow control” (so if receiver is lagging behind the sender, sender will know about it and next send() will be blocked). Very roughly and extremely briefly, TCP implements it by keeping send buffer and receive buffer on each side of connection, sending acknowledgements (ACKs) to the other side when data is received, automated retransmitting “behind the scenes” when ACK is not received for a certain time, and (to support “flow control”) sending information and keeping track on how much space there is on the opposite side of the channel. A bit of further details on how TCP implements reliability (ACKs, timeouts and so on), is provided in item #28 below.

One may ask: with all those advantages of TCP (and quite a few others, as described below), why bother with UDP at all? The answer is simple: there is a price tag attached to the TCP goodies, and it is reduced control over delays, so if you need sub-second delays – you’re not likely to get away with TCP, so you’ll need to resort to using UDP and to handle all the associated problems 🙁 .

One important thing to understand:

There is no difference between performance of TCP and UDP when there is no packet loss.

Ok, there might be sub-microsecond-level difference due to extra processing within TCP stack, but for all the practical intents and purposes it is negligible. The real difference comes when you have the packet loss; this is when those TCP retransmission timeouts start to make a difference.

Therefore,

all the specific times mentioned in this article, are not meant as “average-case” latencies, but rather a “worst-case acceptable delays”.

In other words, what you’ll really see delay-wise when using TCP for gaming purposes, will be most likely along the following lines (sorry, I don’t have a graph handy, but I’ve seen many of them):

most of the time latency will be below, say, 100ms (less for a non-cross-atlantic connection)
however, occasionally you’ll observe short “spikes” of varying magnitude. These “spikes” will go quite high, up to several seconds (depending on number of packets dropped). It is these “spikes” which represent the real difference between TCP and UDP, and it is maximum acceptable magnitude of these “spikes” which we’re speaking about below.

28. DON’T use TCP for sub-second real time

So, should you use TCP? Well, in general yes, but there is a caveat:

As a rule of thumb, you should use TCP – unless you need to communicate with typical sub-second times (think of shooter or VoIP).

Compact Cassette is a magnetic tape recording format for audio recording and playback... Between the early 1970s and the late 1990s, the cassette was one of the two most common formats for prerecorded music— Wikipedia —The problem with using TCP for sub-second real time is that TCP has never been intended for highly interactive communications; instead, it has been optimized primarily for looong file transfers (on the order of minutes to several-hours long). Let’s not forget that original [RFC793] which specifies TCP (and which is still in force, though many extensions were added later) was released in 1981. It was the year when IBM PC was just introduced, when all the existing home computers were 8-bit (with 64KBytes RAM being absolute maximum, and all persistent storage coming via cassette(!)), and when all the network stuff was confined to barely interactive Big Iron boxes (mainframes etc.). No wonder that nobody really cared about interactivity back then. However, TCP apparently worked so good and so far beyond it’s original design, that we’ve started to use pretty much everywhere, despite its certain shortcomings.

Congestion control concerns controlling traffic entry into a telecommunications network, so as to avoid congestive collapse by attempting to avoid oversubscription of any of the processing or link capabilities of the intermediate nodes and networks and taking resource reducing steps, such as reducing the rate of sending packets. It should not be confused with flow control, which prevents the sender from overwhelming the receiver.— Wikipedia —One further thing which indicates this “hours-long” timeframe expected by TCP, is that TCP does support a concept of “keep alives” and timeouts on keep alives; however, timeout for TCP “keep alive” is set to 2 hours (sic!) by default. Not only it makes TCP “keep alive” perfectly useless for games (except, maybe, for offline chess), but also it indicates one big problem with TCP for our purposes: it has never been intended for use in highly interactive scenarios. Lack of “keep alives” might be inconvenient (and you may need to implement your own “keep alive” on top of TCP, see item #46 in Part VI for details), but what really hurts interactivity, is a so-called “exponential back-off”. To explain it, first we’ll need to discuss how TCP provides reliability.

TCP operates on top of IP packets, which are unreliable (i.e. each and every of them can be lost). To provide reliable channel on top of unreliable packets, TCP simply sets a timeout whenever it sends a packet, and monitors incoming traffic; whenever the other side sends a packet with “acknowledgement” (ACK) flag, the packet is considered to be delivered. However, if ACK is not received within the timeout – TCP stack on the sending side retransmits the packet ¹. So far so good, but TCP specification says that retransmission timeout should double(!) on each subsequent retransmission attempt. It means that if we’ve lost our first packet and have had retransmission timeout of 0.1 second (which is within typical range, as it is determined by RTT), by the 5^th retransmit we’ll get to 3.2 seconds, and by 8^th retransmit we’ll get to 25.6 seconds, which is likely to be too much for any more or less interactive game out there. If retransmits are merely probabilistic (and independent), the chances to get 8 retransmits in a row are very poor (if each packet can be lost with 1% probability, then probability of losing 8 packets in a row is mere 1e-16), but in practice there are strong correlations between packet losses, which make such multiple-retransmits-in-a-row much more likely. Further implications of exponential backoff and using TCP in general, will be discussed in Part VI.

While there were good reasons for this “exponential backoff” (mostly it is about congestion control for the Greater Good of the whole Internet, though the effect of it is currently being questioned [MondalEtAl]), it becomes a real pain in the neck when using TCP for highly interactive purposes.

So, now, when we’ve established that TCP is not good for situations when we need to guarantee delays to be small, and is reasonably good for large delays, our next question obviously is: “What is the game minimum acceptable delay, for which you can still use TCP?” While the line between “can” and “cannot” is not that bright, it has been observed that for typical acceptable delays in the range of around 5-10 seconds, you can make TCP work fine (using quite a bit of trickery, see Part VI). If for your case, acceptable delays are below 1 second – you’re pretty much out of luck with TCP. And for the range between 1 and 5 seconds – the answer is “it depends”.

In general, if you can get away with TCP – you should do it (with reasons being discussed below), so pretty much it can be described as a following table (Your Mileage May Vary, batteries not included):

Acceptable Delay	<1 sec	1-5 sec	>5 sec
Protocol	UDP	It depends	TCP

¹ in practice, it is a bit more complicated to provide streaming and flow control, so it is not a packet which is acknowledged, but a portion of the stream, but the idea is still the same

29. DON’T write your own guaranteed delivery protocol over UDP

One might think: “hey, with all these TCP issues, why should I use it? I can write my own guaranteed delivery protocol over UDP (or to re-implement [RFC 793] with the changes I need)!”. My recommendation in this regard is the following:

don’t do it, look for the options to do it over TCP or over unreliable UDP

And I am not aware of scenarios when it is not possible (of course, it isn’t a guarantee, but can provide quite an educated guess).

“Implementing your own reliable-delivery protocol over UDP is extremely complicated, time-consuming, and error-prone.For guaranteed delivery purposes, you shouldn’t write your own protocol over UDP for two reasons. First of all, UDP is much less firewall- and router-friendly (see also item #33 below). It also means that in certain cases your users will have a trouble connecting over UDP, while TCP will be working fine. So, if you can do it with TCP – do it. More importantly, implementing your own reliable-delivery protocol over UDP is a task which is an extremely complicated, time-consuming, and error-prone. Just take a look at the original TCP specification [RFC793] – it is about 80 pages, and even for basic but properly-working implementation of a reliable stream you’ll need to implement at least a half of it. And to make things worse – implementing such a protocol is nothing compared to designing and testing it (there is nothing more nasty than the Internet, which always comes up with new ways to break your protocol – been there, seen that).

Bottom line: if you need a reliable delivery over UDP (which might happen) – see item #34 below; you’ll find references for three existing “reliable UDP“ libraries, which you really should consider before writing your own one.

30. DO use UDP for „fire and forget“ packets

There are two well-known cases when UDP really shines: VoIP and shooter games. Both these scenarios can be described as sending „fire and forget“ packets (NB: it is worth noting that Unity3D’s “unreliable state synchronization” also falls under this category). In particular:

whenever sender sends a packet, it doesn’t care if the packet is delivered or not
- there are no retransmissions at all
whatever is lost – is lost forever
- whatever is possible to recover, will be recovered by receiver on receiving the next packet

Whenever your communication fits under “fire and forget” scenarios, and your game doesn’t tolerate over-1-second delays – it means that most likely, you do need UDP. Further explanation of “fire and forget” approach will be provided in Part V.

31. DON’T use Unreliable UDP for Secure Communications (that is, if you can get away without it)

Security over TCP (or more generally, over any reliable stream, including reliable UDP streams) is relatively easy: just use TLS (more specifically – OpenSSL, see part VIIa for details). Of course, TLS is not without it’s issues, but generally it is good enough for gaming purposes. On the other hand, implementing security over non-guaranteed datagrams is very tricky; even those who are supposed to be security professionals, tend to make grave mistakes in this field.

DTLS allows datagram-based applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery.— Wikipedia —One Really Bad example of what happens when even supposed specialists in the field are trying to implement security over unreliable datagrams without sufficient review, is well-known WEP security disaster: while WEP stands for “Wired Equivalent Privacy”, apparently it can be broken by mere eavesdropping in about 2 minutes [ArsTechnica]. BTW, I don’t mean to blame WEP authors for it, and am mentioning it merely to illustrate how difficult the task of implementing-security-over-datagram-layer really is.

Bottom line: you really really don’t want to develop your own security protocol over unreliable datagrams. If you really really have to secure unreliable UDP (for example, if you need both security and sub-second interactivity for the very same data) – use DTLS. While it has certain peculiarities which might need to be taken into account when applying DTLS to games (see item #56 in Part VIIa for details), it is certainly much better than anything any of us can do without spending at least half a year of work.

32. DON’T choose UDP because of UDP Multicast

You might have heard about UDP multicast and a thought like: “hey, this is just the ticket for my game!“ might came to your mind. Don’t hold your breath: while UDP multicast does work in Intranets, it generally doesn’t work for the Internet, so if your users are Internet ones (and the games out there almost universally need to support operation over the Internet) – you need to forget about UDP multicast 🙁 [Stackoverflow].

The reason for it is quite simple: UDP multicast is in fact implementing a neat concept of “subscriber/publisher” (similar to the one we’ve discussed in #item 16 of Part IIb). However, each subscriber/publisher address in UDP multicast is an IP address (from 224.0.0.0 to 239.255.255.255), and currently there is no mechanism for distributing these multicast IP addresses to publishers (except for well-defined ones assigned by IANA, see also [RFC5771]). In addition (most likely, as a result of the previous observation, as routing those AD-HOC blocks from [RFC5771] would inevitable lead to conflicts between different publishers), UDP multicast addresses are not currently routed by Internet routers [Stackoverflow]. Case closed at least for the time being:-(.

33. DON’T use UDP if you can get away without it

“In general, if your game can work either over TCP, or over UDP, you should prefer TCP.In general, if your game can work either over TCP, or over UDP, you should prefer TCP. While in theory there might be no difference, there is One Big Fat Practical Problem with UDP – it is that UDP packets are known to have significantly worse support by firewalls and routers than TCP (stuff such as GRE is even worse, but it is beyond the scope of present article). First, not all routers support UDP (situation is usually much worse when NAT is required). Second, not all firewall-configuring guys are fond of UDP (for the reasons described below).

While you might argue “which firewalls? All my players are at home, so there are no firewalls in sight” – first, even if there are no firewalls, there are routers for sure (and believe me, there are still very very ugly routers in place). Second – in the modern world, it is more likely than not to see your player going on vacation and trying to play from the hotel’s Wi-Fi. From my experience, hotel Wi-Fi’s tend to exhibit very bad tolerance to anything-but-TCP (even for TCP quite a few ports might be closed, but this is a separate story, see item #50 in Part VI for a hint on dealing with closed TCP ports), so when your player is in a hotel it becomes quite a gamble – either UDP will work, or it will not. And if your player can play with your competitor’s game because they’ve used TCP, and cannot play yours because you’ve used UDP for no good reason – it might be a quite strong incentive for him to switch to your competitor.

DDoS A distributed denial-of-service (DDoS) attack occurs when multiple systems flood the bandwidth or resources of a targeted system— Wikipedia —The second reason to avoid UDP is actually the same reason why firewall-configuring guys don’t like allowing it: this is because of DDoS attacks. While DDoS attacks may use (and are using) both UDP and TCP, in practice mounting an attack over UDP is simpler, and more importantly – from my experience, for an average DDoS attack out there, about 80-90% of attack traffic is UDP. Which means that if you’re under attack and are using TCP, you can call your ISP (and your ISP can call their upstream if necessary) and block 80-90% of attack traffic just by blocking UDP towards your servers with one single simple router rule (and without traffic even hitting the downlink from your ISP to your servers(!)). I can assure you than both your admins and your players will appreciate this ability to block DDoS attack (or more precisely – they will be quite angry if you don’t have such an ability).

34. If everything else fails – DO consider “Reliable UDP” library

“If you have found that you cannot get away with using TCP, but also cannot do what you need with 'fire-and-forget' UDP packets – take a look at one of 'reliable UDP' libraries.So, what should you do if you need both sub-second delays and reliable packet delivery?

If you have found that you cannot get away with using TCP, but also cannot do what you need with “fire-and-forget“ UDP packets – take a look at one of “reliable UDP“ libraries. There are three rather popular “reliable UDP“ libraries out there – one is [Enet], another is [UDT], and the third one is [RakNet] (which used to be commercial, but is free now); personally I didn’t try either of them (thanks God, I didn’t need to) – and cannot really vouch for any of them in practical scenarios, but at the very least they’re better than whatever-you-can-possibly-come-up-with without spending at least half a year writing and testing your library. There are also [Reliable UDP] and [R-UDP] protocols, though I’m not aware of currently-supported libraries to support them 🙁 .

35. DON’T use hybrid TCP+UDP

Out of all the options, hybrid TCP+UDP option looks the least viable. I don’t see many scenarios where this approach has advantages over reliable UDP library, and synchronization and TCP-to-UDP interaction is likely to be a mess (due to different TCP-vs-UDP handling on different routers).

One exception is when your game as such fits into TCP requirements in general, but needs something on-the-side (such as VoIP chat) which requires UDP. In such cases the use of TCP for all gaming purposes while using UDP only for VoIP purposes is generally justified.

To be continued…

This post aims to help you to choose whether you need TCP or UDP for your game engine. Stay tuned for Part V, UDP (with Part VI, TCP, following immediately after that).

EDIT: The series has been completed, with the following parts published:
Part V. UDP
Part VI. TCP
Part VIIa. Security (TLS/SSL)
Part VIIb. Security (concluded)

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Comments

Anonymous says

August 29, 2015 at 5:03 pm

“Don’t use UDP because routers suck at handling it”

See, if everyone avoids a protocol because some part of the chain sucks at it (and not because of any inherent limitation), it ends up sucking MORE. Because there’s no pressure to keep implementing it, and without pressure things sadly don’t work (specially when talking about hardware companies,).

Not saying it’s not a valid reason, just that programmers should keep it in mind that protocol support is a constant tug of war, and if you stop pulling you lose.

- "No Bugs" Hare says
  
  September 8, 2015 at 5:11 am
  
  You are right of course, but there is one important question which developer should ask her/himself in this regard: what is more important for him/her: to push UDP as a protocol (for the greater good of the mankind), or to develop the best possible game given current landscape? In most cases, we as developers are bound with interests of our users (see also my old article published in Overload journal: http://ithare.com/the-guy-were-all-working-for/ ), so the answer is likely to be in favour of the latter at the cost of the former, whether we like it or not 🙁 .
  
  As developers-for-hire we have certain obligations, and they often conflict with “greater good” (whatever it is); that’s why quite a few of us go into unpaid work, where we can do whatever we think is The Right Thing 🙂 .
  
  To re-iterate: I am not arguing with your point, I am just saying that:
  * while we’re on the paid hours (or working on a project which is not entirely our own), developer ethics (which is pretty much engineering ethics) may prevent us from pushing something-which-would-be-great-as-soon-as-everybody-uses-it 🙁
  * while we’re on our own, we can (and should) do whatever we feel is The Right Thing to do 🙂
  
- no thanks says
  
  April 25, 2018 at 3:26 pm
  
  My bigger issue with this attitude is that it simply is not true. Routers have handled UDP since the day they were first created. UDP is 1000 times easier for both hardware and software to support. At best you’re maybe handling one extra person at a significant network performance cost to everyone else.
  
  Can you even cite a single router that doesn’t handle UDP? I’ve never seen one in over 20 years of working with all types.
  
  - "No Bugs" Hare says
    
    April 26, 2018 at 3:42 pm
    
    Strictly speaking you do have a point – usually, it is NOT routers but firewalls who are causing the trouble (and I can show HUNDREDS of firewalls which are doing it – in particular, 90% of the public WiFi is like it, especially hotels and airports, ouch!). But for practical purposes, we do not really care whether it is routers or firewalls (or routers misconfigured to work as packet firewalls) who are responsible for it
    
    In any case, per Google (look for their article on QUIC protocol), 8-9% of the Internet population cannot use UDP. If you don’t care about 8-9% of the world population, it is fine, but you should do it with open eyes.
    
    P.S. Oh, and let’s not forget about DDoS, which are 80-95% UDP at least over last 15 years (first time I was under DDoS, was in 2002 or so). Oh, that sigh of relief when you manage to convince your ISP to nullroute (with their upstream) just _UDP_ to your IPs (rather than nullrouting _everything_), reducing DDoS from 100Gbit/s to 5Gbit/s which they can manage… This is useful even if you run BGP-based anti-DDoS, but if you don’t – this often becomes the only life saver. So, sure – if you need it, DO use UDP. Just make sure that you DO provide TCP backup (for those who don’t have UDP, and in case of DDoS).