Preparing to Deploy your Game: To Cloud or Not to Cloud?

 
Author:  Follow: TwitterFacebook
Job Title:Sarcastic Architect
Hobbies:Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek
 
 

Costs: Cloud vs non-cloud
#DDMoG, Vol. VII

[[This is Chapter 23(a) from “beta” Volume VII of the upcoming book "Development&Deployment of Multiplayer Online Games", which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the "release" book to those who help with improving; for further details see "Book Beta Testing". All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the "1st beta" of the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

With this post, we’re starting the 3rd (and last) part of the 1st beta of the epic book “Development & Deployment of Multiplayer Online Games”. This 3rd part is dedicated to deployment and post-deployment issues.

At this point, we’re considering a situation when you have an (almost) ready software – and want to start thinking about deployment. Apparently – it is not as easy as “hey, let’s get a $10/month server within the cloud and run our game from it”. In fact – deployment is one of the trickiest (and one of the least discussed in the literature) parts of the whole task of “get your multiplayer game up and running”.

No Way to Run Servers from Office

First of all, let’s note that

You’re NOT going to run your game from servers in your office1

I am not going to discuss this issue in detail – let’s just accept it as a fact. There are several reasons behind this fact – but the most important one is connectivity. For games – connection quality is paramount, and whatever-you-do – connectivity in your office will suck big time compared to what-you-can-get-from-a-half-decent-datacenter (and most of the clouds too); you’ll see it yourself a bit below, when we’ll be speaking about requirements for the datacenter.


1 Well, to demonstrate that I’m wrong, you may move your office to a datacenter – but this will be more a trick-to-prove-a-point rather than anywhere-usual scenario.

 

To Cloud or Not to Cloud?

The very first deployment decision you should make on the way to deploy and launch your game, is to decide whether you want to run your game in the cloud – or not.

On hosting ISP and CSPs

Hare pointing out:hosting ISP is a provider which sits within datacenter, and which rents you serversHistorically, back in 1990s and early 2000s, there was no such term as “cloud” – and (surprise!) it was still perfectly possible to run an Internet-based business. There were (and still are) lots of so-called “hosting ISPs” out there, and as a Big Fat Rule of Thumb(tm) hosting ISP is a provider which sits within datacenter, and which rents you servers (usually referred to as “dedicated servers” in hosting-ISP-speak), usually on per-month basis.2

When “cloud” buzzword was created, and Cloud Service Providers (CSPs) started to grow faster than mushrooms, they started to sell virtual servers – instead of physical server boxes. These virtual server boxes can be rented per-hour (or even per-second) – but come with quite a few limitations, that we’ll discuss below.

As a recent (and very welcome for game servers) addition, another concept, filling the gap between traditional hosting ISPs and CSPs, has started to emerge. I’m speaking about so-called “bare-metal” cloud servers – essentially very-fast-provisioned server boxes with per-hour billing; more on them in [[TODO]] section below.

With all this in mind, for our purposes, the question of “to cloud or not to cloud”, can be rephrased as:

Do we want to use cloud-based virtual servers, or “bare-metal” cloud servers, or traditional hosting ISP?

2 hosting ISPs can also allow you to “co-locate” your own servers on its premises, but for the time being we’ll ignore co-location; unless your game is already very large – it rarely makes sense financially (it is often more expensive to co-locate your own box than to rent it – that’s not accounting for the price of the hardware server itself).

 

On “Cloud Gaming”

[[NB: part up to “Economics of the Cloud” has been moved to Vol. III, chapter on Server-Side Architecture]]

Of course, with “cloud” being THE buzzword for last 10 years or so – it has inevitably lead to the emergence of the term “Cloud Gaming”. Moreover, as Wikipedia says [Wikipedia.CloudGaming] – there are two different types of “Cloud Gaming”: the one which is based on “video streaming”, and another – on “progressive downloading”.

Video streaming

Out of all the cloud-related stuff which happened in the recent years – one of the strangest things was a concept of the game being fully played – and rendered – on the server, where the game output was compressed, and sent back to a “thin” Client as a video stream (a.k.a. “pixel stream”). The most known real-world deployment of such a system was the one by OnLive (later re-launched as CloudLift); the whole thing didn’t really fly, at least commercially, and experienced significant technical issues too (in particular, it had significant problems working over all-but-the-very-best-connections with no-packet-loss-whatsoever).

Technical problems of video/pixel streaming games are numerous:

  • Latencies are bad (ruling out faster games completely). In particular:
    • Client-Side Prediction is completely impossible with video/pixel streaming. This means that input lag cannot be less than an RTT.
    • Hare with omg face:Video streaming is just, well, streaming – which means that Head-Of-Line blocking (discussed in Vol. IV's chapter on Network Programming) is pretty much inevitableVideo streaming is just, well, streaming – which means that Head-Of-Line blocking (discussed in Vol. IV) is pretty much inevitable, and in case of single packet loss – we’re speaking about delays of at least 2*RTT (which need to be buffered on receiving side to avoid constant “jerking”, so actually we’re speaking about at least 2*RTT delays all the time – which BTW was consistent with real-world observed delays for OnLive).
      • To make things worse – even in presence of such buffers, multiple packet losses in a row (which, as it was noted in Vol. IV, are becoming more and more frequent over the Internet in recent years), cause considerable delays and degradations in player experience.
    • Video quality suffers significantly (to get quality comparable to Blu-ray – we’d need around 10Mbit/s of video stream – and even more CPU power to compress it in real-time)
    • CPU power required on servers to render 3D – and to compress stream with acceptable quality in real-time(!), is huge. Comparing it to traditional architectures (such as those discussed in Vol. III’s chapter on Server-Side Architectures) – we’re speaking about 100x increase in Server-Side CPU power(!).

As a result –

I do not think that games based on video streaming,3 are viable at least in the near future. Within this book, we will not discuss them any further.

NB: video streaming which stays completely within LAN (such as Steam In-Home Streaming) – with rendering happening on a Client PC and then streamed to other devices within the same LAN – does not suffer from the problems above, and can be made viable with current technologies. 


3 That is – from Client to Server, see below.

 

Progressive downloading

Progressive downloading (a.k.a. file streaming) is radically different from video/pixel streaming. The basic idea is to write your game in a usual Client-Server way – while avoiding to force player to do heavy downloads at once; instead – we would be downloading stuff (such as additional levels, characters, quests, etc.) as-player-plays. The key point here – is to enable “instantly playable” games. Probably, the most well-known representative of this class of “Cloud Games” is Kalydo/Utomik.

Hare thumb up:progressive downloading can be implemented without departing from the traditional communication and architectural patternsProgressive downloading doesn’t suffer from the problems of video streaming – and can be actually seen as an evolution of DLCs (though an automated one, and having much smaller pieces than traditional DLCs). As it was discussed in Chapter [[TODO]], progressive downloading can be implemented without departing from the traditional communication and architectural patterns (well – at least without departing from them too much).

IaaS vs PaaS vs SaaS

In general, when speaking about “cloud”, it can mean several different “service models”; the most popular ones are “Infrastructure as a Service” (IaaS), “Platform as a Service” (PaaS), and “Software as a Service” (SaaS).

At this point we’ll be speaking only about so-called Infrastructure-as-a-Service (IaaS). In other words – what you’ll get from your cloud provider, will look almost-exactly4 as a remotely-accessed server; while most of the time it will be a virtual server rather than a physical one (though see below on “bare-metal cloud”) – it can run exactly the same software as can be run on physical boxes.

Other cloud models (PaaS and SaaS) are more tricky. First of all, we need to realize that it is very common to create proprietary APIs for services which are marketed as5 PaaS/SaaS; and proprietary APIs inevitably mean that (a) choosing cloud is not a deployment-time decision (i.e. you need to decide on it looong before), and (b) you’re having an absolute vendor lock-in (which is a Bad Thing(tm)). Examples of such PaaS-with-proprietary-APIs include Google App Engine, and significant parts of Microsoft Azure.

On the other hand – sometimes vendors market standard services (one example would be using MySQL as a service) as PaaS or SaaS – so you might be able to use standard APIs (such as standard MySQL APIs). Such services with standard APIs may be usable for your game – though you still need to be very careful with two things: (a) costs,6 and (b) latencies. Unfortunately – cost analysis for PaaS/SaaS is complicated, and very much depends on specifics of your game, so (unlike for IaaS) we won’t be doing it here. In any case –

Having “standard” APIs is very important for services marketed as PaaS/SaaS

If PaaS/SaaS has “standard” API – we should care only about costs and latencies (and even more importantly – we can switch to our-own-instance of the same service if 3rd-party service doesn’t work for us). On the other hand – if it has proprietary API – the whole game becomes very different.

With proprietary APIs, the choice of PaaS or SaaS is not a deployment-time decision anymore; as a result – if you’d like to use this kind PaaS or SaaS – they should have been considered much earlier within your development life cycle. Up to now, I haven’t seen successful PaaS/SaaS clouds with proprietary APIs, aimed for games – but it doesn’t mean that they cannot possibly exist; what is clear, however, is that if you’re using proprietary APIs – they create a very strong dependency, which is a major risk, and should be analysed very carefully before making a decision.


4 Usually, the only substantial difference is in performance and latencies.
5 at this point, we don’t care whether it is “really” PaaS/SaaS according to Wikipedia or any other source; what is important is to decipher wording used by various CSPs
6 “cloud” doesn’t mean it is cheap – so make sure to compare costs; for example, [Frost] mentions $11/day for a 10G MySQL-as-a-service database – which is outright exorbitant

 

Pros and Cons of the IaaS Cloud

In spite of everybody and their dog running full-speed for cloud deployments, the choice “whether to use cloud” is not that obvious – at least for games. There are both pros and cons of using cloud (more specifically – IaaS) services.

First, about IaaS pros (when comparing IaaS to traditional rented servers):

  • Inquisitive hare:if your load varies greatly with time – you will be able to save quite a bit by using per-hour (or even per-second) billingIaaS cloud does provide elasticity at cheap prices. In other words – if your load varies greatly with time – you will be able to save quite a bit by using per-hour (or even per-second) billing.
  • Fast hardware replacement in case of hardware failure. Typically, for virtualized cloud server the only problem is to detect the failure – and then your provider will re-launch your instance elsewhere within seconds (note that you still lose all the in-RAM data of the instance – and any hard disk data too). For rented servers – fixes usually come in a few hours, so to ensure the continuity you may (or may not – as failures beyond fans and HDDs are really really rare – more on it in [[TODO]] section below) need to have (and pay for) a stand-by server of each type.
    • Note that normally, high availability and fault tolerance are not included into IaaS offerings. In other words – an IaaS server is almost-exactly like your physical server, and can crash at any moment; if you want to have high availability and/or fault tolerance – you’re still on your own (but you still can use all the methods discussed in Vol. III’s chapter on Fault Tolerance).

Now, an (apparently significantly longer) list of IaaS cloud cons (as it applies to games):

  • If you are using your system 100% of the time – cloud prices are usually significantly higher than renting the same computing power.7 As of early 2017, it was possible to rent a “workhorse” 1U/2-socket server with 2×8=16 cores, 64G RAM, and 8x2T HDDs – and residing in a very decent datacenter, for about $150/month. To rent comparable computing power (though without HDDs) from a leading-but-not-overly-expensive cloud provider, pricing for per-hour billing was $0.862/hour – or $630/month, it is wallet-blowing four times more expensive(!).8
    • In some scenarios, however, higher per-hour pricing can be compensated by elasticity. For a detailed analysis see “Economics of Cloud” section below.
  • Traffic pricing. Quite a few games out there are rather heavy traffic consumers. For example, for a “typical” simulation server capable of running 1000 players with 100kBit/s going to each of the players – for a non-cloud hosting ISP we’re speaking about “unmetered 100Mbit/s connection”, which (as of early 2017) can be obtained for as little as $20/month.9 However, for the cloud, a similar amount of traffic (13 petabytes10) will cost you several hundreds of dollars; it is of the order of 10x price difference(!).
  • Higher and unpredictable latencies. Due to the very nature of virtualized clouds, which need to move instances around, there are occasional “latency spikes” of the order of hundreds of milliseconds – and sometimes going into seconds.
    • There is a way to mitigate it – by using so-called “bare-metal clouds” (which are essentially merely ultra-fast-provisioned servers with per-hour billing).
  • Significantly less control over exact location of your server. The reason for it, is once again, clouds moving instances around. Once again, “bare-metal clouds” do mitigate it.
  • Judging hare:By design, clouds are built from commodity server boxes.Inability to customize hardware. By design, clouds are built from commodity server boxes. While to certain extent this stands for all the hosting ISPs (i.e. for any hosting ISP you won’t be able to use exotic hardware – at least unless you’re doing co-location), cloud providers offer even less options to choose specific hardware than servers rented from hosting ISPs. In particular, there are at least two options which are important for certain subsystems, and obtainable on most of traditional hosting ISPs, but are not available in clouds:
    • Larger 4S/4U boxes. Historically, there are two “standard” sizes for server boxes: (a) smaller “workhorse” 2S/1U boxes, and (b) larger 4S/4U server boxes.11 The latter ones are more expensive per-CPU-cycle – but on a positive side, they tend to have significantly longer MTBFs. This, in turn, comes handy if we want to avoid dealing with fault tolerance12 for a few critical server boxes – such as database server(s). In short – with 4S/4U server boxes (coming from Big Three server manufacturers), they tend to have MTBFs in the range of 5-10 years, and usually it is the most practically reliable thing to use; for more discussion – see Vol. III’s chapter on Fault Tolerance.MTBF Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a system during operation— Wikipedia —
    • For OLTP databases such as those used in games,13 latency of disk writes is important. For hosting ISP, we can easily get a BBWC RAID card (more on it in [[TODO]] section below), bringing disk write latencies pretty close to the latency of PCIe transfer forward and back – and usually the number is in dozens-of-microseconds. For cloud – we’re often speaking about distributed cloud storage (with latencies in milliseconds – which is about 100x higher(!)). Even for low-latency on-cloud-box SSDs (which are going to cost you even more than those already-high numbers above) – we’re usually speaking about hundreds of microseconds, still much higher than good ol’ RAID-with-BBWC (coming from the previous century).
  • Need to handle resource allocation failures. To be profitable, cloud services need to keep the balance of the hardware-they-have – and the hardware-they-really-use-to-run-their-services. If too many cloud users request CPU at the same time – well, the cloud provider will need to decide who of the customers gets priority, and who gets offline while the load spike persists.14 Sure, the developments such as EC2’s “spot instances” help to establish clear priority rules15 – and mitigate this problem, but IMO it is still imprudent to build a cloud-based system with no handling of the “CSP refused to allocate new server” scenario.
    • To handle such a “cloud resource allocation failure” scenario, to the best of my knowledge, two ways are possible: (a) to have a backup CSP, where your system will automatically go in case of primary CSP failing to allocate a new server for you; (b) to have an ability to reduce system load (like stopping lower-priority games, etc.). However, implementing (a) may be tricky for faster games (where “bare-metal cloud servers” are often necessary), and (b) strongly depends on the specifics of your game.
  • If your servers are virtualized – you won’t be able to use virtualization techniques, including such a potentially useful thing as VM-based fault tolerance (such as VMWare Fault Tolerance or Xen Remus).
    • In theory, cloud providers may include them into their offerings – but I don’t know of any provider doing it yet.
    • Bare-metal clouds are exempt from this restriction.
    • If considering VM-based fault tolerance – beware of additional latencies it creates (for more discussion – see Vol. III’s chapter on Fault Tolerance)

7 NB: through this section, I am speaking only about published pricing; any kind of special deals which might be available from service providers (especially for high-profile games), are not included.
8 Sure, you can buy the same thing with per-month billing, but (a) you will lose all the elasticity, and (b) it will still be about 2x more expensive than dedicated servers. As for ultra-cheap cloud providers such as Linode or Digital Ocean – they bill per month and provide per-core pricing which is comparable to the cost of renting servers; however – as with any per-month billing, there are no elasticity benefits, and other considerations – such as latencies and their DDoS policies – are pretty bad for game servers.
9 YMMV, no warranties of any kind, batteries not included.
10 Assuming 50% utilization – see Fig 21.1 in “Economics of Cloud” section below for an example.
11 there is a third common size – 2S/2U, but its uses are quite specific and relatively limited
12 As it was discussed in Chapter [[TODO]], fault tolerance itself is also often error-prone and tends to cause quite a bit of trouble – and this can include MTBF being decreased because of faulty fault tolerance mechanisms.
13 especially for those which use single-write-connections which I recommend, see Chapter [[TODO]] for discussion.
14 and no CSP I know provides sufficient remedies from such failures in their respective SLAs.
15 of course, for operational servers of our game, we should not use “spot instances”; they’re important because other users of the cloud may use them, generating profit for CSP – while we’re still enjoying prioritized access to cloud resources.

 

Bare-Metal Clouds

Hare with smiley sign:One relatively recent and certainly welcome addition to the cloud scene, is so-called “bare-metal cloud” servers. One relatively recent and certainly welcome addition to the cloud scene, is so-called “bare-metal cloud” servers. Essentially – they are just good old leased/rented servers offered by traditional hosting ISPs, but ideally – with two significant improvements:

  • Ultra-fast deployment (within 2-3 minutes)
  • Ability to prepare your own disk image on one of these “bare-metal servers”, to store this image with the provider, and then to request new instance from this stored image
    • This way, first you prepare your own disk image of your own configuration – together with your preferred OS with its settings, with your own apps etc. etc. – and then you are able to deploy the whole thing within 2-3 minutes after you realize that you need this new server box. Very neat – and if not for exorbitant cloud pricing, I would say that this is exactly the way to go.
    • I am not sure that all the providers who are speaking about “bare-metal” clouds, are allowing to store your own image with them. However, as for those ISPs which don’t – I have very serious doubts that they are usable for games. Bottom line – make sure to double-check it with your CSP of choice.

Economics of Cloud

Now, to be more precise about “cloud vs non-cloud” pricing (and answering “hey, you’re saying that cloud is more expensive, but thousands of companies tell it is cheaper” considerations) – let’s take a closer look at the economics of the cloud.

Keeping that “if you rent on-demand cloud for a month, you’ll pay 4x more” observation in mind, let’s consider several use cases.

In the first use case, we have a perfectly flat distribution of the load over the time; in the second use case – we have a significant load for 6 hours every 30 days.

In the first use case (and keeping in mind that “cloud is 4x more expensive” observation in mind), cloud is going to be 4x more expensive than leasing servers from traditional hosting ISP. However, in the second use case – cloud will be 30x less expensive than leasing servers (just because leased servers will sit idle for over 99% of the time).

In other words –

Cloud-vs-leased pricing is all about typical load patterns for your specific business.

Now, let’s look at the 3rd use case – which is much more tailored to games. As this 3rd use case, let’s consider the load of a “global” game server is over the course of a day; “global” here means that the game serves all the markets across the globe from the same set of servers (i.e. there is only one Huge Server Farm, and there are no such things as “NA Server”, “West Europe Server”, and so on). For such “global” games, somewhat “typical” graph of load distribution over the time, can be seen on the Fig 21.1:

Fig 21.1

If we take this load pattern and try to compare the costs for the cloud and traditional hosting, we’ll see that:

  • Assuming that we have enough load to use lots of servers – the cost of the cloud will be proportional to the integral of this function.
  • To estimate the costs for traditional hosting with the servers rented per-month – we’ll need to take the maximum load (and to have some reserve over it too – in practice, for larger games 20% reserve is not that bad starting number for reserve).

Performing these calculations (and still assuming cloud per-hour pricing to be 4x higher than traditional one) – we’ll see that

for the type of load curve shown on Fig 21.1, cloud is about 2x more expensive.16

For our 4th use case – we’ll consider a game with “regional” servers (such as “NA Server” serving all the North America, but with a different server serving Europe, and so on); as it was discussed in Chapter [[TODO]], such deployment configurations are popular for very-latency-critical games (such as MOBAs or FPS). A typical load graph for this type of game is shown on Fig 21.2:

Fig 21.2

If we, once again, calculate the costs of the cloud-based solution vs the costs of server rental (and under the same assumptions as above) – we’ll see that

for the load graph shown on Fig 21.2, cloud costs are almost exactly the same as traditional server rental costs

(in practice, however, for most of the games traditional servers will still win because of significantly lower traffic costs).


16 and that’s even without taking traffic costs into account

 

What About those Big-Name Games??

Wtf hare:From the analysis of typical load patterns above, it seems that for games it doesn’t make much sense to use cloud.From the analysis of typical load patterns above, it seems that for games,17 it doesn’t make much sense to use cloud.

On the other hand, there are quite a few big-name games out there which are very loud about being cloud-hosted, and I certainly do not want to say that they’re idiots-not-able-to-count-their-money.

There are two important considerations which may still affect the balance of pros and cons in favor of the cloud. The first consideration is that I have strong suspicions that at least for some of these Big Name Games, some kind of a special pricing deal exists between the game and the cloud provider. Which, in turn, means that

their experience and calculations may NOT apply to your specific case.

Yes, with cloud hosting being a big business, CSPs may be really eager to get some Big Name Game™ customer to their system (and to make all kind of buzz about it); to get such a customer – they’re often willing to provide all kind of unimaginable-to-usual-customers-support for these Big Name Games™, and willing to drop their pricing a lot too. The problem, however, is that if you’re not The Big Name Game (yet) – then most likely, you won’t get that preferential treatment.

The second consideration which happens to be in favor of cloud, is much more interesting. It is related to an observation that there are other irregularities of the traffic beyond daily variations shown on the graphs above; as any deviation from flat load distribution makes the cloud more viable price-wise – such additional variations are certainly playing on the pro-cloud side.


17 if we use publicly available pricing (which indicates that cloud is 4x more expensive than renting/leasing servers on per-month basis) – and assuming that we need to deal only with intra-day load variations

 

Beyond Intra-Day Variations – Cloud rulezz?

For those daily load graphs above, there is one thing to be understood: these graphs represent just typical intra-day variations. However, in practice, beyond intra-day variations – there are quite a few other load variations:

  • Weekly variation. Usually (no warranties of any kind) – it happens to be not too drastic (though still improving viability of the cloud-based solutions).
  • Seasonal variation. This one can be quite significant – but OTOH, it is usually slow enough so it can be handled with per-month rented servers quite efficiently.
  • Event-, ad-, and promotion-driven variations. Now we’re speaking. These are variations which can be Really Big™ (and the better your marketing team is working – the bigger this variation will be). I’ve seen promotion-driven traffic going at 2x difference from “normal” peak-time-traffic, but heard about 5x difference and more.

And

If your promotion-driven traffic is 5x over normal daily peaks – cloud will probably become less expensive than renting servers

(if you reserve rented servers to run promotional peaks – most of your servers will be sitting idle most of the time, when there is no peak).

This represents quite a compelling case for using the cloud-based services (keeping in mind latency restrictions and using bare-metal cloud when necessary). Still, it is not the last point in our cloud-vs-rented-servers analysis.

Hybrid (Rented+Cloud) Deployments

While promo-based variations above look as a solid justification for cloud-based deployments, let’s hold our horses before joining that crowd-of-companies running full-speed into the cloud. In fact, as promotions need to be quite rare – it would be a waste to pay 4x more for cloud all the time when promotions are not running.18

Hare with an idea:In fact, the following “hybrid” model is more optimal than both pure cloud-based and pure rental-based onesIn fact, the following “hybrid” model is more optimal than both pure cloud-based and pure rental-based ones:

  • We’re renting servers on per-month basis to cover most of our regular load. For a typical load graph on Fig 21.1 – I’d probably argue for having all the usual load to be handled by rented servers.19 For a typical load graph on Fig 21.2 – intuitively I’d say that we should rent enough servers to cover about 20 hours of operation (i.e. all the day but that peak from 18-22); more precise analysis shows that the optimum for the load on Fig 21.2 is around renting enough servers to handle 70’000 simultaneous players20 – moving any load above this number to the cloud.
  • For everything-else-which-goes-above-rented-capacity – we’re using cloud. This will be necessary to cover:
    • Ad-, event-, and promotion-driven load spikes
    • Any unexpected spikes (however rarely, they do happen)
    • Regular sharp load spikes (such as the daily spike on Fig 21.2)

Let’s take a closer look at the costs of the cloud-vs-rented-vs-hybrid for both “global” and “regional” daily load distributions (represented by Fig 21.1 and Fig 21.2 respectively); for each of these loads we’ll consider two cases: one taking them “as is”, and another one – using the same load plus assuming that every month, there is one day when you’re running promotions – and that the load during promotions is 5x higher than usual.

 

Table 21.1 below shows the monthly costs for different models (normalized to the cost of cloud):

Type of Load21 Cost (rented servers)22 Cost (cloud)23 Cost (“hybrid”24)
“global” game (Fig 21.1), without promotional days 0.47 1.0 n/a25
“regional” game servers (Fig 21.2), without promotional days 1.03 1.0 0.51
“global” game (Fig 21.1), with promotional days being 5x higher than the regular ones 2.2 1.0 0.53
“regional” game servers (Fig 21.2), with promotional days being 5x higher than the regular ones 4.5 1.0 0.57

NB: costs in the Table 21.1 do not include traffic costs, so real-world numbers will be different – and depending on your specific pricing. However, it should be sufficient to illustrate the basic idea behind “hybrid” rented+cloud deployments.

As we can see, “hybrid” rented+cloud model – with handling most of “flat” load on the cheaper rented servers, and handling load spikes on the cloud – tends to work much better than both “pure” rented servers (which become too expensive to handle huge occasional loads such as promo loads), and “pure” cloud (which is too expensive to handle flat loads26).

Assertive hare:there are already quite a few games out there which are already using such “hybrid” approachBTW, there are already quite a few games out there which are already using such “hybrid” approach; moreover, I’ve heard about a game which went further and is using three different sets of servers – IIRC, one set of servers was rented on per-month basis, another one – was a cloud with per-day billing, and the third set – was cloud with a per-minute billing; each subsequent set is more expensive than the previous one – but allows for better granularity, saving costs in the long run.

NB: let’s keep in mind that if our game is fast paced – and this is what is likely for load pattern on Fig 21.2 which is typical for per-continent Datacenters – we’ll most likely need to use “bare-metal servers” for the “cloud” part of the load – to avoid virtualization-related latencies; fortunately, more and more CSPs are providing them (though prices vary greatly).


18 in fact, if promotions are running all the time – then distribution will become flat, and all the argument will fall apart.
19 formally, optimum will be a bit lower than that – but most likely, the gain won’t be worth the trouble.
20 NB: this number doesn’t account for traffic costs
21 for all types of daily load, we’re assuming one “promotional” day per month
22 with 20% reserve over the top promotional load
23 as noted above, we’re normalizing everything to the cost in the cloud, so in this table cloud costs are “1.0” by definition
24 along the lines outlined above
25 for this type of load, we don’t really need anything better than rented servers
26 even more so if we take traffic into account

 

[[TODO: add payments/analytics/CRM/content generation/content distribution – might be better suited for cloud – esp. analytics and content generation as having “spike” loads; content distribution is also a good candidate due to standard HTTP-based stuff such as CDNs being trivially re-used]]

Summary on Rented-Servers-vs-Cloud

A brief summary of the musings above:

  • Femida hare:If the load is flat, cloud is significantly more expensive than rented serversIf the load is flat, cloud is significantly more expensive (as of early 2017 4x more expensive) than rented servers
    • In addition, traffic prices for cloud servers tend to be very high too (up to the point of being outright atrocious)
  • On the other hand, for load spikes, cloud is much cheaper than rented servers (just because the rented servers will sit idle most of the time waiting for that load spike)
  • Quite often, it makes sense to use a “hybrid” schema, covering “usual” loads with rented servers, and load spikes above it with the cloud.
    • It will also allow to mitigate quite a few restrictions of “pure” cloud deployments; for example, for “hybrid” deployments you may still use “rented” database server with BBWC RAID, for critical servers you may still use VM-based fault tolerance, etc.
  • For low-latency games – “bare-metal cloud” may be necessary.
  • For all kinds of cloud deployments – make sure to have a way to handle cloud server allocation failures.

[[To Be Continued…

Tired hare:This concludes beta Chapter 23(a) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”.

Stay tuned for beta Chapter 23(b), where we’ll discuss how to choose your ISP/CSP]]

Don't like this post? Comment↯ below. You do?! Please share: ...on LinkedIn...on Reddit...on Twitter...on Facebook

[+]References

Acknowledgement

Cartoons by Sergey GordeevIRL from Gordeev Animation Graphics, Prague.

Join our mailing list:

Comments

  1. Dahrkael says

    Very good explanation!
    For sure getting billed for bandwidth is a killing move for games, good ol’ physical servers ftw. maybe even using some 3-minute-deploy monthly-billed VPS for peaks does the trick instead of using cloud instances.

    One thing about video-streaming games: shouldn’t it use UDP to stream the video as any decent realtime video streaming application? Why would head of line blocking be a problem? I’m pretty sure that lost frame is not crucial to play the game.

    • "No Bugs" Hare says

      > One thing about video-streaming games: shouldn’t it use UDP to stream the video as any decent realtime video streaming application? Why would head of line blocking be a problem? I’m pretty sure that lost frame is not crucial to play the game.

      These days, any half-decent video compression doesn’t send frames one by one (the last one doing it, IIRC was MJPEG – which really really suxxx). Instead, modern compressors (including H.264, VC-1, and so on) compress portions of the stream starting with a so-called “key frame” – and distance between “key frames” can be like minutes (and if reducing this distance – we’ll be increasing required bandwidth A LOT). As a result – dropping even one byte makes it impossible to decode the whole thing until the next “key frame” – and this is exactly head-of-line blocking. And as a result – for modern video streaming (unlike for VoIP) there isn’t much difference between TCP and UDP :-((.

Leave a Reply

Your email address will not be published. Required fields are marked *