Server-Side Hardware for MOGs - IT Hare on Soft.ware

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[rabbit_ddmog vol=”7″ chap=”Chapter 23(c) from “beta” Volume VII”]

After all the (admittedly boring) discussions of services provided by different types of ISPs – we can get to something which makes quite a few developers’ hearts race <wink />. I’m speaking about Server-Side hardware.

Side note: maybe it is just me – but I am indeed very fond of choosing my hardware, especially at home or for Server-Side production. And in practice – BOTH for home devices, AND for servers, it DOES pay to know what exactly you’re shopping for.

RENTING our Hardware

First of all, let’s re-iterate that most of the time we won’t be speaking about co-location (as discussed above – it is better to avoid colo except for some rather special cases). As a result –

As a Big Fat Rule of Thumb™, we won’t be BUYING our hardware.

Instead – we’ll be RENTING it from ISP guys.¹

This, in turn, restricts our choices of the available server hardware. We cannot say “hey, I built this nice tower with four GTX 1080 Tis, please put it to the datacenter”; neither we can often go for very-nice-but-scarcely-available-for-rent server boxes such as HP DL580 (however, a similar Dell R930 can be found much more easily).

¹ Usually on per-month basis, though per-hour basis becomes more and more available – for a price, of course.

Typical Server-Side Network Diagram

Also, let’s note that most of the time, we won’t need just servers – but that rather, for any serious MOG, we’ll be looking for the following hardware:

Servers, including:
- LOTS of “workhorse” 2S/1U – for sure
- SOME 2S/2U boxes – mostly for storage, though YMMV
- A FEW 4S/4U boxes – likely. While quite a few people argue that 4S/4U servers are a waste of money – for some of the servers it is not that obvious, more on it below.
Ethernet Switch(es)
- Usually – the managed one(s), with VLANs etc.
Firewall(s)
- If we’re going to get our latency-critical traffic firewalled – we MUST look for low-latency firewalls (which BTW is not going to be easy).

Here is a typical network diagram for a very very basic MOG deployment (it is more typical for slower MOGs):

With such slower-but-more-secure deployments, our uplink Internet port goes to our firewall; nothing else is connected directly to the Internet – so the firewall can control all the traffic.

Several important points about the diagram on Fig 21.1:

“everything which is behind the firewall – goes over “private” connections/networkeverything which is on the right side of the firewall – goes over “private” connections/network we mentioned when we spoke about ISP-provided services. BTW, traffic over “private” network should not be billed by your ISP – but make sure to check it whether it is really the case(!)
Ethernet connections shown, may go between your boxes directly, or can go from one rack to another one via some kind of patch panel; it doesn’t really matter – and we don’t care about it at all.²
Switch we want to use – should be a “managed” switch with support for so-called “VLANs”=”Virtual LANs”. Very basically – we can consider each VLAN “as if” it is a completely separated LAN (in spite of going via the same physical switch).

Now about those VLANs. On the diagram on Fig 21.1, we have three of them: DMZ VLAN, “Trusted” VLAN, and “MGMT” VLAN. Let’s consider them one by one:

DMZ VLAN is a traditional way to separate web/e-mail traffic from the rest of your system (web/e-mail servers are traditionally considered a high-risk from security perspective).
“Trusted” VLAN is where all your traffic (pre-filtered by a firewall) goes.
“MGMT” VLAN is quite an interesting beast. Its only purpose – is to manage your servers (using so-called “out of band management” such as IPMI, iLO, DRAC, or IMM; more “out of band management” below). As a good security practice:
- no “normal” traffic ever goes over MGMT VLAN – only control traffic from admins.
- the only traffic which can get into MGMT VLAN from firewall – is VPN traffic from your admins.

As it was noted above – the diagram on Fig 21.1 is more typical for slower-paced games; for fast-paced games – it is often modified to something along the lines of the Fig 21.2:

Here, we have another VLAN – “Game” VLAN, which handles latency-critical game traffic. This traffic comes directly from the Internet – but goes only to Game Servers. And to allow Game Servers to reach DB Server – we’re still using “Trusted” VLAN (in this way, we’re ensuring that at least our DB Server cannot be attacked directly from the Internet). NB: at least if your Game Server is Linux one, it is possible to use the same physical NIC to participate in both “Game” VLAN and “Trusted” VLAN without losing much security; this is achieved via using IEEE802.1q “tagging” (you need a switch supporting 802.1q too, but this is rarely a problem for anywhere-serious managed switches).

About two uplink ports on the Fig 21.2: nothing really prevents us from using just one uplink port to connect both firewall and “Game” VLAN (to do it, we just need to connect the left side of the firewall, as well as Uplink port, back to switch; both should be connected to ports configured as a part of our “Game VLAN”).

² well, as long as all the cables are ok – and Ethernet cables are known to be a big source of problems <sad-face />

Hardware

“both diagrams on Fig 21.1 and on Fig 21.2 are extremely basic – and that for anywhere serious deployment, you’ll need significantly more than thisFor our current purposes, these diagrams are especially interesting, because they show us pretty much all the types of boxes we need to have. On the other hand, we should point out that both diagrams on Fig 21.1 and on Fig 21.2 are extremely basic – and that for anywhere serious deployment, you’ll need significantly more than this (in particular, for the time being we’re ignoring redundancy entirely). However – they should be enough to give an idea of what we should expect on our Server-Side network-wise. Also, if you need more than this (for example, you need a Load Balancer³) – you should be able to find a place for it on the diagram yourself.

Now, let’s consider all the boxes on these diagrams one by one, and let’s start with non-Server boxes first.

³ Though as it was discussed in Vol. III’s chapter on Server-Side Architecture, I am not that fond of hardware Load Balancers.

Switch

When speaking about switch – it will be almost-universally present, though sometimes it will be a part of ISP-provided “private network” connectivity. If it is the case – you are in luck (just make sure that you’ll be able to specify several VLANs where your servers should be connected).

If VLANs is not a part of the service provided by ISP – you should be able to rent switch from them; just as a ballpark number – as of 2017, it was possible to rent a 24-port 1Gbit/s managed switch for about EUR 50/month (and as you can see from the diagrams above – 24 ports is not really much when it comes to switches).

BTW, if you ever need more than 24 ports – keep in mind that you can cascade switches (though inter-switch connection is rather tricky – so make sure to read about “trunking” and think about separating different VLANs to different physical switches to reduce risk of inter-switch connection being overloaded).

Firewall

As for the firewall – the choice of it depends greatly on a question “do we want to firewall our gaming traffic?” (on Fig 21.1 we do firewall it, and on Fig 21.2 we don’t).

Low-Latency Firewalls – Packet-Only Filtering

“beware: “being able to handle 1Gbit/s in general”, and “being able to handle 1Gbit/s in 60-byte packets” are often two very different thingsAs a rule of thumb, if our game is fast-paced, AND we want to firewall our gaming traffic – we need a low-latency firewall capable of handling LOTS of small packets (beware: “being able to handle 1Gbit/s in general”, and “being able to handle 1Gbit/s in 60-byte packets” are often two very different things).

When speaking about low latencies and high traffic, one should keep away from quite a few vendors who are just putting Linux distro on their hardware, then get each packet into user space – and process it there, incurring kernel-user-kernel penalty for each packet. Such boxes (especially those oriented on throwing in as many complicated checks as possible) tend to be overwhelmed very easily by serious gaming traffic (which tends to consist of tons of small packets).

With regards to low-latency firewalls, I had very good experiences with Netscreen firewall appliances (now Juniper, though Juniper did discontinue Netscreen line and replaced it with supposedly-better SRX series); also I’ve heard good things about higher-end CISCO ASAs. Whether your ISP will provide any of these – depends, but I’ve seen up to ASA 5545 available for rent from a hosting ISP; ASA 5545 handles up to 1GBit/s with sub-millisecond latencies -– even when traffic consists from smaller packets (OTOH, it is going to cost you – the one I’ve seen was rented at EUR 650/month).

Last but not least: for our own gaming traffic – we won’t need complicated firewall rules (neither we’ll be able to write them anyway); this might call for using a simple low-latency packet-filtering firewall for gaming traffic (such a box can be inserted right after second uplink port on Fig 21.2) – and complicated state-based monster for all the other traffic.

It means that for gaming traffic – we can at least try to use a simple Linux box with iptables (or FreeBSD with pfSense) configured to throw away anything but UDP packets going to our own ports; it is not that much filtering – but for gaming UDP traffic, it is quite difficult to do more than that anyway. Note BTW that as netfilter-controlled-by-iptables operates in kernel mode – it will perform significantly better than those userspace firewalls I criticized above.

Stateful Firewalls – Better Protection, Worse Latency and Throughput

For usual (non-gaming or slower) traffic – firewalls tend to be very different from simplistic UDP packet filtering. Here we’re speaking about serious stateful inspection, understanding of higher-level protocols such as TLS/HTTP/etc., and so on. Fortunately – in these cases we don’t usually speak about low latencies (and rarely – about high traffic), so it is not that difficult to find a firewall box for this purpose.

In some cases – ISPs provide such firewalls themselves (I’ve seen it marketed as a “shared firewall”). If it is not an option – you should look at least for a stateful firewall with support for TLS, HTTP, and VPN (simplistic packet filtering isn’t enough for web/e-mail/TLS traffic) – all the way up to so-called UTM firewalls. UTM firewalls can be usually found as appliances – but if you happen to dislike the idea of paying for appliance firewall – or if your ISP doesn’t provide such an option to rent a firewall – you may want to rent a separate server box (or for cloud – a separate VM) and run a software firewall such as untangle, endian, or ClearOS on top of it.

Servers

After we’re done with network stuff – we can start dealing with Servers <at last! />

On Out-of-Band Management

When you’re choosing your rented or co-location servers, one of the most important practical things⁴ is so-called Out-of-Band Management.

Most importantly for us now – Out-of-Band Management allows you to:

KVM switch A KVM switch (with KVM being an abbreviation for 'keyboard, video and mouse') is a hardware device that allows a user to control multiple computers from one or more sets of keyboards, video monitors, and mice.— Wikipedia — have remote access to your box (known as Remote Console, KVM-over-IP, etc.). It works as follows: you sit in front of your laptop half a globe away from the server – and see the server’s screen at your laptop (this accounts for letter V – Video – in KVM abbreviation; you and also can use Keyboard and Mouse). It is somewhat similar to remote control programs such as TeamViewer etc. – but Out-of-Band Management works completely separately (via separate hardware, separate network port, etc. etc.) than the hardware which is used by OS itself; think of it as of a fully separate hardware which captures video output – and sends it to you using completely independent means. As a result – Out-of-Band Management doesn’t require OS to work, and can work without the OS being correctly configured or even installed; in fact – it can even show you BIOS load screen.
- This comes Really Handy™ in case if you have changed configuration of your server box – and by doing so, you misconfigured your NIC so you can no longer access your box over IP to change this configuration back. Out of this situation – I know of only two possible ways: (a) “remote hands”, and (b) out-of-band access.
use “remote media” (also known as “virtual media”). This feature works as follows: you, sitting at your laptop a thousand miles from your Server – insert a CD into your laptop, and your Server can see your CD as if it is inserted into the Server itself. This feature allows to install any-OS-you-want, completely from scratch (and without any help from ISP guys – that is, beyond connecting cables and maybe configuring Out-of-Band interface once).

With Out-of-Band Management – the only two things you will ever need from ISP guys is to:

connect cables
configure Out-of-Band Management on your Server to use your IP addresses (in case of the diagrams on Fig 21.1-21.2, IPs should be those from whatever-you-designated for your MGMT VLAN).

“A word of warning – even if your server supports iLO/DRAC/IMM/IPMI – make sure that you (or your ISP) have a license for it, with the license supporting both KVM and “remote media” functionality.As for “which Out-of-Band Management to use” – usually, the only realistic option is to stick to whatever-provided-by-your-Server’s-vendor. For HP boxes – it is known as iLO, for Dell – it is DRAC, for IBM/Lenovo – it was renamed several times, with the last name I’ve heard, being IMM; for SuperMicro – it is IPMI. I’ve tried first three of these myself – and all provided more or less the same functionality (and while I didn’t try SuperMicro’s IPMI – it is reported to provide all the necessary capabilities too). A word of warning though – even if your server supports iLO/DRAC/IMM/IPMI – make sure that you (or your ISP) have a license for it, with the license supporting both KVM and “remote media” functionality.

If you’re renting a cloud server – make sure that your ISP does provide something like “console access” to your cloud instances; with “console access” etc. – you usually won’t be able to install OS from scratch, but you will still be able to fix your IP misconfigurations; this is a bit worse that a full-scale Out-of-Band Management with “remote media” – but is still workable.

⁴ after the obvious stuff such as CPU/RAM/etc.

“Workhorse” Expendable Servers – 2S/1U

With understanding of Out-of-Band Management – we can start speaking about typical server configurations.

Rack Unit A rack unit (abbreviated U or RU) is a unit of measure defined as 1.75 inches (44.45 mm). It is most frequently used as a measurement of the overall height of 19-inch and 23-inch rack frames, as well as the height of equipment that mounts in these frames— Wikipedia —If you’re renting your servers – the most popular choice is a “workhorse” server, which has 2 CPU sockets – while being just 1 Rack Unit (=1,75”, a.k.a. “RU”, or “U”) in height. A combination of these two properties (Sockets and Units) is often abbreviated as 2S/1U.

2S/1U server boxes are one the most popular servers out there; they tend to provide the best price/performance – and are pretty decent overall.

For such a 2S/1U box, you can expect:

2 CPU sockets – ideally with almost-the-latest-greatest CPUs installed. OTOH – your YMMV greatly depending on ISP(!); as a rule of thumb – 1-2 years behind “latest greatest” is pretty normal, but make sure that you’re not being sold a 5-year-old CPU for the price of a modern one.
Anywhere from 4G to 256G RAM
Up to 8-10(!) HDDs/SSDs (how they fit them into 1U – is everybody’s guess, but it seems to work)
Usually – hot-replaceable fans, hot-replaceable power sources, hot-replaceable HDDs/SSDs
Usually – S.M.A.R.T. with pre-failure detection of HDDs/SSDs

“Workhorse” 2S/1U Servers are usually your best bet whenever you need a server which should perform well, but for which your system is ready to handle a failure in a more-or-less seamless manner;⁵ in other words – 2S/1U server boxes work the best as cheap expendable servers.

MTBF Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a system during operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system.— Wikipedia —In game environments – 2S/1U boxes are usually used to run your Game Worlds (and also – as Front-End Servers if you’re using them – see Vol. III’s Chapter on Server-Side Architecture for discussion). The rationale goes along the following lines. First, for a reasonably large game, you’re going to have some dozens of such Game World Servers (“industry standard” number for simulation-and-alike games is around 100 players/core or 1000 players/server box, so to run mere 50’000 simultaneous players, you’re likely to need 50 of such 2S/1U servers). Second, when you have a dozen of servers – chances of one of them failing go pretty high (with typical MTBFs for 2S/1U boxes going at about 2 years, for 50 of them we’ll be speaking about one of them breaking every two weeks or so, ouch! Moreover – even if we use more reliable 4S/4U with MTBFs of 5 years – we’ll still be looking at crashes every month). Third – as crashing every two weeks and even every month is usually unacceptable – we’ll need to have fault tolerance (or at least some kind of fault mitigation – more on it in Vol. III) anyway. Fourth – as we’re going to build some kind of fault tolerance/mitigation anyway – we need the boxes with the best price/performance ratio, i.e. paying for better reliability doesn’t make too much sense. Fifth – as 2S/1U boxes tend to provide the best price/performance – they are the one commonly used for Game World Servers.

BTW, RAIDs, while still being a good thing for such boxes (at least if you’re using fault mitigation rather than full-scale fault tolerance) – do NOT usually require a hardware RAID: a software RAID will do nicely.

Popular examples of 2S/1U servers include:

HP Proliant DL160 (my personal favorite in this class), DL360
Dell Poweredge R430, R630
Lenovo (former IBM) X3550
SuperMicro – 1028, 6018, etc.

⁵ NB: as it was discussed in Vol. III’s chapter on Fault Tolerance – for quite a few game-like systems out there, it is not necessary to have ALL your servers as fault-tolerant (very shortly – doing so is complicated, and most of cheaper redundancy systems actually reduce MTBFs because such redundancy system themselves act as SPOFs, and happen to be pretty poorly designed and tested as well).

Caveat Emptor (=”Buyer Beware”)

These days, there are ISPs out there who will readily sell you dedicated servers at really bargain prices (as low as $5/dedicated-server/month). However – whenever you’re comparing pricing – make sure that you’re comparing apples-to-apples. All the performance numbers for servers within this chapter (and within this book in general), are assuming that you’re using reasonably-recent Xeon CPUs; however – these days you can be easily sold an access to Raspberry Pi marketed as a “dedicated server”, and anything-else-in-between. Whenever you’re using something but Xeon, keep in mind that:

“Intel Atoms can take as much as 5x performance hit per core compared to XeonsIntel Atoms can take as much as 5x performance hit per core compared to Xeons [YCombinator].
Low-voltage Intel x64 (LXXXX series) can be easily 2x-3x slower per-core than Xeons.
ARM can be 10x slower per-core than Xeons.

Oh, and BTW – keep in mind that in spite of all the scalability, per-core performance does matter. In other words – if you have your game running fine on 10 Xeon cores – it doesn’t necessarily mean that it will run on 100 ARM cores.⁶

⁶ there is a minimum per-core performance which is needed for scalable program to be able to scale (in an extreme example – a million of boxes each capable of doing 1000 cycles/second each, won’t cut it) – and quite a few games are written in a way that 10x per-core performance degradation can kill all their scalability.

On Blade Servers

Blade servers is an even-more-packed version of 2S/1U boxes. As a rule of thumb – “blade” server consists of 6U/7U/10U “enclosure” – and this enclosure is filled with vertically-oriented “blades”; the whole point of “blades” is that while they look as separate from outside – they reuse stuff such as power, cooling, Ethernet switch, etc. from chassis, which allows to achieve even higher physical density within racks (usually – you can obtain about 2x more CPUs in the same rack space using blades).

In theory – a 2S blade should be “almost-indistinguishable” from traditional 2S/1U server box; however – in practice, I’ve seen blades suffering from quite severe problems (ranging from reliability issues to substantial-packet-loss-within-blade-chassis).⁷ As a result – when having a choice (and unless price difference is significant), I’d rather not use blade-based servers at all.

⁷ granted, it might have been teething problems – but as they were lasting for years, I am not sure whether these problems are really gone.

In-Between/Storage Servers – 2S/2U

Close cousins of 2S/1U servers are 2S/2U servers. While, sitting half a globe away, you won’t see much difference from the server-being-twice-thicker – it makes difference for ISPs, as the space in datacenters costs money. As a result – 2S/2U boxes tend to cost more that 2S/1U ones (though YMMV very easily, so if you’re offered 2S/2U box at a price of 2S/1U one – there is nothing bad/suspicious about it; 4S/4U boxes are a different story – and they are almost-universally more expensive).

Other than being (obviously) larger and (probably) more expensive – 2S/2U servers tend to exhibit the following properties compared to 2S/1U ones:

More HDDs/SSDs can be fit into 2U chassis (up to 24 SSDs/HDDs bays).
As a rule of thumb – 2S/2U boxes from the same vendor tend to have somewhat better MTBFs (arranging for air flows in 1U boxes is atrociously difficult – which tends to cause substantial local overheating, and as a result, more problems in the long run).

Overall, there are several potential uses for 2S/2U servers in game-like environments:

As the expendable servers (replacing 2S/1U ones) – but keep pricing in mind.
As entry-level OLTP DB servers. This is facilitated by better storage – and somewhat better MTBFs – compared to 2S/1U servers. For more discussion on DB servers – see below.
“With 24 bays – we can easily fit up to 48 terabytes per such a boxAs DB Servers for all-purposes-except-OLTP (and as it was discussed in Vol. VI’s chapter on Databases – there can be lots of these: from reporting replicas, and all the way to archives and analytical NoSQL DBs). With 24 bays – we can easily fit up to 48 terabytes per such a box (not accounting for RAID redundancy).
As servers to provide NAS storage (for example, for archive DBs, maybe some huge analytical stuff, etc.).

Popular examples of 2S/2U servers include:

HP Proliant DL180, DL380
Dell Poweredge R530, R730
Lenovo (former IBM) X3650
SuperMicro – 2027, 6027, etc.

Mission-Critical Servers – 4S/4U

“in quite a few cases (especially for games), it happens that a single mission-critical server box happens to be more reliable (i.e. fails less frequently) than a supposedly-redundant system; this is common whenever redundancy system becomes a source of faults itself – and it occurs all the time if some kind of drivers are involved in implementing redundancy.As it was discussed in Vol. III’s Chapter on Fault Tolerance – in quite a few cases (especially for games), it happens that a single mission-critical server box happens to be more reliable (i.e. fails less frequently) than a supposedly-redundant system; this is common when redundancy system becomes a source of faults itself – and it occurs all the time if some kind of drivers are involved in implementing redundancy.

As a result, IF you can afford failures once-per-5-years-or-so – it often happens that a single OLTP DB server (with an ongoing backup, of course – we’ll discuss ongoing backups a bit later) is your best option. In such cases – it is usually not a problem to pay a bit more for such a single server in exchange to better MTBFs – which leads us to more expensive 4S/4U boxes. Moreover, such a 4S/4U box (if you’re following guidelines discussed in Vol. V’s chapter on Databases) can easily handle 10B+ transactions per year ⁸– and this is a Damn Lot (~=”unless you’re running post-2005 NASDAQ, it should be enough”).

As a result – I am often arguing for using 4S/4U server boxes for mission-critical servers (such as OLTP DB servers). As a rule of thumb, 4S/4U servers have the following differences from 2S/2U ones:

4 CPUs instead of 2
Even better reliability (here, if it is a product from a major vendor,⁹ we’re speaking about 5-7 years of MTBF)
Significantly larger RAM (12TB in one single server box, anyone?)
Surprisingly, number of HDDs/SSDs bays is not necessarily larger (these boxes are usually not optimized for storage, but rather for expansion with DAS etc.); still, we normally have 10-24 bays onboard.
- BTW, for an OLTP DB Server – you need at least four of HDDs/SSDs (better – 6-10, more on it in [[TODO]] section below)
Hardware RAID controller becomes a kinda-must in this class.
- As a rule of thumb, you DO need BBWC hardware RAID for OLTP DB servers – we’ll discuss why software one won’t do, a bit later in this Chapter.
Hot-replaceable fans, hot-replaceable power sources, hot-replaceable HDDs/SSDs are the must too. So is S.M.A.R.T. with pre-failure detection of HDDs/SSDs.

Popular examples of 4S/4U servers include:

HP Proliant DL580 (my personal favorite in this class; however – just 10 HDD bays in some cases can be a limitation compared to R930; also – finding DL580 for rent is going to be a challenge)
Dell Poweredge R930
Lenovo (former IBM) X3850
SuperMicro – 8048, etc.

⁸ actually – I strongly suspect that such a box can handle up to 100B transactions/year, but I didn’t see 100B with my own eyes

⁹ This seems to stand for HP, IBM, and Dell, but I am not sure if it stands for SuperMicro

On Storage: Internal/DAS/SAN/NAS

In general – if one of your DB Servers doesn’t have enough bays to store your data – you need to go further. With this in mind, the following types of storage are known:

Internal Storage. For a 4S/4U box such as HP DL580, Lenovo x3850, or Dell R930, you can expect anywhere from 10 to 24 bays.
DAS (Direct Attached Storage). DAS is just a separate box with HDDs, connected to the main server (usually – using SCSI, SAS, or something similar). For all intents and purposes (except for physical location of HDDs) – it is the same as internal storage (pretty much the same latencies, the same way to manage it, etc., etc.) The only problem with DAS is that they’re rarely available for rent from ISPs – so they’re not really available for most of our environments <sad-face />
“For larger organizations, SANs are traditionally THE way to organize Server-Side storage.SAN (Storage Area Network). For larger organizations, SANs are traditionally THE way to organize Server-Side storage. Essentially – it is a kinda “storage as a service” (which was used for ages before “cloud” buzzword was invented), which is able to store LOTS of data, and is providing virtual hard drives (as block devces) for your servers. Connection from server to SAN is usually done via optical fiber.
- If you can use it – SAN is indeed the best way to store your data – with one small-but-very-important exception: it is OLTP DB Server. For OLTP DB Servers, disk access latencies are often absolutely critical (for more discussion – see Vol.VI’s chapter on Databases) – and all-SAN-implementations-I-know, lose to internal storage/DAS+BBWC RAID in this regard (SAN controllers I’ve seen, don’t have their own battery-backed storage, so they cannot acknowledge write(ing?) without going over the SAN, which takes time; if we’re speaking about fiber-optics-based SAN – it is usually in the range of additional few hundred microseconds, which is not fatal for non-OLTP, but can easily hurt your OLTP DB significantly).
- From our renting perspective – SAN-systems-as-a-whole are usually not available for rent. However, your hosting ISP may provide you with access to logical volumes within their own SAN. Unfortunately, most of ISPs/CSPs tend to provide SANs which (while being SANs – i.e. operating at block level) are working very poorly from latency perspective (for example, [https://www.datadoghq.com/blog/aws-ebs-latency-and-iops-the-surprising-truth/] reports 100+ms latency spikes – no, thanks; just to compare – with internal/DAS storage with BBWC RAID, we’re speaking about ~100 microseconds – and this is whopping 3 orders of magnitude difference!)
NAS (Network Attached Storage). NAS is quite similar to SAN, but operates at a different level of abstraction. Very shortly – NAS can be seen as a server box with lots of hard drives, sharing its file system using SMB or NFS¹⁰ – usually over IP-over-Ethernet. Formally – the difference between SAN and NAS is that SAN operates at block level (and looks to the client OS as a block device/disk), while NAS operates at file system level (and looks as a file system). In practice, however – while SAN might or might not have high latencies such as 10+ms – NAS do have high latencies almost for sure.
- From our renting perspective – we might want to use NAS-provided-by-ISP/CSP, or we can build our own one. Just take one of those 2S/2U server boxes, fill it to the brim with 2T HDDs – and run Linux on top of it, sharing this storage to the rest of your “trusted” VLAN.

¹⁰ or one of a dozen of other NAS protocols

On Disk Latencies, BBWC RAID, and NVMe

“any heavily-writing DB is going to be quite sensitive to the latency of writing to the database logAs it was discussed in Vol. VI’s chapter on Databases, when it comes to OLTP DB (and especially if we’re using it with a single modifying DB connection – which I am arguing for) – they tend to be very sensitive to disk write latencies; more generally – any heavily-writing DB is going to be quite sensitive to the latency of writing to the database log (very shortly – until the disk has confirmed that log data for transaction is committed – DB cannot consider transaction completed, as it would violate Durability guarantees).

If we’re using usual HDDs – writing latencies are going to be in 10-20ms range even for fastest HDDs. For SSDs, latencies are generally better – though, depending on specifics, can still get into single-digit millisecond range.

Ideally, latency-wise, the request originating from DBMS, should get via PCIe to the controller card – and it should be terminated on the PCIe card itself, without performing any long operations or going beyond the card. At this point, I know of two technologies which allow for this kind of latencies:

Hardware RAID card with Battery-Backed Write Cache (BBWC). The point here is that such a card has an onboard write cache, and moreover – this write cache has a battery which allows it to survive power outage (usually – for 72 hours or so). In turn, it means that such RAID-with-BBWC card doesn’t really need to wait until the data is written to HDD (or SSD) sitting behind the RAID. Instead – RAID-with-BBWC can acknowledge write as soon as it got the data from PCIe and wrote the data to its onboard write cache (which is damn fast – we’re only speaking about dynamic memory write here); moreover – all the Durability guarantees still stand(!). BTW, in case of RAID-with-BBWC, writing latencies do not depend on the nature of the disks-used-to-implement-RAID¹¹ – so for write latencies it doesn’t really matter whether you’re using HDD or SSD behind your RAID-with-BBWC.
Much more recent NVMe.

“It is interesting to note that when speaking about write latencies of ages-old RAID-with-BBWC vs much-newer NVMe - we'll see that from application perspective, they will be in the same ballpark range (hundreds of microseconds)It is interesting to note that when speaking about write latencies of ages-old RAID-with-BBWC vs much-newer NVMe – we’ll see that from application perspective (i.e. counting from app-sending-request to app-receiving-reply), they will be in the same ballpark range (hundreds of microseconds); this is related to an observation that both RAID-with-BBWC and NVMe can terminate write request right there on PCIe card. While NVMe will still outperform RAID-with-BBWC-plus-HDDs for reading, for writing they will be pretty much the same.

As a result – for OLTP DB log writing (and this is exactly where latencies are critical for OLTP DBs) – there isn’t that much difference between RAID-with-BBWC and NVMe. This is a relief as most of server-boxes-from-major-vendors do have RAID-with-BBWC for ages (while finding server-box-with-NVMe for rent is still not easy <sad-face />).

BTW, just to compare latencies for cloud storage: data from [Lê-Quôc] shows that for cloud storage, we’re speaking about latency spikes of up to 100ms easily – and this is about 300x of what-we-can-expect from a reasonable non-cloud-based solution (NB: acknowledging this, some cloud vendors have started to sell servers with internal storage in addition to cloud storage; for such boxes-with-local-storage – things are going to be better than for cloud-based storage, though to get an idea of real latencies for such boxes-with-pretty-much-unknown-hardware, you’ll usually need to measure it yourself).

¹¹ though read latencies are still better for SSDs

Choosing Storage

The task of choosing optimal storage for your servers is quite complicated and non-obvious; however – as a very rough, very first approximation, the following guidelines might help:

For your Game World Servers and Matchmaking Server – 2 internal HDDs (with a software RAID on top of them) will usually do; in cloud – it is still better to use internal storage if available, but even a high-latency virtualized one will work (in the latter case, be very careful, however, NOT to have any swapping – it is usually better to disable swap file altogether if you’re using slow virtualized storage)
For your OLTP DB – I am arguing for internal storage only – and with BBWC RAID card too (NVMe is also an option – but it is still much more difficult to find for rent). Whether to use 2S/2U box or 4S/4U one – is pretty much up to your wallet.
- As for number of HDDs/SSDs – from what I’ve seen, 10 bays is perfectly enough for an OLTP DB. As it was discussed in Vol. VI’s chapter on Databases – high-performance OLTP DB needs to be truncated (removing older data), and actually should fit into RAM for performance, so it is normally limited to single-digit terabytes (I’d even say less-than-1T), and even 10 bays will provide it without any problems.
For your reporting replica DBs – 2S/2U boxes with internal storage will usually do. Fiber-based SAN would also work – but I don’t know of ISPs/CSPs which provide fiber-based SANs for rent <sad-face />
“BTW, your own NAS (on top of 2S/2U boxes) can be usually organized with less latency than ISP/CSP-provided one (frankly speaking, 100ms latency is atrocious even for NAS; if using BBWC RAID+SSD for your own NAS box – it is usually possible to obtain ~2-5ms latencies quite easily).For analytical DBs – it depends; if they’re really huge – you might need to go for NAS, though it might easily cause significant performance hits due to much higher latencies. BTW, your own NAS (on top of 2S/2U boxes) can be usually organized with less latency than ISP/CSP-provided one (frankly speaking, 100ms latency is atrocious even for NAS; if using BBWC RAID+SSD for your own NAS box – it is usually possible to obtain ~2-5ms latencies quite easily).
For archive DBs – latencies is rarely an issue, so NAS is usually perfectly fine (either provided by ISP/CSP, or your own one).

On Vendors

For serious servers (I am not speaking about Raspberry Pi here), traditionally there are four big vendors: HP, DELL, IBM/Lenovo, and SuperMicro.

When speaking about expendable 2S/1U or 2S/2U server boxes – IMO there isn’t much difference which ones to use. In other words, if ISP-you-like provides you with one of these four – as a rule of thumb, you should be fine. Still, if more than one is available, and pricing is about the same, here is my personal list in the order of preference (once again, this is for expendable servers):

HP
Lenovo/IBM or Dell
SuperMicro

On the other hand – if we’re speaking about mission-critical servers (such as DB Server) – I would probably try to avoid SuperMicro altogether; while I am not saying that their servers are bad – I still think that they’re not mature enough to rely on them for mission-critical stuff. In other words – if the only option provided by your ISP for mission-critical servers is SuperMicro – I would consider it as a negative-to-be-taken-into-account (and whether this negative can be countered by other virtues of your ISP – depends a LOT on your specifics). Among remaining three vendors – my current personal order of preference for mission-critical servers goes as follows:

HP¹²
Lenovo/IBM
Dell

What’s more important, however – is to

make sure that you DO know what the vendor of those-servers-rented-to-you is, AND that the vendor is one of the Big Four mentioned above.

While it might be possible to find decent servers from the other vendors – it is IMO too high-risk (and MOG deployment already has enough risks to deal with). Moreover – when going outside of the Big Four – you’ll likely live without Out-of-Band Management – and this is not a picnic at all (especially when you need to fix a critical problem on a Friday night – and “remote hands” are not available until Monday).

¹² no, I am not paid by HP – it is just that I had pretty good experience with their servers (their desktops/laptops/printers are a completely different story though)

[[TODO: web/mail server]]

[[TODO: choosing ISP]]

[[To Be Continued…

This concludes beta Chapter 23(c) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”.

Stay tuned for beta Chapter 23(d), where we’ll discuss certain issues related to configuration of your (probably-rented) hardware]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.