Not sure how relevant the RAID5 discussion would be. There are dramatic improvements in IO happening right now after a long period of stagnation. Given the shift toward SSD and NVME...
THANK YOU for bringing this point up! :-). Added it to the ToC.
I've double-checked it just to be sure. When speaking about SSDs, manufacturers still usually mean "just like HDD (i.e. with SAS or similar interface) but better"; as such, SSDs are usually meant to be used "just like HDDs" (i.e. within RAIDs), and they don't seem to provide reliability guarantees sufficient to run SSD without RAID. On the other hand, at least some of enterprise-level NVMe devices seem to provide these guarantees (I will need to dig the proof up or to request manufacturers to be 100% sure), so it may be possible to use them without RAIDs. On the third hand ;-), enterprise-level SSDs and NVMes still cost around 5x-10x more than an HDD of equivalent size, which may make a big difference. On the fourth hand, I am not sure what will provide smaller latency - BBWC RAID or NVME (both are sitting on PCIe and are terminating right there on PCIe board, so they should be comparable, but further differences may exist, depending on specific implementations). So, RAIDs still have their place (and probably will have for a while - at least for archive/analytical data); as for OLTP loads - they MAY be phased out, but I expect BBWC RAIDs to be a viable option for a long while.
Smaller MMOs (especially those starting development now) may be able to get away without the complexity of distributed sharding at all. Scale-Up is a lot more sensible than it used to be.
Will think about it (and you're certainly right about scale-up), but I see one problem: can you think of an MMO which is not trying to conquer The Whole World? In other words: sure, there are small MMOs, but will any of them commit to being small at early stages of development? Also note that as soon as you can have your DB in one place, horizontal scaling of everything else is not a rocket science, and given reasonable architecture, usually comes more or less naturally. Scaling DB is a different story, but DB-scaling belongs to Part D and will come with a big banner on top: ONLY DO IT IF YOU NEED IT ;-). In short - I wouldn't recommend to try keeping everything on one server (costs are low, risks are high), but keeping whole DB on one server is perfectly possible until you're Really Huge (and was possible like 10 years ago too).
Perhaps a topic on asset delivery and CDNs?
I was going to mention it somewhere, but apparently forgot :-( . So - I've added it and THANK YOU! :-)
Planning for game exploits (eg. duping, soft/hard virtual currency or payment hacks) with the ability to roll-back the world (all or selectively) to a prior point in time?
Rolling back the world (beyond usual DB log rollback on restart) is one Really Tricky Thing. From my experience, such rollbacks are always gameplay-level, always part-of-the-world, always rely heavily on specifics of the game, and are mostly a business decision, so at this point I don't really see how to generalize it for just "some undefined game out there". If you have any ideas how to describe it without relying on gameplay specifics - please share, it will really be a Good Thing to describe in such a book...
Also, what EVE online does -- stuff the whole thing into RAM + Hybrid RAM (2 RAMSAN400 and 2 RAMSAN500 units).
I've seen systems-which-stuff-everything-in-RAM and didn't like them (though maybe it is not inherent, but was about poor implementations, I didn't really think about the reasons behind).First, reporting/analytics/... is a Big Problem; second, there were HUUUGE issues related to the recovery from certain class software bugs (in short - in some cases those guys weren't able to restart the system-which-worked-for-many-days until a bug in some minor module is fixed; Big Ouch).
(read)-CACHING of all the data in RAM is a completely different story though :-)
BTW, your link about EVE online seems to say that they have more conventional architecture - with conventional RDBMS for persistence (which in turn uses RAMSANs instead of HDDs, but apparently it doesn't change much - I've seen about the same DB transaction numbers on a classical BBWC RAID with usual HDDs).
Ah, yes - the DB is ram based - I got a little ahead of myself.
Interesting that you've seen similar numbers on a classical BBWC RAID with normal HDDs. Maybe you could mention that? IIRC, one of the big differences about eve was their single, huge, ram-based DB and everyone was "in the same world".
Interesting that you've seen similar numbers on a classical BBWC RAID with normal HDDs. Maybe you could mention that?
You mean mention in the book? Yes, this will be there (most likely in several places). The key point will be that you can (and generally should) keep your DB on one single server at least until you're DAMN HUGE (and then it is still possible to scale out, though it is quite an endeavour). This point won't change much with hardware improving (just definition of DAMN HUGE will change ;-) ).
IIRC, one of the big differences about eve was their single, huge, ram-based DB and everyone was "in the same world".
I've seen at least one system-other-than-EVE with a single, huge RAM-based DB, so I'm not sure whether they're that unique technically :-). As I've said above, I didn't really like that system (maybe it could be improved, but IMHO the whole thing was inadequate at least for that specific game).
Here is some newer stuff they're doing:
http://community.eveonline.com/news/dev-blogs/tranquility-tech-3/
oh interesting. Thanks.
e1/ oh...my....god. They are nuts!
Just a few comments.
1) For the C++, please use C++11/14 in a modern style and not C with Classes. C++ can be written elegantly.
2) In talking about endianness, please mention that practically all the major CPU's support little endian, and there is an advantage is making your serialization and communication little endian as you will avoid unnecessary conversions.
3) For secure hashing and checksums, mention Blake2 https://blake2.net/ which is faster than the md5 and sha1/2, and also customizable in that it can produce hashes with lengths from 1 byte to 64 bytes.
1) For the C++, please use C++11/14 in a modern style and not C with Classes. C++ can be written elegantly.
As the book is not about C++, but rather will use C++ for examples, I don't want to be too C++-specific outside of special Chapter dedicated to C++. Within that chapter - yes, it will go beyond C with classes (in particular, I'm planning to cover STL to some extent), but don't expect it to be a book on "how to use all the stuff included in C++11". In general, I'm a big fan of KISS principle and don't like using stuff just for the sake of using it, and in simple examples there isn't likely to be a case for most of C++11+ features.
2) In talking about endianness, please mention that practically all the major CPU's support little endian, and there is an advantage is making your serialization and communication little endian as you will avoid unnecessary conversions.
Yes, something more-or-less along these lines is planned (though more emphasis will be still on importance of clean well-defined APIs for marshalling, then on implementation details such as endianness). If I forget to mention it in the beta version of an appropriate chapter - feel free to remind me :-).
3) For secure hashing and checksums, mention Blake2 https://blake2.net/ which is faster than the md5 and sha1/2, and also customizable in that it can produce hashes with lengths from 1 byte to 64 bytes.
I don't think that this book is a good place to discuss subtle differences between security algorithms. In practice, there will be very little difference between secure hashes (sans MD5, which is broken pretty badly), and observable performance differences (when taking into account that secure hashing takes only an extremely little fraction of the CPU time on both client and server) will be negligible too. I am not arguing against Blake2, but saying that this book is a wrong media for arguments on cryptography as such; applied crypto is a different matter though.
Basically, if mentioning Blake2, I will need to explain why Blake2 is so much better than the other 3 finalists-but-not-winners of NIST competition, which is very far beyond the scope of this book (not to mention that I myself have no firm opinion about it).
In the math behind array redundancy, a "drive failure" is (often) considered as an instantaneous event. The whole disk goes. This isn't how hard drives fail, just how you hear about people losing data. The way that hard drives degrade is that blocks go bad.
Sometimes they fail the way you describe it, but sometimes they fail in a very different manner. From my experience, PFA/SMART can predict failures only in about 30% of HDD failure cases, in all other cases it is "sudden death" (I have no idea why, but suspect mechanical failures, and those bearings working @7500 rpm are Big Fat SPOFs, there is no doubt about it). But even if PFA would be able to predict 95% of all the failures, it would still mean that for anything-which-has-potential-monetary-implications you'd better have RAID.
P.S. With SSDs, situation is potentially different (there are no mechanical failures with Big Fat Mechanical SPOFs).
P.P.S. BTW, for most applications I'm not arguing for RAID-5, indeed preferring RAID-1/RAID-10 (as RAID0-over-RAID1) instead. However, RAID-5s tend to work pretty well for archive kind of data (and are quite a bit cheaper than RAID-10s).
What about RAID-6?
RAID-6, if implemented properly, is somewhat better for archive data than RAID-5 (formally - it has smaller vulnerability window); however, there is still some (though minor) risk of running into implementation problems with RAID-6 (I've seen it myself a few years ago with a major enterprise manufacturer), so it is still not that black-and-white.
I think the most pertinent articles to get available on the website would be:
Chapter XV. Random Number Generation
What is a Good RNG
Marsaglia/NIST tests
PRNG
Hardware RNG
Hardware-assisted (P)RNG
RNG-Critical Environments
Beyond Bit-Stream: Basic Chi-Square Analysis
Separating Game RNG from Transport-Security RNG
and in a lesser extent:
Chapter III. Yes, It Is Going to Be Client-Server
Assertive hare:“Yes, It Is Going to Be Client-ServerOn Client-Server Scalability
On P2P
I think there is a real online gap for some pertinent data in a good form factor/density.
Where do I sign up for an announcement when it releases?
"Beta" chapters will be published more or less on weekly basis, and posted to /r/programming too. Final release is going to take a while (like mid-2017), see http://ithare.com/book-beta-testing-development-and-deployment-of-massively-multiplayer-games-from-social-games-to-mmofps-with-stock-exchanges-in-between for details.
Saw this earlier today, skimmed the chapter names now. It looks like there is a lot of good content in there and I would probably enjoy some of it.
It seems you spelled Heresy wrong in the subsection -> Herecy: Security by Obscurity is NOT Necessarily Bad.
Yep, fixed now, thanks
Just realized something reading chapter 1 - footnote 1; which applies more to the ToC than Chapter 1, is that one thing in particular for UDP, is that sometimes there is the need for STUN, et. al (and even then, it's not always enough).
You're right; "guaranteed" NAT traversing for TCP connections to the well-known IP of the server is one of the reasons why I'm usually saying "if you can get away with TCP - do it" ;-). I've added "NAT Traversing" to Chapter X. THANK YOU!
Regarding logging and the ancillary crap surrounding it...
Please don't skip over the importance of being able to tell when an event happened, to whom and who else was close. I worked on an online poker game that was able to have thousands of people playing at once. (not all at the same table) but the logging was so limited we could not:
Catch people breaking the TOS (multiple games from the same account)
Proving cheating was hard. The game was plagued with cheaters
Finding bots was nearly impossible
Assessing an account for refunds where bugs were found was nearly impossible
The game was developed very quickly and was not well thought out and full of bugs (Sound familiar?). If we had made logging a religion from the beginning, we could have mitigated a lot of the tech support costs and problems later.
Of course, logging is important, though for this kind of stuff I would certainly suggest to have something better-than-logs :-). IMNSHO, most of this information belongs not even to the plain-text logs, but to the "audit trail" in the database, so CSRs can access it easily. Most of the work on catching cheaters should be done without developers (and interpreting logs usually requires developers).
So, I see it as two quite separate things: 1. plain-text logs for developers; 2. "audit trail" in DB for CSRs (senior CSRs, security/anti-cheating teams/...).
NB: if you make (useful) comments here on Reddit, you can get a free e-copy when the book is published, details are here: http://ithare.com/book-beta-testing-development-and-deployment-of-massively-multiplayer-games-from-social-games-to-mmofps-with-stock-exchanges-in-between/
P.S. Hope this comment doesn't go against reddiquette, if it does - let me know.
It looks great! A suggestion w.r.t your book-in-progress.
In the TOC, under 2D graphics, I noticed Double-Buffering, but not Triple-Buffering, v-sync, or adaptive-sync. (which are all ways to avoid partial screen updates / screen-tearing). Maybe you had planned on it already, but if not -- it might be worth it adding.
I should admit that I'm a fan of double-buffering for 2D (compared to others, it can provide very good results with minimal requirements to hardware support; in fact, in most cases of 2D-graphics-with-only-small-sprites-moving it works without visual artifacts even if only simple BitBlt() is used). OTOH, alternatives do need to be mentioned in the book, so THANK YOU!
Looks very, very good. A little hard to offer suggestions based on the ToC alone, but some initial thoughts...
Not sure how relevant the RAID5 discussion would be. There are dramatic improvements in IO happening right now after a long period of stagnation. Given the shift toward SSD and NVME (and even 3DXpoint supposedly next year) and how quickly the prices are dropping. RAM density is also going up, with multiple TB in a single server now. Even 10Gig NICs are dead cheap.
Smaller MMOs (especially those starting development now) may be able to get away without the complexity of distributed sharding at all. Scale-Up is a lot more sensible than it used to be.
Perhaps a topic on asset delivery and CDNs?
Planning for game exploits (eg. duping, soft/hard virtual currency or payment hacks) with the ability to roll-back the world (all or selectively) to a prior point in time?