Random Number Generation - IT Hare on Soft.ware

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[[This is Chapter 19 from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

Random number generation is often seen as trivial by game developers (“hey, just call rand() and you’re done!”). Well, it is not really trivial. It is quite easy to make a very silly mistake, which can go unnoticed for years, and when you learn about it – the Pretty Bad Damage has already been done to your game/company/… 🙁 .

Psychological aspects of RNG

One important thing to keep in mind is that humans tend to have very selective memory when it comes to good and bad occurrences. Which translates into:

even if your RNG is statistically perfect, people will still complain 🙁

As Jeffrey Kaplan from Blizzard has put it: “We found that this had a lot of problems where players would run into streaks, and they only remembered the shitty streaks. ” [ShackNews.OnBlizzard].

“Some years ago, I made an interesting exercise and found that I myself was guilty of it too.Some years ago, I made an interesting exercise and found that I myself was guilty of it too. When I was playing Civ IV and my stronger unit got killed by a much weaker unit by AI opponent, I (as probably most of the people out there) was complaining about the whole random thing being rigged in favor of AI. Intuitively, I was perfectly sure that the thing is extremely unfair. However, I started to write down the stats of the occurrences. And found that while they didn’t pass certain spectral tests, overall averages were within expectations. In other words, while the RNG wasn’t rigged as such¹ (though spectral correlations MIGHT have been somewhat unfair given the play patterns of me and AI), it certainly did feel unfair.

This might seem as an excuse to use poor RNGs, but it is not. Because

If your RNG is flawed, people will complain even more 🙁

Even more importantly, if your RNG is flawed and it is important enough for players, they may gather stats proving that you are not doing a good job. Also, you will lose an option to prove them that your RNG is good (which is a separate challenge, see discussion in “Dealing with Perception Problems” section below).

¹ Of course, all Civ games at levels above, IIRC, Noble, are heavily rigged towards AI, but that’s by design and is well-known

RNG Alternatives

Some people in the industry have argued removing an element of luck altogether. Examples of such techniques to avoid removing randomicity include Blizzard’s “progressive percentages” [ShackNews.OnBlizzard] and Prismata’s concept of “DeckHand” [Grant].

However, it should be noted that avoiding randomicity is certainly not a silver bullet and has its own drawbacks. As Jeffrey Kaplan has put it: “The problem was… we found that while it was great that it got rid of the bad streaks, it also got rid of all the good streaks”. “My job here is to tell how to implement RNG properly if you decided that you do need randomicityAnd Prismata, while avoiding randomicity after the game is started, still relies on random choices to start the game with (which means RNG, and also means player complaints [Grant]).

To summarise – whether you need an RNG, is a gameplay choice, and in this book I’m trying to avoid arguing pro or contra specific gameplay choices. My job here is to tell how to implement RNG properly if you decided that you do need randomicity (and for most of the games out there you will, believe me 😉 ).

Inherent Difficulties with Finding RNG Faults

Before we start discussing RNGs, let’s make one unpleasant observation:

Faulty RNG can easily sit there for years without you noticing it 🙁

Faulty RNG may easily produce a bit stream which looks good enough at the first glance (“hey, it certainly looks as a random hex string every time I look at it”), and deficiencies may surface only after serious tool-based analysis as described below.

On the first glance, such well-hidden failures may look as a non-issue, but it is not the case, because such failures can be (a) exploited and (b) proven after the fact.

For example, your players may notice the problem, will start writing about it (and even may start exploiting it – we’ll see an example of it later). Or, some day somebody may gather stats and prove that all the time your RNG wasn’t fair, and was favouring some playing style. Effects of these things surely depend on specifics of your game, but for some games it can easily be devastating, as we’ll see below.

Three RNG Spectacular Failures

“Just to describe what kind of problems a poorly implemented RNG can easily cause, I will provide three real-world examplesJust to describe what kind of problems a poorly implemented RNG can easily cause, I will provide three real-world examples. The first two are not directly related to games, but they are still very characteristic ones to show the scale of potential problems caused by poor RNGs. The third example is a real-world horror story of the game falling victim of its RNG.

Two Big Fat Problems with Crypto RNGs

Spectacular Failure #1. In 2006, Debian developer has made a change to RNG initialization code.² The change effectively reduced amount of RNG seed material drastically, which in turn has lead to lots of the keys generated by Debian, being insecure. The problem was not discovered until 2008 (illustrating my point above about “sitting there for years without noticing it”), and has caused a whole lot of security problems worldwide. When your keys generated two years ago become vulnerable all of a sudden, it means that all your supposedly secure communications over last two years can be read by the attacker. Ouch!

Spectacular Failure #2 is a more recent one. In 2013, it has been found that Java class SecureRandom as it was implemented on Android, was not exactly secure and often generated the same “randoms” (much more often than with probability of 2^-n which we should expect). This, in turn, has caused a vulnerability which allows to extract private keys used to sign Bitcoin wallets (see, for example, [StackExchange] for explanation). From non-technical point of view – this bug in RNG allowed to steal all the money in Bitcoin wallets… Double Ouch!

² While I have a rather strong opinion on “who’s the one to blame” in this disaster, I don’t want to go into this discussion here

Numerous Problems in a Real-World Game-Critical RNG

Spectacular Failure #3 is probably the most interesting one for several reasons. First of all, it is not about cryptography and is directly applied to games. In addition, it illustrates several very typical bugs in custom-made random generators.

This spectacular failure is all about the RNG which was used by Planet Poker, the company which used to run real-money(!) games at the time; while it happened around 2000, the bugs made there are still very typical these days. The story is very-well described in nauseating detail in the original article [Cigital], so I will just provide a short list of Big Fat Mistakes Planet Poker gamedevs have made with their RNG:

First and foremost, they used a pseudo-random generator initialized ONLY WITH SYSTEM TIME (they used the one with a millisecond precision).
- Moreover, they were doing it FOR EACH AND EVERY HAND(!!)
  - “attackers were able to know the cards of their opponents at the table (!!!)These two mistakes has made their RNGs vulnerable to a simple attack, demonstrated in [Cigital]. In short – attackers were able to know the cards of their opponents at the table (!!!). The idea behind the attack was simple – as soon as we know when the RNG is seeded, and can synchronise our clock with the server clock, we have just a few hundreds of thousands of potential seeds. Then, when we see three cards on the flop (and have our own 2 cards too), we can run these few hundreds of thousands of potential seeds against the RNG, and to find the seed which would generate this flop and our own cards too. This matching seed defines all the cards of the opponents.
- They were using a pseudo-RNG with a merely 32-bit internal state to produce 52-card decks of cards. However, a properly shuffled 52-card deck can have 52! ~= 2^226 different outcomes. And as pseudo-RNG with a 32-bit state can possibly have only 2^32 different patterns, it means that they would never get a vast majority of the potentially available hands (!)
- They had a flawed shuffling algorithm which has caused a flaw in statistical shuffling distribution (see [Cigital] for explanation)
- And they had an off-by-one error (a silly one, but it has also caused a skew in statistical distribution, see [Cigital]).
- One thing AFAIR not mentioned in [Cigital] but still affecting distribution: when you have N bits, obtaining a random number in range of 0 to M where M is not 2^N -1, via taking a reminder, the random number is inherently skewed (!) – just because the number of states-you’re-converting (which is 2^N), is not exactly even for all the outcomes. Though admittedly it has MUCH milder effect than any of the flaws mentioned above, it is still a flaw.
- “they DID publish their shuffling algorithm without making a closed review of it firstLast but not least – they DID publish their shuffling algorithm without making a closed review of it first. Yes, I know I will be beaten hard for this statement (and yes, unless they’ve published the algorithm, they would still run that flawed software), but TBH – for that company it would be MUCH better not to publish the algorithm at all, then to publish it and get all the embarrassment, loss of players, etc. etc. etc.
  - On the other hand, it IS important to review your RNG algorithms, and MAYBE even to publish them. However, before pushing the button “upload code to your website”, you need to make sure that your RNG is really good. And you won’t be able to check it yourself (even if you’re a security pro) – just because it is YOUR code, it always feels better than it should; in other words, you “know” how it works instead of scrutinizing it. That’s why it is paramount to have a THIRD-party review. Ideally – by a different company specializing in security, though you MIGHT be able to get away with a completely different team within your own company.

BTW, if your game calls for a serious RNG, make sure to read [Cigital] – it provides a Very Good Lesson on “how NOT to implement your RNGs”.

What Qualifies as a Good RNG

“now as we’ve seen what a good RNG is certainly not, we can start thinking about what a good RNG is.Ok, now as we’ve seen what a good RNG is certainly not, we can start thinking about what a good RNG is. Well, unfortunately, you cannot really prove that RNG is good;³ the only thing you can really do, is to prove that RNG is bad.

In short – to perform a black-box test of your RNG, the best thing you can do is to run a series of tests looking for certain patterns within your output.

³ technically, for some of the PRNGs you can kinda-prove it (see, for example, Blum-Blum-Shub generator, for which there is a kinda-proof – that is, there is a strict proof under an assumption that certain math problem is non-solvable in reasonable time, which is itself not proven, and is not likely to be proven). However, even if there would be a really strict proof that certain PRNG is really good, we’ll still be stuck with a question “how to seed this PRNG”, which leads us into an inherently non-provable field.

Testing bit stream – Marsaglia DieHard, NIST, and TestU01

In particular, at some stage most of RNGs out there produce supposedly completely random bit stream one way or another (even if you have an RNG which generates numbers rather than bit stream, it is usually trivial to make it generate bit stream instead – just request integers which are uniformly distributed from 0 to 2^N-1 inclusive).

“You should run BOTH of them if you’re using your RNG in an anywhere serious mannerAs soon as you have your supposedly random bit stream (a looooong one – usually in millions of bits), there is a way to get it tested for certain patterns to be present. In fact, there are two sets of tests which can indicate any problems with your bit stream. Three most useful of these tests are [Marsaglia.Diehard], [NIST], and [TestU01]. You should run AT LEAST TWO of them (better – all three) if you’re using your RNG in an anywhere serious manner (i.e. if it can affect the fairness of your games – or, if we put it in a different perspective – if your players are complaining about your RNG). Note though that interpreting results of these tests can be rather tricky (and may require certain understanding of stuff such as statistically expected outcomes and p-values).

Testing beyond bit stream – Chi Square

As we’ve seen above on the example of Planet Poker, it is very easy to make a mistake in converting your bit stream into the game events (these guys has managed to make four(!) separate mistakes on that way). In turn, it means that

even after your bit stream is good, your game events may still be skewed pretty badly

As a result, for RNG-critical games, I strongly insist on perform testing on game-level data too (i.e. after bit stream is converted to whatever-game-level-events you have – ranging from a shuffled deck to sequences of critical-hit decisions).

The simplest form of such analysis does not require a math degree, and can be done using good ol’ chi square goodness-of-fit test [Wikipedia.Chi.Square.Goodness.of.Fit]. Very briefly, the process, when applied to games and RNGs, usually goes as follows:

We’re gathering stats of our game-level events (ideally – both before production, and while our game is running in production). For example, if, like Planet, we have a game of poker, we can record all the decks dealt.
“We’re making a hypothesis that distribution of our game-level events is as it should be according to rules of our game.We’re making a hypothesis that distribution of our game-level events is as it should be according to rules of our game. For example, we’re making a hypothesis that for the 1^st card dealt, probability of all the 52 cards is the same.⁴ To test this hypothesis, we calculate “how many times each of the cards was dealt as the 1^st card of the deck”. In other words, we’re throwing our statistical events into different “bins”, and calculating the number of occurrences within of each “bin”. These numbers will be close, but almost-never will be exactly the same (neither they should). The role of chi-square test is to find out whether the observed deviations from completely-flat distribution are statistically acceptable or not.
For each card (from aces to deuces), we’re calculating a value of x = (O-E)^2/ E, where O is an observed value of number of occurrences for this card, and E – is an expected value (which, under our hypothesis of flat distribution, is the same for all the cards, and equal to the total-number-of-cards-dealt / 52). Then, we calculate the sum of all x, which becomes the chi-square value for our experiment.
Then we take a look at the table such as [Chi.Square.Critical.Values], and look for a row with an appropriate number of bins “Degrees of freedom” a.k.a. ν (for equal distribution of poker cards, we’ll be looking for 51 “Degrees of freedom”, as a number of bins = 52 minus 1, as we’re calculating one parameter – the average – from the data itself).
Therefore, if we find that for our test of equal card distribution, if our calculated chi-square value is below 38.6 – then “Probability less than the critical value” is less than 10%, i.e. our hypothesis that everything is fine, stands with probability of 90%. And if our calculated chi-square value is above 65.4 – then “Probability less than the critical value” is over 90%, i.e. our hypothesis that everything is fine, stands with probability of less than 10% 🙁 .
- “In science, the “magic number” to challenge hypothesis is accepted to be 5% In science, the “magic number” to challenge hypothesis is accepted to be 5%. In other words, to challenge the hypothesis about equal distribution of the cards, your chi-square value should be above 69.8 in at least one of the bins. However, in practice, as soon as you get close to 10% of your hypothesis being correct (i.e. to 90% “probability less than critical value” number), I strongly suggest to make another experiment analyzing more decks, to see whether you’re getting closer to the dangerous zone or not. If, while adding more decks to your experiment, the chi-square tends to lower – it means that you’ve experienced a fluke in your first experiment, and everything is fine. OTOH, if , while adding more decks to your experiment, the chi-square tends to raise – it can easily indicate a Big Fat Problem with your algorithms 🙁 .

Of course, it is by far NOT the only statistical test which is possible to run, but it is the one which can (and IMO SHOULD) be performed yourself. However, for Really RNG-Critical Environments I strongly suggest to hire a company which will be able to perform a much more sophisticated analysis for you.

⁴ Note that while for the game of Holdem used on Planet, equal-distribution hypothesis should stand for all the pocket cards, for cards-on-the-flop and further, equal distribution hypothesis is inherently invalid, as it becomes conditional on player actions, which can easily skew the distribution at least in some cases(!)

Keep reading for discussion on ways to implement reasonably good RNG

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Pages: 1 2

Comments

Grégory OBANOS says

June 21, 2016 at 11:37 am

Hi, very interesting article !
By the way, footnote anchors on first page are broken because note are on second page.

- "No Bugs" Hare says
  
  June 21, 2016 at 12:46 pm
  
  Fixed now (I hope), thanks for mentioning it!
  
Kevin Conner says

June 21, 2016 at 4:34 pm

This is FANTASTIC! This is what my friend and I said for YEARS!

I would like to add some personal experiences in checking random number generation.

I am a “math dunce” to some degree. But I LOVE playing with numbers, especially random generation. I have found that in addition to checking the end result, if you dissect your random generation into pieces – literally dissect your code, as if you are pulling it apart to view each “organ” – it helps to find small random problems before they become large random problems.

As a result, I always check each stage for random acceptability before checking the end result. This has helped me greatly in tracking down small cracks before they become huge!

I am sharing this article to everyone I know. It feels good to have someone legitimize what you’ve been saying for years.

TheBuzzSaw says

June 21, 2016 at 4:36 pm

“… secure communications over last two years can be read by the attacher.”

Attacker?

- "No Bugs" Hare says
  
  June 22, 2016 at 6:46 am
  
  Oops, thanks, fixed now!
  
ALearningDeveloper says

June 22, 2016 at 2:57 am

Hello,

Great article, much appreciated with the efforts put into it and the whole series, but I have a question about the “Poor Man’s” Crypto PRNG. Below are the steps from my understanding:

1. Seed the PRNG with a random cryptographically-secure key
2. Initialize the counter with a different random (cryptographically secure?) value, or a fixed one
3. Encrypt the counter with AES-ECB using the key in 1
4. Increment the counter (using whatever function that gives you a sequence) and repeat

If I understand correctly, there is no encryption (random number generation?) done with AES-CTR. If that is the case, the name AES-CTR sounds misleading. I would call it AES-ECB with a counter.

or…

I got it wrong, CTR mode is actually used, and the ECB mode in the third bullet is a typo. This is what would happen:
a. Seed the PRNG with a random cryptographically-secure key
b. Initialize the keystream block with the key in (a) and a counter value which can be different random (cryptographically secure?) or fixed
c. Encrypt the keystream block with AES-CTR using the key in (a) and XOR the output with only the counter
d. Increment the counter and repeat

But in either case (or maybe a third case or more), I am puzzled and I could use more explanation. Thank you!

- "No Bugs" Hare says
  
  June 22, 2016 at 7:00 am
  
  Actually, you’ve got it right (and yes, ECB is NOT a typo). However, the very same thing is a description of AES running in a CTR mode, while encrypting plaintext consisting of zero bytes of (potentially) unlimited length. If you take a look at the pics of ECB and CTR in https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation, it should become rather apparent.
  
  I’ve tried to make it more clear now, THANKS!
  
Thor Lancaster says

July 30, 2020 at 7:19 pm

Shouldn’t the “my_random” function have a return type of int, not void?
Doesn’t affect the meaning of the article but it makes my bunny ears twitch spontaneously.

- "No Bugs" Hare says
  
  September 9, 2020 at 7:35 am
  
  Fixed, thanks!