Bot Fighting 202. Time-Based Protection

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[rabbit_ddmog vol=”8″ chap=”Chapter 29(j) from “beta” Volume VIII”]

One wide class of bot-protections which is IMNSHO drastically underestimated (and as a result – grossly underused), is time-based ones.

Let’s separate all time-based protections into two categories: “local” ones (which do NOT involve going to the Server-Side), and “Server-Side” ones.

Local Time-Based Protection

First, let’s see what we can do while staying completely on our Client-Side. Apparently, a lot.

Time-Based Debugging Detection

“Whenever we detect that the time spent within a piece-of-non-blocking-code, is more than a few seconds – then, either the system is hopelessly swapping, or we're being debuggedWhat we can do rather easily with time measurements, is to detect debugging. For example, if we have a piece of non-blocking code, and mark it as such – then, whenever we detect that the time between entering such a piece-of-non-blocking-code and leaving it, is more than a few seconds – then, in any realistic scenario, we can safely say that one of the following occurred:

Either the system is hopelessly swapping
Or we’re being debugged.

If the system is swapping that hopelessly, our game is unplayable anyway, so whatever-we-do-with-it, it won’t realistically change anything (so false positives are not a problem¹). And if we’re being debugged – well, this is what we’re looking for in the first place.

Practically, in C++ it can be implemented as a C++ class ObfNonBlockingCode, with its constructor measuring and storing current time, and destructor measuring the time again, subtracting these two times, and then using this measured time difference. In ithare::obf library [NoBugs2018], measured time difference is used as follows:

Measured time difference is divided by some pre-defined threshold (with the threshold pre-calculated in such a manner, that normally, after the division it becomes zero).² In other words, at this point we have a value which is either zero (indicating that we’re not debugged), or non-zero (which indicates that we ARE being debugged)
- Speaking of thresholds, ithare::obf currently uses something-of-the-order-of-15-seconds; this is MUCH more than usually-experienced-delays-due-to-thread-rescheduling (which are usually within hundreds-of-milliseconds), and on the other hand, is still small enough so that it is rather likely for a human being to spend more-than-that in debugging.
Then, this “should-be-zero-if-not-debugged” value can be used either directly, or as data-to-mix-into-the-obfuscation-as-supposed-zero-constant (so, if our variable is not zero, the data will be corrupted and will likely lead to a crash sooner rather than later).

¹ Make sure to check it with your legal department, because it MAY depend on the-action-you’ll-be-taking (!)

² and of course, as our calculations are very approximate – we can always replace division with a right-shift-for-appropriate-number-of-bits

Ways to Measure Time

Our next question is “how we can measure the time for our obfuscation purposes?” In general, at least for x86/x64, I know of three different approaches:

“TBH, I am not fond of using system-level calls for obfuscation purposes (they are waaaay too obvious in the binary code).System-level calls such as GetTickCount(), QueryPerformanceCounter(), or gettimeofday(). TBH, I am not fond of using system-level calls for obfuscation purposes (they are waaaay too obvious in the binary code). If there is nothing else – they might do, but given any other choice – I’d avoid them (again – for obfuscation purposes).
RDTSC instruction (either using __rdtsc() or inline asm). RDTSC is very lightweight (which is a plus), but on the negative side – it is still rather obvious in binary code.
- As for other risks of using RDTSC, [MSDN] says: “We strongly discourage using the RDTSC or RDTSCP processor instruction to directly query the TSC because you won’t get reliable results on some versions of Windows, across live migrations of virtual machines, and on hardware systems without invariant or tightly synchronized TSCs.” Well, let’s discuss it one-by-one:
  - I have never seen those “some versions of Windows” where TSC wasn’t reliable. Moreover, I have never heard of somebody seeing them. [[TODO: some ACPI-related issues were reported; not that we REALLY care, but it should be possible to gather stats and ignore one-off issues]]
  - Hardware systems causing problems with TSC, did indeed exist at least in the past, but they were limited to multi-socket boxes (in fact, all such occurrences I know about, were related to motherboard failing to synchronize TSC across different sockets). As (a) multi-sockets are extremely unlikely to be encountered on Client-Side, and (b) as inter-socket discrepancies are extremely unlikely to compare to 15 seconds (hey, even if motherboard doesn’t synchronize CPUs-in-different-sockets properly, they will still start within some tens-of-milliseconds) – I don’t see it as a problem.³
  - As for live migrations of virtual machines, for our MOG-related Client-Side stuff, it is very unlikely to happen.
- As a result, ithare::obf still takes its chances with RDTSC – unless a better memory-reading option mentioned below is available <wink />.
Direct reading of system-provided memory. At least under Windows, there is a not-so-obvious way of obtaining time. The thing is that, as noted in [Tim@strafenet.com], GetTickCount() is nothing more than reading from two-fixed-addresses with some basic math afterwards. Perfect – we can do it ourselves without ever going to system library, easily <wink />. A relevant fragment from ithare::obf:

#define ITHARE_OBF_TIME_NOW() ((uint64_t((*(uint32_t*)(0x7FFE'0320)))\
                              *uint64_t((*(uint32_t*)(0x7FFE'0004))))>>0x18)

If, on top of it, we obfuscate those rather-obvious 0x7FFE’XXXX constants (along the lines discussed in “Obfuscating literals” section above) – we’ll get a very non-obvious-in-binary-code way to read current time (and as an option, to have the-program-being-debugged, crash in an extremely non-obvious manner <evil-grin />). Oh, and as a side benefit – even if we’re this obvious as listed above, to the best of my knowledge⁴ ScyllaHide isn’t able to fix it automagically <wink />.

³ Though until it is tested on a million-size player population, there is certainly some risk involved.

⁴ Which isn’t much in this case TBH, so make sure to test my claim yourself if trying to rely on it.

Detecting ScyllaHide

When dealing with such programs as ScyllaHide, it is possible to detect them using timing. In particular, discrepancies between elapsed time as measured by GetTickCount(), RDTSC, and values-read-from-SharedUserData, can be used for detection. Very briefly – whenever ScyllaHide installs timer-based hooks – it can be detected fairly easily. The first line of detection would be comparing those-values-read-from-SharedUserData (and/or RDTSC), with (hooked-by-Scylla) GetTickCount(). But even if Scylla manages to hide from this one – we’ll still be able to detect it using more-heavy-weight techniques which are more typical for VM detection (and described below).

Detecting Being Run under VM

If our Client app is running under virtual machine (which are increasingly commonly used to run bots <sad-face />), we still can use timer to detect it. In particular:

If VM does not virtualize RDTSC, then – at least in case of suspend/resume – we’re likely to see discrepancies between non-virtualized RDTSC and virtualized GetTickCount()/whatever-else. In addition, we’re likely to see spikes in times which RDTSC takes [Ortega].
If VM does virtualize RDTSC, it is even simpler – we’ll see a much more consistent picture of RDTSC-taking-MUCH-longer than it should.

For a detailed discussion on using-RDTSC-to-detect-running-under-VM – see [Ortega]. Just make sure to account for sporadic huuuge delays due to thread context switches which happen right between our measurements (i.e. one single delay – or actually, any-delay-happening-once-in-a-blue-moon – is NOT a sufficient evidence of running under VM, but an average of a dozen of such measurements can easily be).

On ithare::obf: time-based detection of ScyllaHide and VMs is on the list, but is not currently high in priority (read: “it is going to take a looooong while…” <sad-face />).

Remote Time-Based Protection

As we can see, even when running under purely local conditions (and even under VM(!)), time measurements still can help us to detect that we’re being debugged – or that we run under VM. But if we can involve our Server-Side, our possibilities expand further:

First, even simplistic dropping the connection on the player timeout (which we should do anyway – see Vol. IV’s chapter on Communications), will make the hacker’s life much more unpleasant. Indeed – if sitting-within-debugger longer than 15-seconds-or-so, causes you to restart from scratch, it is quite annoying.
In addition, we can have our Server-Side to send challenges to the Client, and measure response times of the Client. Then, we can collect the statistics about the timing of these responses – and then to use these statistics (to raise red flags, or in some cases – even to ban the player outright).⁵
- BTW, the challenges can either come from Server to Client – or from Server all the way to the player (such as captchas). From what I’ve seen and heard – both tend to work pretty well to detect bots <wink />.
- In theory, it can even be generalized to the point when we can guess what is exactly the piece of code which is currently being debugged on the other side <wink />.
“What if we send not just a challenge, but a “challenge which includes some piece of code to be executed on the Client-Side”?Moreover, with Server-Side timing available, we can go even further than simple debugger/VM detection, and get to real-life bot detection. What if we send not just a challenge, but a “challenge which includes some piece of code to be executed on the Client-Side”? This way, decompiling this piece-of-code within the time-necessary-to-reply, becomes perfectly impossible, so we should be able to catch the cheater (either he doesn’t try to block our code and gets caught by the code, or he does block the code and gets caught on the Server-Side) – and without that much hassle…
- This can become your Ultimate Tool for catching cheaters.
- OTOH, such an Ultimate Tool (and any Ultimate Tool) still has to be used very carefully. In particular, the information about kinds of data/system info this code will try reading to calculate the reply, is of very significant value (if the hacker knows an exhaustive list of such data – he’ll be able to avoid modifying it, staying “clean” during the real game session).

[[TODO: describe risks, pitfalls – and ways to mitigate. Very briefly – it falls under the same high-risk category as Client self-updates – and has to be treated accordingly (with LOTS of attention paid to ensure proper handling of signatures). In particular – private-key-used-for-signing, should stay on an air-gapped box, i.e. on a machine which has-never-been-exposed-to-the-Internet(!).]]

⁵ In such cases, it MIGHT be more beneficial to ban the offender right away, rather than to wait for the “ban wave” to come.

[[To Be Continued…

This concludes beta Chapter 29(j) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”.

Stay tuned for Chapter 29(k), where we’ll discuss how my beloved 😉 (Re)Actors tend to help us with anti-cheating]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Phlosioneer says

January 25, 2018 at 6:02 am

A correction and a concern.

First: “OTOH, such an Ultimate Tool (and any Ultimate Tool) still has to be used very In particular, the information about kinds of data/system info this code will try reading”
I think you forgot the word “carefully” after “very”, along with a period.

Second: I don’t like the idea of my game executing code provided by a game server. At best, the server must remain trusted and the connection must remain secure. At worst, they game may have a bug and then run code from an unexpected, untrusted source.

The consequences could be quite dire: almost all of your player base will *immediately and simultaneously* receive and run malware. By the time it’s detected and stopped, if these checks are being run with once-a-minute frequency, the fastest you can receive reports, confirm them, and shut down the server, is ~15 minutes. On average, I’d guess an hour would be more common; and in the worst case, it goes completely undetected for days.

There are some possible solutions to this. The most obvious option (to me) would be to sandbox it somehow. But this can cause bugs if not done properly – game developers are not usually required to understand how to properly sandbox downloaded code.

Another option is to have the code be run in an interpreter; for example, perhaps the game already has a Lua scripting thing. However, then you compromise the two things you wanted in the first place; obfuscation and speed. Either you’re sending a long string of code, which is obfuscated but will have widely varying execution time; or you’re sending a short string of code for consistency, but the operation is no longer sufficiently obfuscated.

A third option is to do some kind of sanitizing algorithm; maybe the code isn’t allowed to have any pointer accesses. But the code that checks for x86-or-whatever instructions that include pointer accesses will be a red flag when statically analyzing the game’s code. Then you run into the same problem you covered in Advocating Obscurity part III (On “Encrypted” Code), where your obfuscation / security thing has one Achilles heel.

The technique should definitely be included. But I think it’s irresponsible to not point out the pitfalls, caveats, and serious security considerations of running arbitrary, *native* code from an online source every minute or so. Especially when you’re calling it the “Ultimate Tool” – the only approach on this page without mention of potential problems.

Comments

Goose says

January 24, 2018 at 7:03 pm

I would add a note that not all players using the game inside a VM are malicious. It’s fairly common for Linux users to have a Windows VM so they can play Windows platform games, but otherwise live in Linux. I’d hate for someone to read this and decide to prevent games from running inside VMs.

- "No Bugs" Hare says
  
  January 25, 2018 at 3:20 am
  
  Short answer – I agree with you (with some rather minor reservations and major elaborations) :-).
  
  Long answer: IIRC, it was already mentioned in previous posts/parts-of-chapter-29 before. In short – while some gamedevs (such as Riot) already prevent their games from running in VMs, I am generally arguing against prohibiting VMs outright. OTOH, players-using-VMs do represent an additional risk cheating-wise, so IMO it qualifies as a red flag – which, in turn, calls for more attention to such players at all levels (usually going beyond purely-Client-Side, but can include more-Server-Side-stats-analysis, a look from the human member of anti-cheating CSR team, etc. – I’ve seen even captchas issued to “red-flagged” users to prevent grinding bots :-)).
  
  On the third hand (yes, we hare do have three hands ;-)), IF we do NOT ban VMs, BUT the player still goes to the great lengths to hide his VM (renaming drivers etc., so that VM can be detected only by timer-based stuff which is next-to-impossible-to-hide-from for an MOG) – chances that he’s malicious, start to go through the roof… Which BTW means that IF you have to hide your VM from Riot-or-some-other-game-which-bans-VMs – it is a bit safer to use separate VM-without-trying-to-hide-it for those-games-which-don’t-ban-VMs-outright.
  
Phlosioneer says

January 25, 2018 at 6:02 am

A correction and a concern.

First: “OTOH, such an Ultimate Tool (and any Ultimate Tool) still has to be used very In particular, the information about kinds of data/system info this code will try reading”
I think you forgot the word “carefully” after “very”, along with a period.

Second: I don’t like the idea of my game executing code provided by a game server. At best, the server must remain trusted and the connection must remain secure. At worst, they game may have a bug and then run code from an unexpected, untrusted source.

The consequences could be quite dire: almost all of your player base will *immediately and simultaneously* receive and run malware. By the time it’s detected and stopped, if these checks are being run with once-a-minute frequency, the fastest you can receive reports, confirm them, and shut down the server, is ~15 minutes. On average, I’d guess an hour would be more common; and in the worst case, it goes completely undetected for days.

There are some possible solutions to this. The most obvious option (to me) would be to sandbox it somehow. But this can cause bugs if not done properly – game developers are not usually required to understand how to properly sandbox downloaded code.

Another option is to have the code be run in an interpreter; for example, perhaps the game already has a Lua scripting thing. However, then you compromise the two things you wanted in the first place; obfuscation and speed. Either you’re sending a long string of code, which is obfuscated but will have widely varying execution time; or you’re sending a short string of code for consistency, but the operation is no longer sufficiently obfuscated.

A third option is to do some kind of sanitizing algorithm; maybe the code isn’t allowed to have any pointer accesses. But the code that checks for x86-or-whatever instructions that include pointer accesses will be a red flag when statically analyzing the game’s code. Then you run into the same problem you covered in Advocating Obscurity part III (On “Encrypted” Code), where your obfuscation / security thing has one Achilles heel.

The technique should definitely be included. But I think it’s irresponsible to not point out the pitfalls, caveats, and serious security considerations of running arbitrary, *native* code from an online source every minute or so. Especially when you’re calling it the “Ultimate Tool” – the only approach on this page without mention of potential problems.

- "No Bugs" Hare says
  
  January 26, 2018 at 1:52 pm
  
  1. You’re right, thanks, I fixed it.
  
  2. As for “receiving and running malware” – yes, this is risky, but on the other hand –
  it is essentially the same as running a Client self-update system (which we have to run anyway). “Essentially the same” means that _both_ attacks _and_ protections are the same. In particular, as for protection – it is all about signature-with-a-public-key-which-is-stored-on-the-Client. As for the residual risks after the signature is found to be valid – they’re still orders of magnitude lower than the risk that Microsoft takes when one single cracked private key can infect ALL the PCs in existence.
  
  BTW, when speaking about signatures – they SHOULD be performed on an “air gapped” box, so the private key is never exposed to the Internet at all (well, it is still possible to attack this private key, but it won’t be easy, this is for sure; strictly speaking – it is even possible to make this private key even theoretically invulnerable without physical access (!), but this technology is unlikely to be used by gamedevs). In any case, given properly implemented signatures (issued by air-gapped trusted box), I do not feel that other measures such as sandboxing etc. are really necessary (FWIW, I never heard about a real-world attack via broken signatures issued by an air-gapped box).
  
  > But I think it’s irresponsible to not point out the pitfalls, caveats, and serious security considerations of running arbitrary, *native* code from an online source every minute or so
  
  You’re right, it should be definitely included. I added a [[TODO]] about it, THANKS!
  
Pseudonym says

January 26, 2018 at 6:52 am

RDTSC on older CPUs doesn’t play well with ACPI. Closing the lid of a laptop could be enough to make RDTSC misbehave. I think this may be the problem that MSDN is referring to.

- "No Bugs" Hare says
  
  January 26, 2018 at 2:07 pm
  
  You’re probably right – I added a note about it, THANKS!