Bot Fighting 201, Part 2: Obfuscating Literals

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[rabbit_ddmog vol=”8″ chap=”Chapter 29(g) from “beta” Volume VIII”]

In previous instalment, we discussed a few basic techniques related to Data+Code Obfuscation; most of the time, we were speaking about bijections.

On the other hand, bijections taken alone are NOT exactly sufficient to obfuscate our code; this phenomenon is related to two observations:

Literals can tell a LOT about our program.
- Moreover, if using literals in our bijections – they can easily give away the correspondence between our obfuscation and deobfuscation routines – and this is one of those pieces of information which are better to be kept out of plain view.
“If trying to obfuscate literals using simple bijections-over-literals – first, our own compiler will try to optimize constants out, and then decompiler will try to optimize out the rest.If trying to obfuscate literals using simple bijections-over-literals – first, our own compiler will try to optimize constants out, and then attacker’s decompiler will try to optimize out the rest.
- In other words – if using simple bijections-over-literals, we cannot be sure that our supposedly-obfuscated literal looks reasonably-obfuscated to the hacker, or not <sad-face />.

As a result, we DO need to find a somewhat-special way to obfuscate literals (to make sure that compiler+decompiler doesn’t optimize it out too easily). In this regard, several different approaches are possible.

Volatile

One very simple¹ way of obfuscating literals is to generate a random constant CC (just as above, at build-time), and then to generate the following code:

template<>
struct obfuscated_literal<size_t,13,456> {
  //13 is the constant we’re hiding.
  //456 is an ID of this obfuscation (similar to
  // the obf<> discussed above)
  static size_t volatile c;
  static constexpr size_t CX = obfuscate(13+CC);
  //obfuscate() should be constexpr function,
  // not a big deal in C++17

  static size_t value() const {
    return deobfuscate(c)-CC;
  }
};
volatile size_t obfuscated_literal<size_t,13,456>::c = CX;

As variable c is declared as volatile (~=”allowed to be changed at any moment without any reason-known-to-the-compiler”), the compiler just doesn’t have any options other than to read it (and on each and every access to it too); this ensures that the call to deobfuscate() won’t be optimized out.

A variation of this technique is related to having a table-based implementation of the deobfuscation, where the table-used-for-deobfuscation, is declared volatile.

¹ and supposedly pretty much bulletproof at least against our own compiler optimizing it out

No-inline Functions

In addition to volatiles, I had good experience with using a non-inlineable (__declspec__(noinline)) function calls to hide constants. Even better – I made them use potential-pointer-aliasing (which is a big headache for compiler writers, effectively prohibiting them to make certain assumptions, good for us <wink />!):

__declspec__(noinline)
size_t aliased_pointer_func(size_t* x, size_t* y) {
  *x = 0;
  *y = 1;
  return *x;//can be either 0, or 1,
            // depending on
            // pointers x and y being equal or not
}

TBH, I don’t really see how a call to such a function can be optimized out (unless compiler ignores “noinline” specification); apparently, at least those compilers I tried, agreed with me on it, and left the call in place <smile />.

Reading OS memory

Elaborating a bit further on the idea of obfuscating things (and at the same time, doing poor-man’s anti-debugging for free <wink />) via reading-some-memory, we can write something along the following lines:

template<>
struct obfuscated_literal<size_t,13,457> {
  static constexpr size_t CX = obfuscate(13);
  static size_t value() const {
    uint8_t* peb = (uint8_t*)__readfsdword(0x30);
    uint8_t beingDebugged = peb[2];
    return deobfuscate(CX*(beingDebugged+1));
  }
};
//Warning: at some point, I run into a code generation bug with current MSVC compiler
// related to __readfsdword() intrinsic; as a result, I do NOT recommend to use 
// __readfsdword() in generated code; for a seemingly-working example of using it in a safe 
// manner, see [[TODO]]

The idea here is actually rather neat: if beingDebugged is (as we expect) 0, then beingDebugged+1 is 1, and then value() will return 13. However, if we’re run under debugger, the result will be very different, leading to crash (or to pretty-much-endless-loop) very soon (and without letting hacker know what exactly went wrong). Moreover, this technique is NOT limited to reading specifically BeingDebugged flag; in fact – most of the in-memory flags discussed above (including heap flags, NTGlobalFlag, etc.) can be used in this manner.

Unfortunately, all such memory locations are well-known and are routinely patched by ScyllaHide. It means that this technique won’t be able to seriously help our defense efforts – but as it comes pretty much for free (while solving our task-at-hand=”making sure compiler doesn’t optimize literal obfuscation out”) – why not use it?

Changing Variables with Invariants

All the methods above are working – but on the other hand, each of them can be detected relatively easily; while as of now, I don’t know of tools doing it, I am pretty sure that they can (and probably will) be developed as the time passes. However, there is one further thing which we can do to complicate such analysis (and IMO, very significantly too).

For example, we could use the following code to obfuscate our literal 13:

template<>
struct obfuscated_literal<size_t,13,458> {
  static size_t c = 0x3475723D;//note that 0xD == 13
  static constexpr CC = some_random_constant * 16;
  static size_t value() const {
    c += CC;
    return c & 0x0F;
  }
};

The point here is that while the variable c is changing on each call to value() (so it cannot be detected as a ‘de-facto constant memory location’), the last 4 bits of c are always 0xD, so value() function always returns 13 (as it should).

Moreover, this is certainly not the only technique which allows us to have ever-changing variable which does keep certain invariant (which we can rely on to derive our constant from). In particular, at least the following ever-changing-variables-with-invariants are possible:

Generalization of the technique above to any number of bits.²
If we realize that actually, in the code above &0x0F is equivalent to mod 16 (and ‘+=’ is actually addition mod 2^32 where 2^32 is a multiple of 16), we can generalize the technique above to operations mod M1 and mod M1*M2, where neither M1 nor M2 is 2^n.³
Actually, any finite cyclic subgroup will do here. Unfortunately, I have to admit that I am extremely incompetent in group theory, however:
- it is perfectly possible to generate such a subgroup defined in a table manner (and if we define it over uint8_t, implementation will be very practical too).
Also, XOR-ing our variable with a constant will ensure that only some of the bits will change, and all the others remain intact (with those intact bits effectively forming our invariant). This, in turn, can be generalized to the following:
- Considering our variable as a bunch of ‘digits’.
  - To confuse things better, it is better to use non-binary ‘digits’.
    - To confuse things even further, valid range for each ‘digit’ can be made different.
- On each iteration, change only some of the ‘digits’ (say, in a pseudo-random manner), keeping the rest of the ‘digits’ intact.
- Those intact digits form our invariant, and can be used to calculate our literal.

² as long as it is strictly less than variable size

³ Beware: with M!=2^n, we MUST avoid overflow mod 2^32. For a supposedly-working example, see [[TODO]]

Changes and Threads

From the practical perspective, using ever-changing-variables-with-invariants requires being very careful in presence of multithreading <sad-face />. In particular:

Formally, unless we’re using thread_local variables,⁴ we have to ensure that our access to the variable is atomic.
- “Note that for our purposes, there is no need to ensure atomicity of the whole read-modify-write operation; instead, it is sufficient to have both read and write as atomicNote that for our purposes, there is no need to ensure atomicity of the whole read-modify-write operation; instead, it is sufficient to have both read and write as atomic (and if there is an intervening write coming from a different thread – it cannot violate invariant anyway).
- In practice – as of now, even simple reads/writes to properly-aligned-variable will do, but formally doing so qualifies as a dreaded Undefined Behavior, so nobody knows when it starts hurting us <sad-face />
- OTOH, std::atomic<> is formally correct (and without too much overhead too)
More importantly, from the practical point of view, we have to ensure that we don’t thrash CPU caches by modifying our ever-changing variable from different threads too often. From my current perspective, two approaches are possible in this regard:
- Say that the potential cost of this trick is huuuuge (like 100+ cycles), so we won’t use it in all-but-the-most-non-performance-critical obfuscations.
- Use ever-changing-variables only with thread_local specifier, which by definition eliminates all the MT-related problems (at the cost of having storage for these variables for all our threads, but this is rarely a problem, at least as long as we keep number of our generated thread_local variables to some hundreds).

Obfuscating String Literals

Up to now, we were discussing obfuscating integer literals; however – in practice it is even more important to obfuscate string literals. One traditional way of obfuscating string literals is to have strings-stored-in-data-segment XOR-ed with a single byte; it is a very-well known technique – and is easily defeated too <sad-face />; in particular – such a single deobfuscation routine can be easily found and hacked, revealing all the supposedly-obfuscated strings to the attacker.

However, we can (and SHOULD) do MUCH better than that. Indeed, if we apply our approach discussed-above-with-regards-to-integer-literals, to string literals – we’ll be able to have orders-of-magnitude better protection than naïve XOR-with-byte. In particular, if we (a) generate individual obfuscation code for each of the string literals, and (b) have generated obfuscation of the same length as literal-being-obfuscated – we’ll get the following very-substantial benefits:

There WON’T be one single point of attack.
“Each and every obfuscation has to be hacked individually.Each and every obfuscation has to be hacked individually.
- Moreover, if we re-generate our obfuscation code on every build – it has to be done from scratch for each new build.
Deobfuscation won’t contain XORs (which are screaming out loud “we’re deobfuscating some stuff here!”
If we like it, deobfuscation can be made loop-less.

⁴ or otherwise guaranteeing that access to the same ever-changing-variable from different threads is impossible.

[[To Be Continued…

This concludes beta Chapter 29(g) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”.

Stay tuned for Chapter 29(h), where we’ll discuss my open-source C++17 obfuscation library – which does most of the things discussed in “Bot Fighting 201” more or less for free]]

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.