Bot Fighting 201: Declarative Data+Code Obfuscation with Build-Time Polymorphism in C++

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

Obfuscation: What You See Is NOT What You Get

[rabbit_ddmog vol=”8″ chap=”Chapter 29(f) from “beta” Volume VIII”]

After we discussed Bot Fighting 101–103 (which were about techniques which are commonly used, but are mostly inefficient against serious hackers), we can proceed to a much more interesting topic: how to increase resilience of our Client-Side program to those Brute-Force Attacks discussed in [[TODO]] section above. To re-iterate – Brute-Force Attacks tend to start with identifying system calls (which are pretty much impossible to hide completely) – in the context of MOG it is usually a socket call; after a system call is identified – the attacker goes up the call stack to get to the-place-of-interest-to-him (usually – starting from that-point-where-the-message-to-be-sent-over-the-socket-is-prepared, and going up from there).

At this point, we’ll discuss one serious anti-hacking measure, which is aimed to:

Keep our source code perfectly readable (and with minimal changes);
Make our binary code as-unreadable-as-possible
Obfuscate in-memory data
Make sure that even if source code stays the same – binary code changes a lot on each build. In other words – we’re aiming to have build-time code polymorphism.

It may sound too-good-to-be-true, but apparently, it is perfectly achievable at least in C++; let’s take a closer look at this technique.

Preliminaries – How Our Source Code Will Look

To illustrate our approach, let’s take an extremely simplistic function which we want to obfuscate:

//Listing 29.factorial.orig
uint32_t factorial(uint32_t x) {
  uint32_t ret = 1;
  for(size_t i=1; i <= x ; ++i) {
    ret *= i;
  }
  return ret;
}

NB: in our exercises, we’ll be using MSVC 2017 (in Release mode) – as it is probably the most popular compiler for the Client-Side; for other compilers, the results will be similar, though certainly not identical.

NB2: for my tests, I used __declspec(noinline) for factorial() to make sure that compiler doesn’t inline it (which would complicate our analysis); it is a TEST-ONLY feature, and as a Bit Fat Rule of Thumb™, in production code we should be very aggressive with inlining, and avoid any __declspec(noinline).

When we compile our factorial() function, we get the following asm:¹

;Listing 29.factorial.orig.asm
push esi
mov esi,1
mov edx,esi
cmp ecx,esi
jbe l2
nop dword ptr[eax]; aligning next instruction
l1: imul esi,edx; ret *= i
inc edx; ++i
cmp edx,ecx
jb l1
l2: mov eax,esi
pop esi
ret

Here, in spite of all the attempts of the compiler to make things less clear, there is an obvious loop starting from label l1, with a multiplication inside. As the result of the multiplication is multiplied again and again with an ever-incremented counter – it is relatively easy to guess that we’re dealing with calculating factorial here.

Now, let’s see what we can do to make the code less obvious to the attacker, while preserving clarity of our source code. Let’s introduce an obf<> template, defined as follows:

//Listing 29.obf
template<class T, int ID>
//Here, ID is an ‘identifier’ of
// the instance of our obfuscation
// We DO want to keep them different, don’t we?
class obf {
  T val;

  public:
  obf() {}
  template<class T2>
  obf(T2 v) { val = v; }
  T value() const { return val; }
  operator T() const { return value(); }

  template<class T2>
  obf& operator *=(T2 t) { val *= t; return *this; }
  //NB: in this context, I tend to prefer templatized member functions
  // to free-standing friends, because it is generally easier
  // to address issues from improper ordering, than from type
  // mismatches. And having templatized free-standing friends
  // is not an option due to ambiguities. OTOH, if you prefer
  // free-standing non-templatized friends – they’re ok too.

  //do the same for =, +=, -=, /=, %=
  // and IF you need them – for bitwise ops too

  obf& operator ++() { val++; return *this; }
  obf operator ++(int) { return obf(val++); }
  //do the same for –-

  template<class T2>
  bool operator <(T2 b) { return val < b; }
  //do the same for +, -, *, /, %
  // and IF you need them – for bitwise ops too
};

Now, we can rewrite our factorial() function into the following:

//Listing 29.factorial.obf
obf<uint32_t,123> factorial(obf<uint32_t,124> x) {
  obf<uint32_t,125> ret = 1;
  for(obf<size_t,126> i=1; i < x ; ++i) {
    ret *= i;
  }
  return ret;
}
//BTW, we’re NOT required to have ALL the variables
// obfuscated; in fact – it should be done on case-by-case
// basis

NB: let’s keep in mind that while obf<uint32_t> is very close to uint32_t, it is not a ‘perfect wrapper’ (in particular, because of ‘1 built-in + 1 user-defined’ rule for implicit conversions). It means that some changes MIGHT be necessary to the code after we replace type with obf<type> in some declaration – but they’re usually solvable very easily by replacing variable with variable.value(). Also – make sure NOT to add any casts when enabling obf<> types; adding casts opens a huuuuge can of worms able to introduce LOTS of subtle bugs; fortunately – from what I’ve seen, 99% of the code works after replacing type with obf<type> ‘as is’, and adding .value() allows to deal with the rest.

As we didn’t really change anything, this code still produces the same results when we run it; apparently, asm did change a bit, but the inner loop (the one which reveals all the stuff to the attacker), is still exactly the same.²

Now, we can go a bit further and avoid writing those IDs manually, making an OBF() macro to wrap our obf<> type:³

#define OBF(t) obf<t,__LINE__>
//If you feel like it and use C++17,
// you can even use constexpr function of (__FILE__,__LINE__)
// as an ID; in C++14 it may be possible
// but is hardly worth the trouble

Then, our factorial() function will start to look as

//Listing 29.obf.macro
OBF(uint32_t) factorial(OBF(uint32_t) x) {
  OBF(uint32_t) ret = 1;
  for(OBF(size_t) i=1; i < x ; ++i) {
    ret *= i;
  }
  return ret;
}
//Again, we’re NOT required to have ALL the variables
//  obfuscated

IMO, it looks reasonably close to the original; moreover, I’d argue that it is probably the best thing we can have in this regard. In general, we DO need to specify which variables/parameters have to be obfuscated, and moreover – have to specify how performance-critical they are. In practice, I usually suggest to use template classes such as obf0<>, obf2<>,… obf5<> etc. (with ‘obf0<>’ meaning ‘obfuscate lightly’, and ‘obf5<>’ meaning ‘obfuscate heavily’), and corresponding macros OBF(), OBF2(),…, OBF5().

By now, we’re done with the changes to our source code;⁴ now, let’s see how these (IMO rather minor) changes allow to make the life of the hacker by orders of magnitude more difficult.

¹ All asm in this Chapter uses Intel notation – opposed to AT&T notation; Intel one is also the one provided by MSVC.

² saving for using different registers

³ in MSVC17, for some reason __LINE__ is not a constexpr; however, a workaround is possible (see [[TODO]] for implementation)

⁴ defined as “whatever we’re writing manually”

Take 1 – Naive XOR

Let’s say that we have:

a script, which extracts a list of all the instances of obf<> and OBF() out of our source files (what we need is just a list of (type,ID) tuples, which is trivial to get)
another script, which takes this list of (type,ID) tuples and generates randomized instances of obf<type,ID> wrappers.

One extremely simplistic example of such a generated wrapper would look as follows:

//Listing 29.obf.xor
//GENERATED RANDOMIZED CODE
//CHANGES ON EACH BUILD
//DO NOT MODIFY
template <>
class obf<size_t,126> {
//NB: I am using full template specialization here,
// but it can be partial specialization too
  using T=size_t;
  T val;

  template<class T2>
  void init(T2 v) { val = v ^ 0x28f3472a; }

public:
  obf() {}

  template<class T2>
  obf(T2 v) { init(v); }

  T value() const { return val ^ 0x28f3472a; }
  operator T() const { return value(); }

  template<class T2>
  obf& operator *=(T2 t) { init(value() * t); return *this; }
  obf& operator ++() { init(value()+1); return *this; }
  obf operator ++(int) { 
    obf ret = obf(value()); 
    init(value()+1); 
    return ret;
  }
  //do the same for –-

  template<class T2>
  bool operator <(T2 b) { return value() < b; }
  //do the same for +, -, *, /, %
  // and IF you need them – for bitwise ops too
};

Now, let’s take a look at disassembly. There, we’ll see that while in Debug mode all the XORs with 0x28f3472a are present – in Release mode they’re completely eliminated by compiler, so our inner loop is still exactly the same, with only imul and inc inside (!).

Does it mean that at this point, we failed to obfuscate our data and code? Sure. Does it mean that any such obfuscation is hopeless? Certainly not.

Take 2 – Adding Global volatile

If we go a bit further and generate our wrapper along the following lines:

//Listing 29.obf.xor.global
//GENERATED RANDOMIZED CODE
//CHANGES ON EACH BUILD
//DO NOT MODIFY
volatile size_t global126 = 1;

template <>
class obf<size_t,126> {
  using T=size_t;
  T val;

  template<class T2>
  void init(T2 v) { val = (v ^ 0x28f3472a) * global126; }
  //Order is IMPORTANT here: if we first multiply,
  //  and then XOR – compiler will still be able to
  //  eliminate paired XORs
  //Also, using DIFFERENT operations is IMPORTANT too;
  //  otherwise, compiler may reorder and eliminate things

public:
  obf() {}

  template<class T2>
  obf(T2 v) { init(v); }

  T value() const { return val ^ 0x28f3472a0; }
  operator T() const { return value(); }

  //other operators are the same as above
};

Holding our breath, we take a look at the asm, and can see that now we’re talking! Our innermost loop looks as follows:

;Listing 29.obf.xor.global.asm
l1: imul eax,ecx
inc ecx
xor ecx,28F2372Ah
imul ecx,dword ptr[global126]
xor ecx,28F2372Ah
cmp ecx,dword ptr[x]; on-stack variable
jb l1

While it is certainly not obfuscated enough, and the idea behind the code can be still seen rather clearly, we have already managed to get our obfuscation code in.

Take 3 – ADD instead of XOR

Bijection In mathematics, a bijection, bijective function, or one-to-one correspondence is a function between the elements of two sets, where each element of one set is paired with exactly one element of the other set, and each element of the other set is paired with exactly one element of the first set.— Wikipedia —Now, let’s see what we can do with our obf<> class to obfuscate things better. First, let’s note that XOR is actually only one of the ways to obfuscate things (and is pretty poor at that, BTW – as XOR is rarely used in normal code,⁵ using XOR for obfuscation means every XOR will scream loudly ‘I am obfuscation!!’); also – such non-zeroing XORs may cause AV heuristics to cause false positives [RaabeBallenthin]. However, there are lots of other techniques to make things much less obvious.

Technically, what we’re looking for here, is any kind of bijection; we’ll use this bijection to convert our data from one representation into another one (and as it is a bijection, we can revert it later).⁶

One very simple example of such bijection is just replacing XOR some-random-constant with ADD some-random-constant; then, our init() function will look as val = (v + (size_t)0x28f3472a) * global126 and value() function as return val – (size_t)0x28f3472a0.⁷ After compiling such ADD-based obfuscation (mathematically – addition modulo 2^32), The innermost loop in our generated code will be already less obvious than the one with XORs:

;Listing 29.obf.add.global.asm
l1: inc ecx
imul eax,edx
imul ecx,dword ptr[global126]
lea edx,[ecx-28F2372Ah] ; a counterpart operation
;   was moved out of the loop
cmp edx,dword ptr[x]; on-stack variable
jb l1

Here, MSVC compiler did some serious optimization relying on the interrelations between our obfuscating addition modulo 2^32⁸ + multiplication-by-global, and intra-loop increment+comparison; it certainly didn’t make things simpler-to-understand, nice job obfuscating our intentions!⁹

Our Listing 29.obf.add.global.asm already looks significantly better than original, but TBH, we didn’t even start any serious obfuscation stuff. Most importantly –

As we’re not writing our obf<> classes manually (instead, we have a code generator doing it for us on each build), the sky is the limit to the obfuscations we can generate.¹⁰

⁵ except for XOR EAX,EAX etc. which is a very obvious special case

⁶ actually, we can even use more generic injections, but it is a little bit more complicated, see [[TODO]] section below

⁷ make sure to use ONLY unsigned types here, as signed overflow is apparently Undefined Behaviour in C++ (and starting from around 2015 or so, compilers started to abuse it, so relying on it can lead to really nasty bugs <very-sad-face />). To obfuscate the signed types – just cast them into unsigned counterpart first.

⁸ or modulo 2^64, depending on sizeof(size_t)

⁹ Of course, obfuscating wasn’t a goal of the compiler, but it just so happens that serious optimizations at least for x86/x64 command set tend to result in serious obfuscation too. This includes lots of blending of different inlined functions – in the example above we cannot really say where one function ends, and another one begins; perfect! <wide-smile />

¹⁰ for time-critical code, there are also performance considerations, more on them in [[TODO]] section below.

Take 4 – Throw In Multiply-by-Odd

Our next very-simple-but-efficient obfuscation is related to multiplication-by-an-odd-constant-modulo-2^32¹¹ (and reverse operation is a multiplication-by-another-odd-constant-modulo-2^32). Explanation why it is a bijection, is beyond the scope of this book (very very sketchy – it relies on a fact that any odd-constant and 2^32 are co-primes).

How to calculate inverse-constant (given original constant), is not-so-trivial, and we’ll discuss it in [[TODO]] section below, but for the time being, we’ll use a pre-calculated one. Let’s observe that 13*17*97*315643*2829487 mod 2^32 = 1.

This means that we can write our init() function using any of the 5-numbers-listed-above, and our value() function – using all the remaining 5 numbers, for example:

//Listing 29.obf.mul
//GENERATED RANDOMIZED CODE
//CHANGES ON EACH BUILD
//DO NOT MODIFY
template <>
class obf<size_t,126> {
  using T=size_t;
  T val;

  static_assert(sizeof(size_t)==4,””);

  template<class T2>
  void init(T2 v) { val = v *(size_t)17*(size_t)97*(size_t)315643; }

public:
  obf() {}

  template<class T2>
  obf(T2 v) { init(v); }

  T value() const { return val * (size_t)13*(size_t)2829487; }
  operator T() const { return value(); }
  //other operators are the same as above
};

When compiled it leads to the following asm code in the innermost loop of our factorial() function:

;Listing 29.obf.mul.asm
l1: mov esi,ecx
inc edi
imul esi,eax
add ecx,1F0620CBh
imul eax,esi,23144E3h
cmp edi,edx
jb l1

Well, as we can see – even without globals it doesn’t look too obvious; sure – there is a loop, and there is a multiplication within the loop, but the other stuff (such as the second multiplication) and strange-looking constants, is not really obvious (and while it is merely obfuscation, it is not “dead code” either, so it cannot be easily eliminated by dead-code analysis).

¹¹ of course, 2^16, 2^64 etc. will also do

Take 5 – (kinda)-Feistel Round

The last primitive which we’ll discuss today (with many more to follow in [[TODO]] section below), is a (kinda)-Feistel-round, known from cryptography. Classical Feistel round looks as follows: we take our variable, split it into two equal parts bit-wise, then we feed the “left” part to some (not-necessarily-bijection(!)) “round function” f(), and then XOR it with the “right” part; the result consists of the “right” part (unmodified), and “left” part (XOR-ed with f(right_part)). We’ll do almost the same, just replacing too-obvious-in-the-code-XOR with addition modulo 2^32 (and using x^2 as our “round function” f()):

//Listing 29.obf.Feistel
//GENERATED RANDOMIZED CODE
//CHANGES ON EACH BUILD
//DO NOT MODIFY
template <>
class obf<size_t,126> {
  using T=size_t;
  static_assert(sizeof(size_t)==4,””);
  uint16_t right; uint16_t left;
  uint16_t f(uint16_t x) const {
    return x*x;
  }

  template<class T2>
  void init(T2 v) { right = (uint16_t)v;
  left = (uint16_t)(v >> 16) + f(right); }

public:
  obf() {}
  template<class T2>
  obf(T2 v) { init(v); }

  T value() const { return (uint32_t)((left-f(right)) << 16) + (uint32_t)right; }
  operator T() const { return value(); }
  //other operators are the same as above
};
static_assert(sizeof(obf<size_t,126>)==4,””);

With this code, the inner loop of our factorial() function happens to look as follows:

l1: mov edx,esi
imul edx,esi
movzx esi,si
movzx edx,dx
sub ebx,edx
shl ebx,10h
add esi,ebx
imul eax,esi
lea edi,[esi+1]
movzx edx,di
mov esi,edi
imul esi,edi
mov dword ptr[ebp-4],edx
mov edx,edi
shr edx,10h
add edx,esi
movzx ebx,dx
movzx edx,si
mov esi, ebx
sub esi,edx
movzx edx,di
shl esi,10h
add esi,edx
cmp esi,ecx
mov esi,dword ptr[ebp-4]
jb l1

Good luck finding our original factorial() logic here (there is still a loop-with-imul-inside, but the rest is not exactly clear to put it mildly)…

BTW, it is still NOT the-result-we’re-aiming-for; rather – it is just an illustration of the building blocks for the real stuff; stay tuned for the rest! <wink />

[[TODO: data protection vs common attacks such as [https://blog.badtrace.com/post/reverse-engineering-a-csgo-cheat/] ]]

[[To Be Continued…

This concludes beta Chapter 29(f) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”.

Stay tuned for Chapter 29(g), where we’ll proceed to the question of “how to obfuscate things in C++17 alone, without any external compiler” – and will get ORDERS OF MAGNITUDE more sophisticated than the above – while keeping our source code readable along the lines above]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Comments

Goose says

December 28, 2017 at 3:03 pm

Is it Sergey I. or Sergey G. who think up these images?

- "No Bugs" Hare says
  
  January 2, 2018 at 7:27 am
  
  Usually, it is ‘No Bugs’ who comes up with a _textual_ description, and Sergey G who translates it into lines-on-paper. Kinda as a movie screenwriter and director ;-).