War on Clones, Part II. Identifying Mobile and Browsers. Social and Payment-Based Identification. Putting it all together.

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

[[This is Chapter 15(c) from “beta” Volume IV of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

Identifying Mobile Devices

IMEI IMEI is a number, usually unique, to identify 3GPP (i.e., GSM, UMTS and LTE) and iDEN mobile phones, as well as some satellite phones— Wikipedia —In the field of mobile device identification we need to note that each and every mobile device has to have so-called IMEI. It is a pretty good hardware-based identifier (though, as pretty much anything else, it is hackable – and as many as 10% of IMEIs are reported to be non-unique too [BBC]), and it would be pretty good for our purposes. The only tinsy problem is how to get that IMEI from the OS where our app is running; as we’ll see below, different OSes are quite different in this regard.

Another mobile-specific thing (and the one which tends to help QUITE A BIT on mobile), is related to the observation that mobiles tend to be MUCH tightly integrated with social accounts than PCs. As our real goal is not really “to identify device”, but rather “to identify a real-world user behind the device”, social IDs tend to work just fine; while it is certainly possible for the cheater to have a separate phone with fake account, this is Quite Unlikely (that is, unless the cheater is professional making money out of the cheating).

Identification under iOS

“When running under iOS, our apps live pretty much in jail constructed for us courtesy of AppleWhen running under iOS, our apps live pretty much in jail constructed for us courtesy of Apple. It means that (unless we want our app to run ONLY on jailbroken devices, which we do NOT really want) – we need to play by the rules specified by Apple.

Identifying iOS Devices

For a quite a long while, the primary way of identifying iOS devices was so-called UDID; with UDID, you’ve got a very good way to identify iOS device; yes, it wasn’t perfect (nothing is), but it was really good compared to the other identification ways. However, about 5 years ago UDID got deprecated, and now device identification as such is not really allowed.

Moreover, as of 2016, under iOS neither IMEI nor MAC address can be accessed 🙁 [StackOverflow.UIDDeprecated]. And as we need to play by Apple rules, it basically leaves us with two and a half options provided by Apple:

Random UUID stored in defaults database (in fact, it is similar to “Hidden Crypto-ID” as we’ve discussed above). Pros: is likely to stay, no false positives (that is, if you’re using CFUUIDCreate() or a different crypto-quality random number). Cons: simple reinstalling of your app changes UUID 🙁 . However, see below about keychain workaround.
identifierForVendor [Apple.IdentifierForVendor]. Pros: still works even after access to UDID has been thrown away, and is reasonably unique. Cons: removing ALL your apps is reported to kill identifierForVendor 🙁 [StackOverflow.FindingIMEI] . Also, future of identifierForVendor is unclear, as Apple seems to prefer UUID approach.
advertisingIdentifier. Again, a reasonably unique identifier. However, it has been reported that Apple is currently rejecting apps without ads which use advertisingIdentifier 🙁 [StackOverflow.FindingIMEI]. Pros: pretty unique, usually survives better than the others. Cons: to use it, you MAY need to run some ads 🙁 .

Ok, now about one thing which is not officially endorsed by Apple, but still seem to work as of 2016 🙂 :

“If you store your random UUID into a 'key chain', it will reportedly survive the reinstall of your appIf you store your random UUID into a “key chain”, it will reportedly survive the reinstall of your app [StackOverflow.HowToPreserve]. Pros: still no false positives. As of 2016 seems to survive all app reinstalls (but not device reset). Cons: device reset still wipes it out. In addition, there are things to keep in mind if KeyChain is synced via iCloud [StackOverflow.HowToPreserve]; basically, in these cases you’re likely see several different devices (belonging to the same user(!)) having the same id, which is usually Good for our purposes, but might need slightly different treatment in tools to avoid treating such scenarios as suspicious. Despite all the issues above, this UUID-in-keychain is probably the best single way to identify iOS devices as of mid-2016.

A word on system fingerprinting under iOS: iOS is one of those systems where fingerprinting doesn’t work well 🙁 . The reason for it is two-fold. First, as there is one and only one manufacturer of iOS devices, the number of hardware configurations for iOS devices is very limited to start with (and with auto-updates, the number of OS-level software variations is very limited too). Second, Apple does a good job on “sandboxing” apps in a non-jailbroken iOS. Just as one example: list of installed apps would make a Great Fingerprint, but Apple does not allow reading the list, and is working hard on tightening remaining holes which may still allow to deduce this list. With exploits-which-were-fixed-in-iOS9 including stuff such as leaks via icon cache, I don’t really feel that we can rely on other-apps-list being available for a significant while. It leaves us with a very few options for “fingerprinting”, such as using browser-based fingerprints (which also don’t work well for iOS devices due to the reasons above), and very basic stuff such as device type, iOS version, screen resolution etc. I’ve heard of people trying to play with frequently-monitored free disk space (which IS rather unique), but I didn’t hear of it reaching anywhere reliable results.

“when trying to identify iOS device, probably the best single identification method is to generate crypto-random ID and to put it into keychain.Bottom line: when trying to identify iOS device, probably the best single identification method (as of 2016) is to generate crypto-random ID (or UUID) and to put it into keychain. However, it is certainly not bulletproof, and we DO need other methods too.

Accessing user social data from iOS

As mobile devices are almost-universally used for social IDs (for the purposes of this Chapter, “social” includes e-mail, facebook ID, twitter ID, and so on) – and apps are encouraged to use this social stuff too – well, we can use these IDs to identify our players.

To do it, there are two general ways:

Using Accounts Framework [Apple.AccountFrameworkReference]. For a practical example on “how to get Twitter account from Accounts Framework”, see [StackOverflow.HowToGetUserInfoFromTwitter]
Using per-social-network SDK, such as [Facebook.SDKForIOS]

Note that in any case, you will most likely need to justify accessing this information to the user. An ideal thing in this regard would be to have your users log in using their social accounts; among other benefits (such as ease-of-use – for users, that-is, no-need-to-deal-with-passwords, etc.) this will give you that rather-reliable social-ID, without any kind of misleading on your side, and not causing any suspicions either 🙂 .

Identification under Android

“As soon as we’re past iOS, everything else will go more smoothly.Phew. As soon as we’re past iOS, everything else will go more smoothly. For Android, it doesn’t come as a big surprise, as Android (as an OS) is still pretty much Google-land, and Google primary business (ads) is essentially relying on the ability to identify people and their preferences; hey, identifying people is exactly what we’re trying to do 🙂 .

So, what we have for Android:

ANDROID_ID. Supposed to be unique per device; however, it has been reported to be NON-unique (in a pretty much the same manner as MACs – depending on hardware manufacturer) [StackOverflow.IsSecureAndroidIdUnique]. Also it MAY change on device reset.
Using TelephonyManager.getDeviceId(). This has been seen to provide better results than ANDROID_ID (as it is supposed to return IMEI/MEID, which are quite unique). Note that for dual-SIM devices, there are two IMEI numbers (though they still do NOT depend on SIM(s) inserted). For details on obtaining IMEI under Android, see, for example, [StackOverflow.HowToGetTheDevicesIMEI]
Wi-Fi MAC address is readable, though all the stuff about MACs mentioned in Part I of this post still apply.
“as much as iOS is a device identification nightmare, Android is a device identification paradise.Unlike iOS, system fingerprinting is working for Android. First of all, the spectrum of a various hardware is MUCH wider for Android than for iOS. Second, obtaining “installed apps” list is possible. Third, all kinds of interesting things (such as, for example, ARP table which gives a list of all the MAC devices in the vicinity, so that changing MAC address on this specific device won’t help) is readable on Android.

And all the social account things are usually working on Android too…

In short – as much as iOS is a device identification nightmare (that is, as long as you’re playing by Apple rules), Android is a device (and user) identification paradise.

[[TODO: PSID and XBox]]

Identifying Browsers

“Believe it or not, you MAY be able even to identify the client device if your Client is browser-basedBelieve it or not, you MAY be able even to identify the client device if your Client is browser-based. There are at least several techniques which provide kinda browser fingerprints. I know of quite a few such systems, including [AmIUnique], and [fingerprintjs2] (there are quite a few other commercial ones, but quite a few of them are using deceptive promotion tactics, which raises Big Questions about their intentions). Which one of the techniques to use – is up to you to figure out, however, keep in mind that:

Browser fingerprinting is NOT to be used as a sole way of identifying client devices. It is true for ANY kind of fingerprinting, but it applies to browser fingerprinting in spades.
- Two most insteresting browser fingerprinting techniques are so-called “Canvas fingerprinting” and “WebGL fingerprinting”. The idea is to simply draw a certain rather complicated picture (using HTML Canvas or WebGL respectively) – and then to use some kind of hash of this picture as a fingerprint. And with all the variations in things such as video cards/drivers/installed fonts/etc., picture will be quite unique. OTOH, on the flip side, each video card driver upgrade on the Client box has a chance to break the fingerprint, so these fingerprints shouldn’t be used to identify users in the long run.
On the devices such as iPhones/iPads, which have rather limited number of configurations, fingerprinting (including browser fingerprinting) is NOT likely to work well.
Some people (especially cheaters¹) will take great lengths to prevent fingerprinting. To get an idea of typical actions users take to avoid being fingerprinted, take look at [PixelPrivacy]; very briefly – the best they can realistically do² in this regard is using Tor browser, but anybody playing your game via Tor browser (in particular, disabling Canvas) should already raise a red flag.³
Other than that – well, you can try 🙂 . Reliability will be limited (so you SHOULD NOT rely on it alone to ban somebody outright), but well – it MAY be handy to find out what is going on.

Another (completely different) technique which MAY help with identifying browsers, is HSTS, as described in [NakedSecurity]. In short – HSTS is a kinda-cookie, but it is not just any cookie, so it is MUCH more difficult to remove (and for a good reason too). Which means that – HSTS may be used for device identification purposes (and completely independent from “browser fingerprints”, so it may be used to complement whatever-browser-fingerprint-you’re-using). Note that HSTS is not transferred as a part of request, and all we have is information whether browser comes via http:// or https://, so each HSTS can transfer only one bit of data; however, making 20 “existing-or-non-existing” HSTS cookies will give you a million of different IDs, and raising the number to 30 – will bring you into billions.

¹ but certainly not ONLY cheaters

² IMNSHO, disabling JS doesn’t count as “realistic”

³ that is, if they ARE able to play at all, as Tor is notoriously slow, so anything but social games isn’t likely to fly

Not-Really-Technical Identification

“In the real world populated by humans (opposed to post-Judgment-Day world populated by robots), non-technical identification is usually MUCH more reliable.By now, we’ve discussed certain technical means to identify your players (mostly via more-or-less-successful attempts to identify their devices). However, in the real world populated by humans (opposed to post-Judgment-Day world populated by robots), non-technical identification is usually MUCH more reliable. As it is non-technical, I won’t go into too many details of such identification in this supposedly technical book, but let’s still take a quick look at them.

Social Identification

As noted above, social account DOES serve as a reasonably good identification. However, you should be aware that not all accounts are created equal; there are some “fake” social accounts out there. Such “fake” social accounts may be completely innocent (for example, I know quite a few people who’re not comfortable playing under their real-world names), or they may belong to cheaters. Which means that (once again) we CANNOT make any immediate judgement based only on this info, but we still CAN use this information for fraud and abuse prevention purposes.

How to detect that an account is fake – is beyond the scope of this book; one of the problems in this regard is that pretty much any technique published will make it unreliable. That’s why I REALLY don’t like policies such as “everybody with less than 10 friends is a fake, so let’s ban him” – such a policy WILL cause both false positives and false negatives.

On E-mails

One of the (quasi)-social methods of (quasi-)identification is related to e-mails. And yes, an e-mail DOES provide a little bit of identification. In particular, while bots creating e-mail accounts do exist – asking for an e-mail still creates a barrier against some of the wannabe-abusers. However (as with anything else) having an e-mail DOESN’T guarantee that the account is real (and even less – that it is not a duplicate).

Overall, e-mails tend to perform worse for identification purposes than social accounts; the reason is simple – for a social account you can get much more information than just an ID (including account creation time, number and quality of friends, etc. etc.), and this additional information can be used to identify people (or at least to make a guess whether the account is a real or fake one).

Payment-Based Identification

“One of the best overall ways of player identification is related to payments.One of the best overall ways of player identification is related to payments. Reliability of this way of identification is closely related to the complexity and cost of obtaining a new payment method (which, in turn, is closely related to security measures which are routinely undertaken by financial institutions). In short – it is just MUCH more complicated to get a new credit card, than to get a new e-mail address, and MUCH more difficult to create a duplicate PayPal account than to create a duplicate Facebook one.

Therefore:

If you do have a luxury of having payments – make sure to use it to identify duplicate accounts too. Successful⁴ use of the same credit card by two separate people certainly indicates them to be closely related; whether to allow both of them to play – is up to your policies, but the relation between the two is certainly there.
Even payment-based identification DOES NOT guarantee against false negatives. Moreover, even if you could trust the “name on the card” field provided during the payment (hint: you usually cannot, as “name on the card” is rarely sent to the bank, see Chapter [[TODO]] for discussion), still assuming that all the people named “John Smith” in a city like New York or London UK, are the same, is NOT a good idea.

⁴ Note that “successfully” is a VERY important thing here; otherwise you may become a victim of misidentification fallacy (an example of a similar fallacy was described, for example, in [NoBugs])

Putting it All Together

Ok, we’ve seen quite a bit of different techniques; let’s try to put them into larger picture.

Nothing is 100% Reliable

First of all, let’s note that

whatever we’re doing identifying users – it is NOT 100% reliable.

There is no one single method which is 100% free of false positives and 100% free of false negatives

“we CANNOT possibly have a bulletproof solutionIn other words – whatever we’re doing, we’re working within “security by obscurity” domain; while we CAN engage those-looking-to-bypass-bans, in swords-beats-shield-beats-sword kind of fight, we CANNOT possibly have a bulletproof solution.⁵ This is related to the all-important fact that attackers are playing on their “home turf” (for a discussion on importance of “home turf” – see Chapter II from Vol.1).

⁵ Even if the most drastic identification methods such as Pentium-III-style Processor Serial Number would be available (and without ability to disable it in BIOS), they still could be defeated by VM trickery.

Use Everything You Can Get Your Hands On

From a purely practical standpoint, it means that

As a rule of thumb, we need to use all the methods available.

This, however, is DIFFERENT from saying that “we need to get all the information we can”. Obtaining too much information from player’s computer would be too invasive and sometimes even illegal. However, within reason (and with proper use of hashes, see above) – it IS possible to get the information which is not THAT invasive or sensitive, while still being able to protect yourself from abusers.

That being said – DON’T play with fire

That being said, there are two further all-important points in this regard. First of all,

NEVER EVER do anything which would cause you problems IF your community learns about it.

“Player communities are usually as averse to abusers as you are.Player communities are usually as averse to abusers as you are. They DO want to play in cheater-free and abuser-free environment. However, there is a line between (a) doing something which is Really Necessary to protect your other players from cheaters and abusers, and (b) doing something which is Too Invasive. From our developer’s chair, it is often too easy to cross this fine line 🙁 (especially as this line varies from one player community to another one); that’s why it is often a Really Good Idea™ to defer such decisions to your community (within reason, of course – opinions such as “all the cheaters should be dragged out into the street and shot” shouldn’t be taken literally 😉 ).

On the other hand, I do NOT mean that you should consult your community before implementing any SPECIFIC feature (as we’re in the security-by-obscurity realm, revealing fine details is NOT a good idea, as revealing details weakens your defences significantly – or even completely). However, if you already have a community, you MAY want to ask it about doing Client-side detection (with some vague examples of “what kinds of information we may collect”), and if you don’t have the community yet – you usually SHOULD write about this feature of your Client into your very first ToC.

The second all-important point is:

DO consult your legal team BEFORE implementing ANY client-side data collection

ANY information gathering which goes on the Client side, is a potential legal minefield; DON’T go into it without a mine detector legal advisor. You certainly DON’T want your company to collapse under $100M lawsuit just because you’ve wrote a minor additional data collection feature which has appeared illegal under some statute in a certain state (or just because you didn’t tell about the feature so it hasn’t been included into your ToC).

Everybody make Small Mistakes once in a While

Phew. With this unpleasant stuff aside, we can continue our discussion about the identification. First of all, we need to realise that

Everybody makes occasional mistakes, cheaters/abusers included.

The point here goes along the following lines. It IS theoretically possible to have accounts completely separated (such as “having completely separate computer in a completely separate network, going via completely separate ISP etc. etc.”). However, the longer such two accounts exist – the higher are the chances for abuser simply to enter wrong login into one of them, or for one ISP being down, so he’ll use another one “just this time”, or to forget to make a restore from a VM snapshot before using 2^nd account, or for anything else of similar nature. In fact, if abuse is going long enough – it is BOUND to happen.

“'A minute later, that user changed the alias to Frosty.'Unfortunately, I cannot share any examples in this regard from real-world games (these things are generally WAY TOO sensitive to be published); however, there is a widely publicized case of a guy who was MUCH more security-aware than ANY of our abusers – and still slipped once: ”a question about database programming posted on Stack Overflow, dated March 16, 2013, asking, “How do I connect to a Tor hidden service using curl in php?” The email listed was rossulbricht@gmail.com. A minute later, that user changed the alias to Frosty.” [Wired]. This singular minor one-minute slip was sufficient to lead to his arrest on the charges of no less than “money laundering, computer hacking, conspiracy to traffic narcotics, and attempting to have six people killed” [Wikipedia.SilkRoad].

Log Everything and a bit More

Of course, we’re certainly not an FBI, but our abusers are not that sophisticated as Dread Pirate Roberts either; what’s important – is that they WILL make mistakes. And as this is a weakness for abusers – we (as abuse fighters) SHOULD use it to make our games better for our players (which is usually good for our bottom line too ;-)). From practical standpoint, it means:

ALL the collected information which has reached Server-Side, SHOULD be logged.⁶

As practice shows, these logs (for example, with records-with-all-the-info-we’ve-got made on each login) are Really Invaluable for your security teams.

⁶ Of course, subject to any legal regulations and any promises you’ve made to your players

Real-World Inter-People Relationships

One additional observation about the nature of human relationships, which is highly relevant to real-world player identification and bans, is that

Real-world people tend to form “clusters”

Theory of Six Handshakes ...is the theory that everyone and everything is six or fewer steps away, by way of introduction, from any other person in the world, so that a chain of 'a friend of a friend' statements can be made to connect any two people in a maximum of six steps.— Wikipedia — On the other hand, as “theory of six handshakes” says, “everyone is six or fewer steps away” (in terms of friends). In other words, if following a chain of friend-of-friend-of-friend-of-friend-of-friend of-friend, we can reach EVERYBODY in the world (yes, President of US and Queen of United Kingdom included). I won’t vouch for this theory to be 100% correct, but for our purposes it doesn’t matter if it is really “six handshakes” or “eight handshakes”; what really matters for us is only that “number of friends-of-friends tends to grow exponentially”

These observations lead to the following practical conclusions:

Your security team MAY want to consider a friend-of-the-known-abuser as a potential duplicate account of the same abuser, and spend more time on him; however, as a rule of thumb, such an observation SHOULD NOT be used as a reason for an automated ban, but it MAY be a reason to make additional research, maybe – to ask some questions, etc.
On the other hand, NOT every friend of an abuser is an abuser; if banning on this principle – you risk to ban your whole player population within half a second (!)
You may even say that people who DON’T have friends or other connections – are likely to be bots or some other types abusers. This, again, MUST NOT be taken as a reason to ban (hey, WTF? – I myself am often playing from fake social accounts ;-)), but, for example, if you’re afraid of grinding bots – it MIGHT be a reason to start showing such players a captcha more often than to the others (more on it in Vol.3, tentatively Chapter XXXIII).

Auto-Bans

Ok, now as you’re collecting all that information, you most likely want to ask –

What we should do with all that terabytes of data?

Well, my STRONG recommendation is the following:

Until you have real-world problems, you should just collect that information and do ABSOLUTELY NOTHING automated about it

In other words:

Devising any kind of an automated ban before you see any real-world problem, is usually detrimental

“just sit there, gather information, and brace yourself for an upcoming wave of abuse and fraud.There is one exception to this rule, and you WILL probably want to have automated safeguards for payments (see Chapter [[TODO]]), but other than that – just sit there, gather information, and brace yourself for an upcoming wave of abuse and fraud.

Sure, you will likely need to implement certain automated procedures some time after the launch, but if implementing them in advance without seeing the whole picture – you will be likely creating more problems that solving them.

As most developers (myself included) tend to intuitively dislike this approach, let’s see the reasons behind it. Before the launch (and before having your first 100K players) – you DON’T have any idea about the things which are happening in the wild, both on detection front, and on abuse front. For example, it MIGHT easily happen that for majority of your devices all the MAC addresses are reported the same (which, BTW, will be for real if the majority of your devices are iPhones – all of them tend to report the same “fake” MAC). Or it might happen that a problem of duplicate accounts is not really a problem for you. Therefore,

Until you’re running in the real world, AND got your first real-world stats and first real-world problems – DON’T implement any automated bans.

Still, you SHOULD collect all the information you can (and may) collect, from the very beginning. This will allow you to have all the information you need, at the moment when the first Big Abuser comes in.

That’s exactly the reason why in this Vol.2 (which is dedicated to “development” but not “deployment”) I DID go into details of “how to collect the information”, but DIDN’T go into further discussion on “how to use this information”. For the time being – just keep collecting it. Then, having all this information, you’ll be in MUCH better position to implement forensic tools and automated bans when you’re big enough to become a target for cheaters.

[[To Be Continued…

This concludes beta Chapter 15(c) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 15(d), where we’ll discuss the Ultimate Security Herecy – the one about Security by Obscurity being useful…]]

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.