Direct Payment Processing. Recovery from ‘Unknown’ Transaction Status. PCI DSS.

 
Author:  Follow: TwitterFacebook
Job Title:Sarcastic Architect
Hobbies:Thinking Aloud, Arguing with Managers, Annoying HRs,
Calling a Spade a Spade, Keeping Tongue in Cheek
 
 
Happy New Year to everyone, and – here goes another Chapter (this time – 21(b))
PCI DSS Audit

#DDMoG, Vol. VI
[[This is Chapter 21(b) from “beta” Volume VI of the upcoming book “Development&Deployment of Multiplayer Online Games”, which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the “release” book to those who help with improving; for further details see “Book Beta Testing“. All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

After having discussed the credit card processing, we’ve got all the terminology we need to talk about different ways how credit cards and other payment methods can be used.

Two Across-the-Board Caveats

Before proceeding further, we need to make two all-important observations.

The first one is simple and rather obvious:

In the context of payments, whatever we do, we MUST NOT trust Client (leave alone trusting web browser)

Sure, as we’ve already discussed (in particular, in Chapter III), we shouldn’t trust Client even when we’re not speaking about payments; however, in the context of Chapter III, there were some cases (in particular, Lag Compensation for first-person shooters) where some limited trust to the Client was more or less necessary to provide enjoyable player experience. For anything payment-related, any trust to the Client is a Big No-No, period.

The second observation is related to the fact of life that

Whatever protocol we’re using, distributed transactions can result in “Unknown” transaction status1

Even if we’re using supposedly reliable TCP (HTTP/whatever-else), if we established TCP connection, sent the request, and got nothing back (with TCP connection terminated or hanged for whatever reason) – we have absolutely no way to say whether our transaction came through or not; essentially – the transaction has an “Unknown” status. For some data flows, it can be “Initiated-but-not-Confirmed” status rather than “Unknown” status, but the point about distributed transactions having “we don’t know what is going on, yet” status, still stands.

Surprised hare:Overall, recovery from transactions in 'Unknown' state is doable for a pretty much any Payment Processor; however, implementing it properly can be quite a headacheOverall, recovery from such scenarios is doable for a pretty much any Payment Processor; however, implementing it properly can be quite a headache (especially if facing a processor with ridiculous API requirements such as 2-hour limit on transactions requested, as mentioned in [[TODO]] section below).


1 Or a reasonable facsimile

 

Data Flows for Payment Processing

There are tons of different payment processing systems out there, and each tends to have its very own API (usually with a fair share of its own idiosyncrasies). Still, having seen more than a dozen of different payment providers, IMO from data flow point of view, I can separate all the payment processing systems I know into three distinct categories; let’s name them “Direct Processing”, “Indirect Processing”, and “Client-Centric Processing”.2


2 Apparently, there is a way not to trust the Client even if the processing is Client-Centric; we’ll see how it can be achieved below.

 

Direct Processing

The most straightforward (though not necessarily the simplest to implement) way of payment processing goes along the following lines:

  • Your Game Client (or your web site) collects information from your player, and passes it to one of your servers (let’s name it Cashier Server).
  • Your Cashier Server, on behalf of the player, and using the information it has got from your player, goes to your Payment Provider.
  • Your Payment Provider validates all the information – and tells your Cashier Server that transaction is either APPROVED, or DECLINED.
  • On transaction being APPROVED – your Cashier Server initiates required updates to player’s inventory etc.

As noted above, the transactions can end up in a “UNKNOWN” state and will need to be recovered. Also – you need to be careful to make sure that at least eventual consistency is guaranteed between Cashier Server and the DB which contains player inventory. If Cashier Server operates over the same DB as the one with PLAYERS and their inventories – it is not a problem, but as we’ll see a little bit below, there are often Really Good Reasons to keep Cashier Server DB very separate from everything else – and then reaching consistency, while not exactly being a rocket science, will require certain efforts from your side.

[[TODO: in-Client vs On-Web]]

Recovery from ‘Unknown’ transaction State

For Direct Processing, transaction is in ‘Unknown’ state at least right after we have sent it to our Payment Processor, and until we’ve got reply from the other side of communication. To recover from such ‘Unknown’ transactions, we’ll need to:

  • Have a way to identify the transaction
    • Surprised hare:We cannot use payment provider’s ID to identify transactions in 'Unknown' state (as we got nothing from them yet)We cannot use payment provider’s ID for this purpose (as we got nothing from them yet).
    • Ideally – we should use “our” ID, but surprisingly few payment processors support this concept.
    • Besides “our” ID – identifying transactions gets relatively ugly, but usually identifying by comparing all the available transaction fields, does the trick.
      • However, beware of seemingly identical transactions (which, while being seemingly identical on the payment provider’s side, can have different implications on your side, and you should be sure not to apply the same seemingly identical transaction twice instead of applying two different ones). On the other hand, such collisions, provided that payment provider gives you enough fields to compare, are extremely rare.
    • Make sure that the transaction is stored and committed (with ‘Unknown’ status) to our DB before we even send the data to the payment processor. It should be updated to whatever-status-is-received-from-Payment-Processor – as soon as such status is received.
    • There should be a mechanism provided by Payment Processor to recover from this ‘Unknown’ status. These mechanisms vary greatly from one payment provider to another one. I saw at least two significantly different approaches being used for this purpose:
      • All transactions are implemented as idempotent by the Payment Provider. It means that if you got a transaction with the ‘Unknown’ status – you just need to repeat it, and if it is a duplicate – Payment Processor will detect it and will return the same result it returned before. NB: all such mechanisms tend to rely greatly on “your” transaction ID being unique, so make sure that they’re indeed unique.
        • Actually, our generic ‘Inter-DB Async Transfer’ protocol as discussed in Chapter III (and later in Chapter XVII), can be seen as an incarnation of such an idempotent implementation.
      • Ability to request “whether this transaction came through” (and to get its result if it did).
        • Ways to request this information, vary greatly too from one Payment Provider to another one. The worst I’ve seen, was along the lines of “you can request the list of all your transactions to see whether the one you want has made it, but the list is time-based (with time being in terms of their server) and cannot cover more than 2 hours”.3 As you can imagine, if they were down for more than 2 hours, handling recovery of ‘Unknown’ transactions became quite convoluted to put it mildly.

Bottom line: while ways of recovery from “Unknown” transactions vary, it is certainly doable at least for vast majority of the Payment Processors out there.


3 BTW, as this request took ages to complete even with just 2 hours, I’m pretty sure that DBAs of that payment provider had no idea about indexes as (MERCHANT_ID, TIMESTAMP DESCENDING) – such an index, as we’ll see in Vol. 3, will allow to get all the required transactions within milliseconds.

 

Main Caveat of Direct Processing – Trusting Merchant

Hare with omg face:With the Direct Processing, customer should trust us (the merchant) with their detailsThe whole “Direct Processing” schema is all very straightforward, until we realize one all-important thing about it. With the Direct Processing, customer should trust us (the merchant) with their details (for example, for credit cards – with all the information about the credit card). Moreover, at least for credit cards this information-she-trusts-us is inherently sufficient to initiate not just this transaction, but to initiate any transaction.

This leads to several all-important complications – such as inherent deniability for the player (!). Worse than that, in credit card world, it has led to a problem of leaking credit card details; the problem was soooo bad, that about 15 years ago, VISA and MasterCard decided to introduce special requirements which apply to everybody who processes credit card numbers (in this field, credit card numbers are known as Primary Account Numbers, or PANs). These requirements are known as PCI DSS (Payment Card Industry Data Security Standard), and apply to all the merchants processing credit cards in direct manner as described above.

PCI DSS

Complying with PCI DSS (which becomes necessary as soon as you’re trying to process credit cards in a Direct Processing manner) is quite a headache. Well,

technically, you may be able to ignore it for a while (as compliance for lower-volume merchants is done via self-assessment, so you can cheat), but I STRONGLY advise against going this way.4

And honest complying with PCI DSS is indeed quite complicated. First of all, let’s note that with Direct Processing as described above – you’ll probably need so-called Self-Accessment Questionnaire D (PCI DSS SAQ D) – and it is 335 items long 🙁 .

Hare thumb up:On the other hand, most of PCI DSS requirements make perfect sense regardless of formal complianceOn the other hand, most of PCI DSS requirements make perfect sense regardless of formal compliance (i.e. most of PCI DSS represents “security best practices”), so taking a look at it is a good thing anyway.


4 If your system is broken and you get caught cheating on your PCI DSS self-assessment – more often than not, it will be the end of your business.

 

Complying with PCI DSS

Complying with PCI DSS is a complicated topic, and

If you need to comply with PCI DSS, you DO need to know your security stuff.

On the other hand, if you’re going to run a medium-sized game with a few hundred thousand simultaneous players – you’ll need to think about your security anyway, and most of the PCI DSS requirements are good to follow regardless of being formally compliant.

In other words – if you’re serious about your multiplayer game, you need to have a strong Server-Side team anyway, and they should know a bit or three about security too. On the third hand –

Risks of PCI DSS on-compliance can be Damn High,5 so unless you have somebody on the team with a serious security experience – probably it is better to avoid PCI DSS (and direct CC processing) for the time being.

On the fourth hand (yes, I know I’m halfway to becoming an octopus) – it is still better to design your system in a way which will allow to comply with PCI DSS rather easily later, when your game grows larger, and you can afford a security specialist on your team.


5 If you’re hacked with PANs stolen, and you’re found incompliant – it can mean instant death to your business 🙁

 

Architecting for PCI DSS

As noted above, when architecting your payment processing, I am arguing for doing it in a way which will enable doing PCI DSS later if/when you feel like it (and if/when your game is successful enough, you almost certainly will have reasons at least to consider doing your own credit card processing).

Fig XVIII.1 shows an architecture, based on the Classical Deployment Architecture which was discussed in Chapter VII, and which is suitable for future expansion into PCI DSS:

Fig XVIII.1

Here, everything which is non-payments-related goes along the lines discussed in Chapter VII (and please note that Front-End Servers are optional).

As for the payments – even without PCI DSS, it happens very convenient to have a Cashier Server (which handles all the communications with the Clients, but knows only the very bare minimum about the specifics of the payment gateways.

Hare pointing out:The key point with regards to being PCI-DSS-proof, is to have each of Payment Gateway Servers to have their own database, rather than using your big DB sitting behind accessible-for-all DB Server.Unlike Cashier Server which is by design very generic, Payment Gateway Servers are inherently very specific to your payment providers (regardless of them being Direct Payment, Indirect Payment, or Client-Centric), and handle all the communications with a respective Payment Provider (this should include handling protocols, obtaining OAuth, recovering from transactions in ‘Unknown’ state, etc. etc.). The key point with regards to being PCI-DSS-proof, is to have each of Payment Gateway Servers to have their own database, rather than using your big DB sitting behind accessible-for-all DB Server. In fact, given the usual nature of Payment Gateway-specific data, this separation is not really restricting, and is actually quite a good way to ensure good decoupling of the time-critical main DB from payment-related processing. To ensure transaction consistency, we should use Asynchronous Inter-DB Transfers (first mentioned in Chapter III, and later discussed in Chapter XVII) to move the funds back and forth between main DB and per-gateway DBs.

From PCI DSS perspective, these per-gateway DBs (and Cashier Server using Inter-DB Transfer Protocol) enable easy migration into the following architecture which allows for a relatively easy support for PCI-DSS-compliant credit card processing:

Fig XVIII.1

[[TODO: make Client and Client-Server connection in bold]]

As we can see, on Fig XVIII.2 the differences from the Fig. XVIII.1 are very minimal: there is still pretty much the same Cashier Server, and there are still Payment Gateway Servers with their own DBs.

On the other hand – if we take a look at this architecture from PCI DSS compliance point of view – we’ll see that

Only those Servers, Links, and DBs, which are shown in bold, ever handle credit card numbers a.k.a. PANs

This will allow us to keep these elements in a separate network, properly protected according to PCI DSS requirements (and separated via firewalls from the rest of your network too),

…and leave pretty much everything else “as is”6

This whole trick is possible because of so-called PCI DSS scope, which very roughly can be described as “everything which transfers or stores PAN, is in scope” (in fact, it is a bit more complicated than that though general direction still stands, make sure to read PCI DSS itself very carefully; for more discussion on it – please refer to a pretty good discussion in [Halbleib]).

One important thing to be noted, is that if we want to simplify PCI DSS compliance based on this architecture,

We MUST NOT allow using credit card numbers outside of the protected PCI-DSS-compliant area

CSR Customer service advisors (representatives) interact with customers to provide answers to inquiries involving a company's product or services.— Wikipedia —However scary it may sound, apparently it is not that big deal for usual business purposes. First of all, you won’t have any problems with PCI DSS compliance as long as you use surrogate “CC_IDs” to identify credit cards, refer you credit card transactions by CC_ID within your main DB (and without storing PAN in any form), and keep a mapping of CC_IDs to real PANs within your Payment Gateway DB (i.e. within your PCI DSS perimeter).7

BTW, using crypto-hash (such as SHA-3) of PAN as its CC_ID, is NOT advisable for two reasons. First, it won’t make your life that much simpler – as long as reverting crypto-hash is impossible, to get full PAN from CC_ID you’ll still need to refer to that CC_ID-to-PAN table sitting within the PCI DSS perimeter. On the other hand, it may allow for a certain class of attacks allowing to revert crypto-hash into PANs (effectively a full-space search, optimized via rainbow tables); as soon as somebody does this work and publishes “rainbow table” – you’ll be in significant trouble. Moreover, if you reveal those “truncated PANs” – they will reduce the search space by orders of magnitude, and then reverting the hash can be done very easily (while searching original space – 10^16~=2^53 – is quite a feat, searching over a reduced space with just 5 digits known makes it 10^11~=2^37 which can be broken trivially).

Bottom line:

Stick to CC_IDs which are NOT derived from PAN – and you’ll be better

On the other hand – such occurrences when you do need to get PAN from CC_ID, will be very rare. In practice, I know only of the following uses for CC_ID/PAN:

  • Answering questions “whether it is the same CC”, and for these purposes CC_ID will do just fine without converting it to PAN.
  • Answering BIN-related questions such as “which country this CC belongs to” – and such wide categorizations, to the best of my knowledge, are generally allowed to be stored outside of PCI DSS perimeter (for example, as a CC_ID-to-country table within the main DB).
  • Showing (usually “truncated” per PCI DSS8) PAN to the end-user – and with the architecture above, this won’t cross our PCI DSS perimeter.
  • Showing (usually “truncated” per PCI DSS) PAN to the authorized CSR on the “need-to-know” basis. This is allowed by PCI DSS (though strict compliance is still rather tricky); OTOH, as long as you’re indeed truncating your PAN – it is not that much of a concern.

As a result – to the best of my understanding,9 it should be usually fine to:

  • use CC_ID outside of PCI DSS perimeter
  • implement an API to convert CC_ID into truncated PAN; this should be intended to show this truncated PAN to your CSRs
    • it is important NOT to have any API whatsoever which allows to extract full (non-truncated) PANs. If you can avoid providing such an API – dangerous leaks across your PCI DSS perimeter become rather unlikely, which usually makes auditors quite happy 🙂
  • if necessary business-wise – IIRC is ok to implement a mapping such as CC_ID-to-country and store it outside of your PCI DSS perimeter. Overall, the main concern of the whole PCI DSS is that somebody will steal your DB and will start using these stolen credit card numbers for fraudulent transactions – and mappings of CC_ID-to-country won’t be really helpful for this purpose.

Phew. We’ve got our compliance – and that’s without imposing PCI DSS requirements on the whole system (which can be indeed quite difficult in the context of games). In a sense – this whole approach demonstrates yet another (and rather unusual) benefit of micro-services (those with their own private DBs):

micro-services split along the appropriate lines, can significantly simplify the compliance

[[TODO: this architecture allows to avoid “overkills”, such as running an antivirus on each of game servers – or running an IDS on game server network]]


6 still following “best security practices”, but without compliance fanaticism 😉 .
7 in fact, it is likely that you’ll have this CC_ID-to-PAN mapping shared between different PCI-DSS-compliant Payment Gateways – though still staying within the PCI DSS perimeter
8 i.e. having form such as 1XXXXXXXXX5678
9 No warranties of any kind, make sure to double-check with your PCI DSS auditor if you happen to have one

 

[[To Be Continued…

Tired hare:This concludes beta Chapter 21(b) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”. Stay tuned for beta Chapter 21(c), where we’ll discuss indirect processing, client-centric processing, and reconciliation.]]

Don't like this post? Comment↯ below. You do?! Please share: ...on LinkedIn...on Reddit...on Twitter...on Facebook

[+]References

Acknowledgement

Cartoons by Sergey GordeevIRL from Gordeev Animation Graphics, Prague.

Join our mailing list:

Comments

  1. qm2k says

    > BTW, to the best of my knowledge (though without warranties of any kind), it is ok to use crypto-hash (such as SHA-3 hash) of PAN as it’s CC_ID; still, doing it won’t make your life that much simpler – as reverting crypto-hash is by design impossible, to get full PAN from CC_ID you’ll still need to refer to that CC_ID-to-PAN table sitting within the PCI DSS perimeter.

    Given very small size of the PAN space, any hash (especially good one) can be trivially reversed by brute force/rainbow table. Do not recommend anybody doing it this way.

    • "No Bugs" Hare says

      Realistically speaking, a rainbow table on a full space of 2^53 is going to take dozens of terabytes – and calculating it is not really an easy feat even these days. On the other hand – this technique (using hash as CC_ID), while not providing any significant benefits – does present a risk (for no reason), and moreover – can be misused very easily (for example, if combined with “truncated” numbers, the search space can easily go down by orders of magnitude without developers realizing it – and THEN it will become a problem). I’ve changed the article to recommend against it, THANKS!

  2. Jesper Nielsen says

    If you’re entering PAN in the client itself (instead of loading a web-browser) doesn’t this also require the client to be PCI DSS compliant?

    • "No Bugs" Hare says

      Strictly speaking – yes, but as long as Client doesn’t store PAN (just transmitting it to a PCI DSS compliant back-end via TLS) – it rarely causes practical problems.

  3. Mikhail Maximov says

    Fig XVIII.2 may be done better – the line between non-compliant gateways and their databases is rather ugly.
    I see two possible ways to improve:
    – move non-compliant gateways all the way down (and mirror compliant part; or
    – swap non-compliant databases with db-server (and its db)

    • "No Bugs" Hare says

      TBH, I tried it in different ways, but still wasn’t able to find non-ugly way to draw the diagram (which at the same time doesn’t deviate too far from Fig XVIII.1). Will think more about it on the next pass.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.