codezombie March 24, 2015 at 04:39

Antifraud. Functional and non-functional requirements (part 2)

In the first part of the experiment, it was described why the problem of fraud payments ( fraud ) is acute for all participants of the online payment market, what difficulties it is necessary to overcome to create its own fraudulent payment monitoring system ( antifraud system ), and why for most merchants such systems expensive pleasure for which they are not always ready to pay.

Another complicating development of such systems is the fact that the antifraud-system is a business-critical system and its simple will lead either to a halt in the business process (payment acceptance), or if the system does not work correctly, to increase the risks of financial and reputation losses for company (online store, bank).

Therefore, the practices and approaches listed in the article are applicable not only on the side of the merchant, but on the side of other participants in Internet acquiring - aggregators, payment systems, banks. Moreover, the approaches listed in the article are often closed to the best practices community in their respective organizations.

This part will describe the requirements for an antifraud system whose impact on the software architecture is significant.

Non-functional requirements

Quality attributes

About the selection of quality attributes

I will not stretch the description with an explanation of why I included those other attributes of quality, since such an explanation is obvious in nature if we take into account the type of system being designed - business critical .

In addition, I will not intentionally indicate specific figures on the time of availability of the antifraud system and other quality attributes, since the article does not set the goal of discussing a single system. Instead, the set of approaches and principles underlying such systems is described .

Quality Attributes:

distribution ;
fault tolerance ;
high scalability ;
reliability .

Legal restrictions

Legal restrictions are one of the important factors determining the software architecture of the antifraud system.
So, according to the requirements of the PCI DSS standard, you cannot store the full card number (PAN) * or security code (CVV). It is allowed to store the first six and last four digits of the card. Also, nothing prohibits generating an internal unique identifier for customer cards. The name of the holder and the expiration date of the card are allowed to be transmitted only through secure channels.

* About storing a PAN number

In fact, at a high level (somewhere around the 80th :) PCI DSS certification is allowed to store PAN in encrypted form.

In addition to the requirements of the PCI DSS standard, it is necessary to comply with the provisions of the Law on Personal Data (152-FZ).
A discussion of the whole variety of technical and bureaucratic procedures (with the ensuing legal subtleties) that are necessary just for storing and processing the surname and name of the client will most likely take 10 sheets of instructions and 1.5 months of work to implement these instructions (just kidding, but partially). Therefore the best way~~do not create unnecessary work for yourself~~comply with the provisions of 152-ФЗ - do not fall under its action.

In the designed antifraud system, all program modules will work with depersonized data .

Summing up the legal restrictions, we add the following requirements to the system:

Do not store PAN and CVV cards in any form;
store other payment data only in a secure form ;
transfer information between the merchant (software client) and the antifraud system only through secure communication channels ;
Work only with depersonized data .

Functional requirements

API requirement

First, consider the requirements for the system from the point of view of the outside world, i.e. software clients ( merchants ). Software clients interact with the antifraud system in accordance with the following API requirements:

Functional:

Provide the client with an API to send payment data ;
Return to the client the result of the prediction whether the payment is fraudulent ;
Provide the client with an API to adjust the payment results .

Non-functional:

Provide a public protocol for interacting with the client ;
Maintain the interaction with the client only through secure communication channels .

Business requirements

From the point of view of the internal logic of the antifraud system, we single out only one essential business requirement: to predict whether the transaction will be successful from the payment data .
In the process of implementing this requirement, we will try to prove that the payment will not pass. Consider the main reasons for the refusal to conduct a transaction: payment data is incorrect or the transaction is fraudulent . Below we will analyze the methods for checking each of the listed reasons.

Validation of payment data entry

You should not hope that the merchant will properly verify the payment details. Regardless of whether it was a user input error or malicious actions , identifying errors in payment details at an early stage will help to save both CPU cycles and prevent noise of the trained model (we will talk about it later).

It is necessary to check whether the name of the card holder contains at least 2 letters (dashes and numbers in the name are acceptable), whether the card is valid (the card has an expiration date), whether the card number passes the check by the Moon algorithm.

Algorithm Moon

Algorithm Moon (Luhn algorithm) - an algorithm for calculating the check digit of the number of plastic cards. Designed to detect errors caused by inadvertent data corruption. It allows only with a certain degree of reliability to judge the absence of errors in the card number.

Checking if the transaction is fraudulent

To identify a sign that the payment is fraudulent, there are a large number of heuristics . Some companies boast a figure of 200 heuristics. Although I immediately suspect that some of these heuristics are either not supported by anything, or are the result of some other heuristic, or it’s a crutch at all, allowing you to better tailor the result to the training sample and not have any effect on real data. A large number of heuristics gives only: a retrained model, incorrect recognition of whether a transaction is fraudulent and a decrease in application performance .

Therefore, I will list only the main and, in the general case, the most effective heuristics :

one card is many IPs, and the opposite is the case: one IP is many cards;
one card - many purchases / failed attempts;
one client - many cards (especially issued by various banks);
one client - many indexes, emails;
the client’s name does not coincide with the name of the account owner on the merchant’s website (if any);
the client’s country does not coincide with the country of the owner of the account on the merchant’s website (if any);
Payment takes place overnight (according to the local time of the client).

But how much is “a lot”? For what period of time (5 seconds or 2 weeks)? How to get around the problem that the weight of the filter x ₁ in is not equal to the weight of the filter x ₂ , and the values of their weights should dynamically change during the application?

Often the main approach is the naive assignment of a fixed value for some of the filters and the subsequent processing of these conditions in constructions of the type (this is pseudocode, not 1C):

if (количество_карт_с_одного_ip > 4) {
	статус_платежа = отклонен;
	return;
}
else {
	if (количество_покупок_с_карты_за_1_час > 5) {
		статус_платежа = отклонен;
		return;
	}
	else {
		// continuation magic…
	}
}
// проведение платежа…

I don’t even want to start listing the shortcomings of this approach and the final cost of such a code, which will consist of losses from false positives to reject “decent” payments and skipping fraud with a small change of strategy by fraudsters .

Therefore, the only right decision will be to develop a system in which heuristic filters are capable of self-learning both on the accumulated payment history and on new payments . Here, we will have several machine learning algorithms at our choice : logistic regression, support vector method, neural networks.

Global filters

I call global filters lists where there is a payer in which it is pointless to carry out all other checks - the validity of payment data, checking for fraud. To such lists I include black lists of bank cards, IP, countries, merchants .

Global filters can be either static or dynamic, can be associated both with business rules (the merchant does not accept payments from the Arctic), and with the detection of abnormal activity (IP address).

Conclusion of the 2nd part

In the first two parts, we examined the main aspects of a predominantly non-technical nature that must be taken into account when designing and developing a system for recognizing fraudulent payments.

We are going to create a fault-tolerant, highly scalable, reliable antifraud service , which will be open to software clients through the REST API (https) "outside" and contain logic based on machine learning methods "inside" . To give even more intrigue, I will say that the service will work on one of the public cloud platforms .

In the next part, we finally~~let's get down to business~~ consider the software architecture of the antifraud service, its modular structure and key details of the implementation of such a service.

Tags: