codezombie March 26, 2015 at 01:57

Antifraud. Service Architecture (Part 3)

This is the third part of the experiment to create a fraud payment recognition system ( antifraud system ). The aim is to create affordable (in terms of the cost of development and ownership) antifraud-service, which enables multiple members of the online-payments - Merchant, aggregators, payment systems, banks - to reduce the risk of fraudulent charges ( fraud ) through their site.

In the last part, we focused on the functional and non-functional requirements for antifraud service. In this part of the article, we will consider the software architecture of the service, its modular structure and key details of the implementation of such a service .

Antifraud in azure

Infrastructure

The service is a few applications running in Microsoft Azure. Hosting using a cloud platform instead of on-premise hosting will not only allow you to develop a service that meets all the requirements listed in the second part in the “Non-functional requirements -> Quality attributes ” section at a negligible time , but also significantly reduce the initial financial costs for hardware and software security.

Antifraud service consists of the following systems:

Antifraud API Service - A REST service that provides an API for interacting with the Fraud Predictor ML service.
Fraud Predictor ML is a fraud detection service based on machine learning algorithms.
Transactions Log - NoSQL storage of transaction information.

In addition, the service has numerous Clients , which are merchant web applications or js widgets that call the Antifraud API Service REST services.

The schematic diagram of the interaction of these systems is illustrated above.

Used architectural patterns

Infrastructure, along with the subject area and legislative acts, potentially carries a large number of restrictions that must be taken into account at the architectural level. And if we already discussed domain and legal restrictions in the previous parts of the article, we will discuss the advantages and limitations associated with choosing the Microsoft Azure cloud platform below.

The Azure services used by the antifraud system are the Cloud service for web / worker roles, Azure Table, Azure Queue, Azure ML, etc. - in addition to almost zero initial financial costs for infrastructure give the following advantages out of the box:

high availability : SLA not lower than 99.95%;
storage reliability : high redundancy storage systems;
storage security : certificates of ISO 27001/27002 and others , including PCI DSS 3.0;
fault tolerance : all work nodes can (recommended) be run in multiple instances;
scalability : automatic scaling of the number of work nodes depending on the load, partitioning of NoSQL storage tables based on PartitionKey;

As bonuses, I consider:

convenient application monitoring;
deep integration with Visual Studio.

But it turned out to take advantage of all these advantages only thanks to the "sharpening" of the architecture of the antifraud service under the cloud, like this:

web / worker nodes are stateless ;
horizontal partitioning for storing structured or semi-structured data (Sharding Pattern [1]);
network interactions occur only asynchronously and only with the use of retry policies (Retry Pattern [1]);
for load balancing and guaranteed task processing, message queues are used (Queue-Based Load Leveling Pattern [1]).

In addition, the antifraud service is near real-time system, therefore, when implementing the antifraud service:

we use parallel data algorithms (the simplest and one of the most efficient MapReduce );
we use the Push'n'Forget approach for such places as saving a single record in the transaction log (the accuracy of the machine learning algorithm, one missing record out of 10K successful will not have a strong impact);
avoid blocking the transaction log (any shared resources), which is achieved by adding the timestamp field to the transaction information;
“Kill” (or at least do something with them) long requests .

You must also keep in mind that all cloud services have limitations:

of a technical nature : the most frequent of them are the maximum number of requests per second, the maximum message size;
and technological in nature: the most serious of them are the supported protocols for interacting with PaaS services.

Interaction between service components

For a merchant, a service is a REST service with which you can interact via the https protocol - Antifraud API Service. Antifraud API Service operates in a cluster consisting of several stateless web roles (the web role in Azure is the application layer acting as a web application).

The following sequence diagram describes the possible interactions of the merchant with all subsystems of the antifraud service.

Antifraud sequence diagram

Step 1. Sending a request with payment information.
Step 2. Transformation of the Model (in terms of MVC).
Step 3. Sending a request for a payment prediction service.
Step 4. Returning the result - whether the payment will be successful.
Step 5. Saving data.
Step 6. Returning the result to the client.
Step 7, 8. Recalculation and updating of the training sample, retraining of the model.
Step 9-12 (optional). The client initiates sending a request with information about the payment result (in the case when the prediction result differs from the real payment result transmitted in the request).

Consider each of the steps in more detail.

The request from the merchant arrives at the Controller (in terms of MVC) (Step 1). After which the resulting Model (in terms of MVC) passes:

transformation from a controller model to a domain object;
a request to external geolocation services (Azure Marketplace), in order to find out the country by the payer index and the country by IP of the host from which the request to withdraw funds from the card came;
verification step through global filters;
stage of validation of payment data;
preliminary analysis of the received transaction - we consider heuristics for time frames of 5 seconds, 1 minute, 24 hours;
concealment of personal data of the buyer and payment data - the name of the card holder, the name of the account holder on the merchant’s website, payer address, phone number, email are hashed.
delete unnecessary data - for example, data on the card expiration date after step 4 will not be needed.

Heuristics, global filters and signs of validity of payment data were discussed in detail in the previous part of the article.

In step 2, the domain object is transformed into a DTO object, which:

transferred to the Fraud Predictor ML service (step 3);
after receiving a response from Fraud Predictor ML (step 4), information about the transaction and its result is saved in the transaction log (step 5) (about it a little later);
we return to the client the answer about the predicted result of the payment (fraudulent or not).

To improve the quality of the prediction algorithm, clients can use the API to refine the results of a transaction. So, if the actual result of the payment was different from the value returned by our antifraud service, the merchant can report this by sending a request to clarify the results of the transactions (step 9). Such requests:

have the format ;
processed by the Merchant API Service and, after validation, are placed in Azure Queue (fault tolerant queue service).

Requests are taken from the queue by one of the robots, which are a stateless worker role (the worker role is in Azure this is the application layer that acts as a handler).

Transaction store

Both information about transactions and additional information on them (mainly statistics) are stored in the transaction log - long-term storage based on the Azure Table (a service that is a fail-safe NoSQL-storage (key-value)).

The transaction log is 2 tables:

table with facts about the transaction TransactionsInfo : transaction id (Row Key), merchant id, hash of the name of the card holder (if available), payment amount and currency, etc .;
a table with the calculated statistical metrics of TransactionsStatistics : how many times they paid from this card (several timeframes), from how many IP addresses, how the time interval was between payments, how long a customer had registered with a merchant, how many times he made successful payments, etc.

At steps 7, 8, the model is retrained. The training sample is data from the transaction log, because The log repository contains the latest information on payments and their results. Retraining can occur on a schedule, by the appearance of a fixed value of new entries in the transaction log, to overcome a certain threshold of incorrect predictions.

We will touch on the issue of training the model for detecting fraudulent payments in the next final part.

Conclusion of the 3rd part

In this part, we discussed the architecture of the anti-fraud service, distinguished the functional parts in it - Antifraud API Service, Fraud Predictor ML, Transactions Log, defined their areas of responsibility, as well as ways of interaction between each other.

With the right approach to architecture, deploying an anti-fraud service in the Microsoft Azure cloud will significantly reduce the initial financial costs of the infrastructure, as well as reduce the time spent on issues related to system scalability, reliable data storage and high availability of services.

In the next concluding part, we will continue to create an antifraud service, an order of magnitude cheaper in development and ownership costs than its counterparts -we’ll develop the Fraud Predictor ML service, which is based on the Azure Machine Learning service and is the analytical core of the antifraud service.

Useful sources

[1] Cloud Design Patterns: Prescriptive Architecture Guidance for Cloud Applications , MSDN.

Tags:

antifraud