alexyakovlev90 April 11, 2019 at 18:41

How we made cloud FaaS inside Kubernetes and won at Tinkoff Hackathon

Since last year, hackathons began to be organized in our company. The first such competition was very successful, we wrote about it in the article . The second hackathon was held in February 2019 and was no less successful. The organizer recently wrote about the goals of the latter .

The participants were given a rather interesting task with complete freedom in choosing a technology stack for its implementation. It was necessary to implement a decision platform for a convenient deployment of customer scoring functions that could work on a fast flow of applications, withstand heavy loads, and the system itself was easily scaled.

The task is non-trivial and can be solved in many ways, as we saw in the demonstration of the final presentations of the participants' projects. There were 6 teams of 5 people at the hackathon, all participants had good projects, but our platform turned out to be the most competitive. We got a very interesting project, which I would like to talk about in this article.

Our solution is a platform based on Serverless architecture inside Kubernetes, which reduces the time needed to bring new features to production. It allows analysts to write code in an environment convenient for them and deploy it in the prod without the participation of engineers and developers.

What is scoring?

Tinkoff.ru, like many modern companies, has customer scoring. Scoring is a customer assessment system based on statistical methods of data analysis.

For example, a client asks us to give him a loan, or open an IP account with us. If we plan to give him a loan, then you need to evaluate his solvency, and if the account is private equity, then you need to be sure that the client will not conduct fraudulent transactions.

These decisions are based on mathematical models that analyze both the data of the application itself and the data of our storage. In addition to scoring, similar statistical methods can also be used in the work of the service for generating individual recommendations on new products for our customers.

The method of such an assessment can receive a variety of input data. And at some point, we can add a new parameter to the input, which, according to the analysis on historical data, will increase the conversion of the use of the service.

We store a lot of data on customer relationships, and the volume of this information is constantly growing. For scoring to work, data processing also requires rules (or mathematical models) that allow you to quickly decide who to approve the application, who to refuse, and who else to offer a couple of products to assess its potential interest.

For this task we already use a specialized decision-making system IBM WebSphere ILOG JRules BRMS, which, on the basis of rules set by analysts, technologists and developers, decides whether to approve a particular banking product to a client or refuse.

There are many ready-made solutions on the market, both scoring models and decision-making systems themselves. We use one of these systems in our company. But the business is growing, diversifying, both the number of customers and the number of products offered are increasing, and along with this, ideas are emerging how to improve the existing decision-making process. Surely people working with the existing system have many ideas on how to make it simpler, better, more convenient, but sometimes ideas from the outside are useful. In order to collect sound ideas, a New Hackathon was organized.

Task

The hackathon was held on February 23. The participants were offered a combat mission: to develop a decision-making system, which was supposed to meet a number of conditions.

We were told how the existing system functions and what difficulties arise during its operation, as well as what business objectives the platform under development should pursue. The system should have a fast time-to-market of rules being developed so that the working code of analysts gets into production as quickly as possible. And for the incoming flow of applications, decision-making time should tend to a minimum. Also, the developed system should be able to cross-sell in order to give the client the opportunity to purchase other company products if they are approved by us and potential interest on the part of the client.

It is clear that in one night it is impossible to write a ready-to-release project that will certainly go into production, and the whole system is quite difficult to cover, so we were asked to implement at least part of it. A number of requirements were established that the prototype must satisfy. One could try to cover all the requirements as a whole, and to work out in detail individual sections of the developed platform.

As for technology, then all participants were given complete freedom of choice. It was possible to use any concepts and technologies: Data streaming, machine learning, event sourcing, big data and others.

Our decision

After a small brainstorming session, we decided that FaaS solution would be ideal for the task.

For this solution, it was necessary to find a suitable Serverless framework for implementing the rules of the developed decision-making system. Since Kubernetes is actively used in infrastructure management in Tinkoff, we examined several ready-made solutions based on it, I will talk more about it later.

To find the most effective solution, we looked at the developed product through the eyes of its users. The main users of our system are analysts involved in the development of rules. Rules must be deployed to the server, or, as in our case, deployed to the cloud for subsequent decision making. From the perspective of the analyst, the workflow is as follows:

The analyst writes a script, rule, or ML model based on data from the repository. As part of the hackathon, we decided to use Mongodb, but the choice of storage system is not important here.
After testing the developed rules on historical data, the analyst pours his code into the admin panel.
In order to ensure versioning, all code will go to Git repositories.
Through the admin panel, it will be possible to deploy the code in the cloud as a separate functional Serverless module.

Source data from customers should go through a specialized Enrichment service, designed to enrich the initial request with data from the repository. It was important to implement this service in such a way that it would work with a single repository (from which the analyst takes data when developing the rules) in order to maintain a unified data structure.

Even before the hackathon, we decided on the Serverless framework that we will use. There are quite a few technologies on the market that implement this approach. The most popular Kubernetes architecture solutions are Fission, Open FaaS, and Kubeless. There is even a good article with their description and comparative analysis .

After weighing all the pros and cons, we opted for Fission. This Serverless framework is quite easy to manage and meets the requirements of the task.

To work with Fission, you need to understand two basic concepts: function and environment. Function (function) is a piece of code written in one of the languages for which there is a Fission-environment (environment). The list of environments implemented within the framework of this framework includes Python, JS, Go, JVM and many other popular languages and technologies.

Fission is also able to perform functions, divided into several files, pre-packaged in the archive. The Fission operation in the Kubernetes cluster is provided by specialized pods, which are managed by the framework itself. To interact with cluster pods, each function must be assigned a route, and to which you can pass GET parameters or request body in case of a POST request.

As a result, we planned to get a solution that allows analysts to deploy developed rule scripts without the participation of engineers and developers. The described approach also eliminates the need for developers to rewrite the code of analysts in another language. For example, for the current decision-making system that we use, we have to write rules in narrow-profile technologies and languages, the scope of which is extremely limited, and there is also a strong dependence on the application server, since all draft bank rules are deployed in a single environment. As a result, for the deployment of new rules, it is necessary to release the entire system.

In the solution we proposed, there is no need to release the rules, the code is easily deployed at the click of a button. Also, infrastructure management in Kubernetes allows you not to think about the load and scaling, such problems are solved out of the box. And the use of a single data warehouse eliminates the need to compare real-time data with historical data, which simplifies the work of the analyst.

What did we get

Since we arrived at the hackathon with a ready-made solution (in our fantasies), we only need to convert all our thoughts into lines of code.

The key to success at any hackathon is preparation and a well-made plan. Therefore, first of all, we decided which modules our system architecture will consist of and which technologies we will use.

The architecture of our project was as follows:

This diagram shows two entry points, an analyst (the main user of our system) and a client.

The work process is structured like this. The analyst develops the rule function and the data enrichment function for his model, saves his code in the Git repository, and deploys his model in the cloud through the administrator’s application. Consider how the expanded function will be called and make decisions on incoming requests from clients:

The client filling out a form on the site sends his request to the controller. An application arrives at the input of the system, according to which a decision must be made, and is recorded in the database in its original form.
Next, a raw request is sent for enrichment, if necessary. You can supplement the initial request with data both from external services and from the repository. The received rich query is also stored in the database.
The analytic function is launched, which receives an enriched request at the input and gives a decision, which is also recorded in the repository.

As a storage in our system, we decided to use MongoDB in view of the documented storage of data in the form of JSON documents, since enrichment services, including the initial request, aggregated all the data through REST controllers.

So, we had a day to implement the platform. We quite successfully distributed the roles, each team member had his own area of responsibility in our project:

The frontend admin panel for the analyst’s work through which he could download the rules from the version control system of the written scripts, choose the options for enriching the input data and edit the rules scripts online.
A backend admin panel that includes a REST API for the front and VCS integration.
Setting up infrastructure in Google Cloud and developing a source data enrichment service.
The module for integrating the admin application with the Serverless framework for the subsequent deployment of the rules.
Scripts of rules for testing the health of the entire system and aggregation of analytics for incoming applications (decisions made) for the final demonstration.

Let's start in order.

Our frontend was written in Angular 7 using the banking UI Kit. The final version of the admin panel was as follows:

Since there was not much time, we tried to implement only the key functionality. To deploy a function in Kubernetes cluster, it was necessary to select an event (a service for which you need to deploy a rule in the cloud) and code the code of the function that implements the decision logic. For each deployment of the rule for the selected service, we wrote a log of this event. In the admin panel, you could see the logs of all events.

All function code was stored in a remote Git repository, which also had to be set in the admin panel. To version the code, all functions were stored in different branches of the repository. The admin panel also provides the ability to make adjustments to written scripts so that before deploying a function to production, you can not only check the written code, but also make the necessary changes.

In addition to the functions of the rules, we also realized the possibility of stage-by-stage enrichment of the source data using Enrichment functions, the code of which also consisted of scripts in which you could go to the data warehouse, call third-party services and perform preliminary calculations. To demonstrate our solution, we calculated the zodiac sign of the client who left the application and determined its mobile operator using a third-party REST service.

The platform backend was written in Java and implemented as a Spring Boot application. To store the admin data, we originally planned to use Postgres, but, as part of the hackathon, we decided to limit ourselves to a simple H2, in order to save time. On the back-end, integration with Bitbucket was implemented to version the query enrichment functions and rule scripts. For integration with remote Git repositories, the JGit library was used, which is a kind of wrapper over CLI commands, allowing you to execute any git instructions using a convenient programming interface. So we had two separate repositories, for enrichment functions and rules, and all scripts are arranged in directories. Through the UI, it was possible to select the last commit script of an arbitrary repository branch. When making changes to the code through the admin panel, commits of the modified code were created in the remote repositories.

To implement our idea, we needed a suitable infrastructure. We decided to deploy our Kubernetes cluster in the cloud. Our choice is Google Cloud Platform. The Fless serverless framework was installed on the Kubernetes cluster, which we deployed to Gcloud. Initially, the source data enrichment service was implemented by a separate Java application wrapped in a Pod inside a k8s cluster. But after a preliminary demonstration of our project in the middle of the hackathon, we were recommended to make the Enrichment service more flexible in order to provide an opportunity to choose how to enrich the raw data of incoming applications. And we had no choice but to make the enrichment service also Serverless.

To work with Fission, we used the Fission CLI, which must be installed on top of the Kubernetes CLI. Deploying functions in the k8s cluster is quite simple, you only need to assign an internal route and ingress for the function to allow incoming traffic if access outside the cluster is required. Deploying one function usually takes no more than 10 seconds.

Final show of the project and summing up

To demonstrate the operation of our system, we placed on a remote server a simple form on which you can apply for one of the bank's products. For the request, you had to enter your initials, date of birth and phone number.

The data from the client form went to the controller, which simultaneously sent applications for all available rules, pre-enriching the data according to the given conditions, and storing them in a common storage. In total, we have deployed three decision-making functions for incoming applications and 4 data enrichment services. After sending the application, the client received our solution:

In addition to rejection or approval, the client also received a list of other products for which we sent requests in parallel. So we demonstrated the possibility of cross-sale in our platform.

In total, 3 invented bank products were available:

Credit.
A toy
Mortgage.

During each demonstration, we deployed prepared functions and enrichment scripts for each service.

Each rule needed its own set of input data. So, to approve the mortgage, we calculated the customer’s zodiac sign and associated this with the logic of the lunar calendar. To approve the toy, we checked that the client was of legal age, and to issue a loan, we sent a request to an external open service for determining the mobile operator, and made a decision on it.

We tried to make our demonstration interesting and interactive, everyone present could go into our form and check the availability of our imaginary services to him. And at the very end of the presentation, we demonstrated the analytics of the applications received, where it was shown how many people used our service, the number of approvals, refusals.

To collect analytics online, we additionally deployed the Metabase open-source BI tool and screwed it to our repository. Metabase allows you to build screens with analytics based on the data we are interested in, you only need to register a connection to the database, select tables (in our case, data collections, since we used MongoDB), and specify the fields of interest to us.

As a result, we got a good prototype of the decision-making platform, and at the demonstration each listener could personally test its performance. An interesting solution, a ready-made prototype and a successful demonstration allowed us to win, despite the strong competition in the face of other teams. I am sure that on the project of each team, you can also write an interesting article.

Tags: