# Kings of Mathematics: Big Data analytics in a bank. GAUSS project at VTB

Which bank offers for opening current accounts and deposits can be considered successful, and which ones should be improved? What can be improved in the procedure for conducting foreign exchange transactions and in remote banking services? We at the VTB Transactional Business Department are constantly working on finding answers to these questions. How does the IT development strategy help us in this and how do clients benefit from it? Read under the cat.

How to quickly calculate the sum of numbers from 1 to 100? According to legend, the great German mathematician Karl Friedrich Gauss was the first to solve this problem, while still a schoolboy. He noticed that the pairwise sums from opposite ends are the same: 1 + 100 = 101, 2 + 99 = 101, etc., and instantly got the result 50x101 = 5050, demonstrating remarkable analytical abilities.

The repetitive data processing tasks that occur daily in a modern bank are much more complicated than the tasks that the future “king of mathematics” dealt with at the end of the 8th century. However, the approach to solving them has not changed since then. As before, in order to get the result faster and increase its accuracy, you need to automate the processes.

Making financial forecasts, creating analytical reports, analyzing trends and risks without implementing Big Data solutions is the same as counting the sum of numbers from 1 to 100, adding up the numbers in turn. GAUSS pilot project (GAUSS, Global Transaction Business Analytic Union Source & System), launched at VTB's Transactional Business Department earlier this year, helps to collect all the information from various databases of the bank and automate its work.

What is GAUSS of the 21st century?

A modern bank has a huge amount of data on all operations, and their volumes are constantly growing. This information is of great value, but in order not to drown in it, you need to learn how to use it correctly.

The GAUSS project began with combining all the information available in the bank for 2014-2016 and implementing convenient access to it. Employees working with the system can at any time get the materials they are interested in by an unlimited combination of parameters and options. This means that it takes a couple of hours to prepare reports, and not several days, as before, the work efficiency of employees is increasing. Based on the reports, decisions are made to improve the quality of customer service, create more interesting offers, etc.

Further it is planned to develop the project, expanding the database by adding statistical information from all possible sources. GAUSS should become the basis for building a unified corporate “Data Lake”, where every time it will be possible to “dive” for the information that is important at the moment.

However, the scope of the GAUSS project is much wider than just reporting. We hope that very soon it will be possible to use it:

· assess various risks (credit, client, partner);
· Identify fraudulent schemes;
· Model targeted commercial offers;
· Work with the analytical system Microsoft Business intelligence, etc.

How does GAUSS work?

While working on the project, we deliberately abandoned the use of commercial solutions. Gauss is built on the Hadoop / Hive / Ambari / Oozie / Spark / ORC / YARN stack, and to build data marts, we use the PostgreSQL relational database, which we consider to be the leading “open” relational database management system in the world. However, instead of PostgreSQL, you can use any other database without affecting the operation of the system.

Due to the huge amount of constantly incoming information and the emergence of new ways of analyzing it, any Big Data projects cannot be solved using standard templates; this is always a new complex task. Therefore, we built a well-built multi-stage architecture for loading RAW information from all sources, then aggregating, processing and enriching this data, and after preparing the final OLAP data cubes and information display windows. To solve the problem of correct data presentation, flexible mechanisms were developed for mapping source data with target information, data quality control systems (Data Governance) for the generated information, as well as mechanisms for obtaining detailed information on aggregates (data drilldown). This allows you to painlessly change the direction of work during the implementation of the project, adapt to change. The GAUSS system is being developed according to the Agile / Scrum methodology, which allows you to take into account the new requirements of business customers, received feedback, incoming data and at the same time aim each team member to achieve the result. After all, when you work with Big Data, new hypotheses arise all the time regarding how you can use the information hidden in the petabytes of the “data lake”.