Catch, fish: how to use the "data lake" in the bank. VTB Experience

    You go to the bank for a loan for business development, for the purchase of a car or for other purposes. To give or not to give - in each case, the bank’s specialists solve this issue individually, taking into account the client’s credit history, size of his income and other factors. It would seem that the credit system has long been set up and is working properly. Is it possible to come up with something new in this regard? We at retail VTB answer this question in the affirmative. Studies confirm that the data on client behavior that the bank has at its disposal is far from fully involved, and the use of IT in this direction gives a very good effect!

    How we integrate IT into the business and what benefits our customers get - read under the cut.

    In 2016, for the retail business of VTB Group, we implemented the first stage of a large project for the processing and analysis of customer information. Thanks to this project, our clients began to receive personalized offers based on an analysis of their behavior in the past. At the first stage, we collected and use up to 60% of the data, and the results exceeded all expectations. Most customers willingly accepted individual offers and, most importantly, were satisfied. This means that the idea of ​​a selective approach worked, the system functions "perfectly".

    Now the second stage is next in turn - the launch of a new DataResearchPlatform based on DataLake (“data lake”), which in the future should cover 99.9% of all client activity data available in the bank.

    Why DataLake?

    Like all modern Big Data solutions, our new DataResearchPlatform is built on the basis of a “data lake”. Why did we choose this particular technology? DataLake is good because it allows you to store huge amounts of raw data in their original format. This data can be used as you like: to compare, mix, organize according to various criteria. Unlike the standard data warehouse, DataLake data is immediately available to analysts in full and with all the original relationships. This provides more opportunities to find the most unexpected options for their use, but for this you need the appropriate technology and tools.

    Client information is processed using data mining. Thanks to this, bank specialists can test their hypotheses about client behavior and its impact on solvency, as well as develop new predictive models.

    There are other "tricks" that we plan to get when working with DataLake:

    • grow your own users of DataArchitect and DataScientist profiles in a corporate environment;
    • Get excellent experience in data mining;
    • completely review and improve customer information management systems (CRM);
    • learn to more accurately predict the risk for each particular client.

    When the system is in place, the bank can take the most modern fishing rods and go fishing on its "lake". And there is no doubt: every time the catch will be excellent, and they will want to share it with customers. Thanks to an in-depth analysis of client behavior, the bank can offer borrowers special offers, better credit conditions and individual (more loyal) interest rates on loans.

    How does DataResearchPlatform work?

    Before the decision was made to switch to DataLake, VTB already had a data warehouse, so the first thing we did was integrate a new platform with it.

    In addition, at the first stage, we worked on debugging the technological environment for modeling: mechanisms for updating all installed software were worked out and the Hadoop cluster was expanded. It was also important to develop new approaches to the work of users, since the new platform imposes certain requirements on the delimitation of access to data.

    As a result, the current version of DataResearchPlatform is deployed on 12 BDA nodes up to 288 TB (plans to expand it to 18 nodes by the end of the year). The platform is based on the Hadoop ecosystem, OpenSource technologies and industrial Enterprise solutions. It is based on the Oracle BigData Appliance software and hardware solution. To work with data, analytical tools SAS HPDM, SAS EG, Python, R are used.

    Users of the DataArchitect and DataScientist profile got completely secure access to the data, and the data volumes were expanded. Now in DataResearchPlatform almost all the information on client activity that is available to the bank is already collected. It can be "caught" from the "lake" at any time and used for the benefit of the client.

    The project team: members of the VTB24 board - A. Sokolov and S. Rusanov.

    Also popular now: