As we predicted the outflow, approaching it as a natural disaster
Sometimes, in order to solve a problem, you just need to look at it from a different angle. Even if in the last 10 years such problems were solved in the same way with different effects, it is not a fact that this method is the only one.
There is such a topic as customer churn. The thing is inevitable, because customers of any company can take and stop using its products or services for many reasons. Of course, for the company, outflow is a natural, but not the most desirable action, so everyone is trying to minimize this outflow. And even better - to predict the likelihood of an outflow of a particular category of users, or a specific user, and offer some retention steps.
Analyze and try to keep the customer, if possible, at least for the following reasons:
There are standard approaches to forecasting outflows. But at one of the AI championships, we decided to take and try Weibull distribution for this. Most often it is used for survival analysis, weather forecasting, natural disaster analysis, industrial engineering and the like. Weibull distribution is a special distribution function parameterized by two parameters and . Wikipedia In general, the thing is interesting, but to predict the outflow, and indeed in fintech, it is used not so often. Under the cut, we’ll tell you how we (the Laboratory of Data Mining) did this by winning gold at the AI Championship in the “AI in Banks” nomination.
Let's look a bit at what client outflow is and why it is so important. For business, the customer base is important. New customers come to this database, for example, after learning about a product or service from advertising, they live for some time (actively use the products) and after some time they stop using it. Such a period is called the “Customer Lifecycle” (eng. Customer Lifecycle) - a term that describes the steps that the customer goes through when they find out about a product, make a purchase decision, pay, use and become a loyal consumer, and ultimately stop using for one reason or another products. Accordingly, the outflow is the final stage of the client’s life cycle when the client ceases to use the services, and for business this means that the client has ceased to be profitable and in general any benefit.
Each client of the bank is a specific person who selects a particular bank card specifically for his needs. Often travels - a map with miles is useful. He buys a lot - hello, card with cashback. He buys a lot in specific stores - and for this there is already a special affiliate plastic. Of course, sometimes a card is also selected according to the criterion of “Cheapest Service”. In general, there are enough variables here.
And another person chooses the bank itself - is it worth choosing a bank card, branches of which are only in Moscow and the region when you are from Khabarovsk? If the card of such a bank is at least 2 times more profitable, the presence of bank branches nearby is still an important criterion. Yes, 2019 is already here and digital is our everything, but a number of issues for some banks can be solved only at the branch. Plus, again, some part of the population trusts a physical bank much more than an application on a smartphone, this also needs to be taken into account.
As a result, a person may have many reasons for rejecting the products of the bank (or the bank itself). He changed his job, and the card rate changed from a salary to “For mere mortals,” which is less profitable. He moved to another city where there are no bank branches. I did not like talking with an unskilled operator in the department. That is, there may be even more reasons for closing an account than for using a product.
And the client can not only explicitly express his intention - come to the bank and write a statement, but simply stop using the products without breaking the contract. Here, to understand such problems, it was decided to use machine learning and AI.
Moreover, the outflow of customers can occur in any industry (telecom, Internet providers, insurance companies, in general, wherever there is a customer base and periodic transactions).
First of all, it was necessary to describe a clear boundary - since when did we begin to consider the client to be gone. From the point of view of the bank that provided us with the data for work, the state of activity of the client was binary - it is either active or not. There was an ACTIVE_FLAG flag in the “Activity” table, the value of which could be either “0” or “1” (respectively, “Inactive” and “Active”). And everything would be fine, but the person is such that he can actively use it for a while, and then fall out of the active for a month - he gets sick, goes to another country to rest, or even goes to test the card of another bank. Or maybe after a long period of inactivity, again start using the services of the bank.
Therefore, we decided to call the period of inactivity a certain continuous period of time during which the flag was set for it as “0”.
Clients go from inactive to active after periods of inactivity of various lengths. We have the opportunity to calculate the degree of empirical value “reliability of periods of inactivity” - that is, the likelihood that a person will again begin to use the products of the bank after temporary inactivity.
For example, this graph shows the resumption of activity (ACTIVE_FLAG = 1) of customers after several months of inactivity (ACTIVE_FLAG = 0).
Here we will clarify a little the data set that we started working with. So, the bank provided aggregated information for 19 months in the following tables:
For work, we needed all the tables except the "Map".
The difficulty here was something else - in this data the bank did not indicate what kind of activity took place on the cards. That is, we could understand whether there were transactions or not, but we could no longer determine their type. Therefore, it was unclear whether the client was withdrawing cash, whether he received a salary, or if he spent money on purchases. And we did not have data on account balances, which would be useful.
The sample itself was unbiased - for this section for 19 months the bank made no attempts to retain customers and minimize outflow.
So, about periods of inactivity.
To formulate the definition of outflow, you need to choose a period of inactivity. To create an outflow forecast at a time, you must have a customer history of at least 3 months in the interval . Our history was limited to 19 months, so we decided to take a period of inactivity of 6 months, if any. And for the minimum period for a qualitative forecast they took 3 months. Figures at 3 and 6 months we took empirically based on an analysis of the behavior of customer data.
The definition of outflow we formulated as follows: month outflow clientthis is the first month with ACTIVE_FLAG = 0, where at least six zeros in a row in the ACTIVE_FLAG field have been running since this month, in other words, the month since which the client has been inactive for 6 months.
Number of customers left
Number of customers left
In such competitions, and indeed in practice, outflows are often predicted in this way. The client uses products and services at different time intervals; data on interaction with him is presented in the form of a feature vector of fixed length n. Most often this information includes:
And after that they derive the definition of outflow, its own for each task. Then they use the machine learning algorithm, which predicts the likelihood of the client leaving based on the vector of factors . To learn the algorithm, one of the well-known frameworks for building ensembles of decision trees, XGBoost , LightGBM , CatBoost or their modifications is used.
The algorithm itself is not bad, but in terms of forecasting outflow, it has several serious disadvantages.
We decided immediately that we would not use standard approaches. In addition to us, 497 people registered in the championship, each of whom had a good experience. So, trying to do something in a standard way under such conditions is not a good idea.
And we began to solve the problems facing the binary classification model by predicting the probabilistic distribution of customer outflow times. A similar approach can be seen here ; it allows more flexible forecasting of outflow and testing more complex hypotheses than in the classical approach. As a family of distributions simulating the outflow time, we chose the Weibull distributionfor its widespread use in survival analysis. Customer behavior can be seen as a kind of survival.
Here are examples of Weibull probability density distributions depending on parameters and :
This is the probability density distribution of customer churn of three different customers over time. Time is presented in months. In other words, this graph shows when it is most likely that the client will be outflowing in the next two months. As you can see, a client with a distribution has a greater potential to leave earlier than clients with a Weibull (2, 0.5) and Weibull (3.1) distribution.
The result is a model that for each client for any
month predicts the Weibull distribution parameters, which best reflects the onset of the probability of outflow over time. If more details:
Here are the advantages of this method:
But it’s not enough to create a good model, you also need to properly evaluate its quality.
As a metric, we chose Lift Curve. It is used in business for such cases because of an understandable interpretation; it is well described here and here . If you describe the meaning of this metric in one sentence, you get “How many times the algorithm makes the best prediction in the first% than randomly. "
The competition conditions did not establish a specific quality metric by which various models and approaches can be compared. Moreover, the definition of the concept of outflow may be different and may depend on the statement of the problem, which, in turn, is determined by business goals. Therefore, in order to understand which method is better, we trained two models:
The test sample consisted of 500 pre-selected customers who were not in the training sample. For the model, hyper-parameters were selected using cross-validation by client. To train each model, the same sets of attributes were used.
Due to the fact that the model does not have memory, special signs were taken for it, showing the ratio of changes in the parameters of one month to the average value for the parameters over the past three months. What characterized the rate of change of values over the last period of three months. Without this, a model based on Random Forest would be in a previously losing position relative to Weibull-LSTM.
Here, everything is clearly literally a couple of pictures.
Comparison of the Lift Curve for the classical algorithm and Weibull-LSTM
Comparison of the Lift Curve metrics for the months for the classical algorithm and Weibull-LSTM
In general, LSTM does the classical algorithm in almost all cases.
A model based on a recurrent neural network with LSTM cells with a Weibull distribution can predict the outflow in advance, for example, predict the departure of a client over the next n months. Consider the case for n = 3. In this case, for each month, the neural network must correctly determine whether the client will leave from the next month until the nth month. In other words, she must correctly determine whether the client will remain after n months. This can be considered a prediction in advance: predicting the moment when the client just started thinking about how to leave.
Compare the Lift Curve for Weibull-LSTM 1, 2, and 3 months before the outflow:
We already wrote above that the forecasts that are made for customers who are not active for some time are also important. Therefore, here we will add to the sample such cases when the departed client has already been inactive for one or two months, and check that Weibull-LSTM correctly classifies such cases as outflow. Since such cases were present in the sample, we expect the network to cope well with them:
Actually, this is the main thing that can be done with information on hand that such and such customers are preparing to stop using the product. Speaking about building a model that could offer something useful to customers in order to keep them, this will not work if you do not have a history of such attempts that would end well.
We didn’t have such a story, so we decided it like this.
The usual A / B testing can serve as a check of the quality of such retention - we divide customers who potentially leave into two groups. We offer products based on our retention model to one, and we offer nothing to the second. We decided to train a model that could benefit already at point 1 of our example.
We wanted to make segmentation as interpretable as possible. To do this, we chose several signs that could be easily interpreted: the total number of transactions, salary, total turnover of the account, age, gender. The signs from the “Cards” table were not taken into account as uninformative, and the signs from table 3 “Contracts” - due to the complexity of processing in order to avoid data leakage between the validation set and the training set.
Clustering was performed using Gaussian mixture models. The Akaike information criterion made it possible to determine 2 optimums. The first optimum corresponds to 1 cluster. The second optimum, less pronounced, corresponds to 80 clusters. The following conclusion can be drawn from this result: it is extremely difficult to divide data into clusters without a priori given information. For better clustering, you need data that describes each client in detail.
Therefore, the task of training with a teacher was considered in order to offer each individual client a product. The following products were considered: “Term deposit”, “Credit card”, “Overdraft”, “Consumer loan”, “Car loan”, “Mortgage”.
Another type of product was present in the data: “Current Account”. But we did not consider it because of the low information content. By users who are customers of the bank, i.e. They didn’t stop using its products, a model was built that predicted which product might be of interest to them. Logistic regression was chosen as the model, and the Lift value for the first 10 percentiles was used as the quality assessment metric.
The quality of the model can be estimated in the figure.
Customer recommendation model results
This approach brought us the first place in the “AI in Banks” nomination at the RAIF-Challenge 2017 AI Championship.
Apparently, the main thing was to approach the problem from an unusual side and use the method that is commonly used for other situations.
Although the massive outflow of users may well be a natural disaster for services.
This method can also be noted for any other area where it is important to consider the outflow, not by banks as a whole. For example, we used it to calculate our own outflow - in the Siberian and St. Petersburg branches of Rostelecom.
"Laboratory of data mining" company "Search portal" Sputnik "
There is such a topic as customer churn. The thing is inevitable, because customers of any company can take and stop using its products or services for many reasons. Of course, for the company, outflow is a natural, but not the most desirable action, so everyone is trying to minimize this outflow. And even better - to predict the likelihood of an outflow of a particular category of users, or a specific user, and offer some retention steps.
Analyze and try to keep the customer, if possible, at least for the following reasons:
- attracting new customers is more expensive than retention procedures . To attract new customers, as a rule, you need to spend some money (advertising), while existing customers can be activated with a special offer with special conditions;
- Understanding why customers are leaving is the key to improving products and services .
There are standard approaches to forecasting outflows. But at one of the AI championships, we decided to take and try Weibull distribution for this. Most often it is used for survival analysis, weather forecasting, natural disaster analysis, industrial engineering and the like. Weibull distribution is a special distribution function parameterized by two parameters and . Wikipedia In general, the thing is interesting, but to predict the outflow, and indeed in fintech, it is used not so often. Under the cut, we’ll tell you how we (the Laboratory of Data Mining) did this by winning gold at the AI Championship in the “AI in Banks” nomination.
About outflow in general
Let's look a bit at what client outflow is and why it is so important. For business, the customer base is important. New customers come to this database, for example, after learning about a product or service from advertising, they live for some time (actively use the products) and after some time they stop using it. Such a period is called the “Customer Lifecycle” (eng. Customer Lifecycle) - a term that describes the steps that the customer goes through when they find out about a product, make a purchase decision, pay, use and become a loyal consumer, and ultimately stop using for one reason or another products. Accordingly, the outflow is the final stage of the client’s life cycle when the client ceases to use the services, and for business this means that the client has ceased to be profitable and in general any benefit.
Each client of the bank is a specific person who selects a particular bank card specifically for his needs. Often travels - a map with miles is useful. He buys a lot - hello, card with cashback. He buys a lot in specific stores - and for this there is already a special affiliate plastic. Of course, sometimes a card is also selected according to the criterion of “Cheapest Service”. In general, there are enough variables here.
And another person chooses the bank itself - is it worth choosing a bank card, branches of which are only in Moscow and the region when you are from Khabarovsk? If the card of such a bank is at least 2 times more profitable, the presence of bank branches nearby is still an important criterion. Yes, 2019 is already here and digital is our everything, but a number of issues for some banks can be solved only at the branch. Plus, again, some part of the population trusts a physical bank much more than an application on a smartphone, this also needs to be taken into account.
As a result, a person may have many reasons for rejecting the products of the bank (or the bank itself). He changed his job, and the card rate changed from a salary to “For mere mortals,” which is less profitable. He moved to another city where there are no bank branches. I did not like talking with an unskilled operator in the department. That is, there may be even more reasons for closing an account than for using a product.
And the client can not only explicitly express his intention - come to the bank and write a statement, but simply stop using the products without breaking the contract. Here, to understand such problems, it was decided to use machine learning and AI.
Moreover, the outflow of customers can occur in any industry (telecom, Internet providers, insurance companies, in general, wherever there is a customer base and periodic transactions).
What have we done
First of all, it was necessary to describe a clear boundary - since when did we begin to consider the client to be gone. From the point of view of the bank that provided us with the data for work, the state of activity of the client was binary - it is either active or not. There was an ACTIVE_FLAG flag in the “Activity” table, the value of which could be either “0” or “1” (respectively, “Inactive” and “Active”). And everything would be fine, but the person is such that he can actively use it for a while, and then fall out of the active for a month - he gets sick, goes to another country to rest, or even goes to test the card of another bank. Or maybe after a long period of inactivity, again start using the services of the bank.
Therefore, we decided to call the period of inactivity a certain continuous period of time during which the flag was set for it as “0”.
Clients go from inactive to active after periods of inactivity of various lengths. We have the opportunity to calculate the degree of empirical value “reliability of periods of inactivity” - that is, the likelihood that a person will again begin to use the products of the bank after temporary inactivity.
For example, this graph shows the resumption of activity (ACTIVE_FLAG = 1) of customers after several months of inactivity (ACTIVE_FLAG = 0).
Here we will clarify a little the data set that we started working with. So, the bank provided aggregated information for 19 months in the following tables:
- “Activity” - monthly customer transactions (by card, in the Internet banking and mobile banking), including payroll and turnover information.
- “Cards” - data on all cards that a client has, with a detailed tariff schedule.
- “Contracts” - information on client contracts (both open and closed): loans, deposits, etc., indicating the parameters of each.
- “Clients” - a set of demographic data (gender and age) and the availability of contact data.
For work, we needed all the tables except the "Map".
The difficulty here was something else - in this data the bank did not indicate what kind of activity took place on the cards. That is, we could understand whether there were transactions or not, but we could no longer determine their type. Therefore, it was unclear whether the client was withdrawing cash, whether he received a salary, or if he spent money on purchases. And we did not have data on account balances, which would be useful.
The sample itself was unbiased - for this section for 19 months the bank made no attempts to retain customers and minimize outflow.
So, about periods of inactivity.
To formulate the definition of outflow, you need to choose a period of inactivity. To create an outflow forecast at a time, you must have a customer history of at least 3 months in the interval . Our history was limited to 19 months, so we decided to take a period of inactivity of 6 months, if any. And for the minimum period for a qualitative forecast they took 3 months. Figures at 3 and 6 months we took empirically based on an analysis of the behavior of customer data.
The definition of outflow we formulated as follows: month outflow clientthis is the first month with ACTIVE_FLAG = 0, where at least six zeros in a row in the ACTIVE_FLAG field have been running since this month, in other words, the month since which the client has been inactive for 6 months.
Number of customers left
Number of customers left
As it is considered outflow
In such competitions, and indeed in practice, outflows are often predicted in this way. The client uses products and services at different time intervals; data on interaction with him is presented in the form of a feature vector of fixed length n. Most often this information includes:
- User-specific data (demographic data, marketing segment).
- The history of the use of banking products and services (these are customer actions that are always tied to a specific time or period of the interval we need).
- External data, if able to get it - for example, reviews from social networks.
And after that they derive the definition of outflow, its own for each task. Then they use the machine learning algorithm, which predicts the likelihood of the client leaving based on the vector of factors . To learn the algorithm, one of the well-known frameworks for building ensembles of decision trees, XGBoost , LightGBM , CatBoost or their modifications is used.
The algorithm itself is not bad, but in terms of forecasting outflow, it has several serious disadvantages.
- He does not have the so-called "memory . " The input of the model receives a given number of features that correspond to the current moment in time. In order to lay information about the history of changes in parameters, it is necessary to calculate special features that characterize changes in parameters over time, for example, the number or amount of bank transactions over the past 1.2.3 months. Such an approach can only partially reflect the nature of temporary changes.
- Fixed forecasting horizon. The model is able to predict the outflow of customers only for a predetermined period of time, for example, a forecast one month in advance. If you need a forecast for another period of time, for example, for three months, then you need to rebuild the training set and retrain the new model.
Our approach
We decided immediately that we would not use standard approaches. In addition to us, 497 people registered in the championship, each of whom had a good experience. So, trying to do something in a standard way under such conditions is not a good idea.
And we began to solve the problems facing the binary classification model by predicting the probabilistic distribution of customer outflow times. A similar approach can be seen here ; it allows more flexible forecasting of outflow and testing more complex hypotheses than in the classical approach. As a family of distributions simulating the outflow time, we chose the Weibull distributionfor its widespread use in survival analysis. Customer behavior can be seen as a kind of survival.
Here are examples of Weibull probability density distributions depending on parameters and :
This is the probability density distribution of customer churn of three different customers over time. Time is presented in months. In other words, this graph shows when it is most likely that the client will be outflowing in the next two months. As you can see, a client with a distribution has a greater potential to leave earlier than clients with a Weibull (2, 0.5) and Weibull (3.1) distribution.
The result is a model that for each client for any
month predicts the Weibull distribution parameters, which best reflects the onset of the probability of outflow over time. If more details:
- Target characteristics in the training sample - the time remaining before the outflow in a particular month for a particular client.
- If there is no outflow indicator for the client, we assume that the outflow time is longer than the number of months, starting from the current and the end of our history.
- The model used: a recurrent neural network with an LSTM layer.
- As a loss function, we use the negative logarithmic likelihood function for the Weibull distribution.
Here are the advantages of this method:
- The probabilistic distribution, in addition to the obvious possibility of binary classification, allows you to flexibly predict various events, for example, whether the client ceases to use the services of the bank within 3 months. Also, if necessary, various metrics can be averaged over this distribution.
- The LSTM recurrent neural network has memory and efficiently uses the entire history. With the expansion or refinement of history, accuracy grows.
- The approach can be scaled without problems when dividing time intervals into smaller ones (for example, when dividing months into weeks).
But it’s not enough to create a good model, you also need to properly evaluate its quality.
How to evaluate the quality
As a metric, we chose Lift Curve. It is used in business for such cases because of an understandable interpretation; it is well described here and here . If you describe the meaning of this metric in one sentence, you get “How many times the algorithm makes the best prediction in the first% than randomly. "
We train models
The competition conditions did not establish a specific quality metric by which various models and approaches can be compared. Moreover, the definition of the concept of outflow may be different and may depend on the statement of the problem, which, in turn, is determined by business goals. Therefore, in order to understand which method is better, we trained two models:
- A frequently used binary classification approach using the machine learning algorithm of a decision tree ensemble ( LightGBM );
- Model Weibull-LSTM
The test sample consisted of 500 pre-selected customers who were not in the training sample. For the model, hyper-parameters were selected using cross-validation by client. To train each model, the same sets of attributes were used.
Due to the fact that the model does not have memory, special signs were taken for it, showing the ratio of changes in the parameters of one month to the average value for the parameters over the past three months. What characterized the rate of change of values over the last period of three months. Without this, a model based on Random Forest would be in a previously losing position relative to Weibull-LSTM.
Why LSTM with Weibull distribution is better than the approach based on the ensemble of decision trees
Here, everything is clearly literally a couple of pictures.
Comparison of the Lift Curve for the classical algorithm and Weibull-LSTM
Comparison of the Lift Curve metrics for the months for the classical algorithm and Weibull-LSTM
In general, LSTM does the classical algorithm in almost all cases.
Outflow prediction
A model based on a recurrent neural network with LSTM cells with a Weibull distribution can predict the outflow in advance, for example, predict the departure of a client over the next n months. Consider the case for n = 3. In this case, for each month, the neural network must correctly determine whether the client will leave from the next month until the nth month. In other words, she must correctly determine whether the client will remain after n months. This can be considered a prediction in advance: predicting the moment when the client just started thinking about how to leave.
Compare the Lift Curve for Weibull-LSTM 1, 2, and 3 months before the outflow:
We already wrote above that the forecasts that are made for customers who are not active for some time are also important. Therefore, here we will add to the sample such cases when the departed client has already been inactive for one or two months, and check that Weibull-LSTM correctly classifies such cases as outflow. Since such cases were present in the sample, we expect the network to cope well with them:
Customer retention
Actually, this is the main thing that can be done with information on hand that such and such customers are preparing to stop using the product. Speaking about building a model that could offer something useful to customers in order to keep them, this will not work if you do not have a history of such attempts that would end well.
We didn’t have such a story, so we decided it like this.
- We are building a model that defines interesting products for each client.
- In each month, we run a classifier and identify potential outgoing customers.
- Some customers offer a product, according to the model of paragraph 1, remember their actions.
- After a few months, we look which of these potentially outgoing customers left and which remained. Thus, we form a training sample.
- We train the model on the story obtained in paragraph 4.
- Optionally, repeat the procedure, replacing the model from paragraph 1 with the model obtained in paragraph 5.
The usual A / B testing can serve as a check of the quality of such retention - we divide customers who potentially leave into two groups. We offer products based on our retention model to one, and we offer nothing to the second. We decided to train a model that could benefit already at point 1 of our example.
We wanted to make segmentation as interpretable as possible. To do this, we chose several signs that could be easily interpreted: the total number of transactions, salary, total turnover of the account, age, gender. The signs from the “Cards” table were not taken into account as uninformative, and the signs from table 3 “Contracts” - due to the complexity of processing in order to avoid data leakage between the validation set and the training set.
Clustering was performed using Gaussian mixture models. The Akaike information criterion made it possible to determine 2 optimums. The first optimum corresponds to 1 cluster. The second optimum, less pronounced, corresponds to 80 clusters. The following conclusion can be drawn from this result: it is extremely difficult to divide data into clusters without a priori given information. For better clustering, you need data that describes each client in detail.
Therefore, the task of training with a teacher was considered in order to offer each individual client a product. The following products were considered: “Term deposit”, “Credit card”, “Overdraft”, “Consumer loan”, “Car loan”, “Mortgage”.
Another type of product was present in the data: “Current Account”. But we did not consider it because of the low information content. By users who are customers of the bank, i.e. They didn’t stop using its products, a model was built that predicted which product might be of interest to them. Logistic regression was chosen as the model, and the Lift value for the first 10 percentiles was used as the quality assessment metric.
The quality of the model can be estimated in the figure.
Customer recommendation model results
Total
This approach brought us the first place in the “AI in Banks” nomination at the RAIF-Challenge 2017 AI Championship.
Apparently, the main thing was to approach the problem from an unusual side and use the method that is commonly used for other situations.
Although the massive outflow of users may well be a natural disaster for services.
This method can also be noted for any other area where it is important to consider the outflow, not by banks as a whole. For example, we used it to calculate our own outflow - in the Siberian and St. Petersburg branches of Rostelecom.
"Laboratory of data mining" company "Search portal" Sputnik "