AlexSerbul December 28, 2016 at 16:32

Enhanced regularization of neural networks in online stores - using ... napalm

Wink grandfather Einstein , adjusting his backpack with napalm and smoothing stylish black T-shirt with the image of the formula of the law of normal distribution , a leading analyst opened its doors PR-department, a brilliant smile and said, "Guys, continue to collect e-mail clients in ekselkah and creativity by wandering left hand eyes closed? ” Having received the joyful “aha :-)”, the fighter mentally thanked John Napier for the work done for the benefit of enlightening humanity and the reduction of routine labor and ... cheerfully pressed the trigger.

Albert Einstein has always inspired analysts to implement advanced algorithms

After 5 minutes, the fuel in the satchel was already over, it was quite warm, if not hot, but colleagues (?) Did not notice anything and continued to count likes under their posts on social networks.

“This is concentration,” the analyst thought, took the minigun and stepped into the CRM customer service department. “Guys, segment customers by the number of gifts from them for the new year, and does the word“ clusterization ”start to hurt my head in the forehead?” Without waiting for an answer and whispering: “I’ll teach you to consider the churn-rate and CLV in your mind”, the analyst took a step forward and fell into the atmosphere of magical thinking, superstition and steam-punk .

Steam punk seems to be about applied science, but when you look closely, it’s magic, pure magic!

Ahead was the content management department, and out of the tools for popularizing new effective algorithms, there remained a quantum annihilator and an old one, but with a good timbre in the BMW style - a chainsaw. Choosing a trusted fighting girlfriend, the analyst quietly knocked on the door of the content department, guiltily entered and asked: “Continue to buy on amazon and aliexpress - Yeah!”, “Continue to stubbornly not use personalization in the catalog of the online store, not to believe in its effectiveness and not to use AB-testing - Well, of course, something is still for sale! ” “May the force be with us” - this is the last thing that was heard from the lips of the orderlies of the forest, before the sound of the engine started to drown it out ...

From the means of implementation, there was still a secret box under the table and ... the development department. It was a secret weapon intended for him, which exceeded the quantum annihilator in terms of efficiency. The good beer in the box was enough for everyone and with a margin, and the cookies came to taste when discussing with dear brothers in arms the subtleties of overclocking Keras in production. Day - a success!

“You have a data scientist for an interview,” he heard the conqueror of multidimensional spaces tuck up on a sofa at lunch time and ... woke up. What a strange dream, why would it? Oh yes, we are introducing neural networks in the online store - to the point!

“What a strange dream I had,” Alice thought, and ran home so as not to be late for tea.

Why am I doing this? ) The problems of the post are simple - how to make the online store bring more profit at low cost for the latest machine learning technologies. And how to convince the store owner and employees - use them! (and the hand reaches for the weapon again)

Software is close and accessible as never before

On the one hand, now, more than ever, they are accessible and ready for rapid prototyping and, in principle, implementation - the basic, bearded, mustachioed and lively machine learning methods that can make an online store business more profitable.

Love extreme pleasures - yes, please, “pip install”, and while a cup of coffee is being poured, everything has been installed on the analytics laptop, which will allow you to raise the neuron and train on a small amount of data. Get the result a few percent better.

If you are sure that the model will be useful to you, and a lot of data is expected - you can spend a little more time in a more verbose language and get a close result with a greater guarantee of fast work in production.

In general, everything is very accessible with technologies now and you don’t have to think too much - take ready-made with support for tensors and GPUs, run analytics on a laptop which thread is ready for crap on python, configure ssh proxying from battle and implement a service for clients.

Algorithms for an online store that are likely to increase its profit

Everything here is also rosy and accessible. Let's go through each.

Personal recommendations

If the data is relatively small - take the ready-made matrix compressor on Spark and you're done. If there is a lot of data, you can stick their memory in and drive classic collaborative filtering.
From fresh and tasty there is a budget for a cluster with a GPU - you can try to prepare Amazon DSSTNE . Here, colleagues train the classic denoising autoencoder - which “restores” the client’s personal recommendations. The model is wide-flat, but the matrices multiply in pieces on different cores / machines and in C ++, leaving TensorSLOW far overboard.

Amazon DSSTNE receives a much shorter distributed representation (a set of categories / subcategories of Products) from a very long vector of User Products, which is then expanded into a “full” vector of recommendations for User Products.

For content-based recommendations, you can first use a ready-made search engine , and then look to the side “ semantic vectors ”of words and phrases - starting with Word2Vec or GloVe .

Advanced search engine

Here you can often improve the classic search algorithm using semantic ranking . Just do not try to repeat the Yandex experiment with DSSM in the forehead - look at the input dimension of their neural network and calculate how much and what kind of iron you will need for this flight into space.

Chatbot

To teach a chatbot to answer questions you need ... a lot of work on your head and hands. The easiest way to do this is on ... regulars and first-order predicates stored in a base like MySQL. You can also make molds with parameters in the style of booking plane tickets to the USA . And to the popular question “how to make a chatbot on a neural network” - in response, I want to laugh hysterically, tear out leaves from a famous booklet of a famous professor , glue self-rolls and release eights on one side (a symbol of infinity) in the face of the questioner.

Technical Support Automation

Here, too, is a finished story with well-known implementations. There are two problems: you need a large body of dialogs in the store’s language (and for the Russian language the situation with cases is still deplorable ) and irrelevant answers will get into Recall @ 5 if a new question arrives (yes, yes, that same generalization ), because neural networks so far mostly help, only occasionally doing work better than humans (and for this you need to kill by training them).

Next Best Offer

Here you can track the internal cycles of the Buyer of the online store and at key points offer him something to buy. For example, maintain a loyalty cycle for each Buyer. There are many algorithms, from bearded Markov models to modern and effective recurrent cells .

Customer clustering

Here, too, everything has long been ready and easy to deploy in minutes . You can identify target groups of customers and bombard them with personalized recommendations!

So far, we have implemented part of these algorithms in our product: content-based recommendations and collaborative filtering. Well, so that was where to start ... the battle.

But how to prove that the algorithms work?

Here it is already becoming more interesting.

Here you have the online store appeared in umnye- in smart Pythonistas-datasayntisty intelligence ...
So you raised the basic models of the above popular algorithms on their working laptops in a week ...
So you started to offer your clients personalized recommendations and do e-mails on clusters and offer customers products at times when they are “hard to refuse” ...

And ... and come!

To prove that it is better and the resources allocated to smart guys, many smart obscure words and even more obscure pictures in matplotlib pay off - it turns out to be difficult, very difficult.

Scary and incomprehensible pictures on analysts' laptops

Unfortunately and sadly, there are orders of magnitude less explanatory articles and techniques on the topic “how to prove that personal recommendations work” than fascinating articles on the GAN topic or on the generation of pornographic images using deep neurons. And when you read about what NetFlix writes about, how colleagues measure and improve their recommendation system with crude empiricism, sincere faith and a voodoo spell- it becomes a lot of fun (you must read the articles, if you haven’t read, there is the real experience of the pioneers).

In fact, it all boils down to “faith” and intensive and constant AB testing, in which some clients pass by trained models, and some through machine learning algorithms.

Constant feedback - required

We see that the intensity of content personalization is constantly increasing. Through our smartphones, industry giants are watching us intensively and, apparently, but not provable, they are confident that personalizing content - increasing their profit and costs in smart data-analytics analysts and GPUs - is justified.

The giants we know live from advertising. The more targeted it is, the greater their profit. That is why, it seems to me, they and not only them, but rather all the well-known IT companies are investing more and more in data analysis and the use of machine learning algorithms to promote their products.

The requirements of constant control over the effectiveness of content personalization for Clients with and without machine learning and, banal, quality control of trained models on real data is another project, rather even larger in volume and resources for implementation, than using ready-made researched and laid out open access algorithms and frameworks.

That's not all. Comparison of the work of algorithms that, as you know, can take and stop working (better than before) requires constant measurement of conversion and AB testing in all structural divisions: from the marketing department to the customer service department. Otherwise, you will get a beautiful modern, expensive (especially if you take up neural networks with a GPU) and a stupid toy and a set of gurus in a team who speak an incomprehensible language.

Conclusions. How to start online shopping jump in the outgoing train?

I hope it became clear that large market players have long calculated the growth in profits from the use of machine learning to personalize content, are actively developing this area, employ the best minds and exploit the best hardware (GPU). There is no way to find information on the topic of open access efficiency in the daytime. But judging by reports at conferences, 1-2 models out of roughly 10 take off and allow not only to increase conversion, but also to bypass competitors - becoming closer to Customers.
Until you begin to measure conversion in a transparent and systematic way, personalization using machine learning is unlikely to help or even complicate business processes.
After creating clear and constant conversion metrics, learn to conduct system AB testing of online store changes made by hand (the design of the main page has been changed, the logic of the order wizard).
If your goal is to bring the marketing department to the next cosmic speed, then see how the work is going with the email addresses of the Customers: whether in excels or by intelligence and other data. If the process can be automated, do it. Then try to do segmentation using machine clustering, do a test distribution, and using AB-testing, verify the result.
If your goal is to bring the customer service department into orbit of Mars, similarly make sure that the list of clients is not in excel or in correspondence. Minimum - it should be in CRM. Then with the help of primitive techniques (Bayesian classifier, logistic regression) try, with AB testing turned on, start working with churn groups and attractive Clients using CLV analysis. Think carefully, draw ~~in the~~ smart analysts - how to ensure that the model introduced improved conversion and relevance and offers its customers. This can be much more complicated - than everything taken together and done earlier :-) In secret: the conversion will be much higher, but proving it to Australopithecus is an application for a Nobel Prize.
After obtaining a sustainable result and building up the basic processes in the structural divisions of the online store, you can already think about increasing the quality of algorithms using neural networks or attracting more valuable experts from data analytics.

Remember that you can only improve what you can measure!

And if done differently?

Unfortunately, most people do not like to change. Who will improve something by 5%, if it works like that and 20 years before that it worked and "everyone does it." It is clear that having suggested that the accounting department conduct a “Fisher discriminant analysis” the maximum that you can achieve is to increase the production of hormones in the most active women who love smart men. But basically - they will quietly hate you :-) The only employee who most likely will not give up the extra 5% conversion and profit is the head of the sales department, but you can’t prove to him that personalized content using machine learning works without heavy drug. Persuade people selling without AI for 20 years can only be constantly tested on the site by AB testing in the form of a pink cat and a green puppy. They say that sometimes the white-painted girls from Fizmata help, discussing the delights of accurate measurements of everything and always - but finding these goddesses is very difficult, there are much more goddesses who are better versed in social networks than in the work of their own brain.

There is little chance that hard drugs will help convince the sales force to use content personalization.

Summary

In general, there is no colleague working on a 100% recipe to introduce new personalization and predictive marketing technologies into an online store. Sometimes decimation will help . Sometimes - the realization of waking sleep from the beginning of the post. But it’s better, of course, to act with patience and love - if a budget is allocated for them :-) From experience, you need help and support from above and a clearly defined goal: to increase conversion and constantly measure and measure it. Loosen your grip - you'll be left with an expensive toy and a bunch of new unknown words. Good luck to everyone and with the upcoming - we wish you more continuous ABCDE-tests in the new year!

Tags: