Overview of AI & ML solutions in 2018 and forecasts for 2019: Part 2 - Tools and Libraries, AutoML, RL, Ethics in AI

  • Transfer
Hello! I present to you a translation of the article Analytics Vidhya with an overview of events in the field of AI / ML in 2018 and trends in 2019. The material is quite large, so it is divided into 2 parts. I hope that the article will interest not only profile specialists, but also those interested in the topic of AI. Enjoy reading!

Read first: Part 1 - NLP, Computer Vision

Tools and Libraries

This section will appeal to all data science professionals. Tools and libraries - bread and butter for scientists. I participated in many debates about which tool is better, which framework replaces another, which library is the embodiment of “economical” computing, and so on. I am sure that many of you are also concerned.

But one thing can not be disagree - we must be aware of the latest tools in this area or risk being left behind. The pace Python has overtaken competitors and has established itself as the industry leader is a good illustration. Of course, a lot comes down to a subjective choice (which tool your organization uses, compatibility with the existing infrastructure, etc.), but if you don’t keep up with the times, it’s time to start NOW.

So what got into the headlines this year [in 2018 - approx. trans.]? Let's go figure it out!

PyTorch 1.0

What kind of HYIP around PyTorch, which I mentioned already many times in this article?

Considering how slow TensorFlow can be, it opened the way for deep learning to PyTorch. Most of the open source code I see on GitHub is an implementation on PyTorch. This is no coincidence - PyTorch is very flexible, and the latest version (v1.0) already supports many Facebook products and scaling, including processing 6 billion text translations per day.

PyTorch is still gaining momentum and its growth will continue in 2019, so now is the time to join the community.

AutoML - Automated Machine Learning

Automated Machine Learning (or AutoML) has gradually gained popularity over the past couple of years. Companies such as RapidMiner , KNIME , DataRobot and H2O.ai have already released excellent products that demonstrate the great potential of this service.

Can you imagine working on an ML project where you only need to work with a drag and drop interface without coding? This is a scenario that may become real in the near future. In addition, a significant event has already happened in ML / DL - the release of Auto Keras !

Auto kerasIs an open source library for performing AutoML tasks. The idea is to make deep learning accessible to subject matter experts who may not have experience with ML. You can get acquainted with the product here . In the coming years, he is going to make a huge breakthrough.

TensorFlow.js - Deep Learning in the browser

Since we started this work, we build and design machine learning and deep learning models in our favorite IDEs and laptops. How about taking a step and trying something else? Yes, I'm talking about deep learning right in your web browser!

Now it has become a reality thanks to the emergence of TensorFlow.js. There are several examples on the project site that show how cool this open source concept is. First of all, TensorFlow.js has three advantages / features:

  • You can develop and deploy ML models using JavaScript;
  • Run existing TensorFlow models in your browser;
  • Retraining ready-made models.

Trends in AutoML for 2019

I wanted to focus on AutoML in this article. What for? I feel that in the next few years the situation in the field of data science will change, but don’t take my word for it! H2O.ai's Mario Mihailides, GM Kaggle, will tell you what to expect from AutoML in 2019:
Machine learning continues its journey towards becoming one of the most important trends of the future — of where the world is heading. This expansion has increased the demand for applications in this area. Given this growth, it is imperative that automation be the key to maximizing the use of resources in data science. After all, the applications are endless: lending, insurance, anti-fraud, computer vision, acoustics, sensors, recommendations, forecasting, NLP. It is a great honor to work in this area. The list of trends that will remain relevant is as follows:

  1. Providing visualizations and insights to help describe and understand the data;
  2. Search / build / extract the best features for a given data set;
  3. Building more powerful / smarter predictive models;
  4. Reducing the gap between black box modeling and the use of such a model;
  5. Facilitate the production of these models

Reinforcement Learning (reinforced learning)

Reinforcement learning (eng. Reinforcement learning) is one of the ways of machine learning, during which the subject system (agent) learns by interacting with a certain environment. From the point of view of cybernetics, is one of the types of cybernetic experiment. The response of the environment (and not the special reinforcement management system, as it happens in teacher training) to the decisions made is reinforcement signals, so such training is a special case of teacher training, but the teacher is the environment or its model. You also need to keep in mind that some reinforcement rules are based on implicit teachers, for example, in the case of an artificial neural environment, on the simultaneous activity of formal neurons, because of which they can be attributed to learning without a teacher.

- source of Wikipedia

If I were asked in which area I would like to see more rapid development, the answer would be reinforcement learning. Despite occasional headers, there have not yet been any breakthroughs in this area, and most importantly, it seems that for the community the task of reinforcement learning is still too complex and there are no areas for real application of such applications.

To some extent this is true, next year I would like to see more practical examples of using RL. Every month on GitHub and Reddit I try to maintain at least one repository or RL discussion to facilitate discussion of this topic. Quite possibly, this is the next important thing that comes out of all this research.

OpenAI published a really useful toolkit for those who are just introduced to RL. You can read the introduction to the RL here (for me it turned out to be very useful).

If I missed something, I will be glad to your additions.

OpenAI Development in Deep Reinforcement Learning

While the development of RL is slow, the amount of educational material on this topic remains minimal (to put it mildly). Despite this, OpenAI shared some excellent material on this topic. They called their project “Spinning Up in Deep RL”, it is available at this link .

Simply put, this is an exhaustive list of RL resources. The authors have tried to make the code and explanations as simple as possible. There are enough materials that include RL terminology, RL research development tips, lists of important materials, well-documented code and repositories, and examples of tasks to get you started.

No need to postpone until later, if you plan to start working with RL, your time has come!

Dopamine from google

To give impetus to the development and involve the community in the topic of reinforcement learning, the Google AI team introduced the Densor, TensorFlow framework, designed to make projects more flexible and replicable, for everyone.

In this GitHub repository, you can find the necessary information for learning along with the TensorFlow code. This is probably the ideal platform to start simple experiments in a controlled and flexible environment. Sounds like a dream come true for any specialist.

Trends in Reinforcement Learning for 2019

Xander Steenbrugge, speaker of the DataHack Summit 2018 and founder of the channel ArxivInsights, is an expert in reinforcement learning. Here are his thoughts on the current state of RL and what we should expect in 2019:
At the moment I see three main problems in the field of RL:

  1. The complexity of the model (the agent must see / collect a large amount of experience to learn)
  2. Compilation and transfer of training (Training on task A, test on related task B)
  3. Hierarchical RL (automatic decomposition of subgoals)

I am sure that the first two problems can be solved using a similar set of methods related to unsupervised representation learning.

Now in RL, we train deep neural networks that extract from the raw input space (for example, in pixels) actions in an end-to-end manner (for example, with back propagation) using rare reward signals (for example, an account in some kind of game). Atari or the success of robotic capture). The problem here is that:

First. It takes a long time to “grow” useful function detectors, because the signal-to-noise ratio is very low. RL basically begins with random actions, until you get lucky to stumble upon a reward, then you still need to find out what exactly the specific reward was actually caused. Further research is either hard-coded (epsilon-greedy research), or encouraged by methods such as curiosity-driven-exploration . This is inefficient and it returns us to problem 1.

Secondly , such deep neural network architectures are known for their tendency to “memorize”, and in RL we usually test agents on datasets for learning, hence “memorizing” is encouraged in this paradigm.

A possible developmental path that I look with enthusiasm for is to use unsupervised representation learning to transform dirty multidimensional input spaces (for example, pixels) into a “conceptual” lower dimension space that has certain required properties, such as: linearity, disentangling, stability to noise and more.

As soon as you manage to link the pixels into a kind of “latent space”, learning suddenly becomes simpler and faster (problem 1) and you hope that the rules learned from this space will have a stronger generalization due to the properties mentioned above ( problem 2).

I am not an expert on the Hierarchy problem, but all of the above is also applicable here: it is easier to solve a complex hierarchical problem in a “hidden space” than in an unprocessed input space.

A pair of spoilers from the translator

What is representation learning?
В машинном обучении, feature learning или representation learning — это набор техник которые позволяют системе автоматически исследовать факторы необходимы для определения функций или классификации на основе сырых данных. Это заменяет ручной feature engineering и позволяет машине как и изучать функции, так и использовать их для выполнения конкретных задач.

Feature learning может быть “под наблюдением” и “без наблюдения”:

  • В feature learning под наблюдением (supervised feature learning), функции изучаются с использованием размеченных входящих данных.
  • В feature learning без наблюдения (unsupervised feature learning), функции изучаются на основе неразмеченных данных.

источник Wikipedia

What is the latent space?
Слово “латентный” здесь означает “скрытый”. В этом контексте оно чаще всего используется в машинном обучении — вы наблюдаете какие-нибудь данные, которые находятся в пространстве, которое вы можете наблюдать, и вы хотите преобразовать их в скрытое пространство, где сходные точки данных находятся ближе друг к другу.

Для примера рассмотрим 4 изображения:

В наблюдаемом пиксельном пространстве нет непосредственного сходства между любыми двумя изображениями. Но, если вы хотите отобразить его в скрытом пространстве, вы бы хотели, чтобы изображения слева были ближе друг к другу в скрытом пространстве, чем к любому из изображений справа. Таким образом, ваше скрытое пространство передаёт суть структуры ваших данных связанных с задачей. В LDA вы моделируете задачу так, чтобы документы, относящиеся к аналогичным темам, находились ближе в скрытом пространстве тем. При встраивании слов вы хотите отобразить слова в скрытое векторное пространство так, чтобы слова с похожим значением находились ближе в этом пространстве.

Bonus: Check out Xander’s video on overcoming rare rewards in Deep RL (the first task outlined above).

The complexity of the model will continue to improve with the addition of new and new auxiliary learning tasks that increase sparseness, atypical signals of reward (things like research, curiosity-based preliminary training in the style of auto-encoder, unraveling of causal factors in the environment, etc.). This works especially well with very rare reward conditions.

Because of this, training systems directly in the physical world will become more and more feasible (instead of modern applications, which are mainly trained in simulated environments, and then use domain randomizationfor transfer to the real world). I suppose that the year 2019 will bring the first truly impressive demonstrations in robotics, which are possible only with the use of deep learning methods and cannot be rigidly programmed / designed by humans (unlike most of the examples we have seen so far).

I believe that after the success of Deep RL in the history of AlphaGo (especially considering the recent results of AlphaFold), RL will gradually begin to be used in real business applications that will bring practical value beyond the boundaries of academic space, but first the application will be limited to applications that have accurate simulations for large-scale virtual training of these agents (for example, drug discovery, optimization of the architecture of electronic chips, routing of vehicles and packages, and others).

The overall shift in RL development is a process that has already begun when testing an agent on training data will no longer be considered “authorized”. Generic metrics will be key, as in the case of controlled learning methods.

AI for good boys - movement to “ethical” AI

Imagine a world driven by algorithms that define every human action. Not the most pleasant scenario? Ethics in AI is a topic that we have always discussed in Analytics Vidhya, but it is lost in the background of all technical discussions, while it should be considered on a par with other topics.

This year, quite a few organizations have found themselves in a stupid position after the scandal with Cambridge Analytica (Facebook) and Google’s internal controversy about weapons development, leading the list of scandals.

There is no simple and suitable for all cases a recipe for solving ethical aspects of AI. The question requires a detailed approach in conjunction with a structured plan, the execution of which someone must undertake. Let's look at a couple of major events that shook the area at the beginning of this year.

Campaigns from Google and Microsoft

It was great to see that large corporations focused on the ethical side of AI (although the path that led them to this point was not very elegant). Pay attention to the guidelines and principles published by some of the companies:

In essence, these documents speak of justice in AI, as well as when and where to draw a line. Referring to them when you start a new project based on AI is always a good idea.

How GDPR changed the rules of the game

The GDPR (General Data Protection Regulation) definitely had an impact on the data collection method for creating AI applications. The GDPR appeared in this game in order to provide users with greater control over their data (what information is collected and distributed about them).

So how does this affect AI? Well, if researchers in the field of data do not receive data or they are not enough, the construction of any model will not begin. This, of course, marked the beginning of how social platforms and other sites worked. The GDPR created a wonderful example by “dotting i”, but limited the usefulness of AI for many platforms.

Ethical trends in AI for 2019

There are a lot of gray spots in this area. We must unite as a society to integrate ethics into AI projects. How can we do this? Founder and CEO of Analytics Vidhya Kunal Jane emphasized in his speech at the DataHack Summit 2018 that we will need to develop a concept that others can follow.

I expect to see new roles in organizations that will deal with ethics in AI. Best corporate practices will need to be restructured, and management approaches must be revised, since AI is becoming a central element of the company's vision. I also expect the government to play a more active role in this regard with a fundamentally new or changed political course. Indeed, the year 2019 will be very interesting.


Impactful is the only word that briefly describes amazing events in 2018. I became an active user of ULMFiT this year, and I look forward to learning about BERT as soon as possible. Really amazing time.

I will be glad to know your opinion! What development you seemed most useful? Do you work on any project using the tools and approaches that we discussed in this article? What are your predictions for the coming year? I look forward to your responses in the comments below.

Also popular now: