The Mathematical Secrets of Big Data

Original author: Ingrid Daubechies
  • Transfer
image

The so-called machine learning does not cease to amaze, however, for mathematicians, the reason for success is still not entirely clear.

One couple of years ago, at a dinner I was invited to, an outstanding specialist in the field of differential geometry Eugenio Calabi volunteered to devote me to the subtleties of a very ironic theory about the difference between adherents of pure and applied mathematics. So, having come to a standstill in their research, proponents of pure mathematics often narrow down the problems, trying in such a way to get around the obstacle. And their colleagues specializing in applied mathematics come to the conclusion that the current situation indicates the need to continue studying mathematics in order to create more effective tools.

I always liked this approach; because thanks to him it becomes clear that applied mathematicians will always be able to use new concepts and structures, which now and then appear in the framework of fundamental mathematics. Today, when the agenda is the study of "big data" - too large or complex blocks of information that cannot be understood using only traditional methods of data processing - the trend is all the more relevant.

The modern mathematical approach to the interpretation of many methods that are crucial during the current big data revolution is not enough at best. Consider the simplest example of teaching with a teacher, which was used by companies such as Google, Facebook, and Apple to create voice or image recognition technology that should be as close as possible to human performance. The development of such systems begins with the preparation of a huge number of training samples - millions or billions of images and voice recordings - which are used to form a deep neural network that defines statistical patterns. As in other areas of machine learning, researchers hope that computers will be able to process enough data to "study" the task: in this case, the machine is not programmed for a detailed decision-making scheme; she is given the opportunity to adhere to various algorithms, thanks to which you can focus on the relevant samples.

Speaking in the language of mathematics, such training systems with a teacher provide for large sets of stimuli and corresponding reactions; the computer is set the task to master a function that, for sure, guarantees the correct result in the event of a new incoming signal. To do this, the computer has to decompose the task into several unknown - sigmoid - functions. These S-shaped functions resemble an ascent from the road to the sidewalk: this is a smooth transition from one level to another, where the initial level, step height and width of the transition area are not previously determined.

In response to the input signal arriving at the first level of the sigmoid function, results are generated that, before moving to the second level of the sigmoid function, can be combined. So the process continues from level to level. The data obtained during the operation of the functions form a “network” in the neural system. A “deep” neural network consists of many layers.

Several decades ago, researchers proved that such networks are universal, which means that they can generate all possible functions. Other scientists later came to theoretical conclusions about the existence of a unique connection between the network and the functions that it generates. True, the results of such studies concerned potential networks consisting of an incredible number of layers and having many points of intersection of functions within each layer. In practice, neural networks involve about 2-20 layers *. Due to this limitation, none of the classical theories could explain why neural networks and learning through deep neural networks are so effective.

And here is the credo of most applied mathematicians: if the mathematical principles really work well, there should be an excellent mathematical explanation for everything and we simply must find it. In this case, it may turn out that so far we don’t even have the appropriate mathematical basis to deal with all this (or, if there is one, it may have been created in the framework of pure mathematics, from which the approach did not extend to other mathematical disciplines).

Another method used in machine learning - learning without a teacher, is used to identify hidden connections in large blocks of information. For example, let's say you are a researcher who wants to study in detail the personality types of people. You received a substantial grant, thanks to which it became possible to conduct a personality test of 500 questions among 200,000 participants in the experiment. The answers vary on a scale of one to 10. As a result, you have 200,000 data processing results in 500 virtual “dimensions” - one measurement for each initial question from the test. Taken together, these results form one- and two-dimensional sections in a 500-dimensional space. It’s like a simple climb to the mountains corresponds to a two-dimensional model in three-dimensional space.

For you, as a researcher, it is important to determine the one- and two-dimensional models mentioned in order to subsequently reduce the personality portraits of 200,000 participants in the experiment to fundamental characteristics - this is how to find out that two variables are enough to identify any point within a particular mountain range. Perhaps the personality test can also be set with a simple function that describes the relationship between variables, the total number of which is less than 500. This kind of function allows you to display a hidden structure in the data.

About the last 15 years, researchers have created several tools to analyze the geometry of hidden structures. For example, you can build a surface model by first zooming in on different points. At each point you apply a drop of virtual ink and see how it spreads. Depending on the degree of bending of the surface to a specific point, the ink will / will not spread in one direction or another. By combining all the ink spots, you get a pretty clear idea of ​​how the surface looks as a whole. With this information, you would have more than just a set of data processing results. Before you, your eyes would open the connections available on the surface, interesting loops, folds and kinks. And that means you would understand how you can research the information received.

Thanks to the above methods, many interesting and useful discoveries have been made, but additional tools will be required. Applied mathematicians will have to work hard. But even faced with such difficult tasks, they believe that many of their “pure” colleagues will be objective, continue to participate in ongoing projects and help to discover the connection between different mathematical structures. And, perhaps, they will create new ones at all.

* The original version of the article states that practical neural networks use only two or three levels. Currently, this indicator for innovative systems is more than 10 levels. The winner of the last stage of ImageNet Large-Scale Visual Recognition Challenge - Image Recognition Algorithm from Google - involved 22 levels.

Also popular now: