Deep learning: opportunities, perspectives and a bit of history
Over the past few years, the phrase “deep learning” has surfaced in the media too often. Various magazines like KDnuggets and DigitalTrends try not to miss the news from this area and talk about popular frameworks and libraries.
Even popular publications like The NY Times and Forbes tend to write regularly about what scientists and developers in the deep learning field are doing. And interest in deep learning is still not fading. Today we will talk about what deep learning is capable of now, and in what scenario it will develop in the future. / photo xdxd_vs_xdxd CC
What is the difference between the deep learning algorithm and a normal neural network? According to Patrick Hall , a leading data researcher at SAS, the most obvious difference: the neural network used in deep learning has more hidden layers. These layers are between the first, or input, and last, output, layer of neurons. Moreover, it is not necessary to connect all neurons at different levels to each other.
The distinction between deep learning and artificial intelligence is not so straightforward. For example, Professor of the University of Washington Pedro Domingosagrees with the view that deep learning is a hyponym in relation to the term “machine learning”, which in turn is a hyponym in relation to artificial intelligence. Domingos says that in practice, their areas of application intersect quite rarely.
However, there is another opinion. Hugo Larochelle , a professor at the University of Sherbrooke, is confident that these concepts are almost completely unrelated. Hugo notes that AI focuses on goals, and deep learning focuses on a specific technologyor the methodology needed for machine learning. Therefore, hereinafter, when speaking about achievements in the field of AI (such as AlphaGo, for example), we will keep in mind that such developments use deep learning algorithms - but along with other developments from the field of AI in general and machine learning in particular [how rightly notes Pedro Domingos].
Deep neural networks appeared long ago, back in the 1980s . So why did deep learning begin to develop actively only in the 21st century? Representations in a neural network are created in layers, so it was logical to assume that more layers will allow the network to learn better. But the network training method plays a big role. Previously, the same algorithms were used for deep learning as for training artificial neural networks - the reverse encryption method. Such a method could effectively train only the last layers of the network, as a result of which the process was extremely lengthy, and the hidden layers of the deep neural network did not actually “work”.
In 2006 alone, three independent groupsscientists were able to develop ways to overcome difficulties. Jeffrey Hinton was able to pre-train the network using the Boltzmann machine , training each layer separately. To solve the problems of image recognition, Jan LeKan proposed the use of a convolutional neural network consisting of convolutional layers and subsample layers. The cascading auto-encoder developed by Joshua Benggio also enabled all layers in a deep neural network.
Today, deep learning is used in completely different areas, but perhaps the most common use case is in image processing. The face recognition function has existed for a long time, but, as they say, there is no limit to perfection. The developers of the OpenFace service are sure that the problem has not yet been solved, because the recognition accuracy can be improved. And these are not just words, OpenFace can distinguish even people who are similar in appearance. Details about the work of the program have already been written in this article. In-depth training will also help when working with black and white files, which Colornet uses to automatically colorize files .
In addition, deep networks are now able to recognize human emotions. And along with the ability to track the use of the company logo in photographs and the analysis of the accompanying text, we get a powerful marketing tool . Similar services are developed, for example, by IBM. The tool allows you to evaluate the authors of texts when searching for bloggers for collaboration and advertising.
Program NeuralTalk able to describe the image with a few suggestions. A set of images and 5 sentences describing each of them are loaded into the program database. At the training stage, the algorithm learns to predict sentences based on a keyword using the previous context. And at the forecasting stage, Jordan’s neural network is already creating sentences that describe pictures.
Today, there are many applications that can solve different tasks in working with audio. For example, the Magenta application , developed by the Google team, can create music. But most applications focus on speech recognition. The Google Voice Internet service is able to transcribe voice mail and has SMS management functions, while researchers used existing voice messages to train deep networks.
According to scientists such as Noam Chomsky , it is impossible to teach a computer to fully understand speech and conduct an informed dialogue, because even the mechanism of human speech is not fully understood. Attempts to teach cars to speak began in 1968, when Terry Vinograd created the SHRDLU program . She was able to recognize parts of speech, describe objects, answer questions, even had a small memory. But attempts to expand the vocabulary of the machine led to the fact that it became impossible to control the application of the rules.
But today, with the help of in-depth training from Google in the person of the developer Kuoka, Le has stepped far forward. His designs are able to respond to letters in Gmail and even help Google technical support specialists. And the Cleverbot programstudied at dialogs from 18 900 films. Therefore, she can answer questions even about the meaning of life. So, the bot believes that the meaning of life is to serve good. However, scientists again faced with the fact that artificial intelligence only simulates understanding and has no idea about reality . The program perceives speech only as a combination of certain characters.
Teaching machines a language can also help in translation. Google has long been engaged in improving the quality of translation in its service. But how much can machine translation be brought closer to ideal if a person cannot always correctly understand the meaning of a statement? Ray Kurzweil proposes to solve this problem graphically represent the semantic meaning of words in the language. The process is quite time-consuming: in a special directory Knowledge Graphcreated by Google, scientists have downloaded data on almost 700 million topics, places, people, between which almost a billion different connections were made. All this is aimed at improving the quality of translation and the perception of language by artificial intelligence.
The very idea of representing the language in graphical and / or mathematical methods is not new. Back in the 80s, scientists were faced with the task of presenting a language in a format with which a neural network could work. As a result, a variant of representing words in the form of mathematical vectors was proposed, which made it possible to accurately determine the semantic proximity of different words (for example, in a vector space the words “boat” and “water” should be close to each other). These studies are the basis of today's Google designs, which modern researchers no longer call "individual word vectors," but "idea vectors."
Today, in-depth training even penetrates the healthcare sector and helps to monitor the condition of patients no worse than doctors. For example, the Darmouth-Hitchcock Medical Center in the United States uses the specialized Microsoft ImagineCare service , which allows doctors to catch subtle changes in the condition of patients. Algorithms receive data on weight changes, monitor patient pressure and can even recognize emotional state based on analysis of telephone conversations.
Deep learning is also used in pharmaceuticals. Today, molecular targeted therapy is used to treat various types of cancer. But to create an effective and safe medication, it is necessary to identify active molecules that would act only on a given target, avoiding side effects. The search for such molecules can be carried out using in-depth training (a description of a project conducted jointly by scientists from universities in Austria, Belgium and the R&D department of Johnson & Johnson is in this scientific material).
How "deep" is deep learning? The answer to this question can give developers AlphaGo . This algorithm cannot speak, cannot recognize emotions. But he is able to beat anyone in a board game. At first glance, there is nothing special. Almost 20 years ago, a computer developed by IBM first defeated human chess. But AlphaGo is a completely different matter. Board game Go appeared in ancient China. The beginning is something like chess - opponents play a cage on the board, black pieces against white ones. But the similarities end there, because the pieces are small pebbles, and the goal of the game is to surround your opponent’s pebbles.
But the main difference is that there are no pre-known winning combinations; it is impossible to think several moves ahead. The car cannot be programmed to win, because it is impossible to build a winning strategy in advance. This is where deep learning comes into play. Instead of programming specific moves, AlphaGo analyzed hundreds of thousands of games played and played a million games with itself. Artificial intelligence can learn in practice and perform complex tasks, acquiring what a person would call "an intuitive understanding of a winning strategy."
Despite the staggering success of AlphaGo, artificial intelligence is still far from enslaving the human race. The machines learned a peculiar “intuitive thinking”, processing a huge amount of data, but, according to Fei Fei Lee, the head of the Stanford Laboratory of Artificial Intelligence, abstract and creative thinking is not available to them.
Despite some progress in image recognition, a computer may confuse a traffic sign with a refrigerator. Together with his colleagues, Lee compiles a database of images with their detailed description and a large number of tags that will allow the computer to get more information about real objects.
According to Lee, such an approach - training based on a photo and a detailed description of it - is similar to how children learn by associating words with objects, relationships and actions. Of course, this analogy is rather crude - a child does not need to thoroughly describe each object and its environment to understand the interconnections of real-world objects.
Professor Josh Tenenbaum , a student of cognitive science at MIT, notes that the algorithm for learning the world and learning from a computer is very different from the process of cognition in humans; despite its size, artificial neural networks cannot compare with the device of biological networks. So, the ability to speak is formed in a person very early and is based on the visual perception of the world, possession of the musculoskeletal system. Tenenbaum surethat it is not possible to teach machines full-fledged thinking without imitating human speech and the psychological component.
Fei Fei Lee agrees with this opinion. According to the scientist, the current level of work with artificial intelligence will not allow to bring it closer to human - at least due to the presence of emotional and social intelligence in people. Therefore, the seizure of the world by machines should be postponed for at least another couple of decades.
PS Additional reading: Our IaaS digest - 30 materials on the applicability of cloud technologies.
Even popular publications like The NY Times and Forbes tend to write regularly about what scientists and developers in the deep learning field are doing. And interest in deep learning is still not fading. Today we will talk about what deep learning is capable of now, and in what scenario it will develop in the future. / photo xdxd_vs_xdxd CC
A few words about deep learning, neural networks and AI
What is the difference between the deep learning algorithm and a normal neural network? According to Patrick Hall , a leading data researcher at SAS, the most obvious difference: the neural network used in deep learning has more hidden layers. These layers are between the first, or input, and last, output, layer of neurons. Moreover, it is not necessary to connect all neurons at different levels to each other.
The distinction between deep learning and artificial intelligence is not so straightforward. For example, Professor of the University of Washington Pedro Domingosagrees with the view that deep learning is a hyponym in relation to the term “machine learning”, which in turn is a hyponym in relation to artificial intelligence. Domingos says that in practice, their areas of application intersect quite rarely.
However, there is another opinion. Hugo Larochelle , a professor at the University of Sherbrooke, is confident that these concepts are almost completely unrelated. Hugo notes that AI focuses on goals, and deep learning focuses on a specific technologyor the methodology needed for machine learning. Therefore, hereinafter, when speaking about achievements in the field of AI (such as AlphaGo, for example), we will keep in mind that such developments use deep learning algorithms - but along with other developments from the field of AI in general and machine learning in particular [how rightly notes Pedro Domingos].
From the “deep neural network” to deep learning
Deep neural networks appeared long ago, back in the 1980s . So why did deep learning begin to develop actively only in the 21st century? Representations in a neural network are created in layers, so it was logical to assume that more layers will allow the network to learn better. But the network training method plays a big role. Previously, the same algorithms were used for deep learning as for training artificial neural networks - the reverse encryption method. Such a method could effectively train only the last layers of the network, as a result of which the process was extremely lengthy, and the hidden layers of the deep neural network did not actually “work”.
In 2006 alone, three independent groupsscientists were able to develop ways to overcome difficulties. Jeffrey Hinton was able to pre-train the network using the Boltzmann machine , training each layer separately. To solve the problems of image recognition, Jan LeKan proposed the use of a convolutional neural network consisting of convolutional layers and subsample layers. The cascading auto-encoder developed by Joshua Benggio also enabled all layers in a deep neural network.
Projects that “see” and “hear”
Today, deep learning is used in completely different areas, but perhaps the most common use case is in image processing. The face recognition function has existed for a long time, but, as they say, there is no limit to perfection. The developers of the OpenFace service are sure that the problem has not yet been solved, because the recognition accuracy can be improved. And these are not just words, OpenFace can distinguish even people who are similar in appearance. Details about the work of the program have already been written in this article. In-depth training will also help when working with black and white files, which Colornet uses to automatically colorize files .
In addition, deep networks are now able to recognize human emotions. And along with the ability to track the use of the company logo in photographs and the analysis of the accompanying text, we get a powerful marketing tool . Similar services are developed, for example, by IBM. The tool allows you to evaluate the authors of texts when searching for bloggers for collaboration and advertising.
Program NeuralTalk able to describe the image with a few suggestions. A set of images and 5 sentences describing each of them are loaded into the program database. At the training stage, the algorithm learns to predict sentences based on a keyword using the previous context. And at the forecasting stage, Jordan’s neural network is already creating sentences that describe pictures.
Today, there are many applications that can solve different tasks in working with audio. For example, the Magenta application , developed by the Google team, can create music. But most applications focus on speech recognition. The Google Voice Internet service is able to transcribe voice mail and has SMS management functions, while researchers used existing voice messages to train deep networks.
Projects in the "colloquial genre"
According to scientists such as Noam Chomsky , it is impossible to teach a computer to fully understand speech and conduct an informed dialogue, because even the mechanism of human speech is not fully understood. Attempts to teach cars to speak began in 1968, when Terry Vinograd created the SHRDLU program . She was able to recognize parts of speech, describe objects, answer questions, even had a small memory. But attempts to expand the vocabulary of the machine led to the fact that it became impossible to control the application of the rules.
But today, with the help of in-depth training from Google in the person of the developer Kuoka, Le has stepped far forward. His designs are able to respond to letters in Gmail and even help Google technical support specialists. And the Cleverbot programstudied at dialogs from 18 900 films. Therefore, she can answer questions even about the meaning of life. So, the bot believes that the meaning of life is to serve good. However, scientists again faced with the fact that artificial intelligence only simulates understanding and has no idea about reality . The program perceives speech only as a combination of certain characters.
Teaching machines a language can also help in translation. Google has long been engaged in improving the quality of translation in its service. But how much can machine translation be brought closer to ideal if a person cannot always correctly understand the meaning of a statement? Ray Kurzweil proposes to solve this problem graphically represent the semantic meaning of words in the language. The process is quite time-consuming: in a special directory Knowledge Graphcreated by Google, scientists have downloaded data on almost 700 million topics, places, people, between which almost a billion different connections were made. All this is aimed at improving the quality of translation and the perception of language by artificial intelligence.
The very idea of representing the language in graphical and / or mathematical methods is not new. Back in the 80s, scientists were faced with the task of presenting a language in a format with which a neural network could work. As a result, a variant of representing words in the form of mathematical vectors was proposed, which made it possible to accurately determine the semantic proximity of different words (for example, in a vector space the words “boat” and “water” should be close to each other). These studies are the basis of today's Google designs, which modern researchers no longer call "individual word vectors," but "idea vectors."
Deep Learning and Healthcare
Today, in-depth training even penetrates the healthcare sector and helps to monitor the condition of patients no worse than doctors. For example, the Darmouth-Hitchcock Medical Center in the United States uses the specialized Microsoft ImagineCare service , which allows doctors to catch subtle changes in the condition of patients. Algorithms receive data on weight changes, monitor patient pressure and can even recognize emotional state based on analysis of telephone conversations.
Deep learning is also used in pharmaceuticals. Today, molecular targeted therapy is used to treat various types of cancer. But to create an effective and safe medication, it is necessary to identify active molecules that would act only on a given target, avoiding side effects. The search for such molecules can be carried out using in-depth training (a description of a project conducted jointly by scientists from universities in Austria, Belgium and the R&D department of Johnson & Johnson is in this scientific material).
Does the algorithm have intuition?
How "deep" is deep learning? The answer to this question can give developers AlphaGo . This algorithm cannot speak, cannot recognize emotions. But he is able to beat anyone in a board game. At first glance, there is nothing special. Almost 20 years ago, a computer developed by IBM first defeated human chess. But AlphaGo is a completely different matter. Board game Go appeared in ancient China. The beginning is something like chess - opponents play a cage on the board, black pieces against white ones. But the similarities end there, because the pieces are small pebbles, and the goal of the game is to surround your opponent’s pebbles.
But the main difference is that there are no pre-known winning combinations; it is impossible to think several moves ahead. The car cannot be programmed to win, because it is impossible to build a winning strategy in advance. This is where deep learning comes into play. Instead of programming specific moves, AlphaGo analyzed hundreds of thousands of games played and played a million games with itself. Artificial intelligence can learn in practice and perform complex tasks, acquiring what a person would call "an intuitive understanding of a winning strategy."
Cars won't take over the world
Despite the staggering success of AlphaGo, artificial intelligence is still far from enslaving the human race. The machines learned a peculiar “intuitive thinking”, processing a huge amount of data, but, according to Fei Fei Lee, the head of the Stanford Laboratory of Artificial Intelligence, abstract and creative thinking is not available to them.
Despite some progress in image recognition, a computer may confuse a traffic sign with a refrigerator. Together with his colleagues, Lee compiles a database of images with their detailed description and a large number of tags that will allow the computer to get more information about real objects.
According to Lee, such an approach - training based on a photo and a detailed description of it - is similar to how children learn by associating words with objects, relationships and actions. Of course, this analogy is rather crude - a child does not need to thoroughly describe each object and its environment to understand the interconnections of real-world objects.
Professor Josh Tenenbaum , a student of cognitive science at MIT, notes that the algorithm for learning the world and learning from a computer is very different from the process of cognition in humans; despite its size, artificial neural networks cannot compare with the device of biological networks. So, the ability to speak is formed in a person very early and is based on the visual perception of the world, possession of the musculoskeletal system. Tenenbaum surethat it is not possible to teach machines full-fledged thinking without imitating human speech and the psychological component.
Fei Fei Lee agrees with this opinion. According to the scientist, the current level of work with artificial intelligence will not allow to bring it closer to human - at least due to the presence of emotional and social intelligence in people. Therefore, the seizure of the world by machines should be postponed for at least another couple of decades.
PS Additional reading: Our IaaS digest - 30 materials on the applicability of cloud technologies.