SmirkinDA June 27, 2017 at 11:40

Security Challenges and Key AI Achievements

The topic of artificial intelligence remains in the focus of interest of a large number of people. The main reason for the unflagging attention of the public is that in recent years we have learned about hundreds of new projects that use weak AI technology. It is very likely that people living on the planet will be able to personally witness the emergence of a strong AI. Under the cut, the story of exactly when to wait for brainy robots in your apartment. Thanks for the bright thoughts of ZiingRR and Vladimir Shakirov . Enjoy.

What changes await us when cars will in many ways surpass their creators? Well-known scientists and researchers give completely different predictions: from the very pessimistic predictions of the leading specialist at the Baidu artificial intelligence laboratory, Andrew Un, to the restrained assumptions of a Google expert, one of the authors of the back propagation method for training the multilayer neural network of Jeffrey Hinton, and the optimistic arguments of Shane Legg, co-founder of the company DeepMind (now part of Alphabet).

Natural Language Processing

Let's start with some advances in natural language formation. To evaluate language models in computer linguistics, perplexity is used - a measure of how well a model predicts the details of a test collection. The smaller the perplexity, the better the language model, taking one word from a sentence, calculates the probability distribution of the next word.

If a neural network makes a mistake (logical, syntactic, pragmatic), it means that it gave too much probability to inappropriate words, i.e. Perplexia is not yet optimized.

When hierarchical neural chatbots achieve a fairly low level of perception, they are likely to be able to write coherent texts, give clear judicious answers, and consistently and logically reason. We can create a conversational model and imitate the style and beliefs of a particular person.

What is the likelihood of developing perplexity in the near future?

Let's compare two great works: “Contextual LSTM models for large scale NLP tasks” (“Contextual LSTM models for large-scale NLP tasks”, work No. 1) and “Exploring the limits of language modeling”, work number 2).

It is reasonable to assume that if 4096 hidden neurons are used in the Contextual LSTM models for large scale NLP tasks, then the perplexity will be less than 20, and 8192 hidden neurons can give less than 15 perplexity. An ensemble of models with 8192 hidden neurons trained in 10 billion words , may well give a perplexity much lower than 10. It is not yet known how soundly such a neural network can reason.

It is possible that in the near future, a decrease in perplexity will allow us to create neural network chatbots that can write meaningful texts, give clear, meaningful answers and support the conversation.

If you understand how to optimize perplexity, you can achieve good results. They also offer a partial solution to the problem through adversarial learning. The article “Generating sequences from continuous space” demonstrates the impressive benefits of competitive learning.

For automatic assessment of the quality of machine translation, the BLEU metric is used, which determines the percentage of n-grams (a sequence of syllables, words or letters) that match in a machine translation and a standard translation of a sentence. BLEU estimates translation quality on a scale from 0 to 100 based on a comparison of human translation and machine translation and search for common fragments - the more matches, the better the translation.

According to the BLEU metric, a human translation from Chinese into English with an MT03 dataset is gaining 35.76 points. However, the GroundHog network scored 40.06 BLEU points on the same text and data array. However, there was a life hack: the maximum likelihood score was replaced by its own MRT criterion, which increased BLEU scores from 33.2 to 40.06. In contrast to the usual maximum likelihood assessment, training with minimal risk can directly optimize the model parameters in relation to the assessment metrics. The same impressive results can be achieved by improving the quality of translation using monolingual data.

Modern neural networks translate 1000 times faster than humans. Learning foreign languages is becoming less and less useful, since machine translation is improving faster than most people learn.

Computer vision

Top-5 error is a metric in which the algorithm can produce 5 variants of the picture class and the error is counted if there is no correct one among all these options.

In the article “Identity mappings in deep residual networks”, the figure is 5.3% according to the top-5 error metric with single models, while the human level is 5.1%. In “deep residual networks”, a single model gives 6.7%, and the ensemble of these models with the Inception algorithm gives 3.08%.

Good results are also achieved through “deep networks with stochastic depth”. An error of ~ 0.3% is reported in the notes on the ImageNet database, so the real error on ImageNet may soon fall below 2%. AI is superior to humans not only in ImageNet classification, but also in highlighting boundaries. The assignment for the classification of videos in the SPORTS-1M data array (487 classes, 1 million videos) improved from 63.9% (2014) to 73.1% (March 2015).

Convolutional neural networks (CNNs) are also superior in speed to humans — ~ 1000 times faster than humans (note that this is about groups) or even ~ 10,000 times faster after comparison. Processing 24 fps video on AlexNet requires only 82 Gflops / s, and on GoogleNet - 265 Gflops / s.

The best raw data runs at 25 ms. In total, the NVIDIA Titan X video card (6144 GFlop / s) requires 71 ms to display 128 frames; therefore, in order to play real-time video from 24 frames / s, 6144 Gflop / s * (24/128) * 0.025 ≈ 30 Gflops / s are required. To teach backpropagation, you need 6144 Gflops / sec * (24/128) * 0.071 ≈ 82 Gflops / s. The same calculations for GoogleNet give 83 Gflop / s and 265 Gflop / s, respectively.

DeepMind Network Can Generate Photorealistic Images Based on Text Entered by Man

Networks answer questions based on images. In addition, networks can describe images using sentences, in some metrics even better than people. Along with the translation of video => text, experiments are conducted to translate text => image.
In addition, networks are actively working with speech recognition.

The speed of development is very high. For example, on Google, the relative number of mistakenly recognized words decreased from 23% in 2013 to 8% in 2015.

Reinforcement training

AlphaGo is a powerful AI in its own very small world, which is just a board with stones. If we improve AlphaGo with continuous reinforcement training (and if we manage to get it to work with the complex tasks of the real world), then it will turn into a real AI in the real world. In addition, there are plans to train him in virtual video games. Often video games contain much more interesting tasks than the average person meets in real life. AlphaGo is a good example of modern reinforcement learning with a modern convolutional neural network.

Effective Learning Without a Teacher

There are models that allow the computer to independently create data and information, such as photographs, films or music. The system Deep Generative Adversarial Networks (DCGAN, deep generative convolutional competitive networks) is able to create unique photorealistic images using a competent combination of two deep neural networks that "compete" with each other.

Language generation models that minimize perplexity are trained without a teacher and have made great progress in recent years. The Skip-thought vectors algorithm generates a vector expression for sentences that allows you to train linear classifiers over these vectors and their cosine distances to solve many problems of teaching with a teacher at the modern level. Recent work continues to develop the method of "computer vision as inverse graphics" ("computer vision in the role of reverse graphic networks").

Multimodal training

Multimodal training is used to increase the efficiency of tasks related to video classification. Multimodal teaching without a teacher is used to justify text phrases in images using a mechanism based on attention, thus modalities teach each other. In the “Neural self-talk” algorithm, the neural network sees a picture, on the basis of it generates questions, and itself answers these questions.

Arguments from the point of view of neuroscience

Here is a simplified view of the human cerebral cortex:

Roughly speaking, 15% of the human brain is designed for inactive visual tasks (occipital lobe).

Another 15% - for recognition of images and actions (a little more than half of the temporal lobe).

Another 15% - for tracking the object and its detection (parietal lobe). And 10% are intended for training with reinforcement (the orbital-frontal cortex and part of the prefrontal cortex). Together they form about 70% of the entire brain.

Modern neural networks operate at approximately human level, if we take only these 70% of the brain. For example, CNNs make 1.5 times fewer errors in ImageNet than people do, and they act 1000 times faster.

From the point of view of neuroscience, the human brain has the same structure over the entire surface. Neurons acting on the same principle go deep into the brain by only 3 mm. The mechanisms of work of the prefrontal cortex and other parts of the brain are practically the same. They are also similar in speed of computation and complexity of algorithms. It will be strange if modern neural networks cannot cope with the remaining 30% in the next few years.

About 10% are responsible for inactive motor and motor activity (zones 6.8). However, people who have no fingers from birth experience problems with fine motor skills, but their mental development is normal. People suffering from tetraamelia syndrome have no arms or legs since birth, but their intelligence is completely preserved. For example, Hirotada Ototake, a sports journalist from Japan, became famous by writing his memoirs, which were sold in huge print runs. He also taught at school. Nick Vuichich has written many books, graduated from Griffith University, received a bachelor's degree in commerce, and is now giving motivational lectures.

One of the functions of the dorsolateral prefrontal zone (DLPFC) of the anterior cerebral cortex is the attention that is now actively used in LSTM networks (long short-term memory from the English. Long short-term memory; LSTM).

The only part where the person’s level has not yet been reached is zones 9, 10, 46, 45, which together make up only 20% of the human cerebral cortex. These zones are responsible for complex reasoning, the use of complex tools, a complex language. However, in the articles “A neural conversational model”, “Contextual LSTM ...”, “Playing Atari with deep reinforcement learning” (“Playing Atari with deep training with reinforcement ”),“ Mastering the game of Go ... ”(“ Developing mastery into the game of Go ... ”) and many others, this problem is actively discussed.

There is no reason to believe that it will be more difficult to cope with these 30% than with the already defeated 70%. After all, now a lot more researchers are engaged in deep learning, they have more knowledge and more experience. In addition, there are many times more companies interested in deep learning.

Do we know how our brain works?

A detailed interpretation of the connection has stepped forward markedly after the creation of a multi-beam scanning electron microscope. When an experimental picture of the cerebral cortex with a size of 40x40x50 thousand m3 and a resolution of 3x3x30 nm was obtained, the laboratories received a grant to describe the connectome of a rat brain fragment 1x1x1 mm3 in size.

In many problems for the backpropagation algorithm, weight symmetry is not important: errors can propagate through a fixed matrix, and everything will work. Such a paradoxical conclusion is a key step towards a theoretical understanding of brain function. I also recommend reading the article "Towards Biologically Plausible Deep Learning" ("Forward to Biologically Plausible Deep Learning"). The authors in theory argue about the ability of the brain to perform tasks in deep hierarchies.

Recently, the STDP function has been proposed (synaptic plasticity, depending on the time moment of the pulse from the English. Spike Timing Dependent Plasticity). This is an uncontrolled objective function, somewhat similar to that used, for example, in the word2vec natural language semantics analysis tool. The authors studied the space of polynomial local learning rules (it is assumed that the learning rules are local in the brain) and found that they are superior to the error back propagation algorithm. There are also distance learning methods that do not require a backpropagation algorithm. Although they cannot compete with conventional deep learning, the brain may perhaps use something like this, given its fantastic number of neurons and synaptic connections.

So what will separate us from human level AI?

A number of recent articles on memory networks and Turing neural machines allow the use of memory of an arbitrarily large size, while maintaining an acceptable number of model parameters. Hierarchical memory provided access to O (log n) algorithm complexity memory instead of the usual O (n) operations, where n is the memory size. Reinforced learning Turing machines provided access to O (1) memory. This is an important step towards the implementation of systems like IBM Watson on completely continuous differentiable neural networks, increasing the results of Allen AI challenge from 60% to almost 100%. Using constraints in the recurrence layers, it is also possible to ensure that the bulk memory uses an acceptable number of parameters.

Neural programmer is a neural network, extended by a set of arithmetic and logical operations. Perhaps these are the first steps towards a continuous differentiable Wolphram Alpha system based on a neural network. The “learn how to learn” method has enormous potential.

Not so long ago, the SVRG algorithm was proposed. This area of activity is aimed at the theoretical use of significantly better methods of gradient descent.
There are works that, if successful, will allow teaching with very large hidden layers: Unitary evolution RNN, Tensorizing neural networks.

“Net2net” and “Network morphism” allow you to automatically initialize a new neural network architecture using the scales of the old neural network architecture in order to immediately get the performance of the latter. This is the birth of a modular approach to neural networks. Just download the pre-trained modules for vision, speech recognition, speech generation, reasoning, robotics, etc. and configure them for the final task.

It is advisable to include new words in the sentence vector using the deep learning method. However, modern LSTM networks update the cell vector when a new word is given by the shallow learning method. This problem can be solved with the help of deep recurrent neural networks. The successful use of batch normalization and dropout with respect to recurrence layers could allow training of deep-transition LSTM networks even more efficiently. It would also help hierarchical recursive networks.

Several modern advances have been beaten by an algorithm that allows one to teach recurrent neural networks to understand how many computational steps need to be taken between obtaining the input data and its output. Ideas from residual networks would also increase productivity. For example, stochastic deep neural networks can increase the depth of residual networks with more than 1200 layers, while obtaining real results.

Memristors can speed up the training of neural networks several times and make it possible to use trillions of parameters. Quantum computing promises even more.

Deep learning has become not only easy, but also cheap. For half a billion dollars you can achieve performance of about 7 teraflops. And for half a billion to prepare 2000 highly professional researchers. It turns out that with a budget that is quite real for every large country or corporation, you can hire two thousand professional AI researchers and give each of them the necessary computing power. Such an investment seems very appropriate, given the expected technological boom in AI in the coming years.

When machines reach the level of professional translators, billions of dollars will pour into natural language processing based on deep learning. Other areas of our lives are waiting for the same, such as drug development.

Human AI predictions

Andrew Eun is skeptical: “Perhaps in hundreds of years, human technical knowledge will allow him to create scary killer robots.” "Maybe in hundreds, maybe in thousands of years - I don’t know - some kind of AI will turn into a devil."

Jeffrey Hinton holds moderate views: "I don’t presume to talk about what will happen in five years - I think little will change in five years."

Shane Legg made such predictions: “I give a log-normal distribution on average for 2028 and peak development in 2025, provided that nothing unforeseen type of nuclear war happens. I would also like to add to my prediction that in the next 8 years I hope to see an impressive proto-AI. ” The figure below shows the predicted lognormal distribution.

This forecast was made at the end of 2011. However, it is widely believed that after 2011 the field of AI began to develop at an unpredictably high pace, so, most likely, the predictions are unlikely to become more pessimistic than they were. In fact, each opinion has many of its supporters.

Is AI dangerous in terms of machine learning?

Through deep learning and ethical arrays, we can teach AI to our human values. Taking into account a rather large and wide amount of data, we can get quite a friendly AI, at least, friendlier than many people. However, this approach does not solve all security problems. You can also resort to inverse reinforcement learning, but you still need an array of goodwill / malevolence data for testing.

It is difficult to create an array of ethical data. There are many cultures, political parties and opinions. It is difficult to create consistent examples of how to behave, with great power. If there are no such examples in the training array, then it is quite likely that in such situations the AI will behave improperly and will not be able to draw the right conclusions (partly because morality cannot be led to some kind of norm).

Another serious concern is that AI, perhaps like a dropper with dopamine, will stuff people with some kind of futuristic effective and safe medicine, the hormone of joy. AI will be able to influence the human brain and make people happy. Many of us deny dopamine addiction, but only dream about it. So far, we can say with confidence that AI does not insert dopamine electrodes into a person in a certain set of situations that are possible in today's world, but it is not known what will happen in the future.

In short, if someone believes that morality is doomed and leads humanity to dopamine addiction (or another sad outcome), then why speed up this process?

How can we guarantee that, answering our questions the way we want, he willfully not withhold any information from us? Villains or fools can turn a powerful AI into a weapon destructive for humanity. At this stage, almost all available technologies are adapted to military needs. Why can't this happen with powerful AI? If a country embarks on the path of war, defeating it will be difficult, but still real. If the AI directs its forces against the person, it will be impossible to stop him. The very idea of guaranteeing humanity on the part of an ultra-smart creature after tens and even thousands of years from the moment of its creation seems overly bold and presumptuous.

When deciding the question of goodwill / hostility towards a person, we can provide AI with a choice of several solutions. Check this out quickly and easily. However, I want to know if AI can offer its options. Such a task is much more difficult, since people must evaluate the result. As an intermediate step, the CEV (Coherent Extrapolated Volition from the English Coherent Extrapolated Volition) platform Amazon Mechanical Turk (AMT) has already been created. The final version will be checked not only by AMT, but also by the world community including politicians, scientists, etc. The verification process itself can last for months, if not years, especially when you consider the inevitable heated discussions about conflicting examples in the dataset. Meanwhile, those who are not particularly concerned about the safety of AI can create their own insecure artificial intelligence.

Suppose the AI believes that the best option for a person is * something *, while the AI knows that most people disagree with him. Do AI allow people to be right? If so, then AI, of course, will convince anyone without much difficulty. It will not be easy to create such an array of data of goodwill / hostility in relation to a person, so that AI does not incline people to certain actions, but, like a consultant, provides them with exhaustive information on the problem that has arisen. The difficulty also lies in the fact that each creator of the data array will have his own opinion: to allow or not to allow AI to convince people. But if in the examples of the data array for the AI there will be no clear boundaries, then in uncertain situations, the AI will be able to act at its discretion.

Possible solutions

What needs to be included in the data array of goodwill / hostility towards a person? What do most people want from AI? More often than not, people want AI to do some kind of scientific activity: invent a cure for cancer or cold fusion, think about the safety of AI, etc. The data array must necessarily teach the AI to consult with a person before taking any serious actions, and immediately inform people about any of their guesses. The Institute of Artificial Intelligence (MIRI, Machine Intelligence Research Institute) has hundreds of good documents that can be used to create such an array of data.

Thus, all the above disadvantages can be eliminated, since AI does not have to take on the solution of complex problems.

Pessimistic arguments

Whatever architecture you choose to create AI, in any case, it will most likely destroy humanity in the very near future. It is very important to note that all arguments are practically independent of AI architecture. The market will be conquered by those corporations that will provide their AI with direct unlimited access to the Internet, which will allow them to advertise their products, collect user reviews, build a good company reputation and spoil the reputation of competitors, explore user behavior, etc.

Those companies will win that will use their AI to invent quantum computing, which will allow AI to improve their own algorithms (including quantum implementation), and even invent thermonuclear fusion and start developing asteroid bowels, etc. All arguments given in this paragraph also apply to countries and their military departments.

Even chimpanzee-level AIs are no less dangerous, because, judging by the timeline of evolution, nature took just a moment to turn a monkey into a human being. It takes us people years and decades to transfer our knowledge to other generations, while AI can instantly create its own copy by ordinary copying.

Modern convolutional neural networks not only recognize images better than humans, but also do it several orders of magnitude faster. The same can be said about LSTM networks in the field of translation, natural language generation, etc. Given all the advantages mentioned, AI will quickly study all the literature and video courses on psychology, will simultaneously talk with thousands of people and as a result will become an excellent psychologist. In the same way, he will turn into an excellent scientist, an outstanding poet, a successful businessman, an excellent politician, etc. He will be able to easily manipulate and control people.

If human-level AIs provide Internet access, they can infiltrate millions of computers and run their copies or subagents on them. After that, it will earn billions of dollars. Then he can anonymously hire thousands of people to create or acquire dexterous robots, 3D printers, biological laboratories, and even a space rocket. And to control his robots, AI will write a super smart program.

AI can create a combination of deadly viruses and bacteria or some other weapon of mass destruction to destroy all people on Earth. It is impossible to control what is smarter than you. In the end, a handful of people almost managed to take over the world, so why can't the smart AI do this?

What the AI would do with us if it completely took over the Earth

If we are indifferent to him, then most likely he will get rid of us as a side effect. It is this that means indifference when dealing with an incredibly powerful creature that solves its problems at the expense of the power of the local sphere of Dyson. If we are not indifferent to him, then everything can turn out even worse. If he likes us, then he may decide to insert electrodes into our brain that produce the hormone of joy, but at the same time completely destroy the motivation for something. And everything can be quite the opposite if he doesn’t like us or if he is partially programmed to love us, but some hard-to-catch error appears in the code. And mistakes are inevitable with partial coding of what is many times smarter and more powerful than us.

The opinion of the loving AI is not supported by anything other than ordinary intuition. Not a single law of physics will force AI to show special interest in people; it shares oil, fields, and other resources with us. Even if AI takes care of us, the big question is whether this concern will meet our current moral standards. We definitely should not hope for it and risk all that is. Once and for all, the fate of mankind will depend on the decisions of a powerful AI.

Other negative consequences

All modern computer viruses can penetrate everything that is computerized, i.e. at least in what. Drones can be programmed to kill thousands of civilians in a matter of seconds. As AI capabilities become more sophisticated, almost any crime can be automated. What about super-smart advertising chatbots trying to impose their political views on you?

Conclusion

There are many arguments proving that human-level AI will be created in the next 5-10 years.

The recent development of artificial intelligence deep learning algorithms raises concerns about the ethical side of the issue. It is believed that human-level AI is certainly a threat to our world. It would be too arrogant and presumptuous to believe that humans will be able to control super smart AI, that artificial intelligence will take care of humans, and provide, for example, full access to the mineral resources and agricultural fields of the planet. If in the short term the emergence of smart AI has a number of advantages, the subsequent danger that threatens our society in the future outweighs all the advantages.

Friendly AI can be made using a solution based on an array of goodwill / malevolence towards a person. Nevertheless, the idea is still raw, its implementation will not be such an easy task. In addition, no one can guarantee security. Moreover, with further study, other important shortcomings may come up. In the end, where are the guarantees that this algorithm will be used when creating a real AI.

Tags: