0decca November 21, 2013 at 20:18

Internet with immunity or why god does not play lego

Most programmers learn constantly. We read the books of the guru and look at the code of professionals. And we argue which method is better and which solution is prettier.
But imagine that there is a super professional whose code we managed to see. What can we learn from him? And what conclusions can we draw?

So - artificial immune systems.

The real immune system is an extremely complex and incomprehensible thing, consisting of cells with long-term memory, whose weight is comparable to the weight of the human brain.
Describing the entire immune system is expensive, and a detailed article would be more suitable for a portal for people with insomnia than for Habr. Therefore, I will run very superficially at key points, missing many of the less critical in silico reproduction (albeit very important in vivo).

The immune system itself (hereinafter IP) is usually divided into two parts - the innate and adaptive. However, nature is a poor system architect, and does not like levels of abstraction. But likes to use inherited solutions. Human IP is a conglomeration of ancient algorithms and newer developments and is not always easy to separate.
Therefore, we will try to list the objects and processes first.

1. Lymphocytes. The cells that form the basis of the immune system, an abstract base class for special versions of cells (B-cells and T-cells). Some types of lymphocytes carry on their surface a molecular detector (Major histocompatibility complex, MHC), which is responsible for binding to a protein molecule (the closest analogy is a key and a lock). This detector may be unique within the system. Many types of lymphocytes can change their type during maturation; they have states that switch under the influence of various activators.
1.1. T lymphocytes. Cells with many subspecies and specializations. Various types are inherited from this base class, which can change their specialization.
1.1.1 T-killers (Tk), killer cells. Established executioners in the service of total control, cleansing the territory of their own. They destroy cancer cells, damaged cells, or cells infected with a virus.
1.1.2 T-helpers (Th). Cells that activate other cells and carry information.
1.2. B-cells. Cells responsible for the adaptive response of the body. B-cells can change their genome and grow different detectors at different stages of development during their lifetime.
1.2.1 Plasma B cells. The main supplier of antibodies.
1.2.2 Long-term memory cells carry information about the mechanism of antibody production.
2. Antigen-presenting cells (APC). A group of cells that include macrophages and dendritic cells (DC). Their purpose is to “show” to other cells of the immune system protein fragments, digests formed from cellular proteins by splitting them into fragments. The fragment is attached to the MHC and exposed on the cell membrane. In fact, this is an imitation of a “lock” for key selection.
2.1. DC, dendritic cells. It is with them that the concept of “theory of danger” is primarily associated. Most of their lives they are inactive, but in case of problems they become more active and begin to work.
3. Antigen (Ag). That part of the external threat that is detected by the immune system. These are usually proteins or toxins.
4. The antibody. Proteins of a special Y-shaped form with the ability to bind to antigen, they are also immunoglobulins. They include molecular “hinges” to hold various digests. Toxins are blocked directly by binding to antigen. If the cell is affected, then the antibody binds to the complexes on the surface of the cell and marks it for subsequent destruction.

When we talk about the metaphor “key-lock,” we must understand that this is only a metaphor.
The key either approaches the lock or does not fit - this is a binary attribute. For a molecular detector, the value “suitable, not suitable” is real and is called affinity. This is a very important point and it is used to adjust affinity on the fly.

A certain part of the IP (primarily the innate subsystem) basically corresponds to the traditional detector-reaction scheme, which is also used in modern antiviruses. But not all. In real IP, there is something that does not exist in our defense systems, processes that make a zero-day attack very unlikely.

However, we will return to this, but for now we will move on to the description of the processes (very abbreviated).

1. Apoptosis is a controlled cell death, ordinary genocide in a single organism with the consent of the central government and the general public. Sometimes, for some reason, the cell of your own body needs to be destroyed. Planned decommissioning of used material. The garbage collector, he's a bioreactor. In this case, the integrity of the membrane is maintained to the end.
2. Necrosis is an unplanned destruction, such as an external attack. In this case, the membrane permeability decreases, an ejection occurs from the cell and the body is able to fix mass death. This death, caused by external causes, is a signal to start. After this signal, the forces of the body are mobilized, the temperature rises (with an increase in the speed of the necessary chemical reactions), the blood flow accelerates, the mechanisms of cell training turn on so that the attack is fully equipped.

The difference between the two processes is the essence of what is called the Danger Theory.
This is a relatively new theory, its birth dates back to 1994 and is associated with the name of the legendary Polly Metzinger.. The fundamental boundary between the concepts lies in the fact that the body (in contrast to the traditional antivirus) does not respond to external signals, but to a combination of the internal state and external reactions. Or, in other words, the system is essentially reflective. She may not know that someone is attacking her. But she knows when she is ill.

This is a very critical border.
In fact, we are moving from a simple “friend or foe” classifier to a complex anomaly detector, which determines the moment when the system begins to lose its integrity and something needs to be done. And this anomaly detector has excellent support for concept drift.

However, there is also a reaction to external patterns in the real immune system.
Bone marrow continuously creates B and T cells, many with a randomized part of the genome responsible for detection. Then the sensory part is checked for a reaction with the antigens of its body, and any cell that detects at least something from itself, is immediately destroyed. This provides a hard zero false positive criterion. And at the same time, a huge array of sensors is created with a random response, providing maximum coverage of possible attacks, but without covering the body itself. This stage is called negative selection.
In fact, this is the first step of what is called a novelty detector in the computer world.
The last point is what is called affine maturation. This is a rapid change in the genome, somatic hypermutation. When activated in case of danger, the cell begins to randomly modify the genome, thereby adjusting the detector, changing the generalization, expanding or narrowing the capture space, getting the opportunity to more accurately catch a specific attack option.

Now imagine a whole array of sensors scattered throughout the body of a loved one. All these millions of cells whose weight is comparable to the weight of the brain.
And each of which has a unique detector.
Personally, I am impressed by the scale and capabilities of such a system.
Not a single modern supercomputer is comparable with the processing power of the immune system of the average gopnik chewing seeds, not to mention the mobile cluster of such gopniks.
However, let's not forget that the PM of this project (let its backups not be lost) had unlimited funding and indefinite periods, i.e. what we can only dream of. It is possible that with such a budget we would do better.

According to the logic of the narrative, I would have to further describe the equivalent of the immune system in silico. However, this lesson is more ungrateful than listing all variants of genetic algorithms.
I will give only the names of the main methods created on the basis of the analysis of immune networks and seemed interesting to me personally. Those interested can find a description in the relevant magazines or simply google a lot of text in pdf format.

DCA - dendritic cell algorithm.
AIRS - artificial immune recognition system
FIN - formal immune networks
CLONALG - clonal selection algorithm
aiNet - artificial immune network for clustering and filtering
libtissue - library for experiments in the field of immune systems

For professionals I can also point out the website www.dangertheory.com .
Practitioners may be interested in the website www.artificial-immune-systems.org/algorithms.shtml
A number of algorithms are ported to Weka and you can play with them in your personal sandbox.

We now turn to a topic that is closer to the author - a discussion of practical implementation in code. I do not mean the implementation of algorithms pulled from pieces of the prepared living system. We are interested in a holistic understanding.

A review of the code of the Chief Programmer who accidentally fell into our hands may be more useful than reading books from C ++ practitioners. Nature's design patterns are very different from our politically correct restrictions and the rules of “good” code. Therefore, in this article I focus on general ideas, and not on specific algorithms - there are too many of them and, therefore, none of them is completely working.

The study of immune systems by interdisciplinary teams with the participation of biologists, mathematicians and programmers was a very useful step, which distributed a considerable number of grants among interested persons. Open-source libraries, DCA algorithms, the beautiful term Immunocomputing, and a new look at information security problems are also significant achievements.
But what do we want to get in the end result?
What can we learn from the Chief System Architect?
Directly reproducing the decisions of nature in iron is foolish - otherwise on our cars instead of wheels metal legs would stand. Neural networks have not much in common with biological neurons, genetic algorithms are not an absolute copy of real processes of reproduction of a double helix.

Below I will express the humble opinion of an engineer who is not very versed in immunology.

1. We are used to classify processes according to their primitives, subjective and strictly human. Therefore, all of the above algorithms implement only part of what we were able to understand in living nature. Should the future AIS include an anomaly detector, clustering, training with or without a teacher?
The real immune system does not have such components in a pronounced form. Unlike us, God does not play Lego. The real purpose of the immune system is not even the differentiation of "I-not Ya", as previously thought. Maintaining your own integrity is something that does not have a catchy name in the booklets of IT marketers. Therefore, when we say that AIS includes some kind of component - this is not entirely correct. AIS is not like a classifier, anomaly detector or data fusion, although it has the corresponding functionality. This is a separate term, when decomposed into components, something is lost.
2. The security system is not a superstructure over the object of protection. She is part of this facility. It does not protect against attacks - we all live in constant defense from the outside world. It maintains the viability and integrity of an object in an unfriendly environment. Therefore, the deep reflexivity of the protected system is laid in it from the beginning, it cannot be attached from the outside by a separate level.
3. As a result of deep reflexivity, internal sensors should be a random selection from the maximum possible list of component states and their combinations. Then the protection can track its own state quite fully. In the limit, the system should have access to all its own data, including its code at an arbitrary level of detail. In modern commercial systems, this seems unrealistic if we recall all the politically correct rules for “good” coding and the closed code of the components. Godel's machine is difficult to create, adhering to the style and generally accepted programming practices or expert recommendations.
4. The security system is individual in relation to the object of protection. The secretary's computer in the bank can be protected from playing solitaire. A sandbox in an antivirus lab should be open to Trojan infection. In each case, the concepts of “norm” and “danger” are different. The accounting server and render farm are too different things and it makes no sense to build their defense using a single list of filters, patterns and access lists.
For someone to read a political post for an unloved president is an attempt to invade. And, therefore, his Kindle should be protected from such texts. The concept of “danger” expands to “unwanted behavior”. Spam filters, porn filters, opinion filters - these are other markets, but AIS can be used in everyone, creating the Limov Ethicosphere in the information space.
5. A multi-level security system - and this is not what anti-virus companies marketers write about. This is a conglomerate of subsystems in which not only each component has its own immune system, but the conglomerate itself is protected by its own rules, independent of the protection of the components. Theoretically, you can create protection for the corporate network along with individual protection for each computer - threats to the computer and the parent network are different. And in theory, you can create protection for the entire Internet - the system scales very easily.
6. Protection must be adaptive. This does not mean the need to train her every time with a new installation. But it should evolve with the advent of new threats without waiting for updates of the fashionable antivirus.
7. A real immune system is a thing with a large number of objects, but a small number of types of relationships, such a low level of connectivity is inaccessible to modern architects. There is no single center of coordination, although dendritic cells and the medium itself do part of the interaction work. This is a clean distributed system that makes the most of distributedness and parallelism.
8. The most important and last point. There is no business model for the mass application of such systems, just as there was no business model of peer-to-peer networks before Napster and the CDS industry. This was confirmed by people from several anti-virus companies with whom I spoke. However, this was clear from the very beginning, the antivirus market is too large and capitalized for sudden movements and revolutionary changes.

And, in conclusion, a few thoughts on the possible implementation of the prototype.

Anomaly detectors are a key point. There are many of them, from one-class classifiers to probabilistic models. However, this is a very crude level, giving a lot of FP in real situations. For research prototypes, one could try incremental decision trees, such as asymmetric random forests from one-class very fast decision tree.
Ideally, this should be genetics, if you forget its greed for resources and weak generalization.

Sensors are a random combination of system states with good selectivity. It is possible to get them using genetics (which is somewhat costly), or a simple Monte Carlo. In any case, the design of the sensor should be carried out completely automatically, without human intervention - there should be too many of them.
One example of implementation is DASTON , a set of software sensors with minimal intelligence in the heap of an executable program.

Concept drift Computer can transfer from bukh virgins or gamers. And the “normal” state will immediately become “abnormal”. The system should be able to forget the unimportant for the new environment. But remember the common to both patterns of behavior. Those. there should be a characteristic period of reinitialization or retraining. The affine maturation technique itself allows this, but the limitations of this technique are not yet clear.

The system should be able to separate the general from the private, the exchange of rules and components of the detectors between computers is possible, but as a variant of co-evolution, without the possibility of compromise from the neighboring machine. That even in the case of co-evolution is quite difficult to guarantee.

The real immune system is often seen as an abnormality detector. As I mentioned earlier, this is fundamentally wrong. The body knows that it is sick. How can a computer or network receive a signal that something is wrong?
This is a very difficult question. There are no formal signs of a "sick" system. The Trojan differs from the browser only in authorship - both allow remote control and download updates without user consent, both collect passwords and can save them remotely. What is a healthy computer and what is sick? A living organism is able to distinguish an infection from an accidental change in its condition by observing the level of necrosis. What can a computer observe? User Response? The counter of beaten files? Changing the Kolmogorov system complexity?
As soon as we relax the requirements and move from “necrosis” to abstraction, we lose a certain part of the logic of AIS.

To understand that an anomaly has occurred is simple. How to automatically, without human intervention, decide on the topic - what to do next?
Blocking the network address that is the source of the anomalies is easy. Most commercial applications of AIS algorithms deal with network traffic analysis. Although it is not a fact that this is not a corporate accounting server to which the trojan got. And even if this is so - is it necessary to stop the payroll of a thousand evil workers who have their own views on the process, accounting and programmers because of this?
Blocking the execution flow on the computer is also easy. Well, what if it's a spoofed file system driver? Then such an action is equivalent to turning off the computer and this, of course, is the highest measure of information protection, only is it useful?
By the condition of the problem, the system must stabilize its functioning. And minimize damage from attacks. Shutting down the computer or stopping the server to clean or restore it from backup is already a damage, DoS in its purest form.
And this is that part of the immune system that no one has yet been able to reproduce.
Perhaps the most important.
Alas, I did not come across a universal solution, so far all that can be done is an online rollback to the recovery point.

And yes, now with all this junk on board you still have to take off. The system should not spend 146% of its resources on its own protection. I would like to keep the additional load in the framework of the logarithm of the total system. Already at this stage, the use of genetics or SVM is beginning to be problematic.

And finally, three spoons of tar in a barrel of jam.
Using the internal state as a source for learning is a powerful thing, but extremely dangerous.
With a sharp change in the amount of data, feedback loops can occur (and should), which can be both negative (i.e., very useful to us) and positive (with a very unpleasant effect).
The latter option leads to overtraining and self-confirmation - the system spends resources on teaching itself to its beloved one’s own changes.
Such is the forced self-knowledge leading to autism, few people will like it when the accounting server goes into computer Zen.
The second minus of the immune systems is their imperfection, noticeable to any allergic person. Autoimmune sores multiply every year. And the AIDS virus, which clings to helper T-cells, blocking the entire immune system at once, is also possible in the computer. Moreover, the aging processes in the body are largely associated with the deregulation of the immune system - for example, arthritis.
The third minus is subjective. I have a deep belief that silver bullets either hit the target badly or are expensive. You have to pay your price for everything, and in the case of AIS this price has not yet been determined.

Tags:

Internet with immunity or why god does not play lego

Also popular now: