How regular expressions appeared

Short introduction

I have always been interested in the history of the emergence of scientific concepts. Before studying a new subject, a series of faceless definitions first arises. Some of them remain such, others attract attention and eventually grow into full-fledged objects of the "picture of the world." As an inaccessible ideal of such an aspiration, we can cite Littlewood's statement about Ramanujan :
every natural number was his best friend

It was always interesting to me not only to master the concept, but also to understand how it appeared. Behind every definition is always a personality. It is interesting to understand what ideas were the basis of this or that concept and why the new definitions were accepted and supported by other people with such enthusiasm that they remained in the textbooks.

Next will be a small study of this kind, the object of which is the concept of regular expression .

McCallock-Pitts neural networks

Surprisingly, neural networks served as the main motivation for introducing the concept of regular expression. A little bit about the history of awareness of this concept.

The concept of a neural network appeared as a result of interdisciplinary interaction. In 1942, Walter Pitts, an American logician who worked mainly in the field of cognitive psychology, met the famous physiologist Warren McCallock. McCallock came to Chicago to work at a local university. Pitts had no housing and McCallock offered to live with him. A consequence of the nascent relationship was the work "The logical calculus of ideas related to nervous activity."

The work proposed a model of human nervous activity. This model was based on an abstraction of the concept of a neuron. The mathematical object resulting from such an abstraction is called the McCallock-Pitts neuron. The McCallock-Pitts neuron is a formal model of a nerve cell composed of a body and several processes called nerve fibers. In the McCallock-Pitts model, a cell is a cell equipped with a special numerical attribute, the so-called threshold. From each neuron comes an edge that simulates a nerve fiber and is called an axon. An axon can branch, but it is necessarily the input to one or more other neurons. Thus, in addition to the outgoing fiber (axon), the neuron has several incoming fibers (synapses).

Some of the inputs may be exciting, others may be inhibitory. Exciting fibers are indicated by a regular arrow, and inhibitory fibers by a dashed arrow. Each neuron generates a signal on the axon only if the number of exciting input signals exceeds the threshold of a given neuron. Axons are indicated by solid outgoing arrows. The body of a neuron is indicated by a triangle. Thus, the McCallock-Pitts neuron can be represented as follows:
Neurons are combined into networks called McCallock-Pitts neural networks. The entire network functions as a state machine. To do this, the concept of time is introduced in the network. At each moment of time, the output signal from a neuron goes to the input of the neuron to which this neuron is connected as an input. Thus, fixing the time instants, we obtain the state of the nervous network, and the transition from one state to another occurs with increasing current time.

The axons of some neurons in the network can be synapses for the same neurons, i.e. neurons can form “loops”. The presence of loops is very important for the McCallock-Pitts neural network. Loops allow you to memorize the signals that were input. If a signal was supplied to the input of a neuron, then you can excite the axon, which is the input for the same neuron, and thus, from now on, this neuron will be in an active state, self-excited with each new moment in time.

Consider, for example, the McCallock-Pitts neural network, which is the simplest storage device. Actually, the storage device will be one neuron with a loop, the remaining network elements are "serving" elements.

The excitation of a certain signal at the output of the network will be called an event. The question arises: is it possible to determine its background from a given event, i.e. the sequence of signals passing through the network at previous points in time so as to reach the entrances to this network? If we take into account that the McCallock-Pitts networks were invented to simulate brain activity, then this question can be reformulated as follows: what external factors led to the state of mind in which it is in a given situation? It turns out that the presence of loops does not make it possible to determine exactly when a given event occurred, and the presence of alternatives (inputs from two different neurons into a given neuron) does not make it possible to determine exactly where this event occurred. McCallock and Pitts thought

Regular events

American mathematician Stephen Kleene studied events at McCallock-Pitts networks. In "Representing Events in Nerve Networks and State Machines," Kleene proposed an elegant way of describing such events using the language of regular expressions . We can say that any McCallock-Pitts network is representable in the form of a regular expression describing the inputs of this network and the background of the input signals, which led to the fact that one of the output signals was excited.

Recall that an event in the McCallock-Pitts network is the appearance of a signal at its output, i.e. on an axon, which is not a synapse for any neuron of a given network. Kleene studied the question: what excitations of any inputs (i.e., synapses of some neurons of a network that are not axons of any neurons of a given network) into the network led to a signal appearing at this output? As an example, consider the network shown in the figure below:

Suppose that a signal is applied to the input of the synapse indicated by the symbol a. The only neuron of this network has threshold 2, therefore, the axon of this neuron will be excited only if the synapse of start S is active. In this case, the axon R is active, but its branch, again entering the neuron, is also active. What can we say at time t about what is the background of the appearance of the signal R at the output? It turns out only that at time t-1 the axon a was definitely active. It can also be assumed that at time t-2 and earlier, axon a could also be excited. We cannot say for sure whether it was in reality, and if it was, at what point in time the axon a became active. Therefore, we express our hypothesis by a *. Thus, the background of the occurrence of event R can be described by the expression a * a. Obviously everywhere

There are also inaccuracies of a different kind. Consider the following network:

Here, the output synapse R is excited if the axons c and the synapse of the previous neuron are active. What can we say about when this synapse will be excited? It turns out that only one of the axons a or b is excited. We denote this fact by the expression a | b. Thus, the background of the occurrence of event R can be described as (a | b) c.

In the examples we examined, we got a convenient way of describing events in McCallock-Pitts networks. Three operators are used for description: *, | and the concatenation (join) operator. It turns out that with the help of these three operators it is possible to describe events in any McCallock-Pitts network. Events that would not be described in this way do not exist on McCallock-Pitts networks. This is the subject of proof of the so-called Kleene's theorem. Events that can be described by the above operators, Kleene called regular (in the sense that there are no other events). Regular events can be described by expressions, using the operators *, | and concatenation. Such expressions are called regular expressions. Given that each McCallock-Pitts network is a state machine,

Forgotten returns

The work of Kleene came out in the mid-50s of the twentieth century. As often happens with scientific concepts that do not find application, the work has been forgotten. This would continue to this day, but at the end of the 60s, the American programmer Ken Thompson discovered that regular expressions are convenient for using string search patterns in long texts. The regular expression is converted to a state machine, which searches for strings matching patterns. To build a finite state machine, Thompson came up with a special algorithm, which is now called the "Thompson Construction". Thompson incorporated the regular expression search engine into his ed text editor, and since then regular expression has become the de facto standard for defining search patterns.

The history of the concept of regular expression provides a good opportunity to see how the objects of human thought evolve, traveling from one area of ​​human activity to another. These areas can be very far from each other, and the exchange of concepts between these areas often looks unexpected. Interestingly, Kleene’s in-depth work in which regular expressions were introduced did not make this concept known. But, in fact, a simple utilitarian application of the concept raised regular expressions to the peak of fame.

Also popular now: