Cognitive bias of universal intelligence
Introduction
In previous articles ( http://habrahabr.ru/post/150056/ and http://habrahabr.ru/post/150902/ ), we examined the simplest models of ideal minimal intelligence (IMI), in particular, the AIξ model. With not too significant reservations, one can almost agree that “AIXI model is the most intelligent unbiased agent possible” [Hutter, 2007] and that IMI in its behavior will be no more limited than a person, but with sufficient computing resources and information . The last disclaimer clarifies the main reasons why these models did not lead to the creation of real AI and why they can only be considered as the first small step towards it. It is important to determine where to go next.
As we noted in previous articles, the interest is real, “pragmatic” (that is, optimized for our world), and not “unbiased” (Pareto optimal for the entire class of computable environments) intelligence described by IMI models. Pragmatic AI cannot be built solely on the basis of a purely analytical consideration of the problem of choosing optimal actions in an arbitrary environment. Even the most effective self-optimization will be insufficient, since it will require not only a huge amount of computation comparable with evolution, but also an appropriate amount of physical interactions of the agent with the environment. The pragmatism of universal AI should be ensured by the introduction of a “cognitive bias”, the elements of which are revealed in some form in classical studies of AI and human thinking, all results of which are of empirical utility. The information accumulated here cannot be rejected, but their special interpretation is necessary in the framework of the theory of universal intelligence. We show that the introduction of cognitive functions does not expand the fundamental capabilities of IMR, but should increase its effectiveness / pragmatism.
Perception.
IMR models do not include such a distinguished cognitive function as perception, if perception is understood not only to receive sensory data as input, but precisely those specific cognitive processes that are characteristic of natural systems. At the same time, natural systems of perception have a pronounced structure that sets a large inductive bias, consistent with the regularities encountered in the real world. This bias is implemented in the form of representations of information and makes possible a very effective interpretation of sensory data without an exhaustive search.
Using the perception example, it should be absolutely clear that IMR models that require a direct search for algorithmic models, for example, for images whose lengths exceed millions of bits (i.e., the number of models under consideration >> 10 100000) are absolutely unrealistic. At the same time, it should be emphasized that the human system of sensory perception retains universality. She can detect a stimulus given by almost arbitrary regularity. At the same time, it is very easy for any computer vision system to find a class of stimuli that it does not identify. This is clearly seen in the attempts to model the formation of a conditioned reflex with non-trivial stimuli (see, for example, [Potapov and Rozhkov, 2012] and references therein). It is in this context that an opinion is expressed that, despite significant progress in the field of robotics, artificial intelligence, machine perception and learning, there is a lack of truly cognitive systems that have sufficient commonality to work in an unstructured environment [Pavel et al., 2007],
In IMI, the process of constructing models includes implicit perception: in it, as a specific cognitive function, it is not separated from more complex symbolic models of the world. Such a division itself (but the division is “soft”!) Can be considered as a heuristic, but it alone is naturally not enough, and the general question arises how to make the process of constructing models in IMR more efficient.
Models.
IMR does not imply the choice and fixation of environmental models. For optimal prediction, all possible models with different weights, which are taken into account in the prediction, are sorted for all available data. Naturally, for us it looks absolutely wasteful. Sometimes it may even seem that a person is inclined to search for one single true model of the world. In particular, all science is an attempt to build a single unified model of the world that corresponds to unambiguous laws. Of course, in the process of scientific research, various competing theories are sorted out, but in the end, a choice is made between them. Moreover, the multiplicity of theories is largely supported not within a single intellect, but due to the multi-agent nature of society.
In the case of perception, there is also a pronounced tendency to choose a single model or interpretation. This is especially evident in dual illusions, when human vision selects one interpretation from two equivalents. At the same time, a person can consciously make his vision switch to a different interpretation, but cannot see both options at the same time. This occurs both at sufficiently low levels, say, of a structural description, and at a semantic level. It is well known that there are many dualistic illusions, which, once again, make no sense.
However, it should be emphasized here that the choice of a single consistent model for predicting when choosing actions is not only unnecessary, but even harmful. Therefore, a real AI, generally speaking, is not obliged to try to build one (and a single) model of the environment, and even in an explicit form. Attempts to create AI with support for global true models (a large base of consistent axioms) ran into significant difficulties. Here, difficulties arise with systems for maintaining truth when obtaining new information, it becomes problematic to implement inductive behavior, etc.
A person can naturally be guided by conflicting not only data, but also models, in some cases using some models, in others - others. One can even say that there are no conflicting data “in nature” (at least for them they are not); they appear as a heuristic characteristic when simplifying universal prediction models. Also, a person can completely not see something or not hear and put forward various assumptions about what was there. That is, the theoretically ideal consideration of all possible models is replaced by a “smart” analysis of the results of enumeration of models.
The selection and incremental refinement of models in real AI will naturally be necessary. Each time, to perform induction on all available data and sort through all possible models is extremely wasteful in conditions of limited resources. But, at the same time, the introduction of resource constraints should not be so rigid that there remains the only "true" model.
In particular, thoughtless simplification in the search of models and prediction leads to the loss of important forms of behavior (for example, inductive behavior aimed at finding information). Also in science, the choice between theories is not just based on current data, but different theories are examined simultaneously in order to determine which experiments will provide new information to reduce the existing uncertainty. It is due to such inductive behavior (possible when considering many models) that information is accumulated that increases the difference in the quality of models so much that the choice becomes almost unambiguous. Models that are close in content (and quality) are not treated as independent different models, but as one “fuzzy” model. The introduction of such indefinite / fuzzy models can make it possible to obtain the effects of inductive behavior even when choosing a single model: indeed, some actions will lead to a greater reduction in uncertainty, which will make it possible to receive more or less certain reinforcements when performing the following actions. Naturally, the question remains, which requires theoretical consideration, how to replace the sets of models with “fuzzy” models in the most efficient way.
So, models appear as a “caching” of the results of induction obtained at previous times, and as a limitation of enumerating the entire infinite set of models in universal induction, but these models must be introduced “softly” so that universality is not lost.
Representation.
Limiting the number of models under consideration is necessary, but far from enough. Even the use of a better model (that is, using Kolmogorov complexity instead of algorithmic probability) is unrealistically expensive: seek time length L model will be proportional to 2 L . The complexity of the model depends on the choice of the reference machine (programming method). But even with a successful choice of a support machine, the length of some models (describing real laws) will turn out to be too large for these models to be constructed by direct search. Additional metaheuristics are needed here.
As part of the problem of visual perception, we have already made an attempt to bring universal induction closer to real methods of image analysis [Potapov, 2012]. We are talking about the principle of representative minimum description length, which stems from the need to decompose the task of constructing a model of the full history of the interaction of the agent with the environment into sub-tasks that are solved almost independently. If some long line of sensory data is divided into substrings, then the assessment of the complexity of substrings will be much larger than the algorithmic complexity of the story as a whole. In this regard, such direct decomposition is unacceptable. However, if mutual information is extracted from these substrings, which is used as a priori when describing each substring individually, the total conditional algorithmic complexity of the substrings will be much closer to the complexity of the story. This mutual information can be interpreted (or expressed in form) as a representation (description method). The introduction of representations is similar to the choice of a reference machine, but differs in two aspects: the concept of representations includes additional decomposition metaheuristics, and different representations can be used for different data fragments, which may correspond to not necessarily algorithmically complete model spaces (whereas the reference machine defines a single distribution of a priori probabilities in the model space).
Indeed, if we take, for example, vision, then the description of images (both in natural systems and in applied automatic methods) is always carried out within the framework of some a priori representation, the purpose of which is not only to shift the probability distribution in the space of environmental models but make their decomposition valid. In particular, due to the use of a priori representations of images, computer vision methods turn out to be applicable to each image separately, instead of requiring a large number of various images to be input, for the totality of which complex patterns occurring in each image can be deduced.
The concept of presentation is very productive. The idea of representations applies to representations of sensory data, and representations of knowledge, and mental representations (that is, representations can be called a common cognitive feature of natural intelligence). Even the general idea of hierarchical descriptions, which has independent value, should be considered as an idea of an important, but particular form of representation. Hierarchical decomposition is, of course, potentially more efficient. Representations of this type are very common in machine perception. However, intensive hierarchical decomposition leads to a decrease in the quality of models built within the framework of the corresponding representations. Compensation for this negative effect can be achieved by introducing adaptive resonance. Yet again, in some approaches to strong AI, the value of adaptive resonance is absolutized (it is believed that it is the key to SRI). Although the value of the adaptive resonance mechanism is certainly great, it must be understood that it is only one of the metaheuristics that can be formalized in the framework of the theory of universal induction.
It is worth noting the lack of innate ideas, even for the case of sensory perception. There is a lot of evidence in favor of the fact that both in humans and in many other animals, representations (even very low levels of perception) adapt to a specific environment. And for AI, there is a need for automatic construction of representations, which can also be studied in the theory of universal induction [Potapov et al., 2010] and which, probably, should be an element of self-optimization of effective universal AI, since training in representations is a more concrete and “pragmatic” way incremental refinement of the reference machine, which sets the a priori probability distribution in the model space.
Planning.
Above we talked about incremental model building as a way to reduce enumeration while gradually lengthening the story. However, the task of choosing optimal actions also has high computational complexity, and for this task it is quite natural to introduce incremental decision schemes. Such schemes lead to the concept of planning, which is also one of the cognitive characteristics of a person.
It can be noted that not only IMR, but also methods of weak AI based on “brute force” do not use planning. In particular, this refers to successful chess programs, which makes them very different from human players, who almost always rely on plans, following the principle of “a bad plan is better than none” (Bushinsky, 2009). Planning, including the reuse of search results performed in previous time steps, allows you to save resources. Indeed, plans are built in advance (and when circumstances allow, that is, if there are free computing resources), and they are only refined during execution, which implies the absence of a need to rebuild the entire search tree from scratch at any given time. Such a strategy, of course, can be included in the IMR, however, its good implementation may not be trivial. In this sense, a chess program is inefficiently intellectual, unlike a person. However, the fact that for a narrow class of environments such as a game of chess, it turns out to be easier to do with ineffective AI does not mean that there is the same opportunity for universal intelligence.
Planning is closely related to other ways to optimize your search. So, people make plans and perform searches in terms of some generalized actions. The farther plans are, the more abstract they are described in terms of actions. Using generic actions is obviously heuristic. These actions are also described in the framework of some ideas, but they are not directly derived in the theory of universal induction. In practice, in the methods of weak AI, such representations are specified a priori, and specific planning algorithms are developed for them. This is clearly not enough for universal AI.
In addition to planning itself as an incremental search and representations for the search space, there are many heuristic techniques for reducing it. At the same time, on the one hand, search and optimization methods, such as heuristic programming, simulated annealing, genetic algorithms, etc. highly developed in classic AI. On the other hand, there is currently no general solution to the search problem. It is very likely that there cannot be a single a priori effective search method, and the need for some kind of self-optimization strategies is inevitable, since various heuristics and specific search methods are better suited for different tasks.
Currently, there is no theory of effective pragmatic general self-optimization capable of inventing arbitrary search heuristics. However, even if a method of such self-optimization existed, it would require some general metaheuristics for its acceleration (otherwise it would not be pragmatic).
In general, it is clear that planning, as well as other methods of reducing enumeration, is “only” an element of optimization of computing resources. It can be introduced not as a heuristic - while maintaining exact correspondence with IMR, but in this form it will not be very effective. More heuristic ways to implement planning will not work for all possible environments, but they can be very effective for a specific, but very wide class of environments. So such concepts (which are essentially heuristic in the sense that they are meaningless for certain classes of environments) arise, such as the suspension and resumption of the implementation of the plan. At the same time, the question remains, which particular planning mechanisms (and various search metaheuristics) should be made innate,
Knowledge.
Knowledge plays a special role in human intelligence. At the same time, in IMR knowledge is not explicitly used. Instead, they build holistic models of the history of interaction with the environment without explicitly extracting knowledge from them. In principle, knowledge is often viewed simply as the upper level of hierarchical models of perception and control (for example, as the upper level of the visual system). In this context, little can be added to what has already been discussed in the sections on perception, planning, and perceptions. However, knowledge systems have their own characteristics. In particular, only representations of knowledge (and not lower-level representations) are modally non-specific and describe the "meaning", and knowledge is used not only to describe internal models of the environment,
In general, representations of knowledge can probably be detected in the process of self-optimization of IMR, but this process requires an extremely long interaction with the environment. Useful representations of the environment, abstracted from specific modalities, can further accelerate the expansion of IMR to a pragmatic effective SRI. But, again, these ideas should not limit the versatility of IMR, as it has in almost all existing cognitive architectures and more private knowledge-based systems.
Memory.
We can say that they have a memory, but the most primitive. IMR simply stores all raw data without performing any other function. At the same time, memory is one of the central elements of most cognitive architectures. Also, a person’s memory is much more complex, and its functions are far from limited to storage. As you know, the main function of human memory, which is the main problem for reproduction on a computer, is the extraction of information on the content. Say, we can recall some event, place, object, person according to their verbal description, image fragment, pencil sketch, etc.
They do not seem to have anything similar. Does this mean that the universal agent will not be able to exhibit the behavior that is available to us thanks to our memory? Not at all. We need memory, first of all, for prediction. We remember the past in order to predict the future (or at least make better choices in the future). It is difficult to come up with any other biological meaning of memory. In fact, natural memory is so closely integrated with the functions of induction and prediction that it is practically inseparable in its purest form. A special memory device is due to the fact that this is done computationally most efficiently (taking into account the features of our world). We substantiate the second thesis separately from the first. If our memory would just store raw data (for example, as one long film), then in order to find scenes in this “film”, satisfying some search criteria, I would have to re-watch the entire movie, processing each scene. What is the point of doing this if the “movie” has already been watched and interpreted once? Naturally, it is more economical to remember his descriptions already constructed and to conduct a search immediately among them.
With unlimited resources, such profitability is not needed, and at each point in time, the IMI simply re-processes the entire history of interaction. It has already been noted [Goertzel, 2010] that the lack of memory as a cognitive structure in IMR is associated with the assumption of unlimited resources. But as soon as we want to increase the realism of our universal agent, taking into account limited resources, we will be forced to complicate the memory structure and integrate it with the procedures for constructing models, predicting and choosing actions.
In addition to the special functions of natural memory, it has a certain organization (episodic / semantic; short-term / long-term, etc.). This organization follows in part from other aspects considered. For example, in the IMI model reproduces the entire history of interaction. It simultaneously describes episodic and semantic content. As soon as representations are introduced that do not reproduce specific data, but define the “terms” in which the description of this data is made, the corresponding separation of memory types appears. It is necessary to consider the dynamics of the deployment of representations in time in order to understand many features of the memory device.
There are other features of memory organization that can provide additional elements of cognitive bias or heuristics of the search for models and actions. For example, the obvious heuristic is the presence of modally-specific memory. From here follows the banal (but important in the context of IMR) conclusion that, to simplify induction (the process of constructing models), data of different modalities are interpreted relatively independently. Such a separation seems too natural and taken for granted, but again we emphasize that it is essentially heuristic and far from complete.
Here is another very revealing feature of the human memory device. These are chunks, which are even taken as the basis of some cognitive architectures [Gobet and Lane, 2010]. They are probably associated with the ultimate decomposition of models in memory (that is, the division of the entire memorized set of objects into minimal groups united by individual models). Chunks may be only an epiphenomenon of the process of decomposition of the induction problem, but they clearly show how much real intelligence tries to minimize costs when solving it.
Thus, the characteristics of human memory are an important source of elements of “cognitive bias,” but a proper understanding of these features also requires detailed analysis within the framework of universal intelligence.
Character and sub character levels.
In the methodology of classical AI, there is a fairly strict division into sub-symbolic (e.g., neural network) and symbolic (e.g., logical) methods. It also manifests itself in the division of cognitive architectures into emergent and symbolic. Now there is a tendency to combine both approaches, in particular, in the form of the development of hybrid architectures. But the fact of the presence of such a division is noteworthy. Indeed, in IMI it is not. Does a person have it, that is, is an explicit division into symbolic and sub-symbolic levels a feature of natural cognitive architecture?
It is quite obvious that the separation of two such different levels is due to the fact that the upper level is accessible through consciousness, and the lower one through neurophysiological studies (the results of which can be most directly associated with lower levels of sensorimotor representations). Intermediate levels are simply inaccessible to direct observation, therefore, much less studied. In this regard, the AI sometimes distinguishes the "middle layer problem" (or the problem of the "semantic abyss") as one of the most difficult. However, the presence of intermediate levels of organization, although somewhat mitigating the severity of the symbolic / sub-symbolic dichotomy, does not cancel the fact that there are separable levels of organization in the natural intelligence.
Such a clear division is unlikely to occur on its own, if it is not laid architecturally. In particular, it could hardly be distinguished in the models formed by the IMI (in these models, even if some concepts of different levels are present, they will be hopelessly mixed). In the natural intellect, not only the models of the environment under construction that have a sufficiently pronounced multi-level structure, but also the methods of work at different levels that differ markedly (at least the fact that consciousness is attached mainly to models of higher levels is noteworthy). So, at the sub-symbolic level, typical patterns are mainly taken into account in a large array of sensory data, while the symbolic level works with arbitrary patterns, but in highly reduced data. However, this is only a general description of the levels. With their introduction, universality must still be preserved in the form of direct and feedback relationships between levels, as well as in the form of the possibility of constructing any computational predicates (basic perceptual concepts) at a sub-symbolic (and intermediate symbolic) level. An example is the laws of gestalt (the laws of perceptual grouping), which are typical for all people, but nevertheless can differ among cultural and primitive people (which can manifest itself, for example, in the (un) susceptibility to certain optical illusions). In other words, the laws of perceptual grouping correspond to typical patterns in sensory data, but these patterns may or may not be put into appropriate representations depending on the features of ontogenesis. and also in the form of the possibility of constructing any computational predicates (basic perceptual concepts) at the sub-symbolic (and intermediate symbolic) level. An example is the laws of gestalt (the laws of perceptual grouping), which are typical for all people, but nevertheless can differ among cultural and primitive people (which can manifest itself, for example, in the (un) susceptibility to certain optical illusions). In other words, the laws of perceptual grouping correspond to typical patterns in sensory data, but these patterns may or may not be put into appropriate representations depending on the features of ontogenesis. and also in the form of the possibility of constructing any computational predicates (basic perceptual concepts) at the sub-symbolic (and intermediate symbolic) level. An example is the laws of gestalt (the laws of perceptual grouping), which are typical for all people, but nevertheless can differ among cultural and primitive people (which can manifest itself, for example, in the (un) susceptibility to certain optical illusions). In other words, the laws of perceptual grouping correspond to typical patterns in sensory data, but these patterns may or may not be put into appropriate representations depending on the features of ontogenesis. An example is the laws of gestalt (the laws of perceptual grouping), which are typical for all people, but nevertheless can differ among cultural and primitive people (which can manifest itself, for example, in the (un) susceptibility to certain optical illusions). In other words, the laws of perceptual grouping correspond to typical patterns in sensory data, but these patterns may or may not be put into appropriate representations depending on the features of ontogenesis. An example is the laws of gestalt (the laws of perceptual grouping), which are typical for all people, but nevertheless can differ among cultural and primitive people (which can manifest itself, for example, in the (un) susceptibility to certain optical illusions). In other words, the laws of perceptual grouping correspond to typical patterns in sensory data, but these patterns may or may not be put into appropriate representations depending on the features of ontogenesis.
All this can be interpreted as a general a priori structure of representations and heuristics of model building within their framework, which (in addition to the concept of representations) provide significant savings in computing resources (but without a fatal violation of universality).
Associations.
There are many cognitive characteristics of human thinking that are somehow associated with association. This is a many-sided phenomenon, since association can be performed both for representations and for models, and at all levels of abstraction. But in all cases there is something in common.
Obviously, the decomposition of real problems of both induction and choice of actions always turns out to be incomplete. You can try to interpret processes related to association as processes that establish possible relationships between data elements, models, and representations that, as a result of the decomposition of a single task facing a universal intellect, were considered independently independently.
The most obvious such interpretation is for the case of decomposition of the induction problem. An association is established between two models of fragments of sensory data if there is mutual information between them that can be expressed in statistical terms (frequent mutual occurrence) or in structural terms (the presence of a simple algorithm that translates one model into another). The latter is also the basis of analogies and metaphors.
An example of the most complex association is transfer training, in which representations from one subject area are transferred to other subject areas. The fact that this is possible and useful in principle is evidence of the special properties of our world, on the use of which transfer training is based. Although the very existence of residual connections, mutual information between different subject areas is not some special property of our environment (it would rather be surprising if there were no such relationships at all), but the ability to find and use these connections in conditions of limited resources is indicative. Firstly, it clearly indicates the universality of human intelligence, its lack of strict restrictions on the structure of ideas under construction,
It is difficult to say whether the mechanisms of transfer learning, the establishment of associations, analogies, metaphors are significantly different, or are these different applications of the same mechanism. But all these mechanisms (like the adaptive resonance mentioned above) can be considered both ways to reduce resource requirements and ways to eliminate the negative effect of this decrease, depending on whether we are moving from pragmatic effective intelligence towards universality or universal intelligence in side of efficiency / pragmatism. Now, transfer education is considered separately from the problems of universal AI, and it is not surprising that modern transfer education models are overly specialized: in them, the mapping between two representations (between which the knowledge transfer is carried out) is always manually set and works only for them. The apparent universality of transfer learning in humans suggests that it should be very closely adjacent to the core of universal AI.
Once again, the IMI does not separately provide transfer training, and it is not needed there (but only thanks to unrealistic unlimited resources): any mutual information between any pieces of data is taken into account there, and a transfer at the level of search heuristics is not needed due to the lack thereof. Since the transfer is carried out at the level of representations, in theory it should appear with them and allow a more smooth transition from a universal AI to the use of representations in a real AI.
Transfer training [Senator, 2011] is an example of the most developed association. No less remarkable (due to its extreme prevalence) is the lowest level association. At the behavioral level, these are conditioned reflexes (and at an even lower neural network level, this is Hebb's rule). Of course, association itself is often understood as something more complex than just conditioned reflexes (for example, according to V.F. Turchin, association is a control system for complex reflexes, that is, a metasystem in relation to the most developed reflexes). However, their basis is the same.
Association is often considered as an independent (sometimes fundamental) principle of natural thinking, contrasting induction (supposedly, explicit models are necessarily constructed in induction, while association is generally modelless and is not associated with any directional optimization, but is associated with unique principles of self-organization ) Of course, behind association there is a very effective metaheuristics reflecting the regularly occurring feature of our world (which, roughly speaking, boils down to the fact that the closer the events in time and space, the more likely they are related; but, of course, the developed association is not limited to this) . Naturally, heuristics (including association) are not inferred from the theory of an unbiased universal intelligence, and in this sense can be considered additional principles.
However, association cannot be considered either the sole or the main basis of thinking. This is clearly seen on the example of the Hebb rule, and the example of reflexes. So, the Hebb rule alone is not enough to solve complex learning problems associated with the construction of invariants. In the case of reflexes, the main difficulty is not the strengthening of the connection between the two known stimuli, but the allocation of classes of connected stimuli, which can be described by arbitrary patterns (the stimulus can be the inclusion of just a light bulb, a light bulb of a certain brightness or color, double inclusion of a light bulb is first brighter and then dimmer and etc.). It is noteworthy that different animals have different abilities to identify patterns in stimuli. So, chickens are not able to learn how to choose a lighter feeder (from several feeders, in one of which lies food invisible until the moment of choice). And even to monkeys it is difficult to disengage from the local context (for example, to use objects to get the fetus that are not currently in sight). Human intelligence is universal, and this universality is not explained by association, but is combined with it.
Reasoning.
Reasoning is something that is often considered proper thinking. Is there any reasoning in AIξ? In a sense, there is. Some of our reasoning boils down to where this or that action will lead us (namely, to determine this, all AIξ resources are spent). Say, thinking over the upcoming conversation, we can assume that we will be told and that it will be possible to answer, as well as what emotions we will experience. However, our reasoning is far from always directly related to the prediction of what kind of sensory input and what kind of reinforcement we will get when performing certain actions. Often we think about things that are not directly related to us. And often in our reasoning (which we are introspectively available to) there is no hint of induction. Indeed, deduction is more often associated with thinking. It is no accident that in many expert systems reasoning is modeled using inference mechanisms. They do not have any inference mechanisms. Although in some models such as AIξtl or Gödel's machines, logic is introduced to justify statements regarding algorithms, but this has almost no relation to ordinary reasoning. Does this not indicate that there is something fundamentally lacking in the IMR?
In fact, it’s quite obvious that it’s not completely testifying. In the methods of deductive inference, an enumeration of the variants of admissible chains of inference rules is performed until a provable statement or its refutation is obtained. Such enumeration is similar to enumeration carried out in IMI for one fixed model of the medium. A clear difference is that in IMR it is performed at every moment of time for complete models of environments that are also sorted anew based on the information just received. But in effective pragmatic systems this is simply impossible, and you have to consider models of only fragments of the environment, and even consider these models as fixed. The analyzed fragment of the environment may not be directly connected with us, and we can consider those actions that we do not ourselves (for example, we can think about what will happen to the planet if a supernova explodes near it). The tendency to analyze fragments of reality that are extremely indirectly connected with us (and even create imaginary worlds) is very curious, but requires a separate discussion and most likely relates to issues of motivation (objective function). It is difficult to imagine that IMI, in order to maximize its objective function, will (albeit virtually) indulge in abstract thoughts about the structure of the Universe (or rather, strive to obtain the necessary information for this), but there is no contradiction here, especially if for creating good cosmological theories he will receive reinforcement. but it requires a separate discussion and refers, rather, to issues of motivation (objective function). It is difficult to imagine that IMI, in order to maximize its objective function, will (albeit virtually) indulge in abstract thoughts about the structure of the Universe (or rather, strive to obtain the necessary information for this), but there is no contradiction here, especially if for creating good cosmological theories he will receive reinforcement. but it requires a separate discussion and refers, rather, to issues of motivation (objective function). It is difficult to imagine that IMI, in order to maximize its objective function, will (albeit virtually) indulge in abstract thoughts about the structure of the Universe (or rather, strive to obtain the necessary information for this), but there is no contradiction here, especially if for creating good cosmological theories he will receive reinforcement.
Now it’s important for us that deductive analysis of models of environmental fragments is associated with resource saving. The results of calculations of the algorithmic model for different sequences of actions can be remembered and reused if this model remains unchanged. Naturally, the methods of such resource saving are closely related to the issues of representations (and declarative representations, which may have to do with expanding the concept of computability) and can be extremely nontrivial. And, of course, they are not derived from IMI. So, logic can be interpreted as a meta-representation, useful for analyzing fragments of our world specifically, since the possibility of distinguishing objects and relations is its general (but not strict) property, which could well be irrelevant to some other reality, where our logic would be useless . Here, much work needs to be done to identify the principles for the implementation of effective reasoning (which are as far from exhaustive search of elementary actions as image processing methods are far from universal induction based on algorithmic probability). At the same time, “caching” the results of the analysis of fixed models will raise additional questions related to updating these results when new information is obtained (in private, this is a well-known problem of the “isolation of the world” or non-monotonous reasoning), which also requires a solution.
Here again, one would think that IMR does not contribute to the solution of known problems without it (for example, inference and maintaining truth). However, we note again that IMR poses these known problems in a much more general way. Thus, the predicate logic in the framework of universal intelligence is presented only as a meta-representation, which has a heuristic nature and does not have to be specified a priori: a self-optimizing universal intellect (for example, human) itself can learn logic and its effective use; our task is to create such an intellect and reduce its training time to an acceptable one. It can be easier to do this than to manually create many private methods, just as it’s easier, say, to implement a certain teaching method than to lay down all the necessary private facts manually. In this case, when creating AI, you should try to achieve less cognitive distortion than in humans. So, although the a priori preference in dividing the perceived world into objects with properties and relationships may probably speed up learning, this preference should not be too strict.
Social interactions.
Interaction with other intelligent agents is a very significant part of the environment. These agents are very complex, so inductive reconstruction of suitable models of other agents will require a very long interaction in the real world and a huge amount of computing resources. Naturally, some theory of mind (the ability to model the mind, in particular other agents) should be built into an effective pragmatic AI. But it should be added to the universal AI as an element of cognitive bias, which sets the bias of models, but does not impose insurmountable restrictions on them.
Social interactions are not limited to predicting the behavior (or reconstruction of models) of other agents as part of the environment. Naturally, social agents interact with each other in the same way as with other environments - through sensory and motor skills. But through them, they can transmit to each other fragments of environmental models, behavioral strategies, and even elements of target functions. In reality, it is society that forms complex target functions, inductive bias and search heuristics (in the form of ethics, science, art, etc.), due to the exchange of information and computing resources between agents. An unbiased universal agent can learn in sufficient time (if during this time someone will ensure its survival) correctly interpret sensory data, identifying this information from them (although some innate mechanisms will be required for learning target functions). But an effective pragmatic intelligence must have this ability a priori, that is, have an inductive preference for social media [Dowe et al., 2011] or have “communication priori” [Goertzel, 2009]. Of course, the more highly developed an animal is, the less prepared for independent life its babies are born, and universal AI can be forgiven for long-term “postnatal” helplessness, but such a priori skills as isolating images of other agents in the sensory stream and imitation can be very noticeable reduce the period of complete helplessness. that is, to have an inductive preference for social media [Dowe et al., 2011] or to have “communication priori” [Goertzel, 2009]. Of course, the more highly developed an animal is, the less prepared for independent life its babies are born, and universal AI can be forgiven for long-term “postnatal” helplessness, but such a priori skills as isolating images of other agents in the sensory stream and imitation can be very noticeable reduce the period of complete helplessness. that is, to have an inductive preference for social media [Dowe et al., 2011] or to have “communication priori” [Goertzel, 2009]. Of course, the more highly developed an animal is, the less prepared for independent life its babies are born, and universal AI can be forgiven for long-term “postnatal” helplessness, but such a priori skills as isolating images of other agents in the sensory stream and imitation can be very noticeable reduce the period of complete helplessness.
An essential (but not the only) aspect of social interactions is language. The analysis of language in the context of universal agents is still little studied. For example, the importance of coding in two parts was discussed (within the framework of the principle of minimum message length), which allows agents to effectively exchange regular parts of models that are separated from noise [Dowe et al., 2011]. But the bulk of the important issues still require a detailed analysis. This includes the semantic substantiation of symbols, and the clear problem that for universal agents it will be most effective (at least at first) to adopt the knowledge accumulated by mankind, for which you need to understand natural languages that are associated with certain ways of representing knowledge, and, therefore, ,
One additional important aspect of multi-agent interactions is that the environment is much more complex and computationally powerful than the agent itself. This aspect is not a heuristic or inductive bias, but it also requires consideration in IMI models.
Emotions
Emotions are often seen as a component of cognitive architecture, so they must also be discussed. At the same time, emotions are clearly related to the objective function, therefore their purpose (unlike other elements of cognitive architecture) cannot be completely reduced to saving resources and reducing training time, for which search heuristics and inductive bias serve.
We have already briefly discussed the objective function problem in previous articles. A “good” objective function (for example, at least accurately estimating survival) cannot be specified a priori. An innate objective function is a rough “heuristic” approximation of some “true” objective function. For example, pain and pleasure are a very rough approximation of the fitness function - death can be painless, and a life-saving operation can be accompanied by severe pain. Emotions and other components of assessing the quality of a situation allow a more accurate approximation. Some of them are congenital. Others are acquired throughout life.
In this case, it is worth separating the heuristic approximations of the true objective function and the quality assessment of the states, taking into account the potential values of the objective function associated with the expected (predicted) states. So, we can avoid those situations that we fear, without thinking every time about the causes of fear. And these are ordinary heuristics that do not define a maximized objective function, but reduce the enumeration of possible actions with a fixed objective function. For example, the pleasure of satisfying curiosity and aesthetic pleasure can be introduced as separate components of the basic objective function (for which there are models based on the algorithmic theory of information [Schmidhuber, 2010]). Or, an intelligent agent may be curious if it is able to predict from experience that obtaining new information will be useful for its survival (more precisely, for obtaining bodily pleasure and avoiding pain). Since this is a difficult prediction task, an agent can develop a “sense of curiosity” as an element of an over-the-air assessment of future rewards to save computing resources. It is worth emphasizing that these two options are fundamentally different, because they correspond to different maximized objective functions, because of which the corresponding agents in some situations can make different choices. Moreover, in reality both each of these options separately and their combination can take place. an agent can develop a “sense of curiosity” as an element of a non-reassessment assessment of future rewards to save computing resources. It is worth emphasizing that these two options are fundamentally different, because they correspond to different maximized objective functions, because of which the corresponding agents in some situations can make different choices. Moreover, in reality both each of these options separately and their combination can take place. an agent can develop a “sense of curiosity” as an element of a non-reassessment assessment of future rewards to save computing resources. It is worth emphasizing that these two options are fundamentally different, because they correspond to different maximized objective functions, because of which the corresponding agents in some situations can make different choices. Moreover, in reality both each of these options separately and their combination can take place.
Since human intelligence is effective pragmatic, in it the total future reinforcement is predicted mainly without explicit reference to the basic objective function. Because of this, learning the objective function itself is closely intertwined with learning heuristics to predict its future values. Moreover, the corresponding learning mechanisms have their own inductive bias, including in the form of some a priori representations. For example, for some emotions there may be no inherent mechanisms of “calculating” their values, but the selected types of emotions can be set at the level of representations. In connection with all this regarding human emotions, feelings, etc. it is difficult to determine to what extent which of them relates to the terms of the basic objective function, and to what extent to heuristics of predicting its future values. For this reason, in psychology, it is still not possible to come to a consensus on the mechanisms of emotions, their role and origin. Nevertheless, all this part of the cognitive system of natural intelligence is quite interpretable within the framework of universal intelligence models, including in terms of increasing their effectiveness.
Attention.
Such cognitive function as attention is a very broad phenomenon. However, it is quite obvious that its occurrence is associated with resource restrictions. For example, visual attention is directed to the most informative or significant (in terms of the objective function) parts of the scene, which implies that these parts are analyzed in detail using more resources than other parts. Naturally, the distribution of resources in solving other cognitive tasks can also be interpreted as attention.
This thesis can also be extended to multi-agent architectures. It can hardly be assumed that intelligence, in principle, is not capable of solving many problems in parallel, while maintaining some unity. At least, for IMR this is possible (and does not threaten schizophrenia), since IMR can work with any number of different data sources and carry out induction and selection of actions simultaneously in as many separate “bodies” as needed. If different data sources contain mutual information, this will be “automatically” taken into account. That is, with unlimited resources, the intelligence that processes data coming from different bodies does not need to focus on any one of them. The phenomenon of attention arises when resource limits are introduced, which implies processing primarily of those pieces of data,
Here it is worth noting another side of the phenomenon of attention: it can also be interpreted as the direction of actions on a particular object. Such "external" attention is due to the distribution of time between not "internal" computing operations, but external actions. Such “external” attention in the IMI should be realized “automatically”: the universal agent should be quite capable, say, of directing the camera towards a sharp sound in order to obtain information that is essential to avoid a strong decrease in the value of its target function (naturally, if this agent has a priori information indicating the possible connection of loud sound with danger). But there is no internal attention as the distribution of limited resources in the IMR, so there is no information about how attention works in humans,
There are many models of attention for cognitive architectures (for example, [Ikle 'et al., 2009], [Harati Zadeh et al., 2008]). We can say that attention mechanisms are present even in simple universal solvers (for example, [Hutter, 2002]), which take into account computational complexity and try to optimally allocate resources between the various hypotheses considered. Naturally, more developed attention mechanisms should be present in effective pragmatic AI. But the details of these mechanisms depend heavily on other parts of cognitive architecture. Thus, meaningful attention models should be developed in conjunction with resource-limited extensions of the IMI models.
Metacognitive functions.
It is often believed that the main thing that separates a computer from a person is the lack of the first functions such as self-awareness, understanding, etc. This opinion is characteristic not only of people far from AI, but also of the people who deal with it (at least in philosophical terms). Even a strong AI Serl was defined as an AI with all of these features. And the impossibility of true understanding is what is attributed to computers by Penrose and other proponents of the belief that a strong AI is impossible.
Many experts, not limited to general discussions about AI, but involved in the development of specific solutions in this area, see much more serious difficulties, for example, in the problems of search, training, presentation of knowledge, etc., while these “human” functions are not considered so complicated. So, self-awareness is interpreted simply as a top-level control module that receives and processes information about the work of other blocks of cognitive architecture. It is clear that such metacognitive functions cannot be fully realized without intelligence itself. Then the computer is not endowed with self-awareness, not because it is something mysterious and inherent only to humans, but because more basic functions are not implemented. Because of this, technicians often shy away from these aspects of thinking, considering them “humanitarian” and, as opposed to philosophers, interpreting them too simplistically. Nevertheless, metacognitive functions are beginning to attract more attention [Anderson and Oates, 2007] and are even implemented in some form in some cognitive architectures [Shapiro et al., 2007] (although these implementations are quite interesting and informative, in our opinion , they are “weak”). Their discussion cannot be completely circumvented in the context of the conversation about universal intelligence.
Indeed, in IMI models, neither self-awareness nor understanding are realized in an explicit form, which raises the natural question of whether something important is missing in these models. It is clear from an analysis of a number of metacognitive functions (meta-learning, meta-reasoning) [Anderson and Oates, 2007] that their purpose is to compensate for the non-optimal functioning of basic cognitive functions. Moreover, the reason for such errors, say, training, which can be corrected by the agent himself, can only be due to the fact that insufficient resources were allocated to solve the corresponding training problem. Indeed, when using universal induction with unlimited resources, the result, in principle, cannot be improved on the same data, and meta-training is pointless. Of course, metacognitive functions are not simply reduced to redistributing resources (this is just a private trick, being the prerogative of attention). So, in the case of training, resource saving can be manifested in the use of only a part of the data, ignoring the context, using simplified representations, etc. And meta-training should not be concerned with putting more resources into the universal learning method, but with evaluating the success of the training unit and involving, for example, more general methods when simpler methods have failed. The concept of the so-called a metacognitive cycle in which “what went wrong and why” should be defined [Shapiro and Göker, 2008]. that to devote more resources to the universal learning method, and by evaluating the success of the training unit and attracting, for example, more general methods when simpler methods have failed. The concept of the so-called a metacognitive cycle in which “what went wrong and why” should be defined [Shapiro and Göker, 2008]. that to devote more resources to the universal learning method, and by evaluating the success of the training unit and attracting, for example, more general methods when simpler methods have failed. The concept of the so-called a metacognitive cycle in which “what went wrong and why” should be defined [Shapiro and Göker, 2008].
This interpretation of metacognitive functions is too general. Regarding specific functions, questions arise. Thus, understanding (which, however, is not always interpreted as a metacognitive function, but which, in our opinion, possesses undoubted attributes of such functions), is not so easy to associate with “cognitive bias”. There are many examples showing that specific systems of (weak) AI do not realize understanding. But these examples do not indicate the fundamental impossibility of machine understanding, but just allow us to determine the role of understanding in saving resources. We have already examined the classic example of a chess position ( http://habrahabr.ru/post/150056/ ), in which a computer program that can beat a grandmaster plays incorrectly due to a lack of understanding of this position. With unlimited computing resources, as a result of deep enumeration, a program could avoid an erroneous move. Moreover, such an (algorithmic) description of the given situation is possible (for example, in the form of an evaluating function), which would allow one to determine the error of the corresponding move. That is, an understanding of the situation is associated with the use of such a representation that allows you to choose effective actions without the high cost of computing resources.
A similar conclusion can be made using other examples. So, the following classical problem is indicative. There is a board of 8x8 cells, from which two corner cells are located that are on the same diagonal. It is required to tile the board with domino tiles 1x2 cells. An ineffective intellect (lacking understanding, but having unlimited resources) could go through all the options for tiling. But a person experiences the effect of understanding when he imagines that this board has a checkerboard coloring, so that 32 cells of one color and 30 cells of a different color appear on it, despite the fact that each knuckle necessarily occupies one cell of different colors. Choosing the right presentation makes the task elementary. Perhaps even more indicative of such tasks are “creative thinking,” as the task of constructing four equilateral triangles using six matches. Here, the choice of presentation of the situation is also of fundamental importance.
And the understanding of images is associated with the construction of their descriptions in the framework of certain representations (which, as a rule, should facilitate the implementation of adequate actions). Apparently, the same can be said about understanding natural language, although it also includes additional problems.
Perhaps understanding is not the very use of effective ideas, but a metacognitive function that gives (accessible to consciousness) an assessment of the effectiveness of ideas. If a person cannot understand something or does not understand well enough, he often (though not always) is aware of this; as well as a person, the sensation of achieving a clear understanding is also available, which probably should be connected with the problem of self-optimization.
Access to the internal content of thinking processes is characteristic of all metacognitive functions, which is integrally expressed in the phenomenon of self-consciousness. There is nothing similar in IMI explicitly (for the unnecessary control of one’s own thoughts - they are already ideal), but this does not mean that he will not be able to behave as a self-aware agent. But will he be able to correctly use expressions such as “I think”, “I believe”, “I know”, “I can”, “I want”, “I remember”, etc., if there are selection methods in it actions do not receive any information about their own work (and the use of such expressions may be important for survival in the existing multi-agent environment)? It is not easy to unequivocally answer this question. Perhaps IMI will be able to correctly (pragmatically) use these expressions without understanding their meaning, but this will require extremely extensive experience in interacting with the social environment and, of course, unlimited computing resources. Indeed, the pronunciation of words is not fundamentally different from any motor output, and if there is a computable mapping between the input actions and the required output actions, then the IMR can reconstruct it according to a suitable interaction history. Nevertheless, the possibility of "unconscious" use of actions requiring missing introspective information continues to raise doubts. Fortunately, it is not necessary to dispel these doubts, since to create an effective pragmatic AI, access to this information is useful not only for communication with other agents, but also for self-optimization. Indeed, the pronunciation of words is not fundamentally different from any motor output, and if there is a computable mapping between the input actions and the required output actions, then the IMR can reconstruct it according to a suitable interaction history. Nevertheless, the possibility of "unconscious" use of actions requiring missing introspective information continues to raise doubts. Fortunately, it is not necessary to dispel these doubts, since to create an effective pragmatic AI, access to this information is useful not only for communication with other agents, but also for self-optimization. Indeed, the pronunciation of words is not fundamentally different from any motor output, and if there is a computable mapping between the input actions and the required output actions, then the IMR can reconstruct it according to a suitable interaction history. Nevertheless, the possibility of "unconscious" use of actions requiring missing introspective information continues to raise doubts. Fortunately, it is not necessary to dispel these doubts, since to create an effective pragmatic AI, access to this information is useful not only for communication with other agents, but also for self-optimization. Nevertheless, the possibility of "unconscious" use of actions requiring missing introspective information continues to raise doubts. Fortunately, it is not necessary to dispel these doubts, since to create an effective pragmatic AI, access to this information is useful not only for communication with other agents, but also for self-optimization. Nevertheless, the possibility of "unconscious" use of actions requiring missing introspective information continues to raise doubts. Fortunately, it is not necessary to dispel these doubts, since to create an effective pragmatic AI, access to this information is useful not only for communication with other agents, but also for self-optimization.
Using this information is non-trivial. Putting IMR in an environment that includes other IMR causes a contradiction (one agent models another agent, which, in turn, models the first agent, and so on to infinity). Complete introspection would cause a similar contradiction. This contradiction is removed along with the introduction of resource restrictions, which, however, violate the abstract ideal intelligence of IMI. This means that the problem of introspection (and, in general, the problem of “theory of mind”) is not solved within the framework of IMR and requires the development of additional principles. Although the theory of mind problem (and metacognitive functions in general) is associated with cognitive bias (especially regarding self-optimization heuristics), it may also be related to the lack of universality of the basic IMR models.
Conclusions.
We have analyzed some cognitive features of human thinking, which can quite naturally be interpreted as heuristics and inductive bias, which ensure the effective pragmatism of natural intelligence, that is, its acceptable work in a certain class of environments under conditions of limited resources and training time.
In general, limited resource requirements are not new; and it’s pretty obvious that many cognitive characteristics come from here. However, so far no nontrivial consideration of the connection between the mathematical theory of universal AI and complex cognitive architectures has been carried out [Goertzel and Iklé, 2011]. To establish such a connection, it is necessary not only to superficially describe cognitive functions, but to strictly introduce them as an extension of the IMR models while preserving the universality that these models possess. This problem will be discussed by us in the future.
Literature.
(Anderson and Oates, 2007) Anderson ML, Oates T. A Review of Recent Research in Metareasoning and Metalearning // AI Magazine. 2007. V. 28. No. 1. P. 7–16.
(Bushinsky, 2009) Bushinsky Sh. Deus Ex Machina - A Higher Creative Species in the Game of Chess // AI Magazine. 2009. V. 30. No. 3. P. 63–70.
(Dowe et al., 2011) Dowe D., Hernández-Orallo J., Das P. Compression and Intelligence: Social Environments and Communication // Lecture Notes in Computer Science 6830 (proc. Artificial General Intelligence - 4th Int'l Conference) . 2011. P. 204–211.
(Gobet and Lane, 2010) Gobet F., Lane PCR The CHREST Architecture of Cognition. The Role of Perception in General Intelligence // E.Baum, M.Hutter, E.Kitzelmann (Eds), Advances in Intelligent Systems Research. 2010. V. 10 (Proc. 3rd Conf. on Artificial General Intelligence, Lugano, Switzerland, March 5-8, 2010.). P. 7–12.
(Goertzel, 2009) Goertzel B. The Embodied Communication Prior // In: Yingxu Wang and George Baciu (Eds.). Proc. of ICCI-09, Hong Kong. 2009.
(Goertzel, 2010) Goertzel B. Toward a Formal Characterization of Real-World General Intelligence // E.Baum, M.Hutter, E.Kitzelmann (Eds), Advances in Intelligent Systems Research. 2010. V. 10 (Proc. 3rd Conf. on Artificial General Intelligence, Lugano, Switzerland, March 5-8, 2010.). P. 19–24.
(Goertzel and Iklé, 2011) Goertzel B., Iklé M. Three Hypotheses About the Geometry of Mind // Lecture Notes in Computer Science 6830 (proc. Artificial General Intelligence - 4th Int'l Conference). 2011. P. 340-345.
(Harati Zadeh et al., 2008) Harati Zadeh, S., Bagheri Shouraki, S., Halavati, R .: Using Decision Trees to Model an Emotional Attention Mechanism // Frontiers in Artificial Intelligence and Applications (Proc. 1st AGI Conference) . 2008. V. 171. P. 374–385.
(Hutter, 2002) Hutter M. The Fastest and Shortest Algorithm for All Well-Defined Problems // International Journal of Foundations of Computer Science. 2002. V. 13. No. 3. P. 431–443.
(Hutter, 2007) Hutter M. Universal Algorithmic Intelligence: A Mathematical Top → Down Approach// In: Artificial General Intelligence. Cognitive Technologies, B. Goertzel and C. Pennachin (Eds.). Springer 2007. P. 227–290.
(Ikle 'et al., 2009) Ikle' M., Pitt J., Goertzel B., Sellman G. Economic Attention Networks: Associative Memory and Resource Allocation for General Intelligence // In: B. Goertzel, P. Hitzler, M Hutter (Eds), Advances in Intelligent Systems Research. 2009. V. 8 (Proc. 2nd Conf. On Artificial General Intelligence, Arlington, USA, March 6-9, 2009). P. 73–78.
(Pavel et al., 2007) Pavel A., Vasile C., Buiu C. Cognitive vision system for an ecological mobile robot // Proc. 13 Int'l Symp. on System Theory, Automation, Robotics, Computers, Informatics, Electronics and Instrumentation. 2007. V. 1. P. 267–272.
(Potapov and Rozhkov, 2012) Potapov AS, Rozhkov AS Cognitive Robotic System for Learning of Complex Visual Stimuli. 2012. (in print)
(Potapov et al., 2010) Potapov AS, Malyshev IA, Puysha AE, Averkin AN New paradigm of learnable computer vision algorithms based on the representational MDL principle // Proc. SPIE 2010. V. 7696. P. 769606.
(Potapov, 2012) Potapov AS Principle of Representational Minimum Description Length in Image Analysis and Pattern Recognition // Pattern Recognition and Image Analysis. 2012. V. 22. No. 1. P. 82–91.
(Schmidhuber, 2010) Schmidhuber J. Artificial Scientists & Artists Based on the Formal Theory of Creativity// In: E. Baum, M. Hutter, E. Kitzelmann (Eds), Advances in Intelligent Systems Research. 2010. V. 10 (Proc. 3rd Conf. On Artificial General Intelligence, Lugano, Switzerland, March 5-8, 2010). P. 145-150.
(Senator, 2011) Senator TE Transfer Learning Progress and Potential // AI Magazine. 2011. Vol. 32. No. 1. P. 84–86.
(Shapiro et al., 2007) Shapiro SC, Rapaport WJ, Kandefer M., Johnson FL, Goldfain A. Metacognition in SNePS // AI Magazine. 2007. Vol. 28. No. 1. P. 17–31.
(Shapiro and Göker, 2008) Shapiro D. and Göker MH Advancing AI Research and Applications by Learning from What Went Wrong and Why // AI Magazine. 2008. V. 29. No. 2. P. 9–10.