The Practical Business of Ontology: A C Advanced Story

Original author: Stephen Wolfram
  • Transfer


Stephen Wolfram's translation of The Practical Business of Ontology: A Tale from the Front Lines .

Philosophy of Chemicals


“We only have to decide: is the chemical closer to the city or to the number?” I spent my day yesterday — like most days of the last 30 years — developing new features of the Wolfram language . And yesterday afternoon at one of my meetings there was a dynamic discussion on how to expand the possibilities of language in chemistry.

At some level, the problem that we discussed was essentially practical. But as it often happens, what we do is ultimately connected with some deep intellectual problems. And in order to actually get the right answer - and successfully develop language features that stand the test of time - we had to drop these depths and talk about things that are not usually considered outside of any philosophy seminar.

Of course, part of the problem is that we are dealing with issues that actually never arose before. Traditional computer languages ​​do not directly try to talk about things like chemicals; they just deal with abstract data. But in Wolfram, we try to build knowledge of everything that is possible; which means that we have to deal with real things, such as chemicals.

We built a whole Wolfram system to handle what we call objects . An object can be a city (e.g. New York ), or a movie , or a planet, or a zillion other things. The object has a name ("New York"). And it has certain properties (for example, population , area , founding date , ...).

We have long existed the concept of chemicals such as water , ethanol or tungsten carbide . Each of these chemical objects has properties such as molecular weight or structural graph , or boiling point .

And we have many hundreds of thousands of chemicals about which we know many properties. But all this in a sense is specific chemicals: specific compounds that we could put in a test tube and conduct experiments.

But what we tried to find out yesterday was how to handle abstract chemicals — chemicals that we build abstractly, say, through an abstract graph representing their chemical structures. Should they be represented by objects such as water or New York? Or should they be considered more abstract, for example, lists of numbers or, for that matter, mathematical graphs?

Well, of course, among the abstract chemicals that we can build, there are chemicals that we already represent objects, such as sucrose or aspirin . But there is an important difference. Are we talking about individual sucrose or aspirin molecules? Or how about the product in bulk?

At some level, this is a confusing difference. Because we can think that knowing the molecular structure, we know everything - it's just a matter of calculation. And some properties - such as molar mass - are mostly trivial to calculate by molecular structure. But others - for example, melting point - are very far from trivial.

Well, but is this a temporary issue that should not be based on long-term language design? Or is it something more fundamental that will never change? Conveniently, I do basic science enough to know the answer: yes, this is something fundamental. This is due to what I call computational irreducibility. For example, the exact value of the melting point for an infinite amount of some material can actually be fundamentally uncalculated . (This is due to the insolubility of the tiling problem ; installing a tile is similar to how molecules make up a solid).

Therefore, knowing this part (very advanced) of fundamental science, we know that we can consciously distinguish between mass versions of chemicals and individual molecules. Clearly, there is a close relationship between, say, water molecules and bulk water. But there is still something fundamentally and irreducibly different in them, as well as in their properties.

At least the atoms should be fine


Ok, let's talk about individual molecules. Of course, they are made of atoms. And at least when we talk about atoms, we are on a fairly solid basis. It would be logical to say that any particular molecule always has a certain set of atoms in it - although, perhaps, we will want to consider “parameterized molecules” when we talk about polymers, etc.

But, at least, it seems safe to consider types of atoms like objects. After all, each type of atom corresponds to a chemical element, and there is only a limited number of them on the periodic table. Of course, in principle, you can come up with additional "chemical elements"; and even think about a neutron starhow about a giant atomic nucleus. But then again, there is a distinctive feature: almost certainly there is only a limited number of fundamentally stable types of atoms , and most others have a terribly short lifetime.

However, it is worth noting right away. “The chemical element is not so much defined as you can imagine. Because it is always a mixture of different isotopes . And, say, from one tungsten mine to another, this mixture can change, giving another effective atomic mass.

And actually this is a good reason for displaying atom types by objects. Because then you just need to have a single object representing tungsten, which can be used when talking about molecules. And only if someone wants to get the properties of this type of atom, which depend on conditions, for example, on a mine, you have to deal with such things.

In some cases (for example, heavy water ) it will be necessary to speak directly about isotopes in what is essentially a chemical context. But in most cases, it is enough to indicate a chemical element.

To indicate a chemical element, you just need to indicate its atomic number Z. And then the textbooks will tell you that to indicate a specific isotope you just need to indicate how many neutrons it contains. But it ignores the unexpected case of tantalum. Because one of the natural forms of tantalum (180mTa) is actually an excited state of the tantalum core , which is very stable. And in order to determine this correctly, you must indicate its level of excitation, as well as the number of neutrons.

In a sense, quantum mechanics saves us here. Since while there are an infinite number of possible excited states of the nucleus, quantum mechanics says that they can all be characterized by only two discrete values: spin and parity .

Each isotope and each excited state is different and has its own special properties. But the world of possible isotopes is much more orderly than, say, the world of possible animals. Because quantum mechanics says that everything in the world of isotopes can be characterized simply by a limited set of discrete quantum numbers.

We went from molecules to atoms to nuclei, so why not talk about elementary particles? Well, this complicates the situation. Yes, there are well-known particles such as electrons and protons - which are pretty easy to talk about - and they are easily represented by objects in the Wolfram language . But there are many other particles. Some of them - such as nuclei - are fairly easy to characterize. You can say things like: “ This is a special excited state of the c-quark-anti-c-quark system" or something like that. But in particle physics we are talking about quantum field theory, and not just about quantum mechanics. And one cannot just “count elementary particles”; one also has to deal with the possibility of virtual particles, etc. And ultimately, the question of which particles can exist is a very complex one, full of computational irreducibility. (For example, what stable states can be in a gluon field, this is a much more complicated question, similar to the tiling problem, which I mentioned in connection with the melting points.)

Perhaps one day we will get a complete theory of fundamental physics . And maybe it will be easy. But no matter how exciting it is, it will not help us here. Because computational irreducibility means that there is an unimaginable distance between what is hidden inside and what phenomena arise from it.

And in creating a language for describing the world, we need to speak in terms of things that really can be observed and calculated. We must pay attention to the fundamentals of physics - and not least, to avoid those positions that may ultimately lead us into confusion. We also need to pay attention to the actual history of science and the actual things that have been measured. Yes, there is, for example, an infinite number of possible isotopes. But for many purposes, it is very useful to simply set up objects for those that are known.

Space of possible chemicals


But is this the case in chemistry? In nuclear physics, we think that we know all fairly stable existing isotopes, so any additional and exotic ones will be very short-lived and, therefore, possibly not important in practical nuclear processes. But chemistry is a completely different story. There are tens of millions of chemicals that people study (and, for example, put in scientific publications or patents). Indeed, there is no limit to the number of molecules that could be considered - and this can be quite useful.

But, well, so how can we refer to all these potential molecules? Perhaps, from the first approximation, we can indicate their chemical structures, indicating graphs in which each node is an atom and each edge is a bond.

What does “communication” really mean? Although it is incredibly useful in practical chemistry, it is at some level an indefinite concept - a kind of semiclassical approximation of full quantum mechanics. There are some standard additional aspects: double bonds , ionization state, etc. But in practice, chemical analysis is very successfully performed simply by characterizing the molecular structures with the corresponding labels of the graphs of atoms and bonds.

OK, but should chemicals be represented by objects or abstract graphs? If it’s a chemical you’ve already heard about, such as carbon dioxide , the object seems convenient. But what if it's a new chemical that has never been talked about before? You might think of inventing a new object to represent it.

However, any self-respecting object will have its own name. What would that name be? In Wolfram, it can only be a graph that represents a structure. But, perhaps, I would like something similar to a regular text name - a string. We always have an IUPAC method for chemical names with names, for example 1,1 ′ - {[3- (dimethylamino) propyl] imino} bis-2-propanol . Also, there are more convenient for the computer version of the SMILES : CC (the CN (CCCN the C ©) © About CC) About . And whatever the graph, it can always generate one of these lines to represent it.

However, a new problem arises: the string is not unique. Actually, as if someone chose to write a graph, it cannot always be unique. A specific chemical structure corresponds to a specific schedule. But there can be many ways to draw a graph and many different representations of it. And in fact, even the problem (“ graph isomorphism ”) with determining whether two images correspond to the same graph is difficult to solve.

What is a chemical at the end?


So, let's imagine that we represent a chemical structure as a graph. At first it is an abstract thing. There are atoms in the graph as nodes, but we don’t know how they will be located in a real molecule (and, for example, how many Angstroms they will be separated). Of course, the answer is not fully defined. Are we talking about the low-energy configuration of the molecule? (What if there are several configurations of the same energy?) Is the molecule supposed to be on its own or in water or in something else? How was the molecule supposed to form? (Maybe it's a protein that is folded in a special way when it descends from the ribosome.)

Well, if we had an object representing, say, “natural hemoglobin ”, perhaps we would be better off. Because in a sense, this object could encapsulate all these details.

But if we want to talk about chemicals that have never been synthesized, this is a slightly different story. And it seems to me that we would be better off with an abstract representation of any possible chemical substance.

But let's talk about some other cases and analogies. Maybe we should just consider everything as an object. Like any integer can be an object. Yes, there are an infinite number of them. But at least it is clear what names they should be given. With real numbers, things are already in a mess. For example, there is no longer such a uniqueness as with integers: 0.99999 ... actually the same as 1.00000 ... but it is written differently.

How about a sequence of integers or, for that matter, mathematical formulas? All possible sequences or all possible formulas may possibly be different objects. But this would not be particularly useful, because much of what I would like to do with sequences or formulas is to enter them and transform their structure. But, what is convenient for working with objects is that each of them is a “single entity”, which does not need to “go inside”.

So what's the story with “abstract chemicals”? It will be a confusion. But, of course, you will want to “go inside” and transform this structure. Which speaks in favor of representing the chemical graph.

But then there is a potentially unpleasant gap. We have a carbon dioxide object, about which we already know many properties. And then we have this graph, which abstractly represents a carbon dioxide molecule.

We may fear that this will confuse both people and programs. But the first thing to understand is that we can distinguish what these two things represent. An object is a natural version of a chemical substance whose properties can potentially be measured. A graph is an abstract theoretical chemical whose properties must be calculated.

But obviously there should be a connection. For a specific chemical object, one of the properties will be a graph representing the structure of the molecule. And having a graph, you need some kind of ChemicalIdentify function , which, like GeoIdentify or, possibly, ImageIdentify, will try to identify by the graph which chemical object (if any) has a molecular structure corresponding to this graph.

Philosophy meets chemistry meets mathematics meets physics ...


While describing some of the problems, I understand how difficult all this may seem. And yes, it’s difficult. But yesterday at our meeting everything went very quickly. Of course, it helps that everyone has faced similar problems before: this is exactly what underlies what we do. But each case is different.

And, somehow, this case has become a little deeper and more philosophical than usual. “Let's talk about the name of the stars,” someone said. Obviously, there are close stars for which we have explicit names. And some other stars may have been identified in large-scale studies of the sky and given specific identifiers. But in distant galaxies there are many stars that will never be named. So how should we represent them?

This led to talk of cities. Yes, there are certain cities of the charter that are officially named, and we probably have almost all of these Wolfram names regularly updated. But what about some kind of village created for one season by some nomadic people? How should we represent this? She has a certain place, at least for a while. But is this a definite thing, or maybe it will later be divided into two villages or not at all?

One can argue almost endlessly about the identification - and even the existence - of many of these things. But ultimately, this is not the philosophy of these things that interest us: we are trying to create software that people find useful. And therefore, in the end, what is useful is what matters.

This, of course, in most cases, it is impossible to know for sure. But it looks like language design in general: think about everything that people want to do, and then see how to set up primitives that will allow people to do this. Would anyone like to introduce chemicals as objects? Yes, that would be helpful. Would anyone like to represent arbitrary chemical structures as graphs? Yes, that would be helpful.

But in order to understand what to do, you need to have a deep understanding of what is actually represented in each case, and how everything is connected. And here philosophy must go to the meeting of chemistry, mathematics, physics, etc.

I am pleased to say that by the end of our hourly meeting yesterday (supplemented by my 40 years of experience and 100 years of experience of everyone present at the meeting), it seems to me that we have figured out the basis of a really good way to handle chemicals and chemical structures. It will take some time before it is fully developed and implemented in Wolfram. But ideas will help us understand how we calculate and reason about chemistry for many years to come. And for me, figuring out such things is an extremely pleasant pastime. And I'm just glad that I do a lot in my continued efforts to develop the Wolfram language.

Also popular now: