Folding bio-calculations. Again in plain language about the resulting folding model
I've already written more than 7 articles on one of my approaches (a set of algorithms and problems) to the task of folding RNA. With each article, there were fewer readers, and some admitted that the brain endured after the second article. The comparative success of the first two articles, compared with the rest - seems to be in the simplicity of presentation and not deepening in the details. Although the latest articles made it possible for us to take a demo of my program and feel the problems, this seems to be of less interest. 
Therefore, I will try here to state in simple language yet another problem that prevents us from solving this problem. And it seems to me that this problem is connected not only with the approach I have chosen to the solution, but it is rather common for the folding task.
In my RNAInSpace software, I realized the ability to “twist” the RNA helix manually so that the geometry and limitations of such rotation became clear. But since according to previous articles this software was not very interested, then I will not present yet another demo version of this software. And let's talk about what I get.
To keep up the conversation with my readers, I will go over a number of important points.
chupvl already asked. Have you tested your solution on a real problem, or are you just disassembling a theory at the moment? There I explained to him like this:
A specific example of folding
. Now I want to clearly demonstrate what that RNA that I fold looks like.
Below is the "skeleton" of RNA, which I use as the base (2QUS).

I'm trying to collapse the viroid ribozyme NC_003540, whose tertiary structure is unknown. But it should be somewhat similar to the basic one. Although the primary sequence is almost 80% different. Below is one of the best options that I managed to get. (not automatically, but semi-automatically - similarly as if playing the FoldIt game only in another way, I already said that before full automation - like to the moon)

What differences can we see here?
1. It must be understood that the base sequence (2QUS) is slightly longer than in the viroid ribozyme, so the ends of the RNA in the viroid ribozyme will not form a long helix, as in the basic analogue.
2. The general styling fundamentally coincides - therefore, the viroid ribozyme is fundamentally folded correctly.
3. But there is at least one difference. Loop L2 (see the area in the figures at the bottom left) - is different.
3.a. There is one nucleotide in this loop, which should form 4 hydrogen bonds with two other nucleotides in the L1 loop (the area in the figures on the bottom right). It is these bonds that hold the ribozyme loops together.
(how it looks and what are the problems that I have already solved - I commented earlier here and here )
3.b. Why is there a difference? In the base ribozyme, this loop is simply longer than in the viroid ribozyme I study, and the L2 loop has to change its configuration so that it can dock with the L1 loop. And since this is a rather unnatural position of the L2 loop, the configuration of the spiral itself (which is higher on the left) should change along with this.
4. And finally, after resolving all these problems, I already rubbed my hands, thinking that folding the ends of the ribozyme with the correct position of the ribozyme nucleus would not be a problem. But he was mistaken. I never managed to form the necessary hydrogen bonds between the ends of the spiral (in the figure you can see that the ends are not close enough to each other).
Why was it not possible to finish the ribozyme?
I began to compare what the problem is with the RNA structure that I have rolled up. The ends did not want to converge, because they were prevented by the protrusion connecting the spirals L1 and L2 (in the center from above). And it turned out that literally one or two nucleotides, after this protrusion took an incorrect position. They did not give much attention to modeling. They did not form any hydrogen bonds - and I allowed them to take any possible position. And they accepted - random. And naturally, a random position did not allow the ribozyme to reach. (branch: just think - how can something happen during the simulation, if in most of the methods now used, the nucleotides are put in a random position, and the correct ones are selected according to some average statistics for the entire RNA molecule?)
I tried to change this position in an already minimized structure (second figure). But this turned out to be not possible - it was necessary to rebuild almost all the connections, i.e. to destroy all those hydrogen bonds that have already been formed.
And then I thought. I remembered a comment from Wott : the initial state dictates the result , and earlier
This is a fairly correct remark, and for me it was known, earlier in one of my scientific articles I wrote:
Therefore, then I answered in general that such an approach contradicts my results, and that it is better to start folding from the initial state extended into a chain than from a half-rolled state.
But there are still so-called stacking interactions: this is when the nucleotides, which can be simply called hexagons, are arranged so that they form a stack of coins. Below in the figure, the first 6 nucleotides (counting from the bottom) are in the stacking interaction.

So, if RNA appears gradually, then before the first hydrogen bond can form, at least twenty nucleotides must appear.
And before, I thought they were just pulled into a chain. But they are actually probably appearing one after another manage to take the position characteristic of stacking. And the chain seems to be stretched, but not quite by accident. And this is just what is important for the starting position.
But there was one more trouble. If we recall what a double strand of DNA looks like, then its length can be very large. And there are precisely such stacking interactions. But RNA, although it strives for this, but it is only enough for very small areas, and here in the third figure you see how, starting from 7 nucleotides, the RNA chain changes direction.
Some conclusions.
It’s probably not very clear what I’ve been leading all this time to. Let's try to figure it out.
1. Stacking is important as creating an initial position, which helps to prevent a situation like mine when folding in nature, in the simulation described above, when the RNA is practically folded - but the pair of nucleotides remains in a random position (not connected by stacking) - and this interferes with further folding.
2. RNA stacking is not as stable as in DNA, where a strict helix is formed. In RNA, at some nodal points, stacking forces cease to act between the pair of consecutive nucleotides. And then the chain changes direction.
3. Where the direction of the chain changes, and when complementary pairs of nucleotides are found, then hydrogen bonds begin to form, which have a stronger stabilizing effect than stacking.
4. But when the chain grows, and is divided into two or more spirals - the loops of these spirals are interconnected by non-standard hydrogen bonds. This is an even stronger interaction, and at this time the previous stacking, the approximation of hydrogen bonds, may collapse, and the shape will change starting from the spiral loops.
This is the hypothesis - the folding model that I have obtained at the moment. It still needs to be verified definitively, but at least many other variations that cannot be fair (such as a hierarchical model (which says first about creating secondary structures, and only then combining them into a tertiary) and others) have been tested now.
As a result, it turns out that the task of stacking is to push the formation of the necessary hydrogen bonds. And without its presence, hydrogen bonds by themselves are not formed.
PS In fact, everything is a little more complicated, but I do not want to load you. And so it seems he promised simply, but it didn’t work out completely. But I am ready to answer any questions where something is not clear.
Therefore, I will try here to state in simple language yet another problem that prevents us from solving this problem. And it seems to me that this problem is connected not only with the approach I have chosen to the solution, but it is rather common for the folding task.
In my RNAInSpace software, I realized the ability to “twist” the RNA helix manually so that the geometry and limitations of such rotation became clear. But since according to previous articles this software was not very interested, then I will not present yet another demo version of this software. And let's talk about what I get.
To keep up the conversation with my readers, I will go over a number of important points.
chupvl already asked. Have you tested your solution on a real problem, or are you just disassembling a theory at the moment? There I explained to him like this:
I'm folding ribozyme. Ribozymes have the same basic structure, or rather, because of what they are distinguished into one class. For comparison, I use another 2QUS ribozyme actually available. Then it’s important for me to see what are the differences and the similarities.
A specific example of folding
. Now I want to clearly demonstrate what that RNA that I fold looks like.
Below is the "skeleton" of RNA, which I use as the base (2QUS).

I'm trying to collapse the viroid ribozyme NC_003540, whose tertiary structure is unknown. But it should be somewhat similar to the basic one. Although the primary sequence is almost 80% different. Below is one of the best options that I managed to get. (not automatically, but semi-automatically - similarly as if playing the FoldIt game only in another way, I already said that before full automation - like to the moon)

What differences can we see here?
1. It must be understood that the base sequence (2QUS) is slightly longer than in the viroid ribozyme, so the ends of the RNA in the viroid ribozyme will not form a long helix, as in the basic analogue.
2. The general styling fundamentally coincides - therefore, the viroid ribozyme is fundamentally folded correctly.
3. But there is at least one difference. Loop L2 (see the area in the figures at the bottom left) - is different.
3.a. There is one nucleotide in this loop, which should form 4 hydrogen bonds with two other nucleotides in the L1 loop (the area in the figures on the bottom right). It is these bonds that hold the ribozyme loops together.
(how it looks and what are the problems that I have already solved - I commented earlier here and here )
3.b. Why is there a difference? In the base ribozyme, this loop is simply longer than in the viroid ribozyme I study, and the L2 loop has to change its configuration so that it can dock with the L1 loop. And since this is a rather unnatural position of the L2 loop, the configuration of the spiral itself (which is higher on the left) should change along with this.
4. And finally, after resolving all these problems, I already rubbed my hands, thinking that folding the ends of the ribozyme with the correct position of the ribozyme nucleus would not be a problem. But he was mistaken. I never managed to form the necessary hydrogen bonds between the ends of the spiral (in the figure you can see that the ends are not close enough to each other).
Why was it not possible to finish the ribozyme?
I began to compare what the problem is with the RNA structure that I have rolled up. The ends did not want to converge, because they were prevented by the protrusion connecting the spirals L1 and L2 (in the center from above). And it turned out that literally one or two nucleotides, after this protrusion took an incorrect position. They did not give much attention to modeling. They did not form any hydrogen bonds - and I allowed them to take any possible position. And they accepted - random. And naturally, a random position did not allow the ribozyme to reach. (branch: just think - how can something happen during the simulation, if in most of the methods now used, the nucleotides are put in a random position, and the correct ones are selected according to some average statistics for the entire RNA molecule?)
I tried to change this position in an already minimized structure (second figure). But this turned out to be not possible - it was necessary to rebuild almost all the connections, i.e. to destroy all those hydrogen bonds that have already been formed.
And then I thought. I remembered a comment from Wott : the initial state dictates the result , and earlier
And in fact, the initial position dictates which “final” state will be achieved. And the initial state in this case is the process of creating the chain itself. That is, for a good problem, it is necessary to solve iteratively - add the first, add the next - bring the system to a minimum, add the next - again bring it to a minimum.
...
Similarly, it is harmful to consider the entire elongated chain - this is an implausible state from which you can achieve both believable and not very states, but due to the large size of the options, there will be much more second ones.
This is a fairly correct remark, and for me it was known, earlier in one of my scientific articles I wrote:
...
In the terminology of game theory, this means that it is necessary to ensure the possibility from any initial position of the game to achieve a given final state. And if, for example, in a game of chess, we know how the game starts and what is the winning condition, and we know that the rules of chess ensure getting from the beginning of the game to the end, then here - when modeling the folding of macromolecules - we still need to find such rules and prove that they provide a folding process.
Therefore, then I answered in general that such an approach contradicts my results, and that it is better to start folding from the initial state extended into a chain than from a half-rolled state.
But there are still so-called stacking interactions: this is when the nucleotides, which can be simply called hexagons, are arranged so that they form a stack of coins. Below in the figure, the first 6 nucleotides (counting from the bottom) are in the stacking interaction.

So, if RNA appears gradually, then before the first hydrogen bond can form, at least twenty nucleotides must appear.
And before, I thought they were just pulled into a chain. But they are actually probably appearing one after another manage to take the position characteristic of stacking. And the chain seems to be stretched, but not quite by accident. And this is just what is important for the starting position.
But there was one more trouble. If we recall what a double strand of DNA looks like, then its length can be very large. And there are precisely such stacking interactions. But RNA, although it strives for this, but it is only enough for very small areas, and here in the third figure you see how, starting from 7 nucleotides, the RNA chain changes direction.
Some conclusions.
It’s probably not very clear what I’ve been leading all this time to. Let's try to figure it out.
1. Stacking is important as creating an initial position, which helps to prevent a situation like mine when folding in nature, in the simulation described above, when the RNA is practically folded - but the pair of nucleotides remains in a random position (not connected by stacking) - and this interferes with further folding.
2. RNA stacking is not as stable as in DNA, where a strict helix is formed. In RNA, at some nodal points, stacking forces cease to act between the pair of consecutive nucleotides. And then the chain changes direction.
3. Where the direction of the chain changes, and when complementary pairs of nucleotides are found, then hydrogen bonds begin to form, which have a stronger stabilizing effect than stacking.
4. But when the chain grows, and is divided into two or more spirals - the loops of these spirals are interconnected by non-standard hydrogen bonds. This is an even stronger interaction, and at this time the previous stacking, the approximation of hydrogen bonds, may collapse, and the shape will change starting from the spiral loops.
This is the hypothesis - the folding model that I have obtained at the moment. It still needs to be verified definitively, but at least many other variations that cannot be fair (such as a hierarchical model (which says first about creating secondary structures, and only then combining them into a tertiary) and others) have been tested now.
As a result, it turns out that the task of stacking is to push the formation of the necessary hydrogen bonds. And without its presence, hydrogen bonds by themselves are not formed.
PS In fact, everything is a little more complicated, but I do not want to load you. And so it seems he promised simply, but it didn’t work out completely. But I am ready to answer any questions where something is not clear.