3D models of human viruses. Part Two: Molecular Modeling and Bioinformatics

    In our first post about three-dimensional modeling of viruses, we listed the main stages of the process and talked about where we start and how we collect the initial information. In this note, we will talk about the next stage of work - the creation of models of individual molecules, from which a whole particle will subsequently be assembled.

    Components of the virus particle Influenza A / H1N1

    A viral particle is a molecular mechanism that solves two fundamental problems. First, the particle must ensure the packaging of the viral genome and its protection from the destructive environmental factors while the virus travels from the cell in which it has gathered to the cell that it can infect. Secondly, the particle must be able to attach to the infected cell, and then deliver the viral genome and associated molecules inside to start a new reproduction cycle. There are not many tasks, so viruses, with rare exceptions, can afford to be quite economical in terms of structure.

    In particular, the genome of most viruses is small and encodes not very many proteins, often this number is less than 10. Moreover, the virus can cause the cell to synthesize a large number of proteins of the same type, from which the viral envelope, the capsid, will then be assembled. Thus, viral particles usually consist of a large number of identical elements that bind to each other as parts of the constructor, often forming regular and symmetrical structures. So, very many, though not all viral packages or their fragments have a spiral or icosahedral shape.

    Examples of viral capsids with icosahedral symmetry. Bactriorodopsin molecule in the lower right corner - for comparison. ( Illustration from the review ).

    To build a model of the virus, it is fundamentally important to know how the individual proteins of the general structure are arranged and how they bind to each other, forming this structure. Modern science has a whole range of methods that can provide answers to these questions, however, none of the approaches, unfortunately, is universal and solves only a part of the tasks that confront us when creating scientifically reliable models of viruses with atomic detail.

    Proteins: how do they receive, store and display information about their structure?

    Recall that proteins are polymer molecules consisting of monomers sequentially linked together - amino acids. In aqueous solutions, proteins are usually folded into complex three-dimensional globules (almost like the Rubik Snake puzzle ), the shape of which depends on the amino acid composition and some other factors. The spatial structure of these globules is determined mainly by the methods of X-ray diffraction analysis and NMR spectroscopy. Also, electron microscopy has recently approached this task.
    In general, methods for determining the spatial structure of molecules are complex and have a whole set of limitations; therefore, far from all viral proteins are fully described. So, X-ray analysis assumes the presence of a crystal through which X-ray radiation is passed. The atoms of the crystal provoke x-ray diffraction, from the picture of which the distribution of electron densities in the crystal can be estimated, and from this data the arrangement of specific atoms can already be restored. This method gives a resolution up to a little over 1 angstrom (0.1 nm), however, in the case of proteins, the problem is that not all of them can be crystallized. This turns out to be especially difficult if the protein has flexible mobile or anchored fragments in the membrane.

    NMR spectroscopy is based on the phenomenon ofnuclear magnetic resonance and allows you to describe the structure of proteins in solution. This approach reveals a set of possible positions of atoms in the molecule and, in contrast to the previous method, makes it possible to assess the degree of flexibility of one or another of its sections. But NMR spectroscopy only works well for relatively small molecules, since large proteins produce too much noise.

    Electron microscopy allows us to describe the structure of large molecular complexes, which is very useful when it comes to viruses. For many symmetrical structures, you can get a large set of images at different angles, analyzing which you can recreate a three-dimensional picture. For individual objects, the resolution obtained as a result of using different types of electron microscopy (up to 4-5 angstroms) is not much worse than the resolution of X-ray diffraction analysis, although it is usually necessary to combine different approaches to obtain complete information and, for example, “fit” the structures of individual proteins into electron density maps obtained by electron microscopy.

    Structures of the trimer of the HIV envelope protein (red and blue fragments of molecules) in complex with a portion of one of the antibodies to this protein (green and yellow fragments), inscribed in an electron density map obtained by cryo-electron microscopy with a resolution of 9 angstroms. From the article Structural Mechanism of Trimeric HIV-1 Envelope Glycoprotein Activation .

    As we wrote in a previous post, the resulting structures are systematized and stored in the Protein Data Bank database . At the same time, atom coordinates are written in * .pdb format, and there is a whole set of programs that allow these data to be visualized and work with such structures. Among them, for example VMD , Chimera , PyMol anddozens of others .

    Screenshot of the text display of the file in * .pdb format. The coordinates of individual atoms in the amino acids of the protein are described.

    Programs can display proteins in several ways. In addition to simply displaying atoms with spheres of different diameters corresponding to the van der Waals radii of atoms, it is possible to show individual bonds, the surface of the molecule, as well as the bends of the amino acid chain using structures resembling ribbons ( ribbon diagram ), which clearly demonstrate where the amino acids are in the protein form alpha helices , where beta layers and where unstructured areas.

    Various options for visualizing the structure of the outer part of the hemagglutinin of the influenza virus in the Chimera program.

    As a digression, I must say that the programs that scientists usually work in, visualizing individual molecules or protein complexes, most often allow you to get only quite primitive results from an aesthetic point of view (it’s enough, for example, to look at a few screenshots from the VMD program ). Fundamentally wider opportunities are opened up if models of molecules are imported into programs that are used by professional designers and specialists in computer three-dimensional graphics. These programs, combined with plugins that improve rendering quality, allow you to get really interesting and attractive visualizations. We will talk more about this in future posts. For now, just give an example:

    Images of the G immunoglobulin molecule .

    Molecular modeling

    Missing protein structures can try to predict what we have to do in order to create complete models of viral particles. For this, a number of computer methods are used, based partly on data on the structures already described, and partly on algorithms that make it possible to calculate interactions between individual atoms of a molecule with certain reliability. Modeling based on already known structures is used, because modern computing power does not yet allow to build spatial models of proteins solely on the basis of amino acids, based on quantum-mechanical principles. Plus, it is believed that the folding of so many proteins has already been determined that for almost every new protein structure there is already an analogue in the PDB bank, the main thing is to find it.
    It is known that proteins, in which more than 30% of amino acid residues are identical, have very similar structures. We can find a protein with a similar amino acid sequence and already known structure, and use it as a template for building a model - this is called homology modeling. To find a similar sequence, the BLAST program is usually used .
    However, some proteins with similar structures have approximately the same sequence similarity as a pair of randomly selected proteins. In order to find a suitable template in such cases, use the Fold recognition methods. They "pull" the sequence of the simulated protein on different known structures, and evaluate how this template suits them. Different programs use different evaluation functions, and therefore produce different results. A single and optimal algorithm for Fold recognition does not exist now, usually they use several programs at once and choose a template based on all their results. For example, you can take a protein having a similar function as a template.
    There are methods that allow you to assemble a model using several templates at once, combining them in an optimal way. The best of them is called I-Tasser. The creators of the program themselves did not declare it the best - for several years now, I-Tasser, under the name " Zhang-server ", has been winning the CASP protein structure prediction contest .
    For example, when working with a model of the influenza virus, we came across the fact that one of the surface proteins, neuraminidase, experimentally determined only that part of the structure that directly performs the enzymatic function (the breakdown of sialic acid in the composition of cell membrane glycoproteins). The parts of the molecule that form the stem of the protein and anchor neuraminidase in the lipid membrane of the virus had to be modeled by homology. The described structures of parainfluenza virus hemagglutinin-neuraminidase ( 3TSI ) and one of the transmembrane peptides ( 2LAT ) were taken as templates .

    Templates for modeling the neuraminidase complex of the influenza virus. A is a fragment of the neuraminidase N2 monomer from the 2AEP structure in the PDB database, B is the stem of parainfluenza hemagglutinin neuraminidase (3TSI), C is the 2LAT transmembrane peptide. D is the final model received.

    The final protein model is usually created taking into account the known structures of its fragments found by different template methods, as well as models from the I-Tasser server. To do this, use the Modeller program . It allows you to build a model by homology using one or more templates, as well as make additional modifications, for example, create disulfide bonds in predetermined places.


    Another important aspect of the structure of viruses, information about which is often incomplete in the scientific literature, is the interaction between individual proteins. In our case, it depends on which surfaces of the model of individual proteins will contact each other and other components of the virion in the final model. Information on interactions also allows clarification of structural bioinformatics.

    The docking program does not model the natural process of complex formation, it would be too slow and resource-intensive, but it enumerates options for the relative position of two or more molecules in search of the best structure. When docking, usually a large molecule in a complex is called a receptor, and a smaller one is called a ligand. To assess the quality of the structure of the complex of the ligand with the receptor, various evaluation functions are used. Ideally, such a function should be the free energy of the system, but it is too complicated to calculate, therefore, various empirical pseudopotentials are used that take into account potential energy (which is just calculated simply), the contact area of ​​the ligand and receptor, and compliance with various rules that the researchers deduced from the analysis a large number of complexes, and all sorts of mysterious terms that do not have physical meaning, but improving the result of the program when tested on a large number of well-known complexes. The search for the minimum of such a pseudopotential in modern programs usually occurs using various variations of the Monte Carlo method and genetic algorithms. Currently, there are many molecular docking programs (the most famous of them areDock , Autodock , GOLD , Flexx , Glide ), characterized by evaluation functions, minimization methods and additional features. At the same time, during the search, the receptor and ligand molecules can both remain motionless (this type of docking is called hard) and somewhat change the conformation (flexible docking). Obviously, the second option is more resource-intensive, but the results of such a search are usually believable. Docking small molecules to proteins is now a standard step in the development of new drugs. You can, for example, carry out docking for 10 million ligands, and choose the hundred most promising compounds for further experimental work - this is called virtual screening.

    In addition to studies of small molecules, docking can be used to build protein-protein and protein-nucleotide complexes. For these purposes, a large number of programs and online services have been developed ( ZDOCK , pyDOCK , HEX ). For example, during our work on the human papillomavirus(HPV), we were faced with the fact that, despite the complete structure of the outer layer of the capsid formed by the L1 protein, there was absolutely no information about the structure of the L2 protein, which is located closer to the genome in the capsid, and accordingly, there is no data on how pentamers L1 interact with L2 molecules. We built a homology L2 protein model using the Tasser server, and then docked in the HeX program. During docking, the pentamer L1 acted as a receptor. It was on its surface that the search for the optimal landing site L2 was carried out. In this case, all structures remained motionless. Those. hard docking method was used. As a result, a plausible structure of a pentamer complex assembled from L1 and minor protein L2 was obtained.

    The pentamer of the main capsid protein L1 in complex with the minor protein L2 (shown on the poster to the right of the viral particle). View from below (disassembled) and view from above. Structures obtained by a combination of homology and docking modeling methods.

    Post-translational modifications

    Finally, bioinformatics methods can try to restore what changes in the structure of viral proteins make the cell itself, in which they are formed. After synthesis, most proteins undergo additional chemical post-translational modifications (PTMs), which can seriously affect the functions performed by the protein. Among these modifications are phosphorylation, ubiquitination, glycosylation, nitrosylation, introducing gaps and other chemical changes. Many surface proteins of viruses are glycosylated, and this modification is of direct importance for the main function of surface proteins of the virus - binding to cellular receptors. On the other hand, proteins of viral matrices - layers that are found directly under the lipid membranes of some viruses, for anchoring in the membrane often must be associated, for example, with myristic acid - a small hydrophobic molecule that facilitates the interaction of proteins with lipids. Thus, in our work, protein modifications also require attention.
    Currently, possible PTMs are rather difficult to predict. The main existing methods and services are based on the search for relevant experimental information for similar proteins or the search in the sequence of the studied protein for small areas characteristic of a particular type of modification.
    In our work, in preparing the models, we use the experimental information reflected in the corresponding UNIPROT database record.

    Stages of work on the influenza hemagglutinin model. A - visualization of the 3ZTJ structure from the PDB database. B - H1N1 influenza virus hemagglutinin model built on the basis of homology with 3ZTJ with completion of transmembrane sections of the molecule. C - model taking into account post-translational modifications (glycosylation).

    Molecular Dynamics and Structural Optimization

    The last thing I want to mention is that when preparing new models of proteins and, especially, their complexes, it is necessary to carry out optimization of structures. The simplest optimization method is to minimize energy. It is used to quickly “lower” the system into a local minimum of potential energy. This manipulation is preferably carried out after each modification of the structure of the molecules. It avoids such troubles as overlapping atoms or the appearance of irregular bond lengths. Various methods of minimizing energy are provided in almost any molecular modeling software package.
    It is worth noting that this method allows only preliminary and very crude optimization. For more accurate preparation of spatial structures, methods of molecular dynamics or quantum mechanics are used. The latter, for example, are used for the best optimization of the structure of small ligand molecules and the most accurate calculations of the energy of intermolecular interactions. But, the greatest accuracy, which is quite logical, is associated with more resource-intensive calculations, which makes these methods practically impractical when applied to large biological macromolecules.
    Molecular dynamics methods allow evaluating the behavior and stability of structures of sufficiently massive molecules, such as polypeptides and nucleic acids.
    The molecular dynamics method is to study the behavior of atoms and molecules and their movements in time. Calculations of molecular dynamics allow, for example, to study the stability of both individual molecules and their complexes, to assess the significance of possible conformational rearrangements, the influence of point mutations, and much more. Modern methods for analyzing the results of molecular dynamics simulations provide the most detailed information on the time behavior of both individual atoms and the entire system under study.

    Depending on how well studied the proteins of the virus whose model we want to create, each time you have to choose approaches for the completion and optimization of models of all proteins and their interactions. After all the structures are obtained, you can proceed with the assembly of the complete model. In this next post in the series about creating scientifically valid models of human viruses, we will describe how this is done.

    PS: Medical anatomical illustration - the history of the study of the human body in the works of illustrators of 5 centuries, which
    became the leader in the survey of the last postwill be next. With stunning engravings, wax models of the last century, plasticizers of corpses, atlases of outstanding researchers, 3D reconstructions based on layered sections of a frozen suicide bomber, interactive applications and the work of modern medical illustrators. Soon.

    Also popular now: