DNA, New Technologies, and the Human Genome: Bioinformatics at ITMO University

Bioinformatics is a promising field of science and a rapidly developing industry. The use of information technology in biological research today allows you to test drugs in a virtual environment and decipher DNA sequences in a matter of hours. In this article we will talk about bioinformatics and what developments are being carried out in this area at ITMO University .
What is bioinformatics
Many scientists agree that bioinformatics is designed to study biological processes using modern computing technologies. In fact, experts in this field use programs to visualize amino acid sequences, and are also developing algorithms based on probability theory and mathematical statistics. However, the initial goal of bioinformatics was more general: Pauline Hogeweg and Ben Hesper in 1970 defined it as "the study of information processes in biotic systems."
If you focus on this definition, the origin of science can be attributed to the XIII century, when Fibonacci built the first mathematical model of the process of reproduction of rabbits. Since then, scientists have begun to apply more formal methods to describe biological processes. In 1953, one of the most important events in the history of bioinformatics, and possibly science as a whole, happened: Francis Crick and James Watson revealed the DNA structure, which today is known to everyone since high school.
Two decades later, DNA sequencing methods were developed - decoding its sequence, and then the first complete genome of a living organism, the bacteriophage fX174, was obtained. The development of sequencing technologies has accelerated this process, as a result of which it was possible to collect the sequence of the genomes of yeast and Drosophila flies.
The turning point in the history of bioinformatics was the assembly of the human genome in 2003: scientists from around the world for 13 years have been collecting pieces of its sequence. From this moment begins the so-called postgenomic era in the development of bioinformatics. Its main feature is the enormous amount of biological data that cannot be processed manually. Here digital technologies come into play, which allow not only to interpret molecular data, be it nucleic and amino acid sequences or protein structure, but also organize them into databases. For example, the GenBank database contains more than 11 billion genes of more than one hundred thousand organisms.
By the way, the researchers themselves do not really like the term “decoding” of the genome: they prefer to use the word “assembly” or “determining the sequence of the genome” - this suggests that even in those areas that have been closely monitored by scientists for many years remain unresolved tasks. For example, the proportion of unknown fragments still remains in the human genome.
Moreover, even knowledge of the entire genome sequence does not indicate its function. That is why many scientists involved in bioinformatics are now studying the connections between already known genes and their influence on the phenotype: in fact, researchers have to solve already known problems, but faster and better using new methods and technologies.
Bioinformatics is closely intertwined with other sciencesin particular with genomics and proteomics. Genomics studies the totality of genes in the body. Having large databases of genomes, we can identify similarities and differences in the genotypes of living creatures, thus making conclusions about the characteristics of individual species and about evolution in general - this is done by comparative genomics. Functional genomics studies the functions of genes, as well as the influence of some genes on others. Thanks to the methods of structural genomics, three-dimensional models of proteins encoded by a specific gene are created.
Proteomics studies the totality of gene expression products - proteins. The field of comparative proteomics is particularly actively developing, the essence of which is in comparing the protein composition, or proteome, of living organisms. Comparison of the proteomes of two organisms reveals the reasons for the difference in their phenotypes, which in turn helps to understand the course of evolution. Comparative proteomics also makes it possible to identify proteins that adversely affect the development of the disease and to test drugs for them.
On the one hand, bioinformatics is an interdisciplinary industry that contains knowledge from molecular biology, genetics, mathematics, and computer science. On the other hand, using the discoveries in these sciences, bioinformatics also makes a significant contribution to their development: this is partially reflected in the names of modern technologies - decision trees, neural networks, genetic algorithms.
ITMO University Developments
Numerous studies in the field of bioinformatics are being conducted at ITMO University. In 2011, the Laboratory of Structural Bioinformatics was created , where experiments are carried out to model proteins and predict protein-protein interactions. One of the latest laboratory developments is a method for studying the dynamics of proteins, based on the principle of mass transfer. The model of movements that are carried out at relatively large distances is quite adequate and eliminates the disadvantages of previous models.
One of the leaders of the Research Institute of Bioengineering Andrei Kayava believesan equally important task is the identification of the functions of proteins. Random rearrangements in the structure of proteins can lead to neurodegenerative diseases, such as Alzheimer's and Parkinson’s. Bioinformatics allows you to study the sequence of amino acids and predict the likely occurrence of these diseases. The ArchCandy method and program, developed by Andrei Kayava's research team, helps solve the problem of diagnosing neurodegenerative diseases at an early stage.
Employees of the Department of Computer Technologies took an active part in a number of scientific projects. The beginning of their research path in bioinformatics was the participation in the international competition de novo Genome Assembly Assessment Project . Participants managed to developand test the genome assembly method, which allows you to eliminate errors in readings - data that is obtained from special sequencing machines.
Another work by young researchers from ITMO University describes a method for assembling contigs - long overlapping segments of DNA - implying a breakdown of the assembly into two stages: the graph de Bruin is used in the first, and the overlap graph in the second. In a later work, a method is also described where one of the stages is microassembly: a graph de Bruin is built from readings, the size of which turns out to be much smaller than the graph from the first stage - hence the name “microassembly”. The result of the work of scientists was a program for the assembly of the ITMO Assembler genome, which can be downloaded here .

DNA sequencers
The continuation of this work was the participation of ITMO University employees in the MetaFast project. The essence of the project is to develop a software package that allows you to compare metagenomes - the totality of DNA of microorganisms - in various environments. DNA of organisms that are unable to reproduce, such as viruses, is difficult to collect, as they provide only fragmentary data. There is too little data on viruses and other bacteria in DNA databases to compare fragments of obtained metagenomes with them, and a deep analysis takes too much time.
The developed program works much faster, conducting only partial collection and comparison of genomes. In addition, the algorithm allows you to identify patterns even in unfamiliar environments. According to Vladimir Ulyantsev, an employee of the Computer Technologies laboratory at ITMO University and chief algorithm developer, this approach helps to find microorganisms in patients that are responsible for a tendency to a specific disease. Comparing the microflora of healthy and sick people, you can quickly identify the cause of the disease and take measures to eliminate it.
MetaFast has been tested in a wide variety of environments, including those with a high virus content. So, for example, scientists have proved the safety of microbes that live underground. They found that samples taken in the New York subway for the most part belong to already known bacteria.
The new algorithm may also be useful in the study of urbanization processes. The urban atmosphere negatively affects our microflora, and modern products destroy the bacteria needed by the body. By comparing the metagenomes of residents of large cities and remote settlements, you can find out what these beneficial bacteria are and how to preserve them.
ITMO University staff also participated in an international projectto develop a web service for a comprehensive study of the work of cells. The GAM program (genes and metabolites), developed by ITMO University graduate student Alexei Sergushichev, identifies the links between genes and changes in metabolism.
For example, when it is necessary to study the process of tumor development, the program takes initial data on the concentration of metabolites - simple substances involved in metabolism - and gene expression and compares them with data in the KEGG database. After this, a map of metabolic pathways is constructed, showing the process of changes in substances as a result of chemical reactions.
The service will be useful in the treatment of diseases associated with impaired immune system, and cancer. Metabolism change maps help to monitor the development of the tumor and develop mechanisms to contain it in the early stages. Using the developed algorithm, scientists have already proven that if you slow down the metabolic process in lung cancer, the tumor growth rate will decrease.
Unlike its counterparts, the GAM web service is both simple, efficient and, importantly, free, so anyone can use it. The service is already used in several dozen laboratories and pharmaceutical companies.
Conclusion: short for those who are interested in bioinformatics

Many students and graduates, including programmers and mathematicians, are interested in how to get into the field of bioinformatics. First you need to decide what tasks you are interested in solving. In bioinformatics, the range of tasks is very wide: from pure computer science and proof of theorems to pure biology, which newcomers have to deal with actively. It is clear that most of the research is at the junction of several areas.
After you need to find out in what places are doing what you are interested in. To do this, you will have to study the articles of specific laboratories and evaluate whether you really want to participate in their work. At the same time, it doesn’t hurt to enroll in courses at the Institute of Bioinformatics or look for online courses like those that offerCoursera . So you can get an idea of what bioinformatics is doing now and how it works.
It is important to understand: since bioinformatics is a discipline located at the junction of several areas, projects in this area can be associated not only with the use of informatics to solve problems of biology, but also vice versa. A vivid example of this is the preparation of a curriculum using DNA computers. Not to mention synthetic biology, in which they try to create or modify microorganisms for a specific purpose: for example, to better process biofuels.
These projects and bioinformatics as a whole are a vivid example of the fact that modern science can be exciting and exciting - not only on the screen of the “big movie”, but also in real life. And in order to take part in such developments, it is absolutely not necessary to study or work abroad: many interesting and significant projects in the field of bioinformatics are developing at Russian universities, in particular at ITMO University.