Big data and machine learning: new opportunities for medicine

    “We have developed better technologies for selecting shoes on Amazon than for choosing the type of cancer treatment,” says MIT professor Regina Barzilay about the current state of high-tech medical projects. The assessment is disappointing: often “popular” areas, such as e-commerce, are ahead of more socially important areas in terms of the level of technology used.

    However, there is good news: solutions that were developed for the conditional “search for shoes” can also be used to help patients. And the demand for such developments is only growing: according to forecasts of the Frost & Sullivan agency, the volume of the medical development market alone, using machine learning and big data, is growing by 40% annually and will reach $ 6.6 billion by 2021.

    Today we’ll talk about how big data is used in medical projects and what developments in this direction are being conducted at ITMO University.

    Photo Charles Clegg CC-BY

    Diagnosis of diseases

    Data mining, machine learning and natural language processing, in particular, are actively used to solve the problems of early diagnosis of diseases: from cancer and diabetes to schizophrenia. For example, the American PathAI project does an excellent job of detecting breast cancer in the early stages. In April 2016, the system competed with an expert and lost: his error percentage was 3.5%, and that of the system was 7.5%. From that moment, researchers were able to increase the sample size on which the system was trained, and by November of the same year, PathAI surpassed the expert in diagnostic accuracy.

    As he saysJoel Dudley, developer of the Deep Patient system at Mount Sinai Hospital in New York: “One of the important features of deep learning is that when making forecasts or models, you don’t need to limit yourself to the most essential information in advance.” This applies, for example, to an analysis of the patient’s entire medical history when forming a treatment plan. Or comparing individual patient data with information about other cases, the Deep Patient algorithm accesses a database of 5 million people.

    Modeling the work of ambulances

    However, the use of big data in medicine is not limited to these examples. For example, at ITMO University, one of the projects combining big data and medicine is being implemented by the Institute for High-Tech Computer Technologies. Together with the Northwestern Federal Medical Center named after V.A. Almazov there are developing a system for managing the fleet of ambulances in St. Petersburg. The objective of the project is to help dispatchers organize the most expeditious hospitalization.

    To solve this problem, the system takes into account the statistics of emergency calls, data on the mobility of the population during the day, data on the load of transport networks and reception departments of hospitals. As a result, the solution allows, firstly, to optimize ambulance routes, and secondly, helps to formulate recommendations for improving the regulations of ambulance stations.

    Development will develop in two directions: on the one hand, the decision support system will accumulate information about an increasing number of diseases. On the other hand, the project will be supplemented by a solution for the automation of medical documentation.

    Computational Biomedicine

    By the way, predictive modeling and working with big data in medicine is not just a special case of technology application, but an independent scientific direction. At ITMO University, specialists in it are trained at the Department of High Performance Computing within the framework of the master's program “Computational Biomedicine” .

    Undergraduates study the methods, algorithms and technologies used in bioinformatics, genomic and epidemiological studies, when creating drugs. In addition, the training course includes the study of models of physiological processes in the human body, as well as the processes of health care institutions and other basic knowledge that allows an IT specialist and an analyst to speak with doctors, biologists, and chemists in the same language.


    Speaking of chemistry: another area for working with big data in the medical sector is biological and chemical research, and the related discipline is chemoinformatics. When creating a new compound, for example, for a medicinal product, it is necessary to conduct a lot of experiments and tests. Chemoinformatics allows to accelerate this process due to its modeling on the basis of modern databases and machine learning algorithms.

    By the way, the very development of this discipline and especially the use of big data have seriously changed the medical industry as a whole. The need for extra-large data sets leads pharmaceutical companies to join forcesand work together with independent scientific and research centers - even ten years ago, such a practice seemed unlikely.

    The “side effect” of working with big data in this area is the possibilityaccumulation of sufficient amounts of information to study the so-called forgotten diseases - diseases that are common among the poorest and most marginalized groups of people living mainly in Asia, Africa and Latin America. The development of drugs and the study of these diseases are considered economically disadvantageous (for pharmaceutical companies). However, access to big data and, in particular, the emergence of open databases of chemical compounds and reactions can significantly reduce the cost of the process and give groups of enthusiasts the opportunity to independently work on solving such problems, at least without the initial support of large pharmaceutical corporations.

    At ITMO University, you can study this area and work on your own project as part of the master's program “Chemoinformatics and Molecular Modeling”, which is being conducted jointly with the University of Strasbourg. Future masters learn to use (and develop) methods for constructing and analyzing databases of chemical compounds and reactions in order to predict their chemical and biological properties, predict the course of reactions and solve the problems of finding new drugs.

    Also popular now: