shulyndina April 2, 2019 at 12:48

Plowed field bigdata in medicine and pharmaceuticals

Yandex Technology Distribution Director, programming promoter, one of the creators and permanent host of the Radio-T podcast, Grigory Bakunov, spoke at last year’s DUMP conference about what fundamental changes are happening in medicine and pharmacology right now, what practical problems science is facing and what medicine looks like of the future.

Under the cut video report and its text version.

Hello! Briefly about sponsors. Not so long ago, I was invited to participate in a conference related to medicine and technology in it. They say: "We need a short lecture for 15 minutes." Before the performance, they stop me for a second, they say, a short announcement. It turns out a man who says: “A great medical conference, very cool! My sea water is literally sold at a neighboring stand, it cures all diseases, the effectiveness is 150% higher than a usual medicine, be sure to come. " I look and think, Lord, if such a person came out at a development conference, he would be stoned. And the doctors are sitting and either they are normal, or they are used to, I do not know.

What am I doing all this for? When I was preparing for this presentation, to be honest, I thought that I would talk about something else. But about a week and a half ago it dawned on me that I did not want to tell a typical bulletin about machine learning in science. Not in the sense that it is bullshit, but in the sense that, on average, if you are interested in this, you already know about it.

If you don’t know about it, then you are simply not interested.

What do I want to tell you about? I want to talk about what practical problems science is facing now, so that it is understandable to you, programmers, and moreover, even pleasant.

Forgive me for the illustrations. I always have such illustrations that you may not even look at them, but I will be pleased if you at least sometimes smile.

The main message, with which you should probably start, is this. When three years ago I started to deal with a topic related to medicine, health, pharmaceuticals, and the use of algorithmic methods, I went to several large institutes and met many smart people there, and each time I asked them the same question. How can I get people who are engaged in medicine and pharmacy, medicine in the first place, into a dialogue? How to make them talk? They say: well, they must be provoked.

And I began to come to the conference with this slogan.

It always looked like this. You go into the hall, say: "Hello, dear doctors and dear scientists, I must tell you this: medicine is not a science."

Of course, this is not entirely true. I have a simple definition of why medicine is not a science. Because modern medicine looks like this: until you graduate from a medical institute, you do not practice medicine, you know nothing in medicine. You need to not just finish. There is an internship, a complicated procedure, you have been studying for almost 9 years, and only starting from this, you begin to be considered a beginning doctor. There is a special esoteric language that only doctors speak. And sometimes I get the feeling that they have their own written language.

In this case, at first you just learn, gain knowledge, then they give you a teacher, whom you follow and repeat what he does. And only then they give you a white coat, a hat, a stethoscope (which, as you know, doctors no longer use, this is pure paraphernalia) and they say: that's it, now you are a doctor.

Think for a second, does this remind you of anything? For many years you were first taught, hardly accepted through the exam, then you go after the teacher and repeat everything you need after him. And after some time you yourself become a teacher.

The one-to-one structure repeats the structure of secret orders of the 12-14th century. One to one. Those who played Assassin's creed should probably remember this story. One to one - a secret order.

In this case, you need to know this. The secret order has a task - not to create new knowledge, not to increase the old, but simply to preserve the knowledge of the ancients. Because of this, medicine has been slowing down for many years. Thank goodness it's over. In my opinion, it ended just now, and ended not thanks to medicine and doctors, but because humanity began to accumulate data.

These data, which we have accumulated, often began to contradict medicine. And they began to contradict strongly. Specifically, it’s hard to contradict.

Most of the major and important changes in medicine that have occurred over the past 20-30 years are associated exclusively with data.

Moreover, medicine, despite the fact that, in my opinion, from the 21st century began to be scientific, it has one big problem.

There is no hard definition of what science is. But there are a number of important scientific techniques. It seems to me that the most important of them is that if you do science, you constantly conduct experiments, you tell other people about them, and other people should be able to reproduce your experiment.

The key point of science in the modern world is the reproducibility of the experiment. Moreover, reproducibility in many ways. You can repeat the experiment that I did. Another person can repeat the experiment that you did.

And now, what is more important. Someone repeats your experiments all the time. Without this, there is no science, no verification.

When we came to this topic (there are several enthusiasts involved in this topic), the first thing we discovered: most of the people who work with data around science do not know anything about how this works in the normal world of programmers.

I believe that this is one of the most successful experiments that we did, we started working with pharmacy and cell biology, we started a culture of the experiment. Each experiment and the results of each experiment we designed in the form of an existing test. A finished written test in Python. Each experiment was designed in this way.

The data of each experimental action, that is, for example, the use of a drug on a protein or the use of a drug on a cell, was a test run. And that’s what’s important. All these tests ran in parallel, all the time, non-stop. This is a classic pattern called Continuous Integration.

When we started talking about it with scientists, they started talking about it: “Well, this is incredibly difficult. To do this, you need to write some software. ” It turned out that most of the software that programmers have been using for years for all of these things, like some Travis, which we have used for many years, some Jenkins, which we have used for many years, is one-on-one for scientists as well.

If you plug in your head and start thinking, then an experiment is a code. The same classic regression stories work. For example, if you suddenly at some point decide that changes are required in your scientific experiment, let's run all the old tests on a new experiment and verify that they work.

Classical regression testing has not gone anywhere. Scientists were shocked because they found that if the experiments were carried out in the old way and in a new way, the difference in experimental measurements was up to 20%.

What is 20% in the farm? Well, it would seem that pharmacy has long been accustomed to mistakes. Well, they released an unsuccessful drug, a year later they paid someone, this drug didn’t start working for someone. In reality, errors detected in the pharmacy at later stages often lead to company closures. Because if you revealed a complex side effect 4-5 years after the launch of the drug and, by your own stupidity, sold it, for example, in the United States or any other civilized market, the number of lawsuits against you, as a company, will amount to tens and hundreds, each of which will consist of tens of millions of dollars. You just spend more on lawyers.

The introduction of regression tests in this environment allowed in many situations to reduce the cost of errors by 20-30%. What is 20-30% of the total flow of a fairly large pharmaceutical company with which I interacted with this? Well, it's like 4-5 billion dollars. According to them, the money is small. For my taste, for introducing one small tool, the money is direct good.

The same story is one in one versioning and the approach to the experiment as such. Starting from the moment when you begin to think about the experiment and about the scientific action, as about the code, you immediately begin to think that you need to put it all somewhere. It turned out that most of the scientists with whom I am working now look with enthusiasm at Github and say: “Well, what was it possible?”

People who have been working with Github and gith for a long time understand that here you launched a new test, here Travis connected, which took all this, pumped it, and drove out new tests. By the way, it looks very beautiful! Travis twitches, a mechanized hand moves, which begins to shove old drugs into pipettes. Incredible picture!

In fact, the most important thing in the story of “let's look at the tests as code” is that versioning has appeared. In a different way, they began to work with hypotheses. Not like “somewhere we seem to be mistaken”, but “let's take a git, make a bisect, find in which piece of code we have a mistake, in which test we made a mistake, at what point we stopped”.

I do not know about you, but these stories excite me greatly. When I start thinking about it, I think, God, well, the stock of tools that programmers created was incredibly large. He is just gigantic.

And God bless him, with pure versioning within the framework of Github. First of all, tests are code. If we describe experiments and hypotheses as code, we have great tools for static analysis. We have great code analysis tools. Let's look for logical errors without even starting an experiment? Let's merge all tests into one big algorithm and look for logical errors in them? No problem.

Here you need to understand that in the farm such Continuous Integration is a rather expensive process, because each test costs some money. Carrying out one CI cycle in the current story with a large farm company with which I work, takes about 80 thousand dollars. Let's translate in another way. If we can make a logical mistake in the experiment before testing - saving instant 80 thousand dollars.

Programmers are well aware: linter and static analysis can be run before the commit. Just do not let the hypotheses that were initially erroneous be tested. Or to say that the error is not in the hypothesis that you now want to add. And this also happens.

And at this moment, too, a very important thing comes.

When one person works on a chain of experiments, no problem. It is as if one programmer writes the code - no problems, put in a folder on Samba or in Dropbox, and no problems, all is well. At that moment, when there are two programmers, conflicts are already beginning. When the programmers are 50 people, and they all work on approximately one piece of code, read - on one set of tests, of course, problems arise. Here, an incredible revelry of creativity for the use of off-the-shelf tools that programmers have developed over the past decades.

At the same time, I vote for Github with both hands. I sincerely believe that using Github beyond just storing code is simply unbelievable. Despite the fact that, of course, I am not a representative of the Github company in any place.

The emergence of tools for collective work on experiments in combination with versioning made it possible to do very interesting things. For example, the guys I work with began to pull-request each other with offers. Just because, well, he went to see how things are with the other team, discovered an interesting hypothesis, and instead of just throwing it in a smoking room, as is customary for guys involved in biology and physics, he did everything simply, he designed a pull request, put it down. On the other side, the guys said: “Oh, cool idea”, they froze it, and after some time we saw a new test with a new experiment in the database.

Unfortunately, due to the fact that most of the relationships between technical and pharmaceutical companies are not very public, we cannot tell everything. I can say that I know at least one drug that started with pull request three years ago and is now receiving FDA certification.

FDA certification means that in a year this drug may appear in pharmacies. Not ours yet.

Unfortunately, this change in the minds of young scientists is still very difficult to overestimate. This is a transition from closed development, as it has been accepted for many years as part of small research teams, to open procedures. I’m sure that 3-4 years will pass, and you will see small research laboratories that everyone keeps on Github and who are ready to accept pull requests from people outside. And it will be just a bomb. It’s just a different world where each person can one way or another participate in normal scientific activity.

Why is it important? Therefore, why open source as such is important. No, I'm not saying right now that open source is the coolest software in the world, no. Moreover, it seems to me that this belongs to the catch phrase of fifteen years ago with the signature "Shine and poverty of open source." But without open source, there would not be a huge amount of things that we use every day. Half of Android. Without open source, there would be no Android.

The same story is happening now with drugs and it will be cool, it will be incredibly cool when we find ourselves in this world.

Here, of course, everything is not so fast. But there is an area in which this current approach of ours is probably the easiest to apply.

There is an interesting approach that says that it’s possible to begin with, in order not to change your entire structure, not to force you to rewrite everything, to begin to do the digitization of the results of the experiments that you are already conducting. And turn them, for example, into a set of simply text files. And then use ready-made tools for working with logs.

To make you understand, I have an incredible story. I am delighted to tell her every time. When the results of scientific experiments are crammed into Kibana and in ClickHouse — ready-made databases containing usually a large number of logs, different tests, measurements, experiments are carried out on them, and, among other things, standard algorithms for “anomaly detection” are used. What is it called in Russian? In Russian, “anomaly detection” is called “search for frustrations”. I myself am shocked by the word, but I like it so much.

The search for frustrations, as it turned out, is incredibly good when applied to experimental science. The coolest place where it is now used - Yandex has an interaction with CERN. Within CERN, there are several large experiments at the Large Hadron Collider. The smallest of these is called LHCb, in which billions of particle collisions occur. The results of each of these collisions are recorded in the database.

After that, a ready-made set of algorithms is run that finds anomalies there. Objects and events that do not fit into the idea of beauty. I can’t say that big discoveries are made there now, but if some discovery was made there as part of this experiment, it will be made exclusively thanks to this IT approach to a seemingly classical area such as particle collision analysis.

These, of course, are fundamental changes in science. And in science any. Returning back to the topic of pharmacy, medicine and biology, I want to say that in fact, the more scientific science is, the more difficult it is to apply programmatic approaches in it.

Because nevertheless, for example, in physics, a very long time ago a different culture of the experiment. They got used to mathematical methods and mathematical approaches. But in pharmaceuticals, medicine and biology, no. Therefore, when you tell them that there are means of collective work, and one part of the experiment can be performed on one part of the continent, and the other on the other, there is a system that allows this to be combined. And more than that: even if you have one person write one and the other another, you can somehow unite this conflict. There is a system that allows you to automatically constantly conduct the experiments that you add and say that some of them did not happen or something happened. Doctors who interact with experimental medicine light up their eyes.

When you do this, you have a feeling (I hope that it is not false) that you are changing the world. It is possible, after 20-30 years, because you just taught pharmacists how to use Travis, people will die less.

The whole story has another sad side. There are very few people who, like me, are trying to bring IT working methods, methods and methodologies to other areas outside of IT. I came here in order to tell you this whole story, largely because, perhaps, you can convey to the scientists, specialists, lawyers, anyone, those endless possibilities that our tools already have.

For a second, the whole story about pharmacy, biology, and physics was pushed back. Imagine for a second that you are working with a law firm. Do you understand that most of the modern contracts can be written in algorithmic language? Do you understand that modern codes of laws are libraries for these treaties? Do you understand that the constitution is the operating system for these contracts? Do you understand that the methods of static analysis, if all this is converted into an algorithmic language, will find defects, errors and problems in this legislation is much more effective than any professional lawyer?

I have been working in IT for many years, I think that I’m good at getting into the deadlines for any task. So, to digitize the whole story, digitize all the legislation, bring it into digital form, you need a good programmer, a good lawyer and a year and a half, probably. Here is the concept of a startup, if you want, take it.

In fact, we are close to ending. By and large, this approach, called "take IT tools and bring them to the rest of the world," is a little messianic. Like, we have a religion, it’s called, now the word “agile” is already dirty, let's take some other word? Let's just “Teamwork Tools.”

To bring automated work tools to any other specialty is a mission that allows people to save hours of life, and sometimes just human lives. That is why I am now doing this so actively.

That is all I wanted to talk about.

You can find me like this, it's me.

I am ready to answer your questions. Before we move on, I want to say that I always worry in front of an audience like here. You are all very different. And there are also many people from Yekaterinburg, I myself am from here, and I know that it is not very customary for us to smile here. Thank you for one of you smiling. It was great, thanks.

On the one hand, I heard the word Python, on the other hand I heard "static analysis", "the price of error is high." Why then Python, and not some Haskell?

I would be very much for Haskell, the only problem is that it was easier for us to go through Python, stupidly because they already had a certain amount of Python code written, due to the fact that in some places we used the machine learning that we wrote, which, of course, was banned in Python. With Haskell, it was easier for mathematicians, with Python, it was easier for biologists and pharmacists.

I am a developer, and my wife is an infectious disease doctor. And her friend works in the laboratory. Purely by chance, I'm not lying :) And I'm from Siberia, from Novosibirsk. Sometimes I tell my wife something technical, and she says: “What are you talking about? We have a completely different world. ” Seriously, they just have a computer problem. My question is this. When all this magic, all this steepness that you are talking about, all this medicine in open source, will come to real examples in everyday operation in the outback, in Siberia, generally throughout the country? When will this become a steep trend?

Medicine until the end of the 20th century developed at a very slow pace. What example would you give? Everyone knows that mankind has learned to treat stomach ulcers. Suddenly it turned out that most cases of stomach ulcers are caused by a specific bacterium, a bacterium called Helicobacter pylori, they found a way to deal with it, everything is elementary, everything is great. Someone even received the Nobel Prize. But if you read the details, you will find out that before that it was discovered in animal husbandry and learned how to treat it 60 years before. 60 years people have been dying.

Now the decision-making cycle for transferring data from one science to another has decreased to 10 years. Now, if there are points between Novosibirsk and Khabarovsk places where people do science, but they don’t even have a computer, then after 10 years everything will change. Mark my word. More than 10 years will not pass.

However, there are areas in which there is simply no application of science. Most of the doctors with whom I now interact and who want to engage in active science and take part in experiments, but who live outside of central cities, use their own personal mobile phones as computers. More than enough. One even programs right on the phone.

“Medicine is not a science,” good. In other speeches, about a year ago, you could hear a phrase from you that only 24% of diagnoses are definitely put. And what to do with this? What are the possible solutions?

A short announcement for those who have not heard this story. There is an official figure of WHO — the World Health Organization — that on average in the world, if you go to a doctor and the doctor puts you a hypothesis of diagnosis, then the probability that it is correct is 24%. That is, not even 50, not even a coin toss. 24%

What to do about it? Here's what to do with it. Save as much data as possible. In fact, the problem is not in the doctor. And the fact that for those, according to Russian standards, either 6, or 9 minutes that the doctor talks to you, along with filling out the card, the data set that the doctor can learn about you to analyze it is negligible. But if you learn to do this automatically, then the amount of data will be incredibly large.

I love to tell this incredible story, it happened to me in the year before last. I am sitting in a movie theater, I have an Apple Watch. My cardiologist, with whom I work, receives data from my Apple Watch. At one point he calls me. I carefully ask: "Yes, what is it?" He says: “Listen, are you okay? I just see that you have a pulse of 160, and you're not in the gym. "

I need it. This is what the medicine of the future looks like. And this approach is not like “I came to the doctor, complained, and he began to diagnose me”, but “the doctor, looking at my organic indicators, said that something was wrong, maybe some action should be taken” , it allows you to radically change this figure. I think that within 20-30 years we will raise, roughly speaking, the diagnostic coefficient to 50%. I may not survive, but you will survive.

First question. What are some examples of areas not related to IT, where there is a real need for distributed collaboration? And the second question. How to get rid of the idea that the picture of the beautiful future of Russia that you draw is not Russia?

Half of the examples I talked about are from Russia. We have made progress in many technical areas, including scientific ones, recently. And because of this, we need to change less. There are many places where you don’t need to fix the established order 50 years ago, just come in and offer at least some order.

As for places where there is a need for collective work. Please do not forget that if the cabmen were asked at the beginning of the 20th century how you imagine the car and what you want from it, they would say that we want a big cart behind the horse to carry feces. This would be a major innovation.

In the sense that you do not expect scientists to enthusiastically respond to your suggestions. There will be some kind of pressure first. You come and say: it seems to me that it is not bad to do this in your particular method, in this particular place. “This one” is, for example, a collective work on one article or on one test. Do not expect delight. Fortunately, after two or three iterations of interaction, they will realize what happiness is, and before that there will be rejection.

It is very interesting which tests are being conducted. Do I understand correctly that the company has a specific set of pharmaceutical tests for certain products? How to introduce new tests there?

No way right now.

For example, we test for allergies, this kind of tests?

There are affiliate tests. Let’s say for simplicity that, for example, there is a set of biological materials that are known to be affected by such an allergen, and there is an automatic farm that introduces a set of drugs and checks that the reaction to these drugs after the administration of another drug is not has changed. Or has changed for the better. Or for the worse. That is, just regular measurements are taken.

It turns out that the whole system of elementary management does it automate and collect data?

Automate data collection and process continuity.

That is, it is not connected with biology itself?

It is not associated with a change in science itself, but you see, the emergence of such an unobvious technology as writing, writing letters, radically changed science in the future. The same story happens here. The emergence of new tools pretty much changes the science itself. It just happens through a step.

Is it a private pharmaceutical company or is it somehow supported by the state?

We in the world do not seem to have state-owned pharmaceutical companies. My experience is based on interaction with two large pharmaceutical companies. One of them is world-class, with German roots.

I have a simple question. Where to get the data if you are not Yandex?

It seems to me that data should be taken from partner companies, just like Yandex does, because in reality no one in the world has enough data in one source to move science.

Science is always something that is formed at the intersection of a large amount of data with different owners.

With medical data, as I understand it, everything is much more complicated ...

It's the other way around. With medical data in recent years, everything is much simpler, because this data is anonymized either in large quantities, but, unfortunately, not in Russian. Well, here you need to do something with them: translate, somehow interact. Or these data are obtained directly from patients. With each of which you need to sign a piece of paper that he agrees to the transfer of data. And that’s all.

Modern science is the lot of the rich. My whole story about open source is also about the fact that, perhaps, this will allow a large number of young and moneyless scientists who do not belong to any large sect and do not work with any large pharmaceutical company to create something new collectively.

But how did you even think of combining two things that weren’t very connected at first glance - IT and medicine? These are technologies that do not intersect in the minds of most people. What were the first steps you took to follow the path you chose?

As you probably see in my appearance, I not only sleep little, I am also not a very healthy person. And when I started to deal with the topic of medicine, pharmacy and all that, I was just trying to solve the problems that the guys who are involved in specific research have. And the only way to solve that I have in my hands was this.

You know, there was such a great philosopher and psychologist named Maslow, who quite accurately formulated. Translated neatly into Russian, he said this: “When you have a hammer in your hands, it’s hard to resist the temptation to consider everything around as nails.” I had a github in my hands, it was difficult not to consider everything around as a code. And so it happened.

Why didn’t you think of crossing different spheres of activity with IT before? And a little demagogy to reconnoiter. If there is a sharp crossbreeding of various fields of activity with IT, and we will crossbreed lawyers and so on, then in fact many people who are currently occupying their jobs will leave.

It’s awesome.

The question is different: what to do with them?

There are many different hypotheses. I’ve read a wonderful book, wherever the solent, which was made from people, as you remember. In the 90s, a bioreactor was discussed a lot in our country. Seriously, I don’t have an answer to this question. The most interesting thing, I do not think that IT people should decide what to do with these people.

I have an idol man, unfortunately, he died, but he had a brilliant phrase. He somehow at one meeting where I was at, at which the programmers swore a lot, went out, wrote two lines. Line one: nothing will work. Point number two: progress cannot be stopped. And with this thought that everything will be necessarily bad, but progress cannot be stopped, I live.

Yes, a large number of people as a result of technological progress will lose their jobs. But progress is not necessary to stop because of this. Mankind will find some way out. Unconditional income, compulsory treatment by programming people who have lost their jobs.

I did not quite understand. What exactly is described in medicine by tests on Python?

In this particular case, for example, we used tests to describe the input and result data of an allergen test. That is, what dosage of the drug is introduced into the cell, and what result on the lumen we received as a result.

That is, through Continuous Integration do you have some kind of physical machine running?

Physical Laboratory, yes. Here you need to understand that on average they already have physical laboratories. It’s just that the tests in them are described not in this form, but in the form of a large program in which it is written: let's put up such and such a set of characteristics with our hands and check at the output that the result is such and such.

This year, the DUMP conference will be held April 19 in Yekaterinburg. Traditionally there will be a Science section. This year's program: Oleg Bartunov (Moscow State University, Postgres Professional), Peter Fedichev (Moscow Institute of Physics and Technology, Gero), Pavel Skripnichenko (UrFU, KantrSkrip), Gennady Shtekh (Naumen), Igor Mamay (Kontur), Vladislav Blinov and Valery Baranova (Tinkoff. ru), Tatyana Zobnina (Naumen). The full program is on the conference website .

Tags:

Plowed field bigdata in medicine and pharmaceuticals

Also popular now: