Data mining makes scientific discoveries



    An interesting article was published in the New Scientist magazine on how data mining is used to analyze a large amount of scientific information. The goal is to search for valuable information in disparate scientific articles. People are probably not able to detect these patterns on their own, without automatic processing. This is not surprising, because the volume of published scientific documents on the Internet in English alone has already exceeded 100 million documents . This is a huge informational noise, from which it is practically impossible to extract useful information. That is, it is impossible to extract by the human mind.

    It is clear that without data mining in modern science it is impossible. For example, petabytes of information from the Large Hadron Collider are processed for months / years to determine the presence or absence of effects assumed by a particular theory. But here we are talking about a more “subtle” analysis of scientific results from different authors to search for hidden patterns, coincidences.

    For example, a California-based supercomputer called KnIT is constantly working on such tasks. He analyzes 50,000 scientific articles per hour. Let's say he specifically analyzed all the information related to a protein called p53 and looked for all the data about the enzymes that interact with it, they are called kinases.

    Protein p53 is very important and is considered a "guardian of the genome", it inhibits the occurrence of cancerous tumors in the body. The supercomputer searched scientific articles for all references that might indicate the presence of new undiscovered kinases for the p53 protein. As a test task, he analyzed scientific works until 2003 - and found 7 kinases that were really discovered over the next 10 years. That is, the system has confirmed that it can make real scientific discoveries. In addition, she found 2 more kinases, still unknown to science. Initial laboratory experiments confirmed the validity of the assumptions made by the supercomputer (although a group of scientists want to repeat the experiments to guarantee).

    KnIT developers from IBM and Baylor College of Medicine recently presented a talk on this subject at the New York Knowledge Discovery and Data Mining Conference. Their main point is that human scientists are better suited to generating new information, while computers are better suited to analyzing all this huge generated data array.

    Of course, KnIT is not the only development in this area where active research is underway. For example, the authors of the Manchester system Eve claim that she has already found a new cure for malaria. The program did not study scientific work, but itself emulated experiments in this area, trying different types of drugs.

    Also popular now: