The scientific work is outdated; what awaits us further

Original author: James Somers
  • Transfer
Scientific work - in its modern form - has become one of the inventions that allowed progress to progress. Before its form was developed in the 17th century, the results of the work were transmitted privately in letters, in an ephemeral way in lectures, or all in a crowd - in books. There were no places for public discussion of gradual progresses. Leaving on their pages a place to describe individual experiments or small technical advances, journals created chaos from the growing science. Since then, scientists began to resemble social insects: they are constantly moving progress forward, with a buzz similar to a flock of bees.

The earliest of the works were in some sense more readable than today's ones. They were less specialized, more rectilinear, shorter and less formal. Mathematical analysis was invented only shortly before. The entire set of data on the studied topic could fit in the plate on one page. All calculations related to the results were carried out by hand, and they could also be checked.

The more difficult science becomes, the harder it is to report its results. Today's works are longer than ever, and are full of jargon and various symbolic symbols. They depend on a set of computer programs that produce data, erase data, build graphs, process statistical models. And these programs are sometimes so carelessly written and so focused on the result that it also contributes to the crisis of repeatability - that is, the work does not cope with its main task: to report about the discovery made simply enough that someone else can also do it .

Perhaps you should blame the habit of the paper on which you print the work. Scientific methods evolve with the speed of software; most of all, physicists, biologists, chemists, geologists, and even anthropologists and psychologists are required to master programming languages ​​and packages of datalogical programs. And at the same time, the main method of communicating scientific results has not changed over the past 400 years. Of course, the work can be spread on the Internet - but it is still text and images located on the page.

What would happen if we today developed a standard of scientific work from scratch? I recently talked with Bret Victor, a researcher working at Apple on early prototypes of the user interface for the iPad, and now managing his own lab in Auckland, California, studying the future of computing systems. Victor has long believed that scientists still do not enjoy all the advantages of a computer. “The situation is not very different from the printing press and the evolution of books,” he said. After Guttenberg, printing presses were mainly used to reproduce bible calligraphy. Almost 100 years of technical and conceptual improvements were required in order to invent a modern book. “There was a whole period during which people had a new printing technology, and they used it to reproduce old media.”

Victor showed what can be achieved when he rewrote a journal article written by Dunakn Watts and Stephen Strogatz, “The collective dynamics of networks of small worlds”. He chose it because it is one of the most frequently cited works in all of science, and because it is a model for a clear presentation of information. (Strogats is best known as the author of the column “Elements of Mathematics” in The New York Times).

The work of Watts-Strohats described key discoveries like most others - with text, pictures, mathematical symbols. And, like most works, these discoveries are very difficult to digest, despite the clear description. The most difficult jobs were those that described the procedures or algorithms, since the reader had to take on the “role of the computer,” as Victor said, to try to keep the picture of what was going on in his mind, following the steps of the algorithm.

After Victor’s reworking, the explanatory text was interspersed with interactive diagrams that illustrated each step. In this version it was possible to trace the operation of the algorithm by example. You could even control him.



Strohats admired option Victor. Later he told me that he was very sorry that in mathematics for more than one hundred years, it has been a tradition to write works as strictly and formally as possible, often even omitting the very visual cues that mathematicians use to make their discoveries.

Stogaz studies nonlinear dynamics and chaos, systems prone to synchronization or self-organization: blinking fireflies, the ticking of metronomes, electrical impulses of heart cells. The key is that such systems operate cyclically, and Strohats visualizes this through circular points: when a point returns to the starting point, it is a blinking glowworm or a heart cell trigger. “For almost 25 years, I have been doing small computer animations of dots running in a circle, with colors indicating their frequency,” he said. “The red ones are slow guys, the violet ones are fast ... All these points are spinning on my computer, I have been doing this all day,” he said. I catch patterns in the color dots running across the screen much better than in the 500 time series. In a similar way, I will see little, because in fact it does not look like that at all.

The programs are dynamic media, but paper is not. In this sense, it seems strange that such studies, like those of Strogats, devoted to dynamical systems, so often spread out on paper, having no advantages in the form of circling dots — since it was such points that helped him to see what he saw and could help to see this and the reader.

This is the whole problem of scientific communication: today, scientific results are very often found with the help of computers. Ideas are complex, dynamic, they are not easy to cover the inner eye. And at the same time, the most popular tool for sharing results is PDF - literally a simulation of a piece of paper. Perhaps we can come up with something better.

Stephen Wolfram published his first scientific paper at the age of 15 years. By the end of his studies at the institute, he had already published 10 papers, and by the age of 20, in 1980, he had already completed his doctorate in particle physics at the California Institute of Technology. His super-weapon was the active use of computers in those times when the most serious scientists considered computational work to be inferior. “By that time, I probably used computer algebra the most in the world,” he said in an interview. It was very convenient, I could just carry out all the calculations on the computer. I had a good time, placing especially ornate formulas in my scientific works. ”

With the growth of the ambition of his research, he increasingly brought the existing software to the limits of possibilities. For one project, he had to use half a dozen different software tools. “I spent a lot of time tying it all together,” he said. “And I decided that I should try to create a unified system that would do everything I need - one that could grow forever.” And instead of continuing academic activities, Wolfram decided to create Wolfram Research, and make an ideal computing environment for scientists. The heading in Forbes on April 18, 1988 was: “Physics Whiz Goes Into Biz” [The wizard-scientist hit the business].

In the center of the Mathematica system, as the company called its main product, there is a “notepad” in which you write commands on one line and see the results on another. Write "1/6 + 2/5" and he will give you "17/30". Ask him to multiply the polynomials, and he will submit. Mathematica is capable of mathematical analysis, number theory, geometry, algebra. It has functions for counting chemical reactions and filtering genetic data. Her database has all the pictures of Rembrandt, and she can give you a scatter diagram of his palette over time. The models of orbital mechanics are built into it, and it can calculate how far the F / A-18 Hornet can plan if its engines turn off at an altitude of 10,000 km. The notepad in Mathematica is not just a record of the user's calculations, but a transcript of his conversation with the all-knowing oracle.

The notebook interface was the brainchild of Theodore Gray, inspired by the work with the old code editor for Apple. Most programming environments allow you to execute code line by line or all at once. Apple editor let you select any part of the code and execute it only. Gray brought these basic concepts to Mathematica, and none other than Steve Jobs himself helped to improve the design. Notepad is designed to turn scientific programming into an interactive exercise, in which individual teams can be corrected and restarted tens or hundreds of times, learning from the results of computational experiments, which allows one to come to a deeper understanding of the data.

Especially well notebook copes with its tasks due to the ability to draw graphics, images and beautiful mathematical formulas, despite the fact that all this dynamically reacts to changes in the code. In Mathematica, you can enter a voice recording, apply complex mathematical filters to audio recordings, and visualize the final sound wave. Dragging the parameters with the mouse, you can change its appearance and see which filters are best suited when playing with them. The ability of a package to easily handle so many different computational tasks in one simple interface is the result of “literally human-ages of work,” as Gray says.

The vision underlying the work was repeated many times by Wolfram in his lectures, blog entries, presentations, and press releases. Do not just make good software, but create an inflection point in the very occupation of science. In the middle of the 17th century, Gottfried Leibniz developed a system for recording integrals and derivatives (acquaintances ∫ and dx / dt), which made the complex ideas of mathematical analysis mechanical. Leibniz believed that similar symbols in a wider application could create an “algebra of thoughts”. Since then, logics and linguists have been dreaming of a universal language that can eliminate ambiguity and turn the solution of complex problems into a kind of mathematical analysis.

Wolfram’s career is constantly trying to incorporate all of the world's knowledge into Mathematica, and later to make it available through Wolfram Alpha, the company's “engine of computational knowledge”, behind many of the possibilities to answer questions from electronic assistants like Siri and Alexa. This is Wolfram's attempt to create Interlingua, a programming language that is equally understandable to both humans and machines — algebra of everything.

The task is characteristically ambitious. In the 1990s, Wolfram sometimes teased the public with comments that in the process of creating his company he was working on a revolutionary research project. The wait was on the rise. Finally, the project arrived: a huge book, as thick as a cinder block, and almost as heavy, with the everlasting title: " Science of a New Type ."

This turned out to be a detailed study conducted with the help of Mathematica notebooks, surprisingly complex patterns created by the simplest computing processes - cellular automata. The study was conducted as a simple study for the sake of research, and in order to understand how simple rules can produce complex phenomena of nature — for example, a tornado or a clam shell pattern. These studies, published by Wolfram without independent editing, were accompanied by constant reminders of their importance.

The more you run into Wolfram, the more it resembles his style. In an article about him from 1988, Forbes tried to get to the roots of this phenomenon: “As Harry Wolfe, the former director of the prestigious Advanced Research Institute (in Princeton), where Wolfram was one of the youngest senior researchers at 23, said, he had "Cultivated difficulties in the character, supported by a sense of loneliness, isolation and uniqueness."

When one of Wolfram's assistants announced a significant mathematical discovery at the conference, which was a key part of the “Science of a new type”, Wolfram threatened to condemn it in case of publication of the work. "In no serious research group, the junior researcher will not be allowed to talk about what the elder is doing," he said at the time. Other scientists criticized Wolfram’s massive book for being based on other works, but did not mention them. "He hints that he is the author of the main ideas that have been the central idea of ​​the theory of complex systems over the past 20 years," one of the researchers told Times Higher Education magazine in 2002.

The self-praise of Wolfram seems all the more surprising since it is completely optional. His achievements speak for themselves - if he would allow them to do it. Mathematica achieved success almost immediately after launch. Users have long been waiting for such a product; at universities, the program has become as prevalent as Microsoft Word. Wolfram also used a steady income to hire additional engineers and experts in various industries, feeding more and more information to his insatiable program. Today Mathematica knows about the anatomy of the foot and the laws of physics, about the music, the systematics of coniferous trees and the main battles of the First World War. Wolfram himself helped to teach his program to the archaic Greek notation.

All this knowledge is “computable”. If you want, you can mark by x the locationof the Battle of the Somme , and through y, the daily precipitation in 1916 within a radius of 50 km from this place, and Mathematica will calculate if more deaths occurred during the battles of the First World War while it was raining.



"I noticed an interesting trend," wrote Wolfram in a blog post. - Choose any area X, from archeology to zoology. Then it will be associated with "computational X", which either already exists or is only being born. And this is considered the future of this field. "Wolfram argues that the better specialists in these areas will master computational methods, the more the field of what is being discovered will expand. The notepad in Mathematica can become an accelerator of science, since it can give rise to a new style of thinking." , - he says, - how the same transition occurs that took place in the 17th century, when people had the opportunity to read mathematical records. This becomes a form of communication that has a very important feature - the possibility new launch.

The idea is that this kind of “scientific work” can have the same dynamism that Strohats and Viktor wanted to have — interactive charts interspersed with text — with the added advantage that all the code that generates these charts and all the data will be available to the reader. who can consider them and play with them. “Honestly, when you write something so simple and understandable in the Wolfram language in a notebook, there is no place for deception. There is what is, and it works the way it works. There is no way to adjust the result, ”says Wolfram.

To write a paper in a Mathematica notebook means to reveal the results and methods of your work; and scientific work, and everything you did to write it. As a result, it will be easier for readers not only to understand it, but also to reproduce (or not reproduce). When millions of scientists around the world make their contribution to science gradually, the only way to turn all this work into something important will be to enable others to reliably build something based on these contributions. “This is what scientific work made in the form of computational essays can accomplish,” said Wolfram.

Wolfram says he is surprised that computational essays have not gained popularity. He recalls his work with Elsevier, a giant of scientific publications, in the early 1980s. “Elsevier hired me to consult about something like“ what the future of scientific publications will look like. ” It was before the appearance of Mathematica notebooks, but he pushed them to talk about from the same area. “A few years ago I again talked with someone from the company's management. And at that meeting, I understood - oh my God, I said the exact same thing 35 years ago! “

I talked to Theodore Gray, who left Wolfram Research in order to become a writer. He said that his work on the notepad in particular was motivated by his sensations, which had already been well formed by the 1990s, “that, obviously, all scientific communication and all technical works, which used any data or mathematics or modeling or graphics or Schemes or something like that, do not need to publish on paper. This was quite obvious by 1990, “he said.

“For the past 29 years, the fact that, with the exception of some people who understood this, the community as a whole has not taken this approach, is perceived with horror and surprise,” he said. “It’s literally impossible to calculate how much is lost, how much time is wasted, how many results are misunderstood or incorrectly presented.”

In early 2001, Fernando Perez realized that he was in roughly the same position as Wolfram 20 years ago. He was a graduate physicist who brought his tools to the limits of their capabilities. He used a whole bunch of systems, and Mathematica among them, and it seemed that the execution of each task required switching from one tool to another. He recalls that on his desk lay 6-7 different books on programming. He wanted to create a unified environment for scientific computing.

But instead of starting to open a company, he found two scientists, a German oceanographer and a computer science graduate student from Caltech, who were thinking in approximately the same direction. They all fell in love with Python, a general-purpose open-source programming language, and independently began to create tools for it to make it easier for scientists to work with the language: tools that simplify working with datasets and charting, encouraging more research styles of programming. .

Perez brought three projects into one and took control of it. From the very beginning, the IPython project (I meant interactive) had open source. The program was not just free, anyone could study its code and correct it by making a contribution to the common cause. This decision was made intentionally. “I was interested in the ethical aspect of the opportunity to share my work with others,” Perez, from Colombia himself, told me, where it was more difficult to get access to commercial programs, “so was the epistemological motivation.” He believed that if science needs to be open, then the tools used to work with it should be open. Commercial software, whose source code was legally unreadable, was the “antithesis to the idea of ​​science,” whose purpose is to open the black box of nature.

Therefore, Python was used. The basic version of the language is not as powerful as the Wolfram Language on which Mathematica works. But if Mathematica draws its capabilities from the work of the army of programmers, the backbone of Python is supported by a massive library of additional features - in image processing, creating music, building AI, analyzing language, building graphs - created by a community of people who contribute to open source for free. Python became the de facto standard for scientific computing, since open source developers such as Perez created useful tools for it; and developers of Python attracted because it was the de facto standard for scientific computing. Communities of programming languages, like any social network, thrive or die due to the power of these feedback loops.

The idea for the IPython notebook interface was taken from Mathematica. Perez admired the way Mathematica notebooks encouraged the research style. “It was possible to sketch something, because this is how you think about the task, this is how you understand it.” Computing notebooks “highlight the idea of ​​live narration. You can think of the process and use the computer effectively as if you want a thinking and computing partner. ”

Instead of developing a special, separate application, not to mention spending man-ages on it, the IPython team — Pereyz was joined by Brian Granger, a professor of physics from California Polytechnic University in San Luis Obispo and Min Reagan-Kelly, PhD from the University of California at Berkeley, working in the field of computational physics - made notepads in the form of simple web pages. The interface lacks the beauty of the works of Steve Jobs and their complexity. But, using the web, IPython received free add-ons: every time Google, Apple or a random programmer released a new graphing tool, or published an improved mathematical code, this improvement was attached to IPython. “It all paid off beautifully,” Perez said.

The work, which announced the first confirmed detection of gravitational waves, was published in the traditional way, in the form of a PDF, but complete with an IPython notebook . In Notepad, you can track all the work that generated all the graphs for the article. Anyone can run the code himself, correct it as he wants, play with calculations in order to better understand how they work. In a certain place of the notebook, the narration reaches the part in which the signal generated by gravitational waves turns into sound - and you can play it in the browser, hear what the scientists first heard, the gurgle from two colliding black holes.



“I think the scientific community has adopted this tool, and it is already considered universal,” says Theodore Gray about the Perez group. “But Mathematica has not yet reached such an acceptance.” 1.3 million such notebooks have already been posted on Github in public access. They are used in Google, Bloomberg and NASA; musicians, teachers and researchers of AI; and "in almost all countries of the earth."

Each time IPython chose a development path that included something else, and as a result, it was no longer called IPython. The project in 2014 was renamed Jupyter to emphasize that it already works not only with Python. Jupyter notebooksimilar to Mathematica notebook, only suitable for any programming language. You can make a notepad for Python, or C, or R, or Ruby, or JavaScript, or Julia. Anyone can create Jupyter support for their programming language. Today it is supported by more than 100 languages.

Theodore Gray, who designed the interface for the original Mathematica notebook, said that once, for the sake of experiment, he tried to make his support for other programming languages. “Nothing came of this,” he told me. - The company was not interested to support it. And if you need to support multiple languages, you can't do it as thoroughly. ” Eric Raymond's

essay of 1997, entitled " Cathedral and Bazaar"in a sense became the main document of the modern open source movement. It rejects the view that complex software must be built like a cathedral,“ carefully crafted by individual wizards or small teams of magicians working in isolation. ”Raymond’s experience as one of the managers of core development Linux taught him that "a huge bustling bazaar with different goals and approaches," defining projects with source code, is an advantage. "The fact that such a market style works, and works well, was sho com, ”he wrote. He tried to explain his essay why“ not only did the Linux world not shatter from misunderstanding, but it seemed to follow from one strong achievement to another with a speed that the cathedral builders can hardly imagine. ”

Mathematica has been in development long before Raymond’s Linux experience, and has been under development for many years. This is the quintessence of the cathedral, and its builders are still skeptical of the bazaar. “There's always room for chaos,” says Gray, about open source systems. - The number of moving parts is enormous, and different parts control different groups. You will never be able to bring them together into an integrated system, just as it is possible in a single commercial product, with a single, so to speak, maniac in its middle. ”

Maniac, of course, advocates Stephen Wolfram. Gray noted that under Mussolini, trains went on a schedule. “The analogy is bad,” he said, but still “I’m for having a maniac in the middle.” The Mathematica Notebook is a more cohesively developed, more polished product - for the most part because every decision has passed through the mind of one stubborn genius. “I saw these guys from Jupyter,” Wolfram told me, “and on average they are at the level of what we had in the 90s.” He says they cut corners. "And we are really trying to do everything right."

But the scientific community is hard to advertise commercial software. Although Wolfram Research has been distributing free notepad viewing software for years, and although most large universities have a license to allow students and teachers to use Mathematica freely, it may be too much to ask publishers to abandon the open PDF format in favor of a commercial product. “So far the situation is this: if you try to send a notebook from Mathematica to the journal, they will complain: we don’t have Mathematica, this is a very expensive program, give us something more standard.”

It doesn’t help that the fact that Wolfram - both the man and the company - persistently praises the superiority of the product, its necessity, so that even Gray compares it to the crossfit devotees who cannot be plugged. This is the same Stephen Wolfram, who called his book, dedicated to his work on cellular automata, “Science of a New Type”. In his post on computational science, he writes: "At the center of computational essays lies the idea of ​​expressing computational thoughts with the help of the Wolfram Language."



Perhaps this is so - perhaps computational notebooks can take root only if they are supported by a single superlanguage, or a company with deep pockets and is significantly interested in their work. But it is possible that the opposite will be true. Integrated efforts, albeit more chaotic, may turn out to be more reliable, as well as the only way to win the trust of the scientific community.

Tungsten doesn’t notice much outside of the Wolfram, and perhaps for this reason Mathematica’s notebook remains rather opaque, and his opponent, albeit secondary and simplified, but open, seems to conquer the world.

It will take some time before computing notebooks will replace PDF in scientific journals, since this will mean a change in the incentive structure of the science itself. Until the journals start demanding from scientists to send them notebooks, and until the free distribution of their work and data becomes a way to gain prestige or to receive funding, people will most likely do everything as before.

I talked to a neurobiologist who became a programmer and contributed to Jupyter, and he told me that the professor who ran the laboratory where he used to work was originally an electrophysiologist — measured the activity of neurons through implanted electrodes. “Getting such data is so expensive and expensive,” he said that no one will ever share it. "You collect one slice of data and can process it to the end of your career."

“At this stage, no sensible person will argue that the practice of scientific research is experiencing a shift,” wrote Perez, the creator of Jupyter, in a blog post from 2013. Science is increasingly using the computations and skills necessary to become a good scientist, are becoming increasingly attractive in the industry. Universities lose the best people organizing their start-ups, as well as going to Google and Microsoft. “In my eyes, many talented colleagues over the past decade have left the academic world, desperate,” he wrote, “and I cannot recall any of them who would not be happy with this many years later.”

Perez told me stories about scientists who donated academic careers for the sake of software development, since they didn’t give a damn about software development in their field of study. Creator matplotlib, probably the most frequently used tool for plotting in scientific papers, was a postdoc on neurobiology, but he had to leave the academic world for the sake of industry. The same thing happened with the creator of NumPy, the now popular tool for numerical calculations. Perez said: “I received unequivocal comments from many colleagues and senior comrades who said: Stop doing this, you are wasting your career and your talent”. Without embarrassment, they advised me to "return to physics, to mathematics, to writing articles."

But those who remain are making progress. Perez recently got a place in the statistical department at Berkeley. The day after our conversation, he had to teach senior students the science of data, a program built entirely on notebooks Python and Jupyter. “The version of this course for younger students attracted, in my opinion, 1200 students,” he said. - It was the fastest growing course in the history of the University of California at Berkeley. And this is all based on open source tools. ”

Those who seek to improve the practice of scientific studies also dream of improving their results. The recording of Leibniz, which made it easier to record the mathematical analysis, expanded the space of the imaginable. The greatest scientific challenges of today are often computational riddles: how to integrate billions of paired bases into genome data, and 10 times more data on proteomics, and historical data on patients, and the results of pharmacological studies in a coherent database of how someone got sick and what needs to be done to help him? How in practice to approach the endless stream of data on temperature and precipitation, oceanography, volcanic and seismic activity? How to create and understand neural connections maps of the thinking brain? If scientists are provided with computing notebooks or some more advanced versions of them, it may help to raise their mind to the level of problems that are currently inaccessible.

At one point, Perez told me that the Jupyter project was honoring Galileo - perhaps the first scientist in the modern sense. The Jupyter logo is an abstract version of the original Galilean drawings depicting the moons of Jupiter. “Galileo had nowhere to go to buy a telescope,” Perez said. “He had to make his own.”

Also popular now: