Why do physicists still use Fortran

http://www.moreisdifferent.com/2015/07/16/why-physicsts-still-use-fortran/
  • Transfer
I do not know what the programming language will look like in the year 2000, but I know that it will be called FORTRAN.
- Charles Anthony Richard Hoare, ca. 1982

In the industry, Fortran is rarely used today - it was ranked 28th in one of the lists of popular languages . But Fortran is still the main language for large-scale simulations of physical systems — that is, for such things as astrophysical modeling of stars and galaxies (eg Flash ), large-scale molecular dynamics, electronic structure counting codes ( SIESTA ), climate models, and so on. In the field of high-performance computing, a subset of which are large-scale numerical simulations, today only two languages ​​are used - C / C ++ and “modern Fortran” (Fortran 90/95/03/08). Popular Open MPI Librariesfor code parallelization were developed for these two languages. In general, if you need fast code running on multiple processors, you have only two options. In modern Fortran there is such a feature as " coarray ", which allows working with parallel programming directly in the language. Coarray appeared in Fortran 95 extensions, and then were included in Fortran 2008.

The active use of Fortran by physicists often confuses computer scientists and other unrelated people who feel that Fortran is a historical anachronism.

I would like to explain why Fortran still remains useful. I don’t encourage students who study physics to teach Fortran — since most of them will be doing research, they’d better study C / C ++ (or stay at Matlab / Octave / Python). I would like to clarify why Fortran is still in use, and prove that this is not only because physicists are “lagging behind fashion” (although sometimes this is the case - last year I saw a physics student who worked with code Fortran 77, while neither he nor his head heard anything about Fortran 90). Computer scientists should consider the predominance of Fortran in numerical calculations as a challenge.

Before I delve into the topic, I want to discuss the story, because when people hear the word "Fortran", they immediately imagine punch cards and code with numbered lines. The first Fortran specification was written in 1954. The early Fortran (then its name was written in capital letters, FORTRAN) was, by modern standards, a hell of a language, but it was an incredible step forward from previous programming in assembler. FORTRAN was often programmed using punched cards, as Professor Miriam Forman of Stony Brooke University remembers without pleasure. Fortran had many versions, the most famous of which are the standards 66, 77, 90, 95, 03 and 08.

It is often said that Fortran is still used because of its speed. But is he the fastest? On the site benchmarksgame.alioth.debian.orgthere is a comparison of C and Fortranin several tests among many languages. In most cases, Fortran and C / C ++ are the fastest. Python loved by programmers often lags 100 times behind, but it’s in the order of things for interpreted code. Python is not suitable for complex numerical calculations, but works well for another. Interestingly, C / C ++ wins Fortran in all tests, except two, although in general they differ little in results. The tests, where Fortran wins, the most “physical” ones are the simulation of a system of n bodies and the calculation of the spectrum. The results depend on the number of processor cores, for example, Fortran is a little behind C / C ++ on the quad core. Tests, in which Fortran lags far behind C / C ++, most of the time are engaged in reading and writing data, and in this respect, Fortran's slowness is known.

So, C / C ++ is as fast as Fortran, and sometimes a bit faster. We are interested in, “why do physics professors continue to advise their students to use Fortran instead of C / C ++?”

Fortran has inherited code


Due to the long history of Fortran, it is not surprising that there are mountains of physics code written on it. Physicists are trying to minimize programming time, so if they find an earlier code, they will use it. Even if the old code is unreadable, poorly documented and not the most effective, it is more likely to use old code that is checked than to write new one. The task of physicists is not to write code, they are trying to understand the nature of reality. Professors have the inherited code always at hand (they often wrote this code decades ago), and they pass it on to their students. This saves their time and removes uncertainties from the process of eliminating errors.

Physic students learn Fortran more easily than C / C ++


I think learning Fortran is easier than C / C ++. Fortran 90 and C are very similar, but Fortran is easier to write. C is a relatively primitive language, so physicists who choose C / C ++ for themselves are engaged in object-oriented programming. OOP can be useful, especially in large software projects, but it is much longer to study it. We need to learn abstractions such as classes and inheritance. The PLO paradigm is very different from the procedural one used in Fortran. Fortran is based on the simplest procedural paradigm, more similar to what is happening at the computer "under the hood." When you optimize / vectorize code to increase speed, the procedural paradigm is easier to work with. Physicists usually understand how computers work, and think in terms of physical processes, such as transferring data from disk to RAM, and from RAM to the processor's cache. They differ from mathematicians who prefer to think in terms of abstract functions and logic. Also this thinking is different from object-oriented. Optimization of OOP code is more complicated from my point of view than procedural. Objects are very bulky structures compared to data structures preferred by physicists: arrays.

Easy first: Fortran work with arrays


Arrays, or, as physicists call them, matrices, are at the heart of all physical computing. In Fortran 90+ you can find many opportunities to work with them, similar to APL and Matlab / Octave. Arrays can be copied, multiplied by a scalar, multiplied with each other in a very intuitive way:

A = B
A = 3.24*B
C = A*B
B = exp(A)
norm = sqrt(sum(A**2))

Here, A, B, C are arrays of some dimension (say, 10x10x10). C = A * B gives us the elementwise multiplication of matrices, if A and B are the same size. For matrix multiplication, C = matmul (A, B) is used. Almost all internal functions Fortran (Sin (), Exp (), Abs (), Floor (), etc.) take arrays as arguments, which results in simple and clean code. There is simply no similar code in C / C ++. In the basic implementation of C / C ++, simple copying of an array requires running for loops on all elements or calling a library function. If you feed an array of the wrong library function in C, an error will occur. The need to use libraries instead of internal functions means that the resulting code will not be clean and portable, or easy to learn.

In Fortran, access to the elements of an array using the simple syntax A [x, y, z], when in C / C ++ you need to write A [x] [y] [z]. The elements of the arrays start from 1, which corresponds to the physicists' ideas about matrices, and in arrays of C / C ++, the numbering starts from zero. Here are some more functions for working with arrays in Fortran.

A = (/ i , i = 1,100 /)
B = A(1:100:10)
C(10:) = B

First, the vector A is created through an implicit do loop, also known as an array constructor. Then a vector B is created, consisting of every 10th element A, with the help of step 10. And finally, the array B is copied into array C, starting with the 10th element. Fortran supports declaring arrays with zero or negative indices:

doubleprecision, dimension(-1:10) :: myArray

The negative index at first looks silly, but I heard about their usefulness - for example, imagine that this is an additional area for placing any explanations. Fortran also supports vector indices . For example, you can transfer elements 1.5 and 7 from array A of dimension N x 1 to array B of dimension 3 x 1:

subscripts = (/1, 5, 7 /)
B = A(subscripts)

Fortran supports array masks in all internal functions. For example, if we need to calculate the logarithm of all the elements of the matrix greater than zero, we use:

log_of_A = log(A, mask= A .gt. 0)

Or we can zero out all negative elements of the array in one line:

where(my_array .lt. 0.0) my_array = 0.0

Fortran makes it easy to dynamically allocate and free arrays. For example, to place a two-dimensional array:

real, dimension(:,:), allocatable :: name_of_array
allocate(name_of_array(xdim, ydim))

In C / C ++, this requires the following entry :

int **array;
array = malloc(nrows * sizeof(double *));
for(i = 0; i < nrows; i++){
     array[i] = malloc(ncolumns * sizeof(double));
}

To free an array in Fortran

deallocate(name_of_array)

In C / C ++ for this

for(i = 0; i < nrows; i++){
    free(array[i]);
}
free(array);

Лёгкость вторая: не нужно беспокоиться об указателях и выделении памяти


In languages ​​like C / C ++, all variables are passed by value, with the exception of arrays passed by reference. But in many cases, passing an array by value makes more sense. For example, let the data consist of the positions of 100 molecules in different periods of time. We need to analyze the movement of a single molecule. We take a slice of the array (sub-array) corresponding to the coordinates of the atoms in this molecule and transfer it to the function. In it, we will deal with a complex analysis of the transferred subarray. If we passed it by reference, the transferred data would not be located in memory in a row. Because of the peculiarities of memory access, working with such an array would be slow. If we pass it by value, we will create in memory a new array located in a row. To the delight of physicists, the compiler takes on all the dirty work of memory optimization.

In Fortran, variables are usually passed by reference, not by value. Under the hood, the Fortran compiler automatically optimizes their transfer for increased efficiency. From the point of view of the professor in the field of optimizing memory usage, the compiler should be trusted more than the student! As a result, physicists rarely use pointers, although they do exist in Fortran-90 + .

A few more examples of the differences between Fortran and C


Fortran has several options for managing the compiler when searching for errors and optimizing. Errors in the code can be caught at the compilation stage, and not during execution. For example, any variable can be declared as a parameter, that is, a constant.

doubleprecision, parameter :: hbar = 6.63e-34

If the parameter in the code changes, the compiler returns an error. In C this is called const

doubleconst hbar = 6.63e-34

The problem is that const real is different from simple real. If the function that accepts real receives const real, it will return an error. It's easy to imagine how this can lead to interoperability problems in code.

Fortran also has an intent specification that tells the compiler whether the argument passed to the function is an input, an output, or both an input and an output parameter. This helps the compiler to optimize the code and increases its readability and reliability.

Fortran has other features used with varying frequencies. For example, in Fortran 95 it is possible to declare functions with the pure modifier. Such a function has no side effects - it changes only its arguments, and does not change global variables. A special case of such a function is the elemental function, which receives and returns scalars. It is used to process the elements of an array. Information that the pure or elemental function allows the compiler to perform additional optimization, especially when code is parallelized.

What to expect in the future?


In scientific calculations, Fortran remains the main language, and is not going to disappear in the near future. On the surveyamong those who use this language are visitors to the 2014 Supercomputing Convention, 100% of them said they were going to use it in the next 5 years. It also follows from the survey that 90% used a mixture of Fortran and C. Anticipating an increase in mixing these languages, the creators of the Fortran 2015 specification include more possibilities for interoperability of the code. Fortran code is increasingly being invoked from Python code. Computer scientists who criticize the use of Fortran do not understand that this language remains uniquely adapted to what it was called FOrmula TRANslation, which translates formulas, that is, transforms physical formulas into code. Many of them do not realize that the language is developing and constantly includes all the new features.

To call modern Fortran 90+ old is the same as calling old C ++, because C was developed in 1973. On the other hand, even in the newest standard Fortran 2008 there is backward compatibility with Fortran 77 and most of Fortran 66. Therefore, the development of language is associated with certain difficulties. Recently, MIT researchers decided to overcome these difficulties by developing from scratch a language for HPC called Julia , first published in 2012. Does Julia take the place of Fortran, remains to be seen. In any case, I suspect that this will take a very long time.

Also popular now: