Uninitialized variables: looking for errors
A large number of scientific studies use code written in Fortran. And, unfortunately, “scientific” applications are also not immune from commonplace errors, such as uninitialized variables. Needless to say, what can such calculations lead to? Sometimes the effect of such errors can lead to “serious breakthroughs” in science, or cause really big problems - who knows where the results can be used (but, we guess where)? I would like to give a number of simple and effective methods that will allow you to check existing Fortran code using the Intel compiler and avoid such troubles.
We will consider the problems associated with floating point numbers. Errors with uninitialized variables are difficult to find, especially if the code started to be written on the Fortran 77 standard. The specificity is that even if we have not declared a variable, it will be declared implicitly, depending on the first letter of the name, according to the so-called rules implicit type definition (all this is also supported in the latest standards). The letters I to N indicate the type INTEGER, and the remaining letters indicate the type REAL. That is, if the variable F suddenly appears in our code, by which we multiply something, the compiler will not throw errors, but simply make F a real type. Here is such a wonderful example that can compile and execute quite well:
program test
z = f*10
print *, z, f
end program test
As you know, anything will be on the screen. I have so:
-1.0737418E+09 -1.0737418E+08
It is interesting that in the same standard it was possible to prohibit such "games" with declaring variables, but only within the program unit by writing implicit none . True, if you forget to do this in some module, “phantom” variables will appear there. It is curious that I once saw randomly added characters to the name of a variable in the calculations. Apparently, someone accidentally typed something in a notebook, and some of them were added to the program code when switching between windows. As a result, everything continued to be considered, and nobody cursed at a variable. Tracking such errors is extremely difficult, especially if the code has worked without problems for many years.
Therefore, I highly recommend always using implicit none and get errors from the compiler about variables that were not explicitly defined (even if they are initialized and everything is fine with them):
program test
implicit none
...
end program test
error #6404: This name does not have a type, and must have an explicit type. [Z]
error #6404: This name does not have a type, and must have an explicit type. [F]
If we understand the already written code, then changing all the sources can be very laborious, so you can use the compiler option / warn: declarations (Windows) or -warn declarations (Linux). She will give us warnings:
warning #6717: This name has not been given an explicit type. [Z]
warning #6717: This name has not been given an explicit type. [F]
When we deal with all the implicit declared variables and make sure that there are no errors with them, we can move on to the next part of the Marlezon Ballet, namely by searching for uninitialized variables.
One of the standard methods is the compiler initializing all variables with a certain value, by which, when working with a variable, we can easily understand that the developer forgot about initialization. This value should be very "unusual", and when working with it, it is desirable to stop the application in order to, so to speak, "take it red-handed."
It is very logical to use the "signal" value of SNaN- Signaling NaN (Not-a-Number). This is a floating point number that has a special representation, and when we try to perform any operation with it, we will get an exception. It is worth saying that a certain variable can get the value NaN and when performing certain operations, for example, dividing by zero, multiplying zero by infinity, dividing infinity by infinity, and so on. Therefore, before proceeding with the “trapping” of uninitialized variables, I would like to make sure that there are no exceptions in our code related to working with floating-point numbers.
To do this, enable the / fpe: 0 and / traceback (Windows) option , or –fpe0 and –traceback(Linux), build the application and run it. If everything went as usual, and the application came out without throwing an exception, then we are great. But, it is quite possible that already at this stage various "unforeseen moments" will "climb". And all because fpe0changes the default work with exceptions for floating point numbers. If by default they are disabled, and we calmly divide by 0, not suspecting this, then now, an exception will be thrown and the program will stop. By the way, not only when dividing by 0 (divide-by-zero), but also when overflowing a floating point number (floating point overflow), as well as during invalid operations (floating invalid). At the same time, the numerical results can also slightly change, since now the denormalized numbers will be “reset” to 0. This, in turn, can give significant acceleration when running your application, since working with denormalized numbers is extremely slow, but with zeros - you understand.
Another interesting point is the possible exception with the fpe0 option .as a result of certain compiler optimizations, for example, vectorization. Let's say we are in a loop and divided by a value if it is not 0, doing an if check. There may be a situation where division will still occur, because the compiler decided that it would be much faster than using masked operations. In this case, we are working in a speculative mode.
So this can be controlled using the / Qfp-speculation: strict (Windows) or -fp-speculation = strict (Linux) option , and disable similar compiler optimizations when working with floating-point numbers. Another way is to change the whole work model through -fp-model strict, which gives a big negative effect on the overall performance of the application. About what models are available in the Intel compiler, I already talked about earlier .
By the way, you can try and simply reduce the level of optimization through the / O1 or / Od options on Windows ( -O1 and -O0 on Linux).
The traceback option simply allows you to get more detailed information about where the error occurred (function name, file and line of code).
Let's do a test on Windows, compiling without optimization (with the / Od option ):
program test
implicit none
real a,b
a=0
b = 1/a
print *, 'b=', b
end program test
As a result, on the screen we will see the following:
b= Infinity
Now enable the / fpe: 0 and / traceback options and get the expected exception :
forrtl: error (73): floating divide by zero
Image PC Routine Line Source
test.exe 00F51050 _MAIN__ 5 test.f90
…
We need to remove such problems from our code before the start of the next step, namely, force initialization with SNaN values using the / Qinit option: snan, arrays / traceback (Windows) or -init = snan, arrays -traceback (Linux).
Now every access to an uninitialized variable will result in a runtime error:
forrtl: error (182): floating invalid - possible uninitialized real/complex variable.
In the simplest example:
program test
implicit none
real a,b
b = 1/a
print *, 'b=', b
end program test
forrtl: error (182): floating invalid - possible uninitialized real/complex variable.
Image PC Routine Line Source
test.exe 00D01061 _MAIN__ 4 test.f90
…
A few words about what this outlandish init option is . It appeared not so long ago, namely from the compiler version 16.0 (I remind you that the latest version of the compiler for today is 17.0), and allows you to initialize the following constructions in SNaN :
- Static scalars and arrays (with SAVE attribute)
- Local scalars and arrays
- Automatic (formed when calling functions) arrays
- Variables from Modules
- Dynamically allocated (with ALLOCATABLE attribute) arrays and scalars
- Pointers (variables with the POINTER attribute)
But there are a number of restrictions for which init will not work:
- Variables in EQUIVALENCE Groups
- Variables in a COMMON Block
- Inherited types and their components are not supported except ALLOCATABLE and POINTER
- Formal (dummy) arguments in functions are not initialized locally in SNaN . However, the actual arguments passed to the function can be initialized in the calling function.
- References in Intrinsic Function Arguments and I / O Expressions
By the way, the option can not only initialize the values in SNaN , but also null them. To do this, specify / Qinit: zero on Windows ( -init = zero on Linux), and not only the REAL / COMPLEX types, but also the integer INTEGER / LOGICAL will be initialized. By adding arrays , we will also initialize arrays, not just scalar values.
For example, options:
-init=snan,zero ! Linux and OS X systems
/Qinit:snan,zero ! Windows systems
Initialize scalars of the REAL or COMPLEX types with the value SNaN , and the INTEGER or LOGICAL types with zeros. The following example extends the initialization action to arrays as well:
-init=zero -init=snan –init=arrays ! Linux and OS X systems
/Qinit:zero /Qinit:snan /Qinit:arrays ! Windows systems
In the past, Intel tried to implement such functionality through the -ftrapuv option , but today it is not recommended for use and is deprecated, although as planned, it also had to initialize the values - it did not work out.
By the way, if you are working on the first generation Intel Xeon Phi coprocessor (Knights Corner), then the option will not be available to you, since there is no SNaN support there .
Well, in the end, an example from the documentation, which we compile on Linux with all the proposed options and find uninitialized variables in runtime:
! ==============================================================
!
! SAMPLE SOURCE CODE - SUBJECT TO THE TERMS OF SAMPLE CODE LICENSE AGREEMENT,
! http://software.intel.com/en-us/articles/intel-sample-source-code-license-agreement/
!
! Copyright 2015 Intel Corporation
!
! THIS FILE IS PROVIDED "AS IS" WITH NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT
! NOT LIMITED TO ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
! PURPOSE, NON-INFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS.
!
! ===============================================================
module mymod
integer, parameter :: n=100
real :: am
real, allocatable, dimension(:) :: dm
real, target, dimension(n) :: em
real, pointer, dimension(:) :: fm
end module mymod
subroutine sub(a, b, c, d, e, m)
use mymod
integer, intent(in) :: m
real, intent(in), dimension(n) :: c
real, intent(in), dimension(*) :: d
real, intent(inout), dimension(*) :: e
real, automatic, dimension(m) :: f
real :: a, b
print *, a,b,c(2),c(n/2+1),c(n-1)
print *, d(1:n:33) ! first and last elements uninitialized
print *, e(1:n:30) ! middle two elements uninitialized
print *, am, dm(n/2), em(n/2)
print *, f(1:2) ! automatic array uninitialized
e(1) = f(1) + f(2)
em(1)= dm(1) + dm(2)
em(2)= fm(1) + fm(2)
b = 2.*am
e(2) = d(1) + d(2)
e(3) = c(1) + c(2)
a = 2.*b
end
program uninit
use mymod
implicit none
real, save :: a
real, automatic :: b
real, save, target, dimension(n) :: c
real, allocatable, dimension(:) :: d
real, dimension(n) :: e
allocate (d (n))
allocate (dm(n))
fm => c
d(5:96) = 1.0
e(1:20) = 2.0
e(80:100) = 3.0
call sub(a,b,c,d,e(:),n/2)
deallocate(d)
deallocate(dm)
end program uninit
First, compile with –fpe0 and run:
$ ifort -O0 -fpe0 -traceback uninitialized.f90; ./a.out
0.0000000E+00 -8.7806177E+13 0.0000000E+00 0.0000000E+00 0.0000000E+00
0.0000000E+00 1.000000 1.000000 0.0000000E+00
2.000000 0.0000000E+00 0.0000000E+00 3.000000
0.0000000E+00 0.0000000E+00 0.0000000E+00
1.1448686E+24 0.0000000E+00
It can be seen that there are no exceptions related to operations with floating-point numbers in our application, but there are several "strange" values. We will look for uninitialized variables with the init option :
$ ifort -O0 -init=snan -traceback uninitialized.f90; ./a.out
NaN NaN 0.0000000E+00 0.0000000E+00 0.0000000E+00
0.0000000E+00 1.000000 1.000000 0.0000000E+00
2.000000 0.0000000E+00 0.0000000E+00 3.000000
NaN 0.0000000E+00 0.0000000E+00
1.1448686E+24 0.0000000E+00
forrtl: error (182): floating invalid - possible uninitialized real/complex variable.
Image PC Routine Line Source
a.out 0000000000477535 Unknown Unknown Unknown
a.out 00000000004752F7 Unknown Unknown Unknown
a.out 0000000000444BF4 Unknown Unknown Unknown
a.out 0000000000444A06 Unknown Unknown Unknown
a.out 0000000000425DB6 Unknown Unknown Unknown
a.out 00000000004035D7 Unknown Unknown Unknown
libpthread.so.0 00007FC66DD26130 Unknown Unknown Unknown
a.out 0000000000402C11 sub_ 39 uninitialized.f90
a.out 0000000000403076 MAIN__ 62 uninitialized.f90
a.out 00000000004025DE Unknown Unknown Unknown
libc.so.6 00007FC66D773AF5 Unknown Unknown Unknown
a.out 00000000004024E9 Unknown Unknown Unknown
Aborted (core dumped)
Now we see that on line 39 we turn to the uninitialized variable AM from the MYMOD module:
b = 2.*am
There are other errors in this code that I suggest finding by myself using the Intel compiler. I really hope that this post will be useful to everyone who writes code on Fortran, and your applications will undergo the necessary checks for uninitialized variables before they are released. Thank you for this and see you soon! Happy New Year, everyone!