What errors are hiding in the code Infer.NET?

The publication by Microsoft of the source codes of their projects is quite a good reason to check them. This time was no exception, and today we will look at suspicious places found in the Infer.NET code. Down with annotation - more to the point!
A little bit about the project and analyzer
Infer.NET is a machine learning system developed by Microsoft experts. The source code of the project has recently become available on GitHub , which was the reason for its verification. More details about the project can be read, for example, here .
The project was checked with the help of the PVS-Studio static analyzer version 6.26. I remind you that PVS-Studio is looking for errors in the code in C \ C ++ \ C # (and soon in Java) under Windows, Linux, macOS. C # code while analyzing only under Windows. The analyzer can be downloaded and tried on your project.
The test itself was extremely simple and without problems. I previously unloaded the project from GitHub, restored the required packages (dependencies) and made sure that the project is being successfully built. This is required in order for the analyzer to have access to all the necessary information for carrying out a comprehensive analysis. After assembling in a couple of clicks, I launched the analysis solution via the PVS-Studio plugin for Visual Studio.
By the way, this is not the first project from Microsoft, tested with PVS-Studio - there were others: Roslyn , MSBuild , PowerShell , CoreFX and others .
Note . If you or your friends are interested in analyzing Java code - you can write to us in supportby selecting "I want analyzer for Java". There is no public beta-version of the analyzer yet, but it should appear soon. Somewhere in the secret laboratory (through the wall) the guys are actively working on it.
But enough of the abstract conversations - let's look at the problems in the code.
Is this a bug or a feature?
I propose to try to find the error yourself - a completely solvable problem. No joke in the spirit of what was in the article " Top 10 errors in C ++ projects for 2017 ", honestly. So do not rush to read the analyzer warning, presented after the code snippet.
privatevoidMergeParallelTransitions()
{
....
if ( transition1.DestinationStateIndex ==
transition2.DestinationStateIndex
&& transition1.Group ==
transition2.Group)
{
if (transition1.IsEpsilon && transition2.IsEpsilon)
{
....
}
elseif (!transition1.IsEpsilon && !transition2.IsEpsilon)
{
....
if (double.IsInfinity(transition1.Weight.Value) &&
double.IsInfinity(transition1.Weight.Value))
{
newElementDistribution.SetToSum(
1.0, transition1.ElementDistribution,
1.0, transition2.ElementDistribution);
}
else
{
newElementDistribution.SetToSum(
transition1.Weight.Value, transition1.ElementDistribution,
transition2.Weight.Value, transition2.ElementDistribution);
}
....
}
PVS-Studio warning : V3001 There are identical.-double.IsInfinity (transition1.Weight.Value) There are identical sub-expressions. Runtime Automaton.Simplification.cs 479
As can be seen from the code fragment, the method is working with a couple of variables - transition1 and transition2 . The use of similar names is sometimes quite justified, but it is worth remembering that in this case the probability of accidentally making a mistake somewhere with a name increases.
This is what happened when checking numbers to infinity ( double.IsInfinity ). Because of the error, we checked the value of the same variable 2 times - transition1.Weight.Value. The transition2Weight.Value variable should be the checked value in the second subexpression .
Another similar suspicious code.
internal MethodBase ToMethodInternal(IMethodReference imr)
{
....
bf |= BindingFlags.Public
| BindingFlags.NonPublic
| BindingFlags.Public
| BindingFlags.Instance;
....
}
PVS-Studio warning : V3001 There are "BindingFlags.Public" there are operator. Compiler CodeBuilder.cs 194
When generating the value of the bf variable , the BindingFlags.Public enumeration element is used twice . Either this code contains an extra flag setting operation, or instead of the second use of BindingFlags.Public, there should be a different enum value.
By the way, in source code this code is written in one line. It seems to me that if it is formatted in tabular style (like here), the problem is easier to detect.
Go ahead. I give the whole body of the method and again I suggest you find the error (or maybe the error) on your own.
privatevoidForEachPrefix(IExpression expr,
Action<IExpression> action)
{
// This method must be kept consistent with GetTargets.if (expr is IArrayIndexerExpression)
ForEachPrefix(((IArrayIndexerExpression)expr).Target,
action);
elseif (expr is IAddressOutExpression)
ForEachPrefix(((IAddressOutExpression)expr).Expression,
action);
elseif (expr is IPropertyReferenceExpression)
ForEachPrefix(((IPropertyReferenceExpression)expr).Target,
action);
elseif (expr is IFieldReferenceExpression)
{
IExpression target = ((IFieldReferenceExpression)expr).Target;
if (!(target is IThisReferenceExpression))
ForEachPrefix(target, action);
}
elseif (expr is ICastExpression)
ForEachPrefix(((ICastExpression)expr).Expression,
action);
elseif (expr is IPropertyIndexerExpression)
ForEachPrefix(((IPropertyIndexerExpression)expr).Target,
action);
elseif (expr is IEventReferenceExpression)
ForEachPrefix(((IEventReferenceExpression)expr).Target,
action);
elseif (expr is IUnaryExpression)
ForEachPrefix(((IUnaryExpression)expr).Expression,
action);
elseif (expr is IAddressReferenceExpression)
ForEachPrefix(((IAddressReferenceExpression)expr).Expression,
action);
elseif (expr is IMethodInvokeExpression)
ForEachPrefix(((IMethodInvokeExpression)expr).Method,
action);
elseif (expr is IMethodReferenceExpression)
ForEachPrefix(((IMethodReferenceExpression)expr).Target,
action);
elseif (expr is IUnaryExpression)
ForEachPrefix(((IUnaryExpression)expr).Expression,
action);
elseif (expr is IAddressReferenceExpression)
ForEachPrefix(((IAddressReferenceExpression)expr).Expression,
action);
elseif (expr is IDelegateInvokeExpression)
ForEachPrefix(((IDelegateInvokeExpression)expr).Target,
action);
action(expr);
}
Found? Checking!
PVS-Studio warnings :
- V3003 The use of if (A) {...} else if (A) {...} 'pattern was detected. There is a possibility of logical error presence. Check lines: 1719, 1727. Compiler CodeRecognizer.cs 1719
- V3003 The use of if (A) {...} else if (A) {...} 'pattern was detected. There is a possibility of logical error presence. Check lines: 1721, 1729. Compiler CodeRecognizer.cs 1721
Let's simplify the code a bit to make the problems more obvious.
privatevoidForEachPrefix(IExpression expr,
Action<IExpression> action)
{
if (....)
....
elseif (expr is IUnaryExpression)
ForEachPrefix(((IUnaryExpression)expr).Expression,
action);
elseif (expr is IAddressReferenceExpression)
ForEachPrefix(((IAddressReferenceExpression)expr).Expression,
action);
....
elseif (expr is IUnaryExpression)
ForEachPrefix(((IUnaryExpression)expr).Expression,
action);
elseif (expr is IAddressReferenceExpression)
ForEachPrefix(((IAddressReferenceExpression)expr).Expression,
action)
....
}
Conditional expressions and then- branches of several if statements are duplicated . Perhaps this code was written by copy-paste, which is why the problem arose. Now it turns out that the then- branches of duplicates will never be executed, since:
- if the conditional expression is true, the body of the first if statement from the corresponding pair is executed ;
- if the conditional expression is false in the first case, it will be false in the second.
Since the then- branches contain the same actions, now it looks like a redundant code that is confusing. It is possible that there is a problem of a different kind - instead of duplicates, other checks should have been performed.
We continue.
publicintCompare(Pair<int, int> x, Pair<int, int> y)
{
if (x.First < y.First)
{
if (x.Second >= y.Second)
{
// y strictly contains xreturn1;
}
else
{
// No containment - order by left boundreturn1;
}
}
elseif (x.First > y.First)
{
if (x.Second <= y.Second)
{
// x strictly contains yreturn-1;
}
else
{
// No containment - order by left boundreturn-1;
}
}
....
}
PVS-Studio warnings :
- V3004 The 'then' statement is equivalent to the 'else' statement. Runtime RegexpTreeBuilder.cs 1080
- V3004 The 'then' statement is equivalent to the 'else' statement. Runtime RegexpTreeBuilder.cs 1093
The code looks extremely suspicious, as it contains two conditional operators with the same bodies then and else- branches. Probably in both cases it is worth returning different values. Or, if this is conceived behavior, it will be useful to remove redundant conditional statements.
There were interesting cycles. Example below:
privatestatic Set<StochasticityPattern>
IntersectPatterns(IEnumerable<StochasticityPattern> patterns)
{
Set<StochasticityPattern> result
= new Set<StochasticityPattern>();
result.AddRange(patterns);
bool changed;
do
{
int count = result.Count;
AddIntersections(result);
changed = (result.Count != count);
break;
} while (changed);
return result;
}
PVS-Studio warning : V3020 An unconditional 'break' within a loop. Compiler DefaultFactorManager.cs 474
Because of the unconditional break statement , exactly one iteration of the loop is performed, and the changed control variable is not even used. In general, the code looks strange and suspicious.
The same method (replica) met in another class. Appropriate analyzer warning: V3020 An unconditional 'break' within a loop. Visualizers.Windows FactorManagerView.cs 350
By the way, the method met with the unconditional operator continue in the cycle (the analyzer found it with the same diagnostics), but above it was a comment confirming that this is a special temporary solution:
// TEMPORARYcontinue;
I remind you that there were no such comments near the unconditional break operator .
Go ahead.
internalstatic DependencyInformation GetDependencyInfo(....)
{
....
IExpression resultIndex = null;
....
if (resultIndex != null)
{
if (parameter.IsDefined(
typeof(SkipIfMatchingIndexIsUniformAttribute), false))
{
if (resultIndex == null)
thrownew InferCompilerException(
parameter.Name
+ " has SkipIfMatchingIndexIsUniformAttribute but "
+ StringUtil.MethodNameToString(method)
+ " has no resultIndex parameter");
....
}
....
}
....
}
PVS-Studio warning : V3022 Expression 'resultIndex == null' is always false. Compiler FactorManager.cs 382
Immediately, I note that between the declaration and the above check, the value of the variable resultIndex may change. However, between the checks of resultIndex! = Null and resultIndex == null, the value cannot be changed. Therefore, the result of the expression resultIndex == null will always be false , which means that an exception will never be generated.
I hope that you have an interest in finding errors yourself without my suggestions to find a problem, but just in case I will suggest doing it again. The code of the method is small, I will give it entirely.
publicstatic Tuple<int, string> ComputeMovieGenre(int offset,
string feature)
{
string[] genres = feature.Split('|');
if (genres.Length < 1 && genres.Length > 3)
{
thrownew ArgumentException(string.Format(
"Movies should have between 1 and 3 genres; given {0}.",
genres.Length));
}
doublevalue = 1.0 / genres.Length;
var result
= new StringBuilder(
string.Format(
"{0}:{1}",
offset + MovieGenreBuckets[genres[0]],
value));
for (int i = 1; i < genres.Length; ++i)
{
result.Append(
string.Format(
"|{0}:{1}",
offset + MovieGenreBuckets[genres[i].Trim()],
value));
}
returnnew Tuple<int, string>(MovieGenreBucketCount, result.ToString());
}
Let's see what happens here. The input string is parsed by the '|' character. If the length of the array is not as expected, an exception must be generated. Wait a second ... genres.Length <1 && genres.Length> 3 ? Since there is no number that would immediately fall into both values of the range of values required ( [int.MinValue..1) and (3..int.MaxValue] ), the result of the expression is always false . Therefore, this check does not protect against anything, and the expected exception will not be generated.
This is what the analyzer warns about: V3022 Expression 'genres.Length <1 && genres.Length> 3' is always false. Probably the '||' operator should be used here.
A suspicious division operation has been encountered.
publicstaticvoidCreateTrueThetaAndPhi(....)
{
....
double expectedRepeatOfTopicInDoc
= averageDocLength / numUniqueTopicsPerDoc;
....
int cnt = Poisson.Sample(expectedRepeatOfTopicInDoc);
....
}
PVS-Studio warning : V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider using a fractional part. An example: double A = (double) (X) / Y ;. LDA Utilities.cs 74 The
suspicious thing here is this: integer division is performed (the variables averageDocLength and numUniqueTopicsPerDoc are of type int ), and the result is written to a double variable . This begs the question: was it specifically done, or was it still meant the division of real numbers? If the variable expectedRepeatOfTopicInDoc were of type int , this would remove the possible questions.
In other places, the Poisson.Sample method , whose argument is the suspicious variable expectedRepeatOfTopicInDoc , is used, for example, as described below.
int numUniqueWordsPerTopic
= Poisson.Sample((double)averageWordsPerTopic);
averageWordsPerTopic is of type int , which is already at the place of use is cast to double .
And here is another place of use:
double expectedRepeatOfWordInTopic
= ((double)numDocs) * averageDocLength / numUniqueWordsPerTopic;
....
int cnt = Poisson.Sample(expectedRepeatOfWordInTopic);
Note that the variables have the same names as in the original example, only for initializing the expectedRepeatOfWordInTopic , the division of real numbers is used (due to the explicit reduction of numDocs to double type ).
In general, it is worth looking at the original place to which the analyzer issued a warning.
But thinking about whether to correct it, and how, let's leave it to the authors of the code (they know better), and go ahead. To the next suspicious division.
publicstatic NonconjugateGaussian BAverageLogarithm(....)
{
....
double v_opt = 2 / 3 * (Math.Log(mx * mz / Ex2 / 2) - m);
if (v_opt != v)
{
....
}
....
}
PVS-Studio warning : V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider using a fractional part. An example: double A = (double) (X) / Y ;. Runtime ProductExp.cs 137
The analyzer again detected a suspicious integer division operation, since 2 and 3 - integer numeric literals, and the result of the expression 2/3 is 0 . As a result, the whole expression takes the form:
double v_opt = 0 * expr;
Agree, a little strange. Several times I returned to this warning, trying to find some kind of trick, and not trying to add it to the article. The method is filled with mathematics and various formulas (which, to be honest, we didn’t really want to), is there anything to expect. In addition, I try to be as skeptical as possible about the warnings that I write to the article, and I describe them only after studying better.
But then it dawned on me - why do we need a multiplier in the form of 0 , written as 2/3 ? So this place is, in any case, worth a look.
publicstaticvoidWriteAttribute(TextWriter writer,
string name,
object defaultValue,
objectvalue,
Func<object, string> converter = null)
{
if ( defaultValue == null && value == null
|| value.Equals(defaultValue))
{
return;
}
string stringValue = converter == null ? value.ToString() :
converter(value);
writer.Write($"{name}=\"{stringValue}\" ");
}
PVS-Studio warning : V3080 Possible null dereference. Consider inspecting 'value'. Compiler WriteHelpers.cs 78 A
fair condition analyzer statement. A null reference dereference can occur in the value.Equals (defaultValue) expression if value == null . Since this expression is the right-hand operand of the operator ||, to calculate it, the left-hand operand must be false , and for this it suffices that at least one of the variables defaultValue \ value is not null . As a result, if defaultValue! = Null , and value == null :
- defaultValue == null -> false ;
- defaultValue == null && value == null -> false ; ( value check failed)
- value.Equals (defaultValue) -> NullReferenceException , since value is null .
Let's look at a similar case:
publicFeatureParameterDistribution(
GaussianMatrix traitFeatureWeightDistribution,
GaussianArray biasFeatureWeightDistribution)
{
Debug.Assert(
(traitFeatureWeightDistribution == null &&
biasFeatureWeightDistribution == null)
||
traitFeatureWeightDistribution.All(
w => w != null
&& w.Count == biasFeatureWeightDistribution.Count),
"The provided distributions should be valid
and consistent in the number of features.");
....
}
PVS-Studio warning : V3080 Possible null dereference. Consider inspecting 'traitFeatureWeightDistribution'. Recommender FeatureParameterDistribution.cs 65
Let's throw out the excess, leaving only the logic for calculating the boolean value, to make it easier to figure it out:
(traitFeatureWeightDistribution == null &&
biasFeatureWeightDistribution == null)
||
traitFeatureWeightDistribution.All(
w => w != null
&& w.Count == biasFeatureWeightDistribution.Count)
Again, the right operand of the operator || calculated only if the result of calculating the left is false . The left operand can be false , including when traitFeatureWeightDistribution == null and biasFeatureWeightDistribution! = Null . Then the right operand of the operator || will be calculated, and the call to traitFeatureWeightDistribution.All will result in an ArgumentNullException .
Another interesting code snippet:
publicstaticdoubleGetQuantile(double probability,
double[] quantiles)
{
....
int n = quantiles.Length;
if (quantiles == null)
thrownew ArgumentNullException(nameof(quantiles));
if (n == 0)
thrownew ArgumentException("quantiles array is empty",
nameof(quantiles));
....
}
PVS-Studio Warning : V3095 The 'quantiles' object was used against it. Check lines: 91, 92. Runtime OuterQuantiles.cs 91
Notice that the quantiles.Length property is first accessed , and then quantiles are checked for equality to null . In summary, if quantiles == null , the method will throw an exception, just a little bit wrong, and a bit out of place where it was needed. Apparently, they mixed up the lines in places.
If you have successfully coped with the detection of the errors given earlier on your own, I suggest you brew a cup of coffee and try the feat again, finding an error in the method below. To make it a bit more interesting, I cite the entire method code.
( Link to full size )
Okay, okay, it was a joke (or did you still succeed ?!). Let's simplify the task a bit:
if (sample.Precision < 0)
{
precisionIsBetween = true;
lowerBound = -1.0 / v;
upperBound = -mean.Precision;
}
elseif (sample.Precision < -mean.Precision)
{
precisionIsBetween = true;
lowerBound = 0;
upperBound = -mean.Precision;
}
else
{
// in this case, the precision should NOT be in this interval.
precisionIsBetween = false;
lowerBound = -mean.Precision;
lowerBound = -1.0 / v;
}
Got better? The analyzer issued the following warning for this code: V3008 The 'lowerBound' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 324, 323. Runtime GaussianOp.cs 324
And indeed, in the last else- branch, the value of the lowerBound variable is assigned twice . Apparently (and judging by the code above), the variable upperBound should participate in one of the assignments .
Follow on.
privatevoidWriteAucMatrix(....)
{
....
for (int c = 0; c < classLabelCount; c++)
{
int labelWidth = labels[c].Length;
columnWidths[c + 1] =
labelWidth > MaxLabelWidth ? MaxLabelWidth : labelWidth;
for (int r = 0; r < classLabelCount; r++)
{
int countWidth = MaxValueWidth;
if (countWidth > columnWidths[c + 1])
{
columnWidths[c + 1] = countWidth;
}
}
....
}
PVS-Studio warning : V3081 The 'r' counter is not used inside a nested loop. Consider inspecting usage of 'c' counter. CommandLine ClassifierEvaluationModule.cs 459
Please note that the internal cycle counter, r, is not used in the body of this cycle. Because of this, it turns out that throughout all iterations of the inner loop, the same operations are performed on the same elements - after all, the indices also use the counter of the outer loop ( c ), and not the inner one ( r ).
Let's see what else was interesting.
publicRegexpFormattingSettings(bool putOptionalInSquareBrackets,
bool showAnyElementAsQuestionMark,
bool ignoreElementDistributionDetails,
int truncationLength,
bool escapeCharacters,
bool useLazyQuantifier)
{
this.PutOptionalInSquareBrackets = putOptionalInSquareBrackets;
this.ShowAnyElementAsQuestionMark = showAnyElementAsQuestionMark;
this.IgnoreElementDistributionDetails =
ignoreElementDistributionDetails;
this.TruncationLength = truncationLength;
this.EscapeCharacters = escapeCharacters;
}
PVS-Studio warning : V3117 Constructor parameter 'useLazyQuantifier' is not used. Runtime RegexpFormattingSettings.cs 38
The constructor does not use one parameter — useLazyQuantifier . It looks especially suspicious against the background of the fact that a property with the corresponding name and type is defined in the class - UseLazyQuantifier . Apparently, they forgot to initialize it through the corresponding parameter.
There are several potentially dangerous event handlers. An example of one of these is shown below:
publicclassRecommenderRun
{
....
publicevent EventHandler Started;
....
publicvoidExecute()
{
// Report that the run has been startedif (this.Started != null)
{
this.Started(this, EventArgs.Empty);
}
....
}
....
}
PVS-Studio warning : V3083 Unsafe invocation of event 'Started', NullReferenceException is possible. Consider assigning an event to a local variable before invoking it. Evaluator RecommenderRun.cs 115
The fact is that between checking for null inequality and calling the handler, there can be a cancellation of the event, and if, between checking for null and calling the handler, the event has no subscribers, a NullReferenceException will be generated . To avoid such problems, you can, for example, save a reference to a chain of delegates to a local variable or use the '?.' Operator to call handlers.
In addition to the above code snippet, there were 35 such places.
By the way, 785 warnings of V3024 were met . The warning V3024 is issued when comparing real numbers using the '! =' Or '==' operators. I will not dwell here on why such comparisons are not always correct - for more information about this, see the documentation, there is also a link to StackOverflow (this is it).
Considering that formulas and calculations were often encountered, these warnings can also be important, although they were placed at level 3 (since not all projects are relevant at all).
If there is a certainty that these warnings are irrelevant - you can remove them with almost one click , reducing the total number of analyzer responses.

Conclusion
Somehow it happened that I hadn’t written articles about project validation for quite a long time, and it was quite pleasant to touch this process again. I hope that you learned something new / useful from this article, or at least read it with interest.
I wish the developers to fix the problem areas as soon as possible and remind that making mistakes is normal, yet we are human. That's why additional tools like static analyzers are needed to find what a person missed, right? Anyway - good luck with the project, and thanks for the work!
And remember that the maximum benefit from the static analyzer is achieved with its regular use .
All the best!
If you want to share this article with an English-speaking audience, then please use the link to the translation: Sergey Vasiliev. What Errors Lurk in Infer .NET Code?