PVS-Studio wanted, but could not find bugs in robots.txt

The other day, Google published the source code for the robots.txt parser. Why not run the project already tested up and down by PVS-Studio and possibly find an error. No sooner said than done. It is a pity that nothing significant could be found. Well then, let it be just an excuse to praise Google developers.
robots.txt - an index file that contains the rules for search robots. It is valid for https, http, and ftp protocols. Google has made its robots.txt file parser available to everyone. You can read more about this news here: Google opens the source code of the robots.txt parser
I think that most readers of our articles know what PVS-Studio does. But in case you are new to our blog, we’ll give a brief reference. PVS-Studio is a static code analyzer that allows you to find various errors, vulnerabilities and shortcomings in projects written in C, C ++, C # and Java. In other words, PVS-Studio is a SAST solution and can work both on user machines or build servers, and in the cloud . And PVS-Studio team loves to write articles about checking various projects. So let's get down to business and try to find errors in the parser source code from Google.
To our regret, and, to the delight of everyone else, no errors were found. Only a couple of minor flaws were found, which we will talk about. You have to write at least something :). The absence of errors is explained by the small volume of the project and the high quality of the code itself. This does not mean that some errors do not hide there, but the static analysis turned out to be powerless at the moment.
In general, this article turned out in the spirit of our other publication, " The shortest article on nginx verification ."
There was a possibility of small optimization:
V805 Decreased performance. It is inefficient to identify an empty string by using 'strlen (str)> 0' construct. A more efficient way is to check: str [0]! = '\ 0'. robots.cc 354
bool RobotsTxtParser::GetKeyAndValueFrom(char **key, ....)
{
....
*key = line;
....
if (strlen(*key) > 0) {
....
returntrue;
}
returnfalse;
}
Calling the strlen function to find out if a string is nonempty is an inefficient way. Such a check can be made much simpler: if (* key [0]! = '\ 0') , and you won’t need to go through all the elements of the string if it is nonempty.
V808 'path' object of 'basic_string' type was created but was not utilized. robots.cc 123
std::stringGetPathParamsQuery(....){
std::string path;
....
}
The path string is declared, but not used further. In some cases, unused variables may indicate an error. But here it seems that earlier this variable was somehow used, but after making the changes it became unnecessary. Thus, the analyzer often also helps to make the code cleaner and help avoid errors by simply removing the prerequisites for their appearance.
In the following case, the analyzer, in essence, makes a recommendation to add a default return after all main has been processed. It might be worth adding returnat the very end, so that you can understand that everything really worked. However, if this behavior was planned and nothing needed to be changed, and I would not want to see the analyzer message, then in the case of PVS-Studio you can suppress this warning and never see it again :).
V591 The 'main' function does not return a value, which is equivalent to 'return 0'. It is possible that this is an unintended behavior. robots_main.cc 99
intmain(int argc, char** argv){
....
if (filename == "-h" || filename == "-help" || filename == "--help")
{
ShowHelp(argc, argv);
return0;
}
if (argc != 4)
{
....
return1;
}
if (....)
{
....
return1;
}
....
if (....)
{
std::cout << "...." << std::endl;
}
}
It was also found that the two functions below, with different names, have the same implementation. Perhaps this is the result of the fact that earlier these functions had different logic, but came to the same. Or it may be that a typo crept in somewhere, so such warnings should be carefully checked.
V524 It is odd that the body of 'MatchDisallow' function is fully equivalent to the body of 'MatchAllow' function. robots.cc 645
intMatchAllow(absl::string_view path, absl::string_view pattern){
return Matches(path, pattern) ? pattern.length() : -1;
}
intMatchDisallow(absl::string_view path, absl::string_view pattern){
return Matches(path, pattern) ? pattern.length() : -1;
}
This is the only place that makes me suspicious. It is worth checking out to the authors of the project.
Thus, verification of the robots.txt parser from Google showed that a project so actively used and, most likely, repeatedly checked for errors, has high quality code. And the shortcomings found can not at all spoil the impression of what cool coders from Google were involved in this project :).
We offer you to download and try PVS-Studio on the project you are interested in.
If you want to share this article with an English-speaking audience, then please use the translation link: Victoria Khanieva. PVS-Studio wanted but couldn't find bugs in robots.txt