Notepad ++: code check five years later
This year the PVS-Studio static analyzer turned 10 years old. True, it is worth clarifying that 10 years ago it was called Viva64. And there is another interesting date: 5 years have passed since the previous verification of the Notepad ++ project code. Since then, PVS-Studio has been very much improved: about 190 new diagnostics are added, old ones are improved. However, you should not expect a huge number of errors in Notepad ++. This is a small project consisting of only 123 source files. Nevertheless, errors were found in the code that would be useful to fix.
Introduction
Notepad ++ is a free open source text editor for Windows, with syntax highlighting for a large number of programming and markup languages. It is based on the Scintilla component, written in C ++ using STL, as well as the Windows API and distributed under the GNU General Public License.
In my opinion, Notepad ++ is an excellent text editor. I myself use it for everything except writing code. For source analysis, I used the PVS-Studio 6.15 static analyzer . The Notepad ++ project has already been tested in 2010 and 2012years. There are 84 High warnings, 124 Medium warnings and 548 Low warnings. Warning levels indicate the validity of errors found. So, of the 84 most reliable warnings (High), 81, in my opinion, indicate real defects in the code - they should be fixed right away, without delving deep into the program logic, because the problems are obvious.
Note. In addition to processing the results of a static analyzer, it will be useful to improve the code, having decided: to use spaces or tabs for indentation. Absolutely all the code looks something like this:
Figure 1 - Different indents in the code.
Let's look at some of the errors that seemed most interesting to me.
Inheritance issues
V599 The virtual destructor is not present, although the 'FunctionParser' class contains virtual functions. functionparser.cpp 39
class FunctionParser
{
friend class FunctionParsersManager;
public:
FunctionParser(....): ....{};
virtual void parse(....) = 0;
void funcParse(....);
bool isInZones(....);
protected:
generic_string _id;
generic_string _displayName;
generic_string _commentExpr;
generic_string _functionExpr;
std::vector _functionNameExprArray;
std::vector _classNameExprArray;
void getCommentZones(....);
void getInvertZones(....);
generic_string parseSubLevel(....);
};
std::vector _parsers;
FunctionParsersManager::~FunctionParsersManager()
{
for (size_t i = 0, len = _parsers.size(); i < len; ++i)
{
delete _parsers[i]; // <=
}
if (_pXmlFuncListDoc)
delete _pXmlFuncListDoc;
}
The analyzer found a serious error leading to incomplete destruction of objects. The FunctionParser base class has a virtual parse () function , but no virtual destructor. The inheritance hierarchy from this class contains classes such as FunctionZoneParser , FunctionUnitParser, and FunctionMixParser :
class FunctionZoneParser : public FunctionParser
{
public:
FunctionZoneParser(....): FunctionParser(....) {};
void parse(....);
protected:
void classParse(....);
private:
generic_string _rangeExpr;
generic_string _openSymbole;
generic_string _closeSymbole;
size_t getBodyClosePos(....);
};
class FunctionUnitParser : public FunctionParser
{
public:
FunctionUnitParser(....): FunctionParser(....) {}
void parse(....);
};
class FunctionMixParser : public FunctionZoneParser
{
public:
FunctionMixParser(....): FunctionZoneParser(....), ....{};
~FunctionMixParser()
{
delete _funcUnitPaser;
}
void parse(....);
private:
FunctionUnitParser* _funcUnitPaser = nullptr;
};
For clarity, I made an inheritance scheme for these classes:
Figure 2 - The inheritance scheme from the FunctionParser class
Thus, the created objects will not be completely destroyed. This will lead to undefined program behavior. You can’t say how the program will behave after UB appears in it, but in practice, in this case, at least, a memory leak will occur, since the “delete _funcUnitPaser” code will not be executed.
Consider the following error.
V762 It is possible a virtual function was overridden incorrectly. See first argument of function 'redraw' in derived class 'SplitterContainer' and base class 'Window'. splittercontainer.h 61
class Window
{
....
virtual void display(bool toShow = true) const
{
::ShowWindow(_hSelf, toShow ? SW_SHOW : SW_HIDE);
}
virtual void redraw(bool forceUpdate = false) const
{
::InvalidateRect(_hSelf, nullptr, TRUE);
if (forceUpdate)
::UpdateWindow(_hSelf);
}
....
}
class SplitterContainer : public Window
{
....
virtual void display(bool toShow = true) const; // <= good
virtual void redraw() const; // <= error
....
}
Notepad ++ code found several problems with function overloading. In the SplitterContainer class inherited from the Window class, the display () method is overloaded correctly, and an error occurred while overloading the redraw () method .
Some more wrong places:
- V762 It is possible a virtual function was overridden incorrectly. See third argument of function 'create' in derived class 'UserDefineDialog' and base class 'StaticDialog'. userdefinedialog.h 332
- V762 It is possible a virtual function was overridden incorrectly. See third argument of function 'create' in derived class 'FindReplaceDlg' and base class 'StaticDialog'. findreplacedlg.h 245
- V762 It is possible a virtual function was overridden incorrectly. See third argument of function 'create' in derived class 'GoToLineDlg' and base class 'StaticDialog'. gotolinedlg.h 45
- V762 It is possible a virtual function was overridden incorrectly. See third argument of function 'create' in derived class 'FindCharsInRangeDlg' and base class 'StaticDialog'. findcharsinrange.h 52
- V762 It is possible a virtual function was overridden incorrectly. See third argument of function 'create' in derived class 'ColumnEditorDlg' and base class 'StaticDialog'. columneditor.h 45
- V762 It is possible a virtual function was overridden incorrectly. See third argument of function 'create' in derived class 'WordStyleDlg' and base class 'StaticDialog'. wordstyledlg.h 77
- V762 It is possible a virtual function was overridden incorrectly. See first argument of function 'redraw' in derived class 'WordStyleDlg' and base class 'Window'. wordstyledlg.h 99
- V762 It is possible a virtual function was overridden incorrectly. See third argument of function 'create' in derived class 'PluginsAdminDlg' and base class 'StaticDialog'. pluginsadmin.h 107
Memory leaks
V773 The function was exited without releasing the 'pXmlDocProject' pointer. A memory leak is possible. projectpanel.cpp 326
bool ProjectPanel::openWorkSpace(const TCHAR *projectFileName)
{
TiXmlDocument *pXmlDocProject = new TiXmlDocument(....);
bool loadOkay = pXmlDocProject->LoadFile();
if (!loadOkay)
return false; // <=
TiXmlNode *root = pXmlDocProject->FirstChild(TEXT("Note...."));
if (!root)
return false; // <=
TiXmlNode *childNode = root->FirstChildElement(TEXT("Pr...."));
if (!childNode)
return false; // <=
if (!::PathFileExists(projectFileName))
return false; // <=
....
delete pXmlDocProject; // <= free pointer
return loadOkay;
}
An interesting example of a memory leak is this function. Dynamic memory is allocated under the pXmlDocProject pointer , but it is freed only when the function is executed to the end. Which, most likely, is a flaw leading to memory leaks.
V773 Visibility scope of the 'pTextFind' pointer was exited without releasing the memory. A memory leak is possible. findreplacedlg.cpp 1577
bool FindReplaceDlg::processReplace(....)
{
....
TCHAR *pTextFind = new TCHAR[stringSizeFind + 1];
TCHAR *pTextReplace = new TCHAR[stringSizeReplace + 1];
lstrcpy(pTextFind, txt2find);
lstrcpy(pTextReplace, txt2replace);
....
}
The processReplace () function is called each time a certain substring in the document is replaced. The code allocates memory for two buffers: pTextFind and pTextReplace . The search string is copied to one buffer, and the replacement string to another. There are several errors that result in a memory leak:
- The pTextFind buffer is not flushed and is not used at all in the function. For replacement, the original txt2find buffer is used .
- The pTextReplace buffer is subsequently used, but memory is also not freed.
Conclusion: each autocorrect operation results in a leak of several bytes. The larger the search string and the more matches, the more memory flows.
Errors with pointers
V595 The 'pScint' pointer was utilized before it was verified against nullptr. Check lines: 347, 353. scintillaeditview.cpp 347
LRESULT CALLBACK ScintillaEditView::scintillaStatic_Proc(....)
{
ScintillaEditView *pScint = (ScintillaEditView *)(....);
if (Message == WM_MOUSEWHEEL || Message == WM_MOUSEHWHEEL)
{
....
if (isSynpnatic || makeTouchPadCompetible)
return (pScint->scintillaNew_Proc(....); // <=
....
}
if (pScint)
return (pScint->scintillaNew_Proc(....));
else
return ::DefWindowProc(hwnd, Message, wParam, lParam);
}
In one place, we missed the validation of the pScint pointer .
V713 The pointer _langList [i] was utilized in the logical expression before it was verified against nullptr in the same logical expression. parameters.h 1286
Lang * getLangFromID(LangType langID) const
{
for (int i = 0 ; i < _nbLang ; ++i)
{
if ((_langList[i]->_langID == langID) || (!_langList[i]))
return _langList[i];
}
return nullptr;
}
The author of the code made a mistake while writing the conditional expression. It first accesses the _langID field using the _langList [i] pointer , and then compares this pointer to zero.
Most likely, the correct code should be like this:
Lang * getLangFromID(LangType langID) const
{
for (int i = 0 ; i < _nbLang ; ++i)
{
if ( _langList[i] && _langList[i]->_langID == langID )
return _langList[i];
}
return nullptr;
}
Miscellaneous errors
V501 There are identical sub-expressions to the left and to the right of the '! =' Operator: subject! = Subject verifysignedfile.cpp 250
bool VerifySignedLibrary(...., const wstring& cert_subject, ....)
{
wstring subject;
....
if ( status && !cert_subject.empty() && subject != subject)
{
status = false;
OutputDebugString(
TEXT("VerifyLibrary: Invalid certificate subject\n"));
}
....
}
I remember that a vulnerability was found in Notepad ++ that allowed replacing editor components with modified ones. Integrity checks have been added to the code. I don’t know for sure whether this code was written to close the vulnerability or not, but you can judge by the name of the function that it serves for an important check.
Against this background, a check:
subject != subject
It looks extremely suspicious, most likely, it should correctly be like this:
if ( status && !cert_subject.empty() && cert_subject != subject)
{
....
}
V560 A part of conditional expression is always true: 0xff. babygrid.cpp 711
TCHAR GetASCII(WPARAM wParam, LPARAM lParam)
{
int returnvalue;
TCHAR mbuffer[100];
int result;
BYTE keys[256];
WORD dwReturnedValue;
GetKeyboardState(keys);
result = ToAscii(static_cast(wParam),
(lParam >> 16) && 0xff, keys, &dwReturnedValue, 0); // <=
returnvalue = (TCHAR) dwReturnedValue;
if(returnvalue < 0){returnvalue = 0;}
wsprintf(mbuffer, TEXT("return value = %d"), returnvalue);
if(result!=1){returnvalue = 0;}
return (TCHAR)returnvalue;
}
Always true or always false conditional expressions look very suspicious. The constant 0xff is always true. Perhaps they made a typo in the statement and the parameter of the ToAscii () function should have been like this:
(lParam >> 16) & 0xff
V746 Type slicing. An exception should be caught by reference rather than by value. filedialog.cpp 183
TCHAR* FileDialog::doOpenSingleFileDlg()
{
....
try {
fn = ::GetOpenFileName(&_ofn)?_fileName:NULL;
if (params->getNppGUI()._openSaveDir == dir_last)
{
::GetCurrentDirectory(MAX_PATH, dir);
params->setWorkingDir(dir);
}
} catch(std::exception e) { // <=
::MessageBoxA(NULL, e.what(), "Exception", MB_OK);
} catch(...) {
::MessageBox(NULL, TEXT("....!!!"), TEXT(""), MB_OK);
}
::SetCurrentDirectory(dir);
return (fn);
}
It is better to catch exceptions by reference. The problem with this code is that a new object will be created, which will lead to loss of information about the exception during interception. Everything stored in classes inherited from Exception will be unavailable.
V519 The 'lpcs' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 3116, 3117. babygrid.cpp 3117
LRESULT CALLBACK GridProc(HWND hWnd, UINT message,
WPARAM wParam, LPARAM lParam)
{
....
case WM_CREATE:
lpcs = &cs;
lpcs = (LPCREATESTRUCT)lParam;
....
}
Immediately frayed the old value with the new. Sounds like a mistake. If now everything works correctly, then you should leave only the second line with the assignment, and delete the first.
V601 The 'false' value becomes a class object. treeview.cpp 121
typedef std::basic_string generic_string;
generic_string TreeView::getItemDisplayName(....) const
{
if (not Item2Set)
return false; // <=
TCHAR textBuffer[MAX_PATH];
TVITEM tvItem;
tvItem.hItem = Item2Set;
tvItem.mask = TVIF_TEXT;
tvItem.pszText = textBuffer;
tvItem.cchTextMax = MAX_PATH;
SendMessage(...., reinterpret_cast(&tvItem));
return tvItem.pszText;
}
The return value of the function is a string, but instead of an empty string, someone decided to make "return false".
Code cleaning
Doing refactoring for the sake of refactoring is not worth it; there are many other useful tasks in any project. But getting rid of useless code is definitely necessary.
V668 There is no sense in testing the 'source' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error. notepad_plus.cpp 1149
void Notepad_plus::wsTabConvert(spaceTab whichWay)
{
....
char * source = new char[docLength];
if (source == NULL)
return;
....
}
So why is this check here? According to the modern C ++ standard, the new operator throws an exception if there is not enough memory, and does not return nullptr .
This function is called when replacing all tabs with spaces in the entire document. Taking a large text document, I was convinced that the lack of memory in this place really leads to the crash of the program.
If you correct the check, then when there is not enough memory, the operation of replacing characters will be canceled, and the editor can be used further. All such places should be fixed, and there are so many of them that I had to put the list in a separate file .
V713The pointer commentLineSymbol was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 3928
bool Notepad_plus::doBlockComment(comment_mode currCommentMode)
{
....
if ((!commentLineSymbol) || // <=
(!commentLineSymbol[0]) ||
(commentLineSymbol == NULL)) // <= WTF?
{ .... }
....
}
There are ten such strange and useless checks:
- V713 The pointer commentLineSymbol was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 3928
- V713 The pointer commentStart was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 3931
- V713 The pointer commentEnd was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 3931
- V713 The pointer commentStart was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 4228
- V713 The pointer commentEnd was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 4228
- V713 The pointer commentLineSymbol was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 4229
- V713 The pointer commentStart was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 6554
- V713 The pointer commentEnd was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 6554
- V713 The pointer commentLineSymbol was utilized in the logical expression before it was verified against nullptr in the same logical expression. notepad_plus.cpp 6555
V601 The 'true' value is implicitly cast to the integer type. pluginsadmin.cpp 603
INT_PTR CALLBACK PluginsAdminDlg::run_dlgProc(UINT message, ....)
{
switch (message)
{
case WM_INITDIALOG :
{
return TRUE;
}
....
case IDC_PLUGINADM_RESEARCH_NEXT:
searchInPlugins(true);
return true;
case IDC_PLUGINADM_INSTALL:
installPlugins();
return true;
....
}
....
}
The run_dlgProc () function already returns a value of a far from logical type, so they return either true / false or TRUE / FALSE in the code . I wanted to write that there are at least one type of indentation in the function, but no: out of 90 lines of the function, in one line there is still a mixture of tabs and spaces. The remaining lines use tab characters. Yes, all this is not critical, but the code looks sloppy for me, as an outside observer.
V704 '! This' expression in conditional statements should be avoided - this expression is always false on newer compilers, because 'this' pointer can never be NULL. notepad_plus.cpp 4980
void Notepad_plus::notifyBufferChanged(Buffer * buffer, int mask)
{
// To avoid to crash while MS-DOS style is set as default
// language,
// Checking the validity of current instance is necessary.
if (!this) return;
....
}
I would attribute such checks to useless code. As you can see from the comment, earlier the problem of dereferencing the zero this was. According to the modern C ++ language standard, such a check is superfluous.
List of all such places:
- V704 'this && type == DOCUMENT' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxmla.h 505
- V704 'this && type == ELEMENT' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxmla.h 506
- V704 'this && type == COMMENT' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxmla.h 507
- V704 'this && type == UNKNOWN' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxmla.h 508
- V704 'this && type == TEXT' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxmla.h 509
- V704 'this && type == DECLARATION' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxmla.h 510
- V704 'this && type == DOCUMENT' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxml.h 505
- V704 'this && type == ELEMENT' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxml.h 506
- V704 'this && type == COMMENT' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxml.h 507
- V704 'this && type == UNKNOWN' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxml.h 508
- V704 'this && type == TEXT' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxml.h 509
- V704 'this && type == DECLARATION' expression should be avoided: 'this' pointer can never be NULL on newer compilers. tinyxml.h 510
- V704 'this' expression in conditional statements should be avoided — this expression is always true on newer compilers, because 'this' pointer can never be NULL. nppbigswitch.cpp 119
Заключение
Other errors were found that were not included in the article. If desired, Notepad ++ authors can independently verify the project and carefully examine the warnings. We are ready to provide them with a temporary license.
Of course, the described problems are not visible to the average user. RAM modules are now large and cheap :). Nevertheless, the development of the project continues, and the quality of the code, as well as the convenience of its support, can be significantly improved by correcting the errors found and removing layers of outdated code.
According to my calculations, for 1000 lines of code, the PVS-Studio analyzer detected 2 real errors. Of course, these are not all the mistakes. I think, in fact, there will be somewhere around 5-10 per 1000 lines of code, which corresponds to a low error density. The size of the Notepad ++ project is 95 KLoc, which means a typical error density for projects of this size: 0-40 errors per 1000 lines of code. However, the source of these numbers is old, and I believe that, on average, the quality of the code is now higher.
Once again, I want to thank the authors of Notepad ++ for their work on an extremely useful tool and wish them every success.
If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov. Checking Notepad ++: five years later
Have you read the article and have a question?
Often our articles are asked the same questions. We collected the answers here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please see the list.