
Amateur and back-engineering. Part 2: Wireframe

Last time, I described the beginning of my reverse engineering relationship. A little more time has passed and now, to some extent, the result of my research.
I am trying to restore the sources from a .dll library and a .pdb database. Using IDA certainly brought some results, but not satisfactory. Maybe I'm just not diligent. Therefore, I started on the other hand - with the restoration of the library project framework. Since I have a .pdb database, I can do it quite well. In theory. Theoretically, because the database records information from preprocessed files, and not from the source. So you need to work on.
Filling
I'll start the story with theory. Structurally .pdb-base is a set of characters (any variable, structure, function, enumeration, type, all these are characters) interconnected. Symbols are divided by type, and depending on the type I can have different properties. By reading the properties, you can get a description of structures, functions, overrides, enumerations, constants, including the relationships between all this, the names of the files and .obj-modules in which the functions are located, and much more. For access to symbols there is a DIA SDK (Debug Interface Access), it is well documented and it is not very difficult to deal with it. The only “problem” is that the DIA out of the box is available only for C / C ++, and if you want to work on .Net, you will need to work by moving the interface to .Net .dll, but that's another story. You can just find the finished module. Personally, I chose the second option after finding Dia2Lib.dll,
Perhaps there is some kind of ready-made solution for generating code from a .pdb database, but I did not find it. And now I am writing my own. I write in C #, there is less trouble with memory, although at the cost of the convenience of working with files. First, we needed classes for describing characters. Standard ones (those from Dia2Lib) are a bit uncomfortable. More precisely, if you want to twirl data in three degrees of freedom, they simply can not stand it.
Classes for Processing Character Data
class Member {
public string name;
public int offcet; //сдвиг поля
public ulong length; //размер поля в байтах
public string type; //полный тип поля, с указателями, константами, выравниваем и т.д.
public string access; //уровень доступа
public uint id; //для идентификации одинаковых типов
}
class BaseClass {
public string type;
public int offcet; //для порядка наследования
public ulong length;
public uint id;
}
class Function {
public string name;
public string type;
public string access;
public string filename; //имя файла, где находится функция
public uint id;
}
class Typedef {
public string name;
public string type;
public uint id;
}
class Enum {
public string name;
public uint id;
public SubEnum[] values;
}
class SubEnum {
public string name;
public dynamic value;
public uint id;
}
class VTable {
public ulong count; //размер таблицы
public string type;
public uint id;
}
class SubStructure {
public string name;
public uint id;
}
class Structure {
public string name;
public uint id;
public Member[] members;
public BaseClass[] baseclass;
public Function[] functions;
public Typedef[] typedefs;
public Enum[] enums;
public VTable[] vtables;
public SubStructure[] substructures;
}
Arrays of these structures can be filled up with banal enumeration of characters and get the basis for the framework. After the problems begin. The first problem, it was already mentioned, in the database all structures from preprocessed files are recorded. Like for example this:
The first example is not very necessary structure
struct /*id:2*/ _iobuf
{
/*off 0x00000000 size:0004 id:5*/ public: char * _ptr;
/*off 0x00000004 size:0004 id:8*/ public: signed int _cnt;
/*off 0x00000008 size:0004 id:5*/ public: char * _base;
/*off 0x00000012 size:0004 id:8*/ public: signed int _flag;
/*off 0x00000016 size:0004 id:8*/ public: signed int _file;
/*off 0x00000020 size:0004 id:8*/ public: signed int _charbuf;
/*off 0x00000024 size:0004 id:8*/ public: signed int _bufsiz;
/*off 0x00000028 size:0004 id:5*/ public: char * _tmpfname;
};
Few people can use the structure from the standard library. But if they can still be tracked somehow, then there is an worse example.
The second example is not very necessary structure
struct /*id:24371*/ std::allocator,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node>:/*0x0 id:24351*/ std::_Allocator_base,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node>
{
//
/*id:24362*/ public: __thiscall const std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node * address (const std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node &);
//
/*id:24364*/ public: __thiscall std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node * address (std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node &);
//
/*id:24367*/ public: __thiscall void allocator,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node> (const std::allocator,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node> &);
//
/*id:24372*/ public: __thiscall void allocator,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node> ();
//:d:\program files\microsoft visual studio .net 2003\vc7\include\xmemory
/*id:24374 */public: void __thiscall std::allocator,class std::allocator >,struct std::less,class std::allocator,class std::allocator > > >,0> >::_Node>::deallocate(struct std::_Tree_nod,class std::allocator >,struct std::less,class std::allocator,class std::allocator > > >,0> >::_Node *,unsigned int);
//
/*id:24376*/ public: __thiscall std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node * allocate (unsigned int ,const void *);
//:d:\program files\microsoft visual studio .net 2003\vc7\include\xmemory
/*id:24378 */public: struct std::_Tree_nod,class std::allocator >,struct std::less,class std::allocator,class std::allocator > > >,0> >::_Node * __thiscall std::allocator,class std::allocator >,struct std::less,class std::allocator,class std::allocator > > >,0> >::_Node>::allocate(unsigned int);
//
/*id:24380*/ public: __thiscall void construct (std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node *,const std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node &);
//:d:\program files\microsoft visual studio .net 2003\vc7\include\xmemory
/*id:24384 */public: void __thiscall std::allocator,class std::allocator >,struct std::less,class std::allocator,class std::allocator > > >,0> >::_Node>::destroy(struct std::_Tree_nod,class std::allocator >,struct std::less,class std::allocator,class std::allocator > > >,0> >::_Node *);
//
/*id:24386*/ public: __thiscall unsigned int max_size ();
structure /*id:24353*/ value_type;
typedef /*id:24352*/std::_Allocator_base,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node> _Mybase;
typedef /*id:24354*/std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node * pointer;
typedef /*id:24355*/std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node & reference;
typedef /*id:24357*/const std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node * const_pointer;
typedef /*id:24359*/const std::_Tree_nod,std::allocator >,std::less,std::allocator,std::allocator > > >,0> >::_Node & const_reference;
typedef /*id:24360*/unsigned int size_type;
typedef /*id:24361*/signed int difference_type;
};
And even if you make a filter on standard template structures, there will remain a bunch of language features that unfold or change during translation. As an example, I can name custom templates.
Template sweep example
struct /*id:16851*/ S_BVECTOR
{
/*off 0x00000000 size:0016 id:9357*/ private: std::vector > m_VECPath;
/*off 0x00000016 size:0004 id:8*/ private: signed int m_nCount;
/*off 0x00000020 size:0004 id:8*/ private: signed int m_nPos;
/*id:9360 */public: __thiscall S_BVECTOR::S_BVECTOR(class S_BVECTOR const &);
/*id:9362 */public: __thiscall S_BVECTOR::S_BVECTOR(void);
/*id:9364 */public: void __thiscall S_BVECTOR::resize(unsigned short);
/*id:9366*/ public: __thiscall void addsize (unsigned short );
/*id:9368 */public: void __thiscall S_BVECTOR::setsize(unsigned short);
/*id:9369*/ public: __thiscall void setsizeNew (unsigned short );
/*id:9370 */public: void __thiscall S_BVECTOR::clear(void);
/*id:9371 */public: void __thiscall S_BVECTOR::push_back(struct D3DXVECTOR2 &);
/*id:9373*/ public: __thiscall void pop_front ();
/*id:9374*/ public: __thiscall void pop_back ();
/*id:9375 */public: int __thiscall S_BVECTOR::size(void);
/*id:9377 */public: bool __thiscall S_BVECTOR::empty(void);
/*id:9379*/ public: __thiscall D3DXVECTOR2 * front ();
/*id:9381*/ public: __thiscall D3DXVECTOR2 * next ();
/*id:9382*/ public: __thiscall D3DXVECTOR2 * end ();
/*id:9383 */public: struct D3DXVECTOR2 * __thiscall S_BVECTOR::operator[](int);
/*id:9385*/ public: __thiscall void remove (signed int );
/*id:9387 */public: __thiscall S_BVECTOR::~S_BVECTOR(void);
/*id:9388*/ public: __thiscall void * __vecDelDtor (unsigned int );
};
Of course, everything can be easily returned to its original form. But situations where manual processing is needed can be quite a lot. For example, for the library that I want to use, 2673 structures are written in the database. Of these, only about 250 are really needed, the rest are std template scans and other “standard” things. One can only hope that everything goes without problems. Well, suppose there are blanks for structures. Next you need to write them to files.
Generation
First you need the files themselves for recording. A bit of theory. When compiling, each source with the code after the preprocessor is translated, using the compiler, into machine codes. From each source code, a .obj file or .o file is obtained, depending on the compiler. Using the DIA SDK, you can get a list of all files from each .obj module (in short, the entire list of what is included in #include). How to get a list of files was described in a previous article (well, as described ... in general, there is code) Speaking in the language of the amateur, from each .obj module you can get the source name that the module used to be (they will have the same name) and a list of connected libraries (this includes all files except .cpp, although there are exceptions). After creating a common structure, and linking the parts together, you can start recording structures.
It is impossible, as far as I know, to get the name of the file in which the structure existed when it existed in the form of the source. But you can find out by what files the implementation of the structure methods was scattered. Therefore, I suggest that you simply collect all the files that include function methods, select the one that will be the header from them, write a description there, and associate the remaining files with the header. But when you get the name of the source in which the method is located it can be unpleasant or a bug, or a manifestation of a file error. To get the name, first you need to find the list of source lines by RVA (relative virtual address), and then find the file that contains these lines from this list of lines. But sometimes the number of lines corresponding to the method is zero, but the file name is still located. And usually the wrong name. This usually manifests itself in the analysis of the constructor.
Constructor beat structure example
// Над каждой функцией записано имя файла-исходника откуда функция родом. Файлы перед описанием структуры - просто перезапись всех исходников, но без повторений.
//e:\????\kop\project\mindpower\sdk\src\mpfont.cpp
//e:\????\kop\project\mindpower\sdk\src\i_effect.cpp
//e:\????\kop\project\mindpower\sdk\include\i_effect.h
struct /*id:9920*/ CTexList
{
/*off 0x00000000 size:0002 id:1138*/ public: unsigned short m_wTexCount;
/*off 0x00000004 size:0004 id:1778*/ public: float m_fFrameTime;
/*off 0x00000008 size:0016 id:9726*/ public: std::vector >,std::allocator > > > m_vecTexList;
/*off 0x00000024 size:0028 id:98*/ public: std::basic_string,std::allocator > m_vecTexName;
/*off 0x00000052 size:0004 id:8384*/ public: IDirect3DTexture8 * m_lpCurTex;
/*off 0x00000056 size:0004 id:8130*/ public: MindPower::lwITex * m_pTex;
//:e:\????\kop\project\mindpower\sdk\src\mpfont.cpp[0]
/*id:9921*/ public: __thiscall void CTexList::CTexList (const CTexList &);
//:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[3]
/*id:9927*/ public: __thiscall void CTexList::CTexList ();
//:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[2]
/*id:9929*/ public: __thiscall void CTexList::~CTexList ();
//:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[3]
/*id:9930*/ public: __thiscall void CTexList::SetTextureName (const std::basic_string,std::allocator > &);
//:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[16]
/*id:9932*/ public: __thiscall void CTexList::GetTextureFromModel (CEffectModel *);
//:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[25]
/*id:9934*/ public: __thiscall void CTexList::CreateSpliteTexture (signed int ,signed int );
//:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[16]
/*id:9936*/ public: __thiscall void CTexList::GetCurTexture (S_BVECTOR &,unsigned short &,float &,float );
//:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[2]
/*id:9938*/ public: __thiscall void CTexList::Reset ();
//:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[7]
/*id:9939*/ public: __thiscall void CTexList::Clear ();
//:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[6]
/*id:9940*/ public: __thiscall void CTexList::Remove ();
//:e:\????\kop\project\mindpower\sdk\include\i_effect.h[12]
/*id:9941*/ public: __thiscall void CTexList::Copy (CTexList *);
};
Usually, and not surprisingly, the structures are in two files, header.h and code.cpp, but there are other options. For example, the structure has only a header, or the file with the code is represented with the extension .inl, or the structure is generally not written anywhere, according to the .pdb database. I used the following algorithm. If there is a header in the list of files into which the structure is included, we write the structure in the header and connect it to the file with the code, if any. We go through the structure, making a list of all types that are used. If the type is a structure, and there is a list of files for it, we connect the header of this structure, otherwise we write this structure to the beginning of the file. There is another unpleasant moment: structures are very fond of duplicating. I don’t have the slightest idea why many of them occur several times, and one after the other (in fact, not one after another, there are many standard templates between them, but if you enable the filter, then one by one). Moreover, the properties \ methods of such structures coincide, but they differ only in serial number. Personally, I just sorted the array with the structures behind the names of the structures, and when iterating over all the elements, I compared the name of the current with the name of the previous one. And it worked.
Result
Although it all worked, but, of course, not as I would like. Of course, it created a bunch of files that generally, as I hope, reflected the structure of the original project, but there’s such a mess ...
One of the generated files is lwitem.h
//Для удобства читания и уменьшения обьема текста методы удалены
#ifndef __MINDPOWER::LWITEM__
#define __MINDPOWER::LWITEM__
#ifndef _MINDPOWER::LWIRESOURCEMGR_
#define _MINDPOWER::LWIRESOURCEMGR_
struct MindPower::lwIResourceMgr:MindPower::lwInterface
{
//57 методов
};
#endif
#ifndef _MINDPOWER::LWISCENEMGR_
#define _MINDPOWER::LWISCENEMGR_
struct MindPower::lwISceneMgr:MindPower::lwInterface
{
//15 методов
};
#endif
#ifndef _MINDPOWER::LWLINKCTRL_
#define _MINDPOWER::LWLINKCTRL_
struct MindPower::lwLinkCtrl
{
//3 меода
};
#endif
#include lwitypes2.h
#ifndef _STD::ALLOCATOR,STD::ALLOCATOR > >::REBIND_
#define _STD::ALLOCATOR,STD::ALLOCATOR > >::REBIND_
struct std::allocator,std::allocator > >::rebind
{
typedef std::allocator,std::allocator >,std::allocator,std::allocator > > >::_Node *> other;
};
#endif
#ifndef _MINDPOWER::LWIPRIMITIVE_
#define _MINDPOWER::LWIPRIMITIVE_
struct MindPower::lwIPrimitive:MindPower::lwInterface
{
//46 методов
};
#endif
#include d3dx8math.h
#ifndef _STD::_NUM_FLOAT_BASE_
#define _STD::_NUM_FLOAT_BASE_
struct std::_Num_float_base:std::_Num_base
{
//16 свойств-констант
};
#endif
#ifndef _MINDPOWER::LWIITEM_
#define _MINDPOWER::LWIITEM_
struct MindPower::lwIItem:MindPower::lwInterface
{
//26 методов
};
#endif
#ifndef _MINDPOWER::LWITEM_
#define _MINDPOWER::LWITEM_
struct MindPower::lwItem:MindPower::lwIItem
{
//12 свойств
//34 метода
};
#endif
#endif
The main mistakes: there are no namespace, there is no filter for standard templates and replacing them with library connections, there is no internal file structure, the
Github amateur code generator
Such things.