NevRA March 30, 2011 at 09:49

Writing a viewer for the MS Exchange mail database (part 1)

In one big, big country, small, pretty little people lived. And everything was fine until a deep, deep pit appeared right in the center of this large, or rather large country. Well, I must say, she did not appear alone and, of course, not right away, but no one will remember about it and no one cares.

The main thing was the pit and it was large, even very large, or rather infinite, and more precisely no one knows. But the government of a large country knew that something had to be done with the pit, and it was decided to study and fall asleep. What was caused by the most presamy, dexterous and courageous, smart and most smart men and women of those who were in this beautiful state.

And the most presamy set about. They studied and fell asleep, fell asleep and studied, studied, fell asleep, fell asleep, studied, and this was not repeated many, but many times.

It so happened that our little most-most heroes left, or rather they had to leave, but the pit remained. But new heroes came, as small as they were, but also the very same ones, and began to do the same.

In general, this went on for as long as one remained. And one fine day, the pit disappeared, though itself, and appeared somewhere in a completely different place, where they began to study and bury it again ...

All coincidences are accidental, everything is fictitious and there has never been anything like it and could not be in reality! True, true, well, or almost true.

One such pit will be discussed later. But since Privacy Policy we will cover only a small piece of it, which can be reached using only open sources and a documented API.

The material will be almost exclusive as information on the topic of grains.

Have you ever thought where MS Exchange Server stores all your mail, or how does it work with it at the lowest level? Here is a little about this and I'm going to write here.

Warning: Do not try to go deep into this topic, and your whole life will not be enough. I warned.

Introduction

MS Exchange Server (hereinafter simply Exchange) is one of the flagships in the Microsoft product line. You can read about its main functions in the wiki or on the official website . In short, this is a kind of "harvester" for working with mail, calendars and other user data, which has ample integration with various MS products (SharePoint, TFS, etc.).

But in the framework of this article we will be interested not in what it provides to the end user, but in where it gets this data and which API it uses for this. We will try to independently read the mailbox database on the Mailbox role of Exchange 2010 ( Mailbox Server ).

Exchange has multiple entry points ( CAS Server) through which the user can access his data and several protocols that he can use for this, for example, OWA, RPC (Outlook), POP3 / IMAP4.

Regardless of the method, gaining access to Exchange sends all requests to the Mailbox role (before Exchange 2007 this was the only role), which, among other things, contains the user mailbox databases that we are interested in today. Physically, these databases are located on the hard disk inside the * .edb files. They can be found in the Mailbox \ <database name> folder in the directory where Exchange was installed. In addition, transaction logs and other files related to the database life cycle are placed there, but we will not need them, the most basic thing for us is * .edb.

If you dig a bit, you can find out that Exchange uses the Extensible Storage Engine (ESE) to access the contents of the databases ! And if you dig around, it becomes clear that the implementation of ESE functions is in the library ese.dll (or esent.dll). This is the core of all Exchange operations . ESE provides an extensive set of tools for working with the database. Description of functions, constants, structures and everything that may be needed can be found here.. Unfortunately, this documentation has not been updated for a long time, so there are no a number of features that appeared in Exchange 2010, but we will not need them in this topic. You can find ese.dll in the Bin folder inside the Exchange main directory.

After reading the ESE description, it becomes clear that the data is stored in tabular form, i.e. a set of tables with columns and rows. The table cells can be of different types, well, plus all the features of the bases: indexes, search, etc.

In total, we know that Exchange stores its databases on Mailbox roles, in the form of files with the .edb extension, and accesses them thanks to ESE (ese.dll). This is enough for us and we can start coding.

We will list the tables in the database, as well as all columns and columns. Of course we ~~never~~we don’t know what they mean because only MS knows about this, well, and a few more people. In these very columns there should be almost all information related to mailboxes, starting from the folder names of the user, ending with letters, but to understand where the task of reverse engineering is already will not be considered here.

Programming

Training

First we need:

Visual Studio 2008-2010
MS Exchange Server 2010 (any other is also suitable, but in this topic we will talk about 2010)
Knowledge of C / C ++

In Exchange, we create the base, which we will try to read. You can use the Exchange Management Console ( EMC ) for this. I will not describe the procedure. there is a lot of information on the Internet on this subject. We will create one user in this database (via the same EMC) so that it contains some content and log into this mailbox by this user to verify that everything is done correctly, for example, through OWA. After that, we go to the Mailbox directory and look there for a folder with the name of our database, and in it an EDB file. Before copying the base, unmount it through EMC. Everything, we have a base for experiments. We copy it somewhere, for example, to the directory of the future project.

Copy ese.dll from the Bin folder, thanks to which we will work with the database.

In Visual Studio we create a console C ++ project. Here is an important nuance, because Exchange 2010 (unlike all previous versions) has only a 64-bit version, then we will have to create a project with x64 support, because otherwise, we simply cannot load ese.dll in our address space. Therefore, to test the application, you need a 64-bit version of the OS, you can certainly test it on Exchange itself, but for this purpose I use my workstation with Windows 7. We will also use the unicode version of the API, so it is better to make unicode the default encoding in the project.

And so, in the newly created project, we make sure that there are support for x64 and Unicode (General - Use Unicode Character Set). Now we connect the main header file for ESE:

#include

This file comes with the SDK together with the studio since VS 2008.
In stdafx.h we add 2 defines, with the JET version (ESE), and indicate that we want to use the unicode version of the API:

#define JET_UNICODE
#define JET_VERSION 0x0600

Well, now we need to determine what we want to get from the database. ESE is a database with tables, columns and rows, which is exactly what we will try to extract from it: tables, columns and rows. To do this, prepare the following structures:

typedef struct tagDBColumnsInfo
{
    std :: wstring sColumnName;
    std :: vector sColumnValues;
} SDBColumnInfo;

typedef struct tagDBTableInfo
{
    std :: wstring sTableName;
    std :: vector sColumnInfo;
} SDBTableInfo;

typedef struct tagDBTablesInfo
{
    std :: wstring sDBName;
    std :: vector sTablesInfo;
} SDBTablesInfo;

The first thing to do is load the DLL itself, do this, as always, via :: LoadLibrary (...).
We will load the function from ese.dll dynamically and we will need the following functions :

Jetinit
JetCreateInstanceW
JetBeginSessionW
JetAttachDatabaseW
JetOpenDatabaseW
Jetclosedatabase
JetDetachDatabaseW
Jetterm
JetSetSystemParameterW
JetOpenTableW
JetGetColumnInfoW
JetRetrieveColumns
Jetove
JetGetTableColumnInfoW
JetCloseTable
JetGetSystemParameter

Base opening

After we have successfully loaded the functions we need, we begin to read the database directly. According to MSDN, you must specify the database page size , through setting the JET_paramDatabasePageSize (esent.h) parameter . This is where the difficulty comes in. it is impossible to find out this value having only an EDB file, but you need to specify exactly otherwise the database will not open. This can be done through eseutils (included with Exchange), but I went a little different way, and found out that this value is constant for the same versions of Exchange and is always a multiple of 4096. So it was experimentally found out that it is 32768 for Exchange 2010 .

Ok, first of all, set the page size:

JET_ERR jRes = _JetSetSystemParameter (NULL, NULL, JET_paramDatabasePageSize, 32768, NULL);

JET_ERR is just a long that contains an error code. You can turn this code into a text description with the JetGetSystemParameter function (ala :: FormatMessage (...)):

JetGetSystemParameter (m_instance, m_sesid, JET_paramErrorToString,
reinterpret_cast(& jeterror), cBuff, MAX_BUFFER_SIZE);

For convenience of parsing the error code, I use the following macro (m_cLog is my internal logging class):

#define WRITE_TO_LOG_AND_RETURN_IF_ERROR (jeterror) \
if (jeterror) {\
char cBuff [MAX_BUFFER_SIZE] = {0}; \
if (m_instance) _JetGetSystemParameter (m_instance, m_sesid, \
JET_paramErrorToString, reinterpret_cast(& jeterror), cBuff, MAX_BUFFER_SIZE); \
m_cLog.write (m_sEDBPath, cBuff, jeterror, __FILE__, __LINE__); \
return jeterror; }

Now you need to disable callbacks specific to Exchange, as we don't know anything about them:

jRes = _JetSetSystemParameter (NULL, NULL, JET_paramDisableCallbacks, true, NULL);

Next, create a new instance (JET_INSTANCE m_instance) for working with the database:

jRes = _JetCreateInstance (& m_instance, NULL);

We initialize the created instance to start working with the database:

jRes = _JetInit (& m_instance);

Starting a new session (JET_SESID m_sesid):

jRes = _JetBeginSession (m_instance, & m_sesid, NULL, NULL);

We include our EDB file:

jRes = _JetAttachDatabase (m_sesid, L "demo.edb", JET_bitDbReadOnly);

And open it:

jRes = _JetOpenDatabase (m_sesid, L "demo.edb", NULL, & m_dbid, JET_bitDbReadOnly);

Total, if all functions returned JET_errSuccess, then the database is open, which means you can start reading the contents.

Next up is some code. I will bring it because on this subject you will not find him in the afternoon with fire.

We list tables

To enumerate, we write the following function:

JET_ERR CJetDBReaderCore :: EnumRootTables (SDBTablesInfo & sDBTablesInfo)
{
    sDBTablesInfo.sDBName = m_sEDBPath;
    JET_ERR jRes = OpenTable (ROOT_TABLE);
    if (jRes == JET_errSuccess)
    {
        JET_COLUMNBASE sNameInfo,
                                  sTypeInfo;
        if (! ReadFromTable (ROOT_TABLE, NAME_COLUMN, sNameInfo) &&
                ! ReadFromTable (ROOT_TABLE, TYPE_COLUMN, sTypeInfo))
        {
            JET_RETRIEVECOLUMN sJetRC [2];
            sJetRC [0] .columnid = sNameInfo.columnid;
            sJetRC [0] .cbData = sNameInfo.cbMax;
            sJetRC [0] .itagSequence = 1;
            sJetRC [0] .grbit = 0;
            CHAR szName [MAX_BUFFER_SIZE];
            sJetRC [0] .pvData = szName;

            sJetRC [1] .columnid = sTypeInfo.columnid;
            sJetRC [1] .cbData = sTypeInfo.cbMax;
            sJetRC [1] .itagSequence = 1;
            sJetRC [1] .grbit = 0;
            WORD wType;
            sJetRC [1] .pvData = & wType;

            do
            {
                jRes = GetColumns (ROOT_TABLE, sJetRC, 2);
                if (jRes! = JET_errSuccess) return jRes;
                if (wType == 1)
                {
                    szName [sJetRC [0] .cbActual] = 0;

                    SDBTableInfo sTableInfo;
                    std :: string tmp (szName);
                    sTableInfo.sTableName.assign (tmp.begin (), tmp.end ());

                    sDBTablesInfo.sTablesInfo.push_back (sTableInfo);
                }

            } while (! TableEnd (ROOT_TABLE));
        }

        jRes = CloseTable (ROOT_TABLE);
    }

    return jRes;
}

Where:

ROOT_TABLE - “MSysObjects”, let's call this table root, because it contains a list of all other tables in the database.
NAME_COLUMN - “Name”, a column containing the names of all tables.
TYPE_COLUMN - “Type”, a column containing the type of the table.

As you can see in the code, first we open the root table, this is done through the JetOpenTable function :

JET_ERR CJetDBReaderCore :: OpenTable (std :: wstring sTableName)
{
    std :: map:: const_iterator iter = m_tables.find (sTableName);
    if (iter == m_tables.end ())
    {
        JET_TABLEID tableid (0);
        JET_ERR jRes = _JetOpenTable (m_sesid, m_dbid, sTableName.c_str (), NULL,
            0, JET_bitTableReadOnly, & tableid);
        WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 (jRes)

        m_tables [sTableName] = tableid;
    }

    return JET_errSuccess;
}

Next, we get information about the columns inside ReadFromTable, because we need her id to get the contents:

JET_ERR CJetDBReaderCore :: ReadFromTable (
    std :: wstring sTableName,
    std :: wstring sColumnName,
    JET_COLUMNBASE & sColumnBase)
{
    std :: map :: const_iterator iter = m_tables.find (sTableName);
    if (iter! = m_tables.end ())
    {
        JET_ERR jRes = _JetGetColumnInfo (m_sesid, m_dbid, sTableName.c_str (),
            sColumnName.c_str (), & sColumnBase, sizeof (JET_COLUMNBCol); JET
        WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 (jRes)
    }

    return JET_errSuccess;
}

Having Id fill the JET_RETRIEVECOLUMN structure , having which we do JetRetrieveColumns inside GetColumns to get the table name:

JET_ERR CJetDBReaderCore :: GetColumns (
    std :: wstring sTableName,
    JET_RETRIEVECOLUMN * sJetRC,
    INT nCount)
{
    std :: map :: const_iterator iter = m_tables.find (sTableName);
    if (iter! = m_tables.end ())
    {
        JET_ERR jRes = _JetRetrieveColumns (m_sesid, iter-> second, sJetRC, nCount);
        WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 (jRes)
    }

    return JET_errSuccess;
}

That's it, we've got the list of tables, let's move on to getting the contents of the columns. For each table, we will receive a list of columns in it, and as we receive it, we will save this information in our structures.

We list the columns

We will write the following function:

JET_ERR CJetDBReaderCore :: EnumColumns (
    SDBTableInfo & sTableInfo,
    std :: list & sColumnsInfo)
{
    if (! OpenTable (sTableInfo.sTableName))
    {
        JET_COLUMNLIST sColumnInfo;
        GetTableColumnInfo (sTableInfo.sTableName, & sColumnInfo);
        MoveToFirst (sTableInfo.sTableName);

        char szNameBuff [MAX_BUFFER_SIZE];
        do
        {
            SColumnInfo ci;
            JET_RETRIEVECOLUMN sJetRC [4];

            sJetRC [0] .columnid = sColumnInfo.columnidcolumnname;
            sJetRC [0] .cbData = sizeof (szNameBuff);
            sJetRC [0] .itagSequence = 1;
            sJetRC [0] .grbit = 0;
            sJetRC [0] .pvData = szNameBuff;

            sJetRC [1] .columnid = sColumnInfo.columnidcolumnid;
            sJetRC [1] .cbData = sizeof (DWORD);
            sJetRC [1] .itagSequence = 1;
            sJetRC [1] .grbit = 0;
            sJetRC [1] .pvData = & ci.dwId;

            sJetRC [2] .columnid = sColumnInfo.columnidcoltyp;
            sJetRC [2] .cbData = sizeof (DWORD);
            sJetRC [2] .itagSequence = 1;
            sJetRC [2] .grbit = 0;
            sJetRC [2] .pvData = & ci.dwType;

            sJetRC [3] .columnid = sColumnInfo.columnidcbMax;
            sJetRC [3] .cbData = sizeof (DWORD);
            sJetRC [3] .itagSequence = 1;
            sJetRC [3] .grbit = 0;
            sJetRC [3] .pvData = & ci.dwMaxSize;

            GetColumns (sTableInfo.sTableName, sJetRC, 4);

            ci.sName.assign (reinterpret_cast (sJetRC [0] .pvData), sJetRC [0] .cbActual / 2);

            SDBColumnInfo sDBColumnInfo;
            sDBColumnInfo.sColumnName = ci.sName;

            sColumnsInfo.push_back (ci);
            sTableInfo.sColumnInfo.push_back (sDBColumnInfo);
        }
        while (! TableEnd (sTableInfo.sTableName));

        CloseTable (sTableInfo.sTableName);
    }

    return JET_errSuccess;
}

Here we open the table again, but not root, but the one that we found in the previous step.

Next, you need to get information about all the columns, for this we get a pointer to the first and go to the last one one by one:

JET_ERR CJetDBReaderCore :: MoveToFirst (std :: wstring sTableName)
{
    std :: map :: const_iterator iter = m_tables.find (sTableName);
    if (iter! = m_tables.end ()) // if already open
    {
        JET_ERR jRes = _JetMove (m_sesid, iter-> second, JET_MoveFirst, 0);
        BOOL bIsEmpty = (jRes == JET_errNoCurrentRecord);
        if (bIsEmpty) return jRes; // Ingnore if empty
        WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 (jRes);
    }

    return NO_ERROR;
}

JET_ERR CJetDBReaderCore :: GetTableColumnInfo (
    std :: wstring sTableName,
    JET_COLUMNLIST * pCl,
    BOOL bReplaceOld)
{
    JET_ERR jRes = JET_errSuccess;
    std :: map :: iterator iter = m_tables.find (sTableName);
    if (iter! = m_tables.end ())
    {
        jRes = _JetGetTableColumnInfo (m_sesid, iter-> second, NULL, pCl,
             sizeof (JET_COLUMNLIST), JET_ColInfoList);
        WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 (jRes)

        if (bReplaceOld) // if you not need last time open table
        {
            jRes = CloseTable (sTableName);
            m_tables [sTableName] = pCl-> tableid;
        }
        else
        {
            jRes = _JetCloseTable (m_sesid, pCl-> tableid);
            WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 (jRes)
        }
    }

    return jRes;
}

Sorry, the first version of the post was cut off, and I did not have time to respond in time, apparently there is some kind of restriction on the size of the article, so it will be divided into two parts.
To be continued ...

In the next part, we will read the base, draw conclusions, and look at an example of data that can be "torn out" of the base.

PS This code is an adapted and reduced version for the post. Therefore, there are some flaws in the code, or rather gags in places of truncated functionality. Please do not pay attention to them, this is not production, but I wanted to show working examples. The code is fully working and written so that it can be placed on the Internet and at the same time do not “eat” all the space on the page. Thank you for understanding.

PPS I understand that, due to the specifics, this information is unlikely to be useful for a wide range of people, but if it helps even one person, I will be glad and the time spent on this post will pay off.