IDA Pro Upgrade. We fix the jambs of processor modules

Published on September 24, 2018

IDA Pro Upgrade. We fix the jambs of processor modules

  • Tutorial


Hello to all,


After quite a long time since the writing of the first article, I still decided, albeit a little bit, but to write articles on the subject of modifying / improving IDA Pro .


In this article we will talk about how to correct the jambs in those processor modules, the source of which you do not have, and the jambs just do not give to live. Unfortunately, not all the problems listed below can be attributed to the jambs, so developers are unlikely to implement them.


Localize bugs


Note: hereinafter, errors in the Motorola M68000 module (my favorite and very often used) will be considered .


So, the first joint : addressing relative to the PC register . The mistake is that the disassembly listing for such instructions is not always correct. Take a look at the screenshot:

It seems that there is no error here. Moreover, its presence does not interfere with the analysis. But, the opcode is disassembled incorrectly. Let's look at the dizasm in some online disassembler:

We see that we must have addressing relative to the PC register, since The target address of the link falls into the range signed short.


Cant two : "mirrors" for RAM, and some other regions. Because addressing in m68k is 24-bit, then all calls to the older (or vice versa, younger) regions should be redirected to the same range as the cross-references.


The jamb is three (or rather, not even a joint, but a lack of functionality): the so-called lineA ( 1010 ) and lineF ( 1111 ) emulators. These are opcodes for which the main instruction set was not enough, so they must be processed in a special way by interrupt vectors. The size of opcodes depends only on the implementation at handler level. I saw only a two-byte implementation. We will add.


The shoal of four : trap #N instructions do not give crefs to the handlers themselves traps.


The jamb is five : movea.w instructions should make a full xref to the address from the word- link , but we only have word- number.


Fix bugs (template)


In order to understand how to fix a specific processor module, you need to understand what opportunities we have on this topic in principle and what a “fix” is.


Actually, the "patch" is a regular plugin. It seems to be written in Python , but, I did everything in “pluses”. Only portability suffers, but if someone takes a rewrite of the plugin in Python - I will be very grateful.


To begin, create an empty DLL project in Visual Studio : File-> New-> Project-> Windows Desktop Wizard-> Dynamic link library (.dll), also by checking the Empty Project checkbox , and removing all the others:


We will unpack the IDA SDK and write it in Visual Studio macros (I will use 2017 ) so that in future you can easily refer to it. At the same time, we will add a macro for the path to IDA Pro .


Go to View -> Other Windows -> Property Manager :


Because We are working with the SDK version 7.0 , the compilation will be performed by the x64 compiler. Therefore, choose Debug | x64 -> Microsoft.Cpp.x64.user -> Properties :


Click the Add Macro button in the User Macros section , and write the IDA_SDK macro there with the path you have unpacked the SDK :


We do the same with IDA_DIR (the path to your IDA Pro ):

I note that IDA is set by default to % Program Files% , which requires administrative rights.


Let's also remove the Win32 configuration (in this article I will not affect the compilation for the x86 system), leaving only the x64 option.


Create an empty ida_plugin.cpp file . We do not add the code yet.
Now it is possible to set the encoding and other settings for C ++ :




Let's write inkludy:


And libraries from the SDK :



Now add the code template:


Ida_plugin.cpp code
#include <ida.hpp>
#include <idp.hpp>
#include <ua.hpp>
#include <bytes.hpp>
#include <loader.hpp>
#include <offset.hpp>
#define NAME "M68000 proc-fixer plugin"
#define VERSION "1.0"
static bool plugin_inited;
static bool my_dbg;
//--------------------------------------------------------------------------
static void print_version()
{
    static const char format[] = NAME " v%s\n";
    info(format, VERSION);
    msg(format, VERSION);
}
//--------------------------------------------------------------------------
static bool init_plugin(void)
{
    if (ph.id != PLFM_68K)
        return false;
    return true;
}
#ifdef _DEBUG
static const char* const optype_names[] =
{
    "o_void",
    "o_reg",
    "o_mem",
    "o_phrase",
    "o_displ",
    "o_imm",
    "o_far",
    "o_near",
    "o_idpspec0",
    "o_idpspec1",
    "o_idpspec2",
    "o_idpspec3",
    "o_idpspec4",
    "o_idpspec5",
};
static const char* const dtyp_names[] =
{
    "dt_byte",
    "dt_word",
    "dt_dword",
    "dt_float",
    "dt_double",
    "dt_tbyte",
    "dt_packreal",
    "dt_qword",
    "dt_byte16",
    "dt_code",
    "dt_void",
    "dt_fword",
    "dt_bitfild",
    "dt_string",
    "dt_unicode",
    "dt_3byte",
    "dt_ldbl",
    "dt_byte32",
    "dt_byte64",
};
static void print_insn(const insn_t *insn)
{
    if (my_dbg)
    {
        msg("cs=%x, ", insn->cs);
        msg("ip=%x, ", insn->ip);
        msg("ea=%x, ", insn->ea);
        msg("itype=%x, ", insn->itype);
        msg("size=%x, ", insn->size);
        msg("auxpref=%x, ", insn->auxpref);
        msg("segpref=%x, ", insn->segpref);
        msg("insnpref=%x, ", insn->insnpref);
        msg("insnpref=%x, ", insn->insnpref);
        msg("flags[");
        if (insn->flags & INSN_MACRO)
            msg("INSN_MACRO|");
        if (insn->flags & INSN_MODMAC)
            msg("OF_OUTER_DISP");
        msg("]\n");
    }
}
static void print_op(ea_t ea, const op_t *op)
{
    if (my_dbg)
    {
        msg("type[%s], ", optype_names[op->type]);
        msg("flags[");
        if (op->flags & OF_NO_BASE_DISP)
            msg("OF_NO_BASE_DISP|");
        if (op->flags & OF_OUTER_DISP)
            msg("OF_OUTER_DISP|");
        if (op->flags & PACK_FORM_DEF)
            msg("PACK_FORM_DEF|");
        if (op->flags & OF_NUMBER)
            msg("OF_NUMBER|");
        if (op->flags & OF_SHOW)
            msg("OF_SHOW");
        msg("], ");
        msg("dtyp[%s], ", dtyp_names[op->dtype]);
        if (op->type == o_reg)
            msg("reg=%x, ", op->reg);
        else if (op->type == o_displ || op->type == o_phrase)
            msg("phrase=%x, ", op->phrase);
        else
            msg("reg_phrase=%x, ", op->phrase);
        msg("addr=%x, ", op->addr);
        msg("value=%x, ", op->value);
        msg("specval=%x, ", op->specval);
        msg("specflag1=%x, ", op->specflag1);
        msg("specflag2=%x, ", op->specflag2);
        msg("specflag3=%x, ", op->specflag3);
        msg("specflag4=%x, ", op->specflag4);
        msg("refinfo[");
        opinfo_t buf;
        if (get_opinfo(&buf, ea, op->n, op->flags))
        {
            msg("target=%x, ", buf.ri.target);
            msg("base=%x, ", buf.ri.base);
            msg("tdelta=%x, ", buf.ri.tdelta);
            msg("flags[");
            if (buf.ri.flags & REFINFO_TYPE)
                msg("REFINFO_TYPE|");
            if (buf.ri.flags & REFINFO_RVAOFF)
                msg("REFINFO_RVAOFF|");
            if (buf.ri.flags & REFINFO_PASTEND)
                msg("REFINFO_PASTEND|");
            if (buf.ri.flags & REFINFO_CUSTOM)
                msg("REFINFO_CUSTOM|");
            if (buf.ri.flags & REFINFO_NOBASE)
                msg("REFINFO_NOBASE|");
            if (buf.ri.flags & REFINFO_SUBTRACT)
                msg("REFINFO_SUBTRACT|");
            if (buf.ri.flags & REFINFO_SIGNEDOP)
                msg("REFINFO_SIGNEDOP");
            msg("]");
        }
        msg("]\n");
    }
}
#endif
static bool ana_addr = 0;
static ssize_t idaapi hook_idp(void *user_data, int notification_code, va_list va)
{
    switch (notification_code)
    {
    case processor_t::ev_ana_insn:
    {
        insn_t *out = va_arg(va, insn_t*);
        if (ana_addr)
            break;
        ana_addr = 1;
        if (ph.ana_insn(out) <= 0)
        {
            ana_addr = 0;
            break;
        }
        ana_addr = 0;
#ifdef _DEBUG
        print_insn(out);
#endif
        for (int i = 0; i < UA_MAXOP; ++i)
        {
            op_t &op = out->ops[i];
#ifdef _DEBUG
            print_op(out->ea, &op);
#endif
        }
        return out->size;
    } break;
    case processor_t::ev_emu_insn:
    {
        const insn_t *insn = va_arg(va, const insn_t*);
    } break;
    case processor_t::ev_out_mnem:
    {
        outctx_t *outbuffer = va_arg(va, outctx_t *);
        //outbuffer->out_custom_mnem(mnem);
        //return 1;
    } break;
    default:
    {
#ifdef _DEBUG
        if (my_dbg)
        {
            msg("msg = %d\n", notification_code);
        }
#endif
    } break;
    }
    return 0;
}
//--------------------------------------------------------------------------
static int idaapi init(void)
{
    if (init_plugin())
    {
        plugin_inited = true;
        my_dbg = false;
        hook_to_notification_point(HT_IDP, hook_idp, NULL);
        print_version();
        return PLUGIN_KEEP;
    }
    return PLUGIN_SKIP;
}
//--------------------------------------------------------------------------
static void idaapi term(void)
{
    if (plugin_inited)
    {
        unhook_from_notification_point(HT_IDP, hook_idp);
        plugin_inited = false;
    }
}
//--------------------------------------------------------------------------
static bool idaapi run(size_t /*arg*/)
{
    return false;
}
//--------------------------------------------------------------------------
const char comment[] = NAME;
const char help[] = NAME;
//--------------------------------------------------------------------------
//
//      PLUGIN DESCRIPTION BLOCK
//
//--------------------------------------------------------------------------
plugin_t PLUGIN =
{
    IDP_INTERFACE_VERSION,
    PLUGIN_PROC | PLUGIN_MOD, // plugin flags
    init, // initialize
    term, // terminate. this pointer may be NULL.
    run, // invoke plugin
    comment, // long comment about the plugin
             // it could appear in the status line
             // or as a hint
    help, // multiline help about the plugin
    NAME, // the preferred short name of the plugin
    "" // the preferred hotkey to run the plugin
};

We fix bugs (understand the template)


Functions print_op()and print_insn()are needed to understand which flags are set by the current processor module for specific instructions. This is necessary if we want to find some flags for the existing opcodes, so that we can use them when correcting.


Actually, the body of our "patching" is a function hook_idp(). In it for our needs we need to implement three callbacks:


  1. processor_t::ev_ana_insn: needed if there is no implementation of some opcodes in the processor module
  2. processor_t::ev_emu_insn: here you can create cross-refs for data / code that new opcodes refer to (or do not refer to old ones)
  3. processor_t::ev_out_mnem: new opcodes should somehow be displayed. It's all here

The function init_plugin()does not allow our patching to load in other processor modules.
And, most importantly, we hang the whole callback on the events of the processor module:


hook_to_notification_point(HT_IDP, hook_idp, NULL);

The global variable trick is ana_addrneeded so that it ana_insndoes not go into recursion when trying to get information about instructions that we do not manually parse. Yes, alas, this “crutch” stretches for a very long time, from the old versions.

Fix for problem # 1


In order to properly solve this problem, I had to tinker a lot with the debugging output that I just implemented for this task. I knew that in some cases, IDA successfully displays links relative to the PC (in the instructions where a jump occurs on the offset table, which is close to the current instruction, plus register-index), but for the instruction to leacorrectly display the addressing, it is not implemented. As a result, I found such an instruction with a jump, and found out which flags should be set so that the PC with the brackets is displayed:



Correction problem №1
case processor_t::ev_ana_insn:
{
    insn_t *out = va_arg(va, insn_t*);
    if (ana_addr)
        break;
    ana_addr = 1;
    if (ph.ana_insn(out) <= 0)
    {
        ana_addr = 0;
        break;
    }
    ana_addr = 0;
    for (int i = 0; i < UA_MAXOP; ++i)
    {
        op_t &op = out->ops[i];
        switch (op.type)
        {
        case o_near:
        case o_mem:
        {
            if (out->itype != 0x76 || op.n != 0 ||
                (op.phrase != 0x09 && op.phrase != 0x0A) ||
                (op.addr == 0 || op.addr >= (1 << 23)) ||
                op.specflag1 != 2) // lea table(pc),Ax
                break;
            short diff = op.addr - out->ea;
            if (diff >= SHRT_MIN && diff <= SHRT_MAX)
            {
                out->Op1.type = o_displ;
                out->Op1.offb = 2;
                out->Op1.dtype = dt_dword;
                out->Op1.phrase = 0x5B;
                out->Op1.specflag1 = 0x10;
            }
        } break;
        }
    }
    return out->size;
} break;

Fix for problem # 2


Everything is simple. Just mask the addresses on a specific range: 0xFF0000-0xFFFFFF (for RAM) and 0xC00000-0xC000FF (for VDP video memory ). The main thing here is to filter by operand type o_nearand o_mem.


Correction problem №2
case processor_t::ev_ana_insn:
{
    insn_t *out = va_arg(va, insn_t*);
    if (ana_addr)
        break;
    ana_addr = 1;
    if (ph.ana_insn(out) <= 0)
    {
        ana_addr = 0;
        break;
    }
    ana_addr = 0;
    for (int i = 0; i < UA_MAXOP; ++i)
    {
        op_t &op = out->ops[i];
        switch (op.type)
        {
        case o_near:
        case o_mem:
        {
            op.addr &= 0xFFFFFF; // for any mirrors
            if ((op.addr & 0xE00000) == 0xE00000) // RAM mirrors
                op.addr |= 0x1F0000;
            if ((op.addr >= 0xC00000 && op.addr <= 0xC0001F) ||
                (op.addr >= 0xC00020 && op.addr <= 0xC0003F)) // VDP mirrors
                op.addr &= 0xC000FF;
        } break;
        }
    }
    return out->size;
} break;

Fix for problem number 3


Actually, to add the desired opcode, you must:


  1. Define indices for new opcodes. All new indexes must begin withCUSTOM_INSN_ITYPE
    enum m68k_insn_type_t
    {
    M68K_linea = CUSTOM_INSN_ITYPE,
    M68K_linef,
    };
  2. The lineA / lineF opcodes work if there are bytes in the code: 0xA0 / 0xF0 . So read one byte
  3. Get a link to a vector handler. In my first 64 header dvorda in my case are interrupt vectors. At positions 0x0A and 0x0B are handlers lineA / lineF :
    value = get_dword(0x0A * sizeof(uint32));
    // ...
    value = get_dword(0x0B * sizeof(uint32));
  4. The ev_emu_insnadd-cref s on handlers and the following statement to code-flow is not interrupted:
        insn->add_cref(insn->Op1.addr, 0, fl_CN); // code ref
        insn->add_cref(insn->ea + insn->size, insn->Op1.offb, fl_F); // flow ref
  5. We ev_out_mnemoutput our custom opcode:
    const char *mnem = (outbuffer->insn.itype == M68K_linef) ? "line_f" : "line_a";
    outbuffer->out_custom_mnem(mnem);


Problem Solving # 3
enum m68k_insn_type_t
{
    M68K_linea = CUSTOM_INSN_ITYPE,
    M68K_linef,
};
/* after includes */
case processor_t::ev_ana_insn:
{
    insn_t *out = va_arg(va, insn_t*);
    if (ana_addr)
        break;
    uint16 itype = 0;
    ea_t value = out->ea;
    uchar b = get_byte(out->ea);
    if (b == 0xA0 || b == 0xF0)
    {
        switch (b)
        {
        case 0xA0:
            itype = M68K_linea;
            value = get_dword(0x0A * sizeof(uint32));
            break;
        case 0xF0:
            itype = M68K_linef;
            value = get_dword(0x0B * sizeof(uint32));
            break;
        }
        out->itype = itype;
        out->size = 2;
        out->Op1.type = o_near;
        out->Op1.offb = 1;
        out->Op1.dtype = dt_dword;
        out->Op1.addr = value;
        out->Op1.phrase = 0x0A;
        out->Op1.specflag1 = 2;
        out->Op2.type = o_imm;
        out->Op2.offb = 1;
        out->Op2.dtype = dt_byte;
        out->Op2.value = get_byte(out->ea + 1);
    }
    return out->size;
} break;
case processor_t::ev_emu_insn:
{
    const insn_t *insn = va_arg(va, const insn_t*);
    if (insn->itype == M68K_linea || insn->itype == M68K_linef)
    {
        insn->add_cref(insn->Op1.addr, 0, fl_CN);
        insn->add_cref(insn->ea + insn->size, insn->Op1.offb, fl_F);
        return 1;
    }
} break;
case processor_t::ev_out_mnem:
{
    outctx_t *outbuffer = va_arg(va, outctx_t *);
    if (outbuffer->insn.itype != M68K_linea && outbuffer->insn.itype != M68K_linef)
        break;
    const char *mnem = (outbuffer->insn.itype == M68K_linef) ? "line_f" : "line_a";
    outbuffer->out_custom_mnem(mnem);
    return 1;
} break;

Fix for problem # 4


It is solved this way: we find the opcode for the instruction trap, we obtain the index from the instruction, and take the vector handler at this index. Something like this will turn out:



Problem Solving # 4
case processor_t::ev_emu_insn:
{
    const insn_t *insn = va_arg(va, const insn_t*);
    if (insn->itype == 0xB6) // trap #X
    {
        qstring name;
        ea_t trap_addr = get_dword((0x20 + (insn->Op1.value & 0xF)) * sizeof(uint32));
        get_func_name(&name, trap_addr);
        set_cmt(insn->ea, name.c_str(), false);
        insn->add_cref(trap_addr, insn->Op1.offb, fl_CN);
        return 1;
    }
} break;

Fix for problem # 5


Here, too, everything is simple: first we filter by operation movea.w. Then, if the operand is of type word, and refers to RAM, we make the link in a steep way, relative to the base 0xFF0000. It will look like this:



Correction problem №5
case processor_t::ev_ana_insn:
{
    insn_t *out = va_arg(va, insn_t*);
    if (ana_addr)
        break;
    ana_addr = 1;
    if (ph.ana_insn(out) <= 0)
    {
        ana_addr = 0;
        break;
    }
    ana_addr = 0;
    for (int i = 0; i < UA_MAXOP; ++i)
    {
        op_t &op = out->ops[i];
        switch (op.type)
        {
        case o_imm:
        {
            if (out->itype != 0x7F || op.n != 0) // movea
                break;
            if (op.value & 0xFF0000 && op.dtype == dt_word) {
                op.value &= 0xFFFF;
            }
        } break;
        }
    }
    return out->size;
} break;
case processor_t::ev_emu_insn:
{
    const insn_t *insn = va_arg(va, const insn_t*);
    for (int i = 0; i < UA_MAXOP; ++i)
    {
        const op_t &op = insn->ops[i];
        switch (op.type)
        {
        case o_imm:
        {
            if (insn->itype != 0x7F || op.n != 0 || op.dtype != dt_word) // movea
                break;
            op_offset(insn->ea, op.n, REF_OFF32, BADADDR, 0xFF0000);
        } break;
        }
    }
} break;

findings


In fact, fixing existing modules is not a very simple task, if it concerns not just the implementation of unknown opcodes, but something more complicated.
It takes hours of debugging of an existing implementation, understanding of what is happening in it (sometimes even the reverse of a percent module). But the result is worth it.


Link to the source: https://github.com/lab313ru/m68k_fixer