IDA Pro Upgrade. We fix the jambs of processor modules

Hello to all,

After quite a long time since the writing of the first article, I still decided, albeit a little bit, but to write articles on the subject of modifying / improving IDA Pro .

In this article we will talk about how to correct the jambs in those processor modules, the source of which you do not have, and the jambs just do not give to live. Unfortunately, not all the problems listed below can be attributed to the jambs, so developers are unlikely to implement them.

Localize bugs

Note: hereinafter, errors in the Motorola M68000 module (my favorite and very often used) will be considered .

So, the first joint : addressing relative to the PC register . The mistake is that the disassembly listing for such instructions is not always correct. Take a look at the screenshot:

It seems that there is no error here. Moreover, its presence does not interfere with the analysis. But, the opcode is disassembled incorrectly. Let's look at the dizasm in some online disassembler:

We see that we must have addressing relative to the PC register, since The target address of the link falls into the range signed short.

Cant two : "mirrors" for RAM, and some other regions. Because addressing in m68k is 24-bit, then all calls to the older (or vice versa, younger) regions should be redirected to the same range as the cross-references.

The jamb is three (or rather, not even a joint, but a lack of functionality): the so-called lineA ( 1010 ) and lineF ( 1111 ) emulators. These are opcodes for which the main instruction set was not enough, so they must be processed in a special way by interrupt vectors. The size of opcodes depends only on the implementation at handler level. I saw only a two-byte implementation. We will add.

The shoal of four : trap #N instructions do not give crefs to the handlers themselves traps.

The jamb is five : movea.w instructions should make a full xref to the address from the word- link , but we only have word- number.

Fix bugs (template)

In order to understand how to fix a specific processor module, you need to understand what opportunities we have on this topic in principle and what a “fix” is.

Actually, the "patch" is a regular plugin. It seems to be written in Python , but, I did everything in “pluses”. Only portability suffers, but if someone takes a rewrite of the plugin in Python - I will be very grateful.

To begin, create an empty DLL project in Visual Studio : File-> New-> Project-> Windows Desktop Wizard-> Dynamic link library (.dll), also by checking the Empty Project checkbox , and removing all the others:

We will unpack the IDA SDK and write it in Visual Studio macros (I will use 2017 ) so that in future you can easily refer to it. At the same time, we will add a macro for the path to IDA Pro .

Go to View -> Other Windows -> Property Manager :

Because We are working with the SDK version 7.0 , the compilation will be performed by the x64 compiler. Therefore, choose Debug | x64 -> Microsoft.Cpp.x64.user -> Properties :

Click the Add Macro button in the User Macros section , and write the IDA_SDK macro there with the path you have unpacked the SDK :

We do the same with IDA_DIR (the path to your IDA Pro ):

I note that IDA is set by default to % Program Files% , which requires administrative rights.

Let's also remove the Win32 configuration (in this article I will not affect the compilation for the x86 system), leaving only the x64 option.

Create an empty ida_plugin.cpp file . We do not add the code yet.
Now it is possible to set the encoding and other settings for C ++ :

Let's write inkludy:

And libraries from the SDK :

Now add the code template:

Ida_plugin.cpp code
#include<ida.hpp>#include<idp.hpp>#include<ua.hpp>#include<bytes.hpp>#include<loader.hpp>#include<offset.hpp>#define NAME "M68000 proc-fixer plugin"#define VERSION "1.0"staticbool plugin_inited;
staticbool my_dbg;
    staticconstchar format[] = NAME " v%s\n";
    info(format, VERSION);
    msg(format, VERSION);
    if ( != PLFM_68K)
#ifdef _DEBUGstaticconstchar* const optype_names[] =
staticconstchar* const dtyp_names[] =
staticvoidprint_insn(constinsn_t *insn){
    if (my_dbg)
        msg("cs=%x, ", insn->cs);
        msg("ip=%x, ", insn->ip);
        msg("ea=%x, ", insn->ea);
        msg("itype=%x, ", insn->itype);
        msg("size=%x, ", insn->size);
        msg("auxpref=%x, ", insn->auxpref);
        msg("segpref=%x, ", insn->segpref);
        msg("insnpref=%x, ", insn->insnpref);
        msg("insnpref=%x, ", insn->insnpref);
        if (insn->flags & INSN_MACRO)
        if (insn->flags & INSN_MODMAC)
staticvoidprint_op(ea_t ea, constop_t *op){
    if (my_dbg)
        msg("type[%s], ", optype_names[op->type]);
        if (op->flags & OF_NO_BASE_DISP)
        if (op->flags & OF_OUTER_DISP)
        if (op->flags & PACK_FORM_DEF)
        if (op->flags & OF_NUMBER)
        if (op->flags & OF_SHOW)
        msg("], ");
        msg("dtyp[%s], ", dtyp_names[op->dtype]);
        if (op->type == o_reg)
            msg("reg=%x, ", op->reg);
        elseif (op->type == o_displ || op->type == o_phrase)
            msg("phrase=%x, ", op->phrase);
            msg("reg_phrase=%x, ", op->phrase);
        msg("addr=%x, ", op->addr);
        msg("value=%x, ", op->value);
        msg("specval=%x, ", op->specval);
        msg("specflag1=%x, ", op->specflag1);
        msg("specflag2=%x, ", op->specflag2);
        msg("specflag3=%x, ", op->specflag3);
        msg("specflag4=%x, ", op->specflag4);
        opinfo_t buf;
        if (get_opinfo(&buf, ea, op->n, op->flags))
            msg("target=%x, ",;
            msg("base=%x, ", buf.ri.base);
            msg("tdelta=%x, ", buf.ri.tdelta);
            if (buf.ri.flags & REFINFO_TYPE)
            if (buf.ri.flags & REFINFO_RVAOFF)
            if (buf.ri.flags & REFINFO_PASTEND)
            if (buf.ri.flags & REFINFO_CUSTOM)
            if (buf.ri.flags & REFINFO_NOBASE)
            if (buf.ri.flags & REFINFO_SUBTRACT)
            if (buf.ri.flags & REFINFO_SIGNEDOP)
#endifstaticbool ana_addr = 0;
static ssize_t idaapi hook_idp(void *user_data, int notification_code, va_list va){
    switch (notification_code)
        insn_t *out = va_arg(va, insn_t*);
        if (ana_addr)
        ana_addr = 1;
        if (ph.ana_insn(out) <= 0)
            ana_addr = 0;
        ana_addr = 0;
#ifdef _DEBUG
#endiffor (int i = 0; i < UA_MAXOP; ++i)
            op_t &op = out->ops[i];
#ifdef _DEBUG
            print_op(out->ea, &op);
        return out->size;
    } break;
        constinsn_t *insn = va_arg(va, constinsn_t*);
    } break;
        outctx_t *outbuffer = va_arg(va, outctx_t *);
        //outbuffer->out_custom_mnem(mnem);//return 1;
    } break;
#ifdef _DEBUGif (my_dbg)
            msg("msg = %d\n", notification_code);
    } break;
//--------------------------------------------------------------------------staticint idaapi init(void){
    if (init_plugin())
        plugin_inited = true;
        my_dbg = false;
        hook_to_notification_point(HT_IDP, hook_idp, NULL);
        return PLUGIN_KEEP;
    return PLUGIN_SKIP;
//--------------------------------------------------------------------------staticvoid idaapi term(void){
    if (plugin_inited)
        unhook_from_notification_point(HT_IDP, hook_idp);
        plugin_inited = false;
//--------------------------------------------------------------------------staticbool idaapi run(size_t/*arg*/){
//--------------------------------------------------------------------------constchar comment[] = NAME;
constchar help[] = NAME;
//--------------------------------------------------------------------------////      PLUGIN DESCRIPTION BLOCK////--------------------------------------------------------------------------plugin_t PLUGIN =
    PLUGIN_PROC | PLUGIN_MOD, // plugin flags
    init, // initialize
    term, // terminate. this pointer may be NULL.
    run, // invoke plugin
    comment, // long comment about the plugin// it could appear in the status line// or as a hint
    help, // multiline help about the plugin
    NAME, // the preferred short name of the plugin""// the preferred hotkey to run the plugin

We fix bugs (understand the template)

Functions print_op()and print_insn()are needed to understand which flags are set by the current processor module for specific instructions. This is necessary if we want to find some flags for the existing opcodes, so that we can use them when correcting.

Actually, the body of our "patching" is a function hook_idp(). In it for our needs we need to implement three callbacks:

  1. processor_t::ev_ana_insn: needed if there is no implementation of some opcodes in the processor module
  2. processor_t::ev_emu_insn: here you can create cross-refs for data / code that new opcodes refer to (or do not refer to old ones)
  3. processor_t::ev_out_mnem: new opcodes should somehow be displayed. It's all here

The function init_plugin()does not allow our patching to load in other processor modules.
And, most importantly, we hang the whole callback on the events of the processor module:

hook_to_notification_point(HT_IDP, hook_idp, NULL);

The global variable trick is ana_addrneeded so that it ana_insndoes not go into recursion when trying to get information about instructions that we do not manually parse. Yes, alas, this “crutch” stretches for a very long time, from the old versions.

Fix for problem # 1

In order to properly solve this problem, I had to tinker a lot with the debugging output that I just implemented for this task. I knew that in some cases, IDA successfully displays links relative to the PC (in the instructions where a jump occurs on the offset table, which is close to the current instruction, plus register-index), but for the instruction to leacorrectly display the addressing, it is not implemented. As a result, I found such an instruction with a jump, and found out which flags should be set so that the PC with the brackets is displayed:

Correction problem №1
    insn_t *out = va_arg(va, insn_t*);
    if (ana_addr)
    ana_addr = 1;
    if (ph.ana_insn(out) <= 0)
        ana_addr = 0;
    ana_addr = 0;
    for (int i = 0; i < UA_MAXOP; ++i)
        op_t &op = out->ops[i];
        switch (op.type)
        case o_near:
        case o_mem:
            if (out->itype != 0x76 || op.n != 0 ||
                (op.phrase != 0x09 && op.phrase != 0x0A) ||
                (op.addr == 0 || op.addr >= (1 << 23)) ||
                op.specflag1 != 2) // lea table(pc),Axbreak;
            short diff = op.addr - out->ea;
            if (diff >= SHRT_MIN && diff <= SHRT_MAX)
                out->Op1.type = o_displ;
                out->Op1.offb = 2;
                out->Op1.dtype = dt_dword;
                out->Op1.phrase = 0x5B;
                out->Op1.specflag1 = 0x10;
        } break;
    return out->size;
} break;

Fix for problem # 2

Everything is simple. Just mask the addresses on a specific range: 0xFF0000-0xFFFFFF (for RAM) and 0xC00000-0xC000FF (for VDP video memory ). The main thing here is to filter by operand type o_nearand o_mem.

Correction problem №2
    insn_t *out = va_arg(va, insn_t*);
    if (ana_addr)
    ana_addr = 1;
    if (ph.ana_insn(out) <= 0)
        ana_addr = 0;
    ana_addr = 0;
    for (int i = 0; i < UA_MAXOP; ++i)
        op_t &op = out->ops[i];
        switch (op.type)
        case o_near:
        case o_mem:
            op.addr &= 0xFFFFFF; // for any mirrorsif ((op.addr & 0xE00000) == 0xE00000) // RAM mirrors
                op.addr |= 0x1F0000;
            if ((op.addr >= 0xC00000 && op.addr <= 0xC0001F) ||
                (op.addr >= 0xC00020 && op.addr <= 0xC0003F)) // VDP mirrors
                op.addr &= 0xC000FF;
        } break;
    return out->size;
} break;

Fix for problem number 3

Actually, to add the desired opcode, you must:

  1. Define indices for new opcodes. All new indexes must begin withCUSTOM_INSN_ITYPE
    M68K_linea = CUSTOM_INSN_ITYPE,
  2. The lineA / lineF opcodes work if there are bytes in the code: 0xA0 / 0xF0 . So read one byte
  3. Get a link to a vector handler. In my first 64 header dvorda in my case are interrupt vectors. At positions 0x0A and 0x0B are handlers lineA / lineF :
    value = get_dword(0x0A * sizeof(uint32));
    // ...
    value = get_dword(0x0B * sizeof(uint32));
  4. The ev_emu_insnadd-cref s on handlers and the following statement to code-flow is not interrupted:
        insn->add_cref(insn->Op1.addr, 0, fl_CN); // code ref
        insn->add_cref(insn->ea + insn->size, insn->Op1.offb, fl_F); // flow ref
  5. We ev_out_mnemoutput our custom opcode:
    constchar *mnem = (outbuffer->insn.itype == M68K_linef) ? "line_f" : "line_a";

Problem Solving # 3
    M68K_linea = CUSTOM_INSN_ITYPE,
/* after includes */caseprocessor_t::ev_ana_insn:
    insn_t *out = va_arg(va, insn_t*);
    if (ana_addr)
    uint16 itype = 0;
    ea_t value = out->ea;
    uchar b = get_byte(out->ea);
    if (b == 0xA0 || b == 0xF0)
        switch (b)
            itype = M68K_linea;
            value = get_dword(0x0A * sizeof(uint32));
            itype = M68K_linef;
            value = get_dword(0x0B * sizeof(uint32));
        out->itype = itype;
        out->size = 2;
        out->Op1.type = o_near;
        out->Op1.offb = 1;
        out->Op1.dtype = dt_dword;
        out->Op1.addr = value;
        out->Op1.phrase = 0x0A;
        out->Op1.specflag1 = 2;
        out->Op2.type = o_imm;
        out->Op2.offb = 1;
        out->Op2.dtype = dt_byte;
        out->Op2.value = get_byte(out->ea + 1);
    return out->size;
} break;
    constinsn_t *insn = va_arg(va, constinsn_t*);
    if (insn->itype == M68K_linea || insn->itype == M68K_linef)
        insn->add_cref(insn->Op1.addr, 0, fl_CN);
        insn->add_cref(insn->ea + insn->size, insn->Op1.offb, fl_F);
} break;
    outctx_t *outbuffer = va_arg(va, outctx_t *);
    if (outbuffer->insn.itype != M68K_linea && outbuffer->insn.itype != M68K_linef)
    constchar *mnem = (outbuffer->insn.itype == M68K_linef) ? "line_f" : "line_a";
} break;

Fix for problem # 4

It is solved this way: we find the opcode for the instruction trap, we obtain the index from the instruction, and take the vector handler at this index. Something like this will turn out:

Problem Solving # 4
    constinsn_t *insn = va_arg(va, constinsn_t*);
    if (insn->itype == 0xB6) // trap #X
        qstring name;
        ea_t trap_addr = get_dword((0x20 + (insn->Op1.value & 0xF)) * sizeof(uint32));
        get_func_name(&name, trap_addr);
        set_cmt(insn->ea, name.c_str(), false);
        insn->add_cref(trap_addr, insn->Op1.offb, fl_CN);
} break;

Fix for problem # 5

Here, too, everything is simple: first we filter by operation movea.w. Then, if the operand is of type word, and refers to RAM, we make the link in a steep way, relative to the base 0xFF0000. It will look like this:

Correction problem №5
    insn_t *out = va_arg(va, insn_t*);
    if (ana_addr)
    ana_addr = 1;
    if (ph.ana_insn(out) <= 0)
        ana_addr = 0;
    ana_addr = 0;
    for (int i = 0; i < UA_MAXOP; ++i)
        op_t &op = out->ops[i];
        switch (op.type)
        case o_imm:
            if (out->itype != 0x7F || op.n != 0) // moveabreak;
            if (op.value & 0xFF0000 && op.dtype == dt_word) {
                op.value &= 0xFFFF;
        } break;
    return out->size;
} break;
    constinsn_t *insn = va_arg(va, constinsn_t*);
    for (int i = 0; i < UA_MAXOP; ++i)
        constop_t &op = insn->ops[i];
        switch (op.type)
        case o_imm:
            if (insn->itype != 0x7F || op.n != 0 || op.dtype != dt_word) // moveabreak;
            op_offset(insn->ea, op.n, REF_OFF32, BADADDR, 0xFF0000);
        } break;
} break;


In fact, fixing existing modules is not a very simple task, if it concerns not just the implementation of unknown opcodes, but something more complicated.
It takes hours of debugging of an existing implementation, understanding of what is happening in it (sometimes even the reverse of a percent module). But the result is worth it.

Link to the source:

