GHIDRA, Playstation 1 executables, FLIRT signatures, and PsyQ

  • Tutorial

Hello everybody,



I don’t know about you, but I always wanted to reverse the old console games, having also a decompiler in stock. And now, this joyful moment in my life has come - GHIDRA has come out . I won’t write about what it is, you can easily google it. And, the reviews are so different (especially from retrograde) that it will be difficult for a newcomer to even decide to launch this miracle ... Here's an example for you: " I worked for ideas for 20 years, and I look at your Hydra with great distrust, because the NSA. But when- I’ll run it and check it in practice . "


In a nutshell - running Hydra is not scary. And what we get after launch will block all your fear of bookmarks-and-backdoors from the ubiquitous NSA.


So, what am I talking about ... There is such a prefix: Sony Playstation 1 ( PS1 , PSX , Curling iron ). Many cool games were created for it, a bunch of franchises appeared that are still popular. And once I wanted to find out how they work: what data formats are there, whether resource compression is used, try to translate something into Russian (I’ll say right away that I haven’t translated any games yet).


I started by writing with a friend a Delphicool utility for working with the format TIM(this is something like BMPthe Playstation world): Tim2View . At one time, enjoyed success (and maybe now enjoys). Then I wanted to go deeper into compression.


image


And then the problems began. MIPSI was not yet familiar with the processor . Took to study. Since IDA ProI also did not knew (I came to the reversal games Sega Mega Drivelater Playstation). But, thanks to the Internet, I found out that it IDA Prosupports downloading and analysis of executable files PS1: PS-X EXE . I tried to upload a game file (it seems they were Lemmings ) with a strange name and extension, such as SLUS_123.45in Ida, I got a bunch of lines of assembler code (fortunately, I already had an idea of ​​what it was, thanks to Windows exe-drivers under x86), and began to understand.



The first difficult place to understand was the assembly line. For example, you see a call to some function, and immediately after it is loading into the register the parameter that should be used in this function. In short, before any jumps and function calls, the instruction following the jump / call is first executed, and only then the call or jump itself.


After all the difficulties I went through, I managed to write several packers / unpackers of game resources. But I’ve never really studied code. Why? Well, everything is banal: there was a lot of code, access to the BIOS and functions that were virtually impossible to understand (they were library, and I didn’t have an SDK for curling then), instructions working with three registers at the same time, lack of a decompiler.


And so, after many, many years, it turns out GHIDRA. Among the platforms supported by the decompiler there are MIPS. Oh joy! Let's try to decompile something soon! But ... I was waiting for a bummer. Executable files are PS-X EXEnot supported by Hydra. No problem, write your own!


Actually code


Enough lyrical digression, let's write code. How to create my own bootloaders for Ghidra, I already had an idea of ​​what I wrote earlier . Therefore, it remains only to find the Memory Map of the first curling iron, the addresses of the registers and, you can collect and load binaries. No sooner said than done.


The code was ready, registers and regions were added and recognized, but there was still a big blank spot in the places where the library functions and BIOS functions were called. And, unfortunately, FLIRTHydra did not have support . If not, let's add.


The format of the FLIRTsignatures is known and described in the pat.txtfile, which can be found in the Ida SDK. Also, Ida has a tool to create these signatures specially from the library files Playstation, and called: ppsx. I downloaded the SDK for the curling iron , which is called PsyQ Playstation Development Kit, found the libfiles there and tried to create at least some signatures from them - successfully. It turns out a little text in which each line has a specific format. It remains to write code that will parse these lines, and apply them to the code.



Patparser


Since each line has a specific format, it would be logical to write a regular expression. It turned out like this:


private static final Pattern linePat = Pattern.compile("^((?:[0-9A-F\\.]{2})+) ([0-9A-F]{2}) ([0-9A-F]{4}) ([0-9A-F]{4}) ((?:[:\\^][0-9A-F]{4}@? [\\.\\w]+ )+)((?:[0-9A-F\\.]{2})+)?$");

Well, to highlight then in the list of modules separately the offset, type, and function name, we write a separate regexp:


private static final Pattern modulePat = Pattern.compile("([:\\^][0-9A-F]{4}@?) ([\\.\\w]+) ");

Now let's go through the components of each signature separately:


  1. First comes the hex sequence of bytes ( 0-9A-F ), where some of them can be any (dot character "."). Therefore, we create a class that will store such a sequence. I called her MaskedBytes:

MaskedBytes.java
package pat;
public class MaskedBytes {
    private final byte[] bytes, masks;
    public final byte[] getBytes() {
        return bytes;
    }
    public final byte[] getMasks() {
        return masks;
    }
    public final int getLength() {
        return bytes.length;
    }
    public MaskedBytes(byte[] bytes, byte[] masks) {
        this.bytes = bytes;
        this.masks = masks;
    }
    public static MaskedBytes extend(MaskedBytes src, MaskedBytes add) {
        return extend(src, add.getBytes(), add.getMasks());
    }
    public static MaskedBytes extend(MaskedBytes src, byte[] addBytes, byte[] addMasks) {
        int length = src.getBytes().length;
        byte[] tmpBytes = new byte[length + addBytes.length];
        byte[] tmpMasks = new byte[length + addMasks.length];
        System.arraycopy(src.getBytes(), 0, tmpBytes, 0, length);
        System.arraycopy(addBytes, 0, tmpBytes, length, addBytes.length);
        System.arraycopy(src.getMasks(), 0, tmpMasks, 0, length);
        System.arraycopy(addMasks, 0, tmpMasks, length, addMasks.length);
        return new MaskedBytes(tmpBytes, tmpMasks);
    }
}

  1. The length of the block from which it is counted CRC16.
  2. CRC16, which uses its own polynomial ( 0x8408):

Count Code CRC16
public static boolean checkCrc16(byte[] bytes, short resCrc) {
    if ( bytes.length == 0 )
        return true;
    int crc = 0xFFFF;
    for (int i = 0; i < bytes.length; ++i) {
        int a = bytes[i];
        for (int x = 0; x < 8; ++x) {
            if (((crc ^ a) & 1) != 0) {
                crc = (crc >> 1) ^ 0x8408;
            }
            else {
                crc >>= 1;
            }
            a >>= 1;
        }
    }
    crc = ~crc;
    int x = crc;
    crc = (crc << 8) | ((x >> 8) & 0xFF);
    crc &= 0xFFFF;
    return (short)crc == resCrc;
}

  1. The total length of the "module" in bytes.
  2. List of global names (what we need).
  3. List of links to other names (also needed).
  4. Tail bytes.

Each name in the module has a specific type and offset relative to the beginning. A type can be indicated by one of the characters::, ^, @, depending on the type:


  • " : NAME ": global name. It was for the sake of such names that I started everything;
  • " : NAME @ ": local name / label. It may not be indicated, but let it be;
  • " ^ NAME ": link to the name.

On the one hand, everything is simple, but a link can easily be not a reference to a function (and, accordingly, the jump will be relative), but to a global variable. What, you say, is the problem? And it is that in PSX it is impossible to push the whole DWORDinto the register with one instruction . To do this, download it in the form of halves. The fact is, the MIPSsize of the instruction is limited to four bytes. And, it would seem, you just need to first get one half out of one instruction, and then disassemble the next - and get a second half. But not so simple. The first half can be loaded instructions 5 back, and the link in the module will be given only after loading its second half. I had to write a sophisticated parser (probably it can be modified).


As a result, we create enumfor three types of names:


ModuleType.java
package pat;
public enum ModuleType {
    GLOBAL_NAME, LOCAL_NAME, REF_NAME;
    public boolean isGlobal() {
        return this == GLOBAL_NAME;
    }
    public boolean isLocal() {
        return this == LOCAL_NAME;
    }
    public boolean isReference() {
        return this == REF_NAME;
    }
    @Override
    public String toString() {
        if (isGlobal()) {
            return "Global";
        } else if (isLocal()) {
            return "Local";
        } else {
            return "Reference";
        }
    }
}

Let's write a code that converts text hexadecimal sequences and dots to type MaskedBytes:


hexStringToMaskedBytesArray ()
private MaskedBytes hexStringToMaskedBytesArray(String s) {
    MaskedBytes res = null;
    if (s != null) {
        int len = s.length();
        byte[] bytes = new byte[len / 2];
        byte[] masks = new byte[len / 2];
        for (int i = 0; i < len; i += 2) {
            char c1 = s.charAt(i);
            char c2 = s.charAt(i + 1);
            masks[i / 2] = (byte) (
                    (((c1 == '.') ? 0x0 : 0xF) << 4) |
                    (((c2 == '.') ? 0x0 : 0xF) << 0)
                    );
            bytes[i / 2] = (byte) (
                    (((c1 == '.') ? 0x0 : Character.digit(c1, 16)) << 4) |
                    (((c2 == '.') ? 0x0 : Character.digit(c2, 16)) << 0)
                    );
        }
        res = new MaskedBytes(bytes, masks);
    }
    return res;
}

You can already think of a class that will store information about each individual function: the name of the function, the offset in the module, and type:


ModuleData.java
package pat;
public class ModuleData {
    private final long offset;
    private final String name;
    private final ModuleType type;
    public ModuleData(long offset, String name, ModuleType type) {
        this.offset = offset;
        this.name = name;
        this.type = type;
    }
    public final long getOffset() {
        return offset;
    }
    public final String getName() {
        return name;
    }
    public final ModuleType getType() {
        return type;
    }
}

And, lastly: a class that will store everything that is indicated on each line of the patfile, that is: bytes, crc, a list of names with offsets:


SignatureData.java
package pat;
import java.util.Arrays;
import java.util.List;
public class SignatureData {
    private final MaskedBytes templateBytes, tailBytes;
    private MaskedBytes fullBytes;
    private final int crc16Length;
    private final short crc16;
    private final int moduleLength;
    private final List modules;
    public SignatureData(MaskedBytes templateBytes, int crc16Length,
            short crc16, int moduleLength, List modules, MaskedBytes tailBytes) {
        this.templateBytes = this.fullBytes = templateBytes;
        this.crc16Length = crc16Length;
        this.crc16 = crc16;
        this.moduleLength = moduleLength;
        this.modules = modules;
        this.tailBytes = tailBytes;
        if (this.tailBytes != null) {
            int addLength = moduleLength - templateBytes.getLength() - tailBytes.getLength();
            byte[] addBytes = new byte[addLength];
            byte[] addMasks = new byte[addLength];
            Arrays.fill(addBytes, (byte)0x00);
            Arrays.fill(addMasks, (byte)0x00);
            this.fullBytes = MaskedBytes.extend(this.templateBytes, addBytes, addMasks);
            this.fullBytes = MaskedBytes.extend(this.fullBytes, tailBytes);
        }
    }
    public MaskedBytes getTemplateBytes() {
        return templateBytes;
    }
    public MaskedBytes getTailBytes() {
        return tailBytes;
    }
    public MaskedBytes getFullBytes() {
        return fullBytes;
    }
    public int getCrc16Length() {
        return crc16Length;
    }
    public short getCrc16() {
        return crc16;
    }
    public int getModuleLength() {
        return moduleLength;
    }
    public List getModules() {
        return modules;
    }
}

Now the main thing: we write code to create all these classes:


Parsing one taken line of a pat file
private List parseModuleData(String s) {
    List res = new ArrayList();
    if (s != null) {
        Matcher m = modulePat.matcher(s);
        while (m.find()) {
            String __offset = m.group(1);
            ModuleType type = __offset.startsWith(":") ? ModuleType.GLOBAL_NAME : ModuleType.REF_NAME;
            type = (type == ModuleType.GLOBAL_NAME && __offset.endsWith("@")) ? ModuleType.LOCAL_NAME : type;
            String _offset = __offset.replaceAll("[:^@]", "");
            long offset = Integer.parseInt(_offset, 16);
            String name = m.group(2);
            res.add(new ModuleData(offset, name, type));
        }
    }
    return res;
}

Parsing all pat file lines
private void parse(List lines) {
    modulesCount = 0L;
    signatures = new ArrayList();
    int linesCount = lines.size();
    monitor.initialize(linesCount);
    monitor.setMessage("Reading signatures...");
    for (int i = 0; i < linesCount; ++i) {
        String line = lines.get(i);
        Matcher m = linePat.matcher(line);
        if (m.matches()) {
            MaskedBytes pp = hexStringToMaskedBytesArray(m.group(1));
            int ll = Integer.parseInt(m.group(2), 16);
            short ssss = (short)Integer.parseInt(m.group(3), 16);
            int llll = Integer.parseInt(m.group(4), 16);
            List modules = parseModuleData(m.group(5));
            MaskedBytes tail = null;
            if (m.group(6) != null) {
                tail = hexStringToMaskedBytesArray(m.group(6));
            }
            signatures.add(new SignatureData(pp, ll, ssss, llll, modules, tail));
            modulesCount += modules.size();
        }
        monitor.incrementProgress(1);
    }
}

The code for creating the function where one of the signatures was recognized:


Function Creation
private static void disasmInstruction(Program program, Address address) {
    DisassembleCommand cmd = new DisassembleCommand(address, null, true);
    cmd.applyTo(program, TaskMonitor.DUMMY);
}
public static void setFunction(Program program, FlatProgramAPI fpa, Address address, String name, boolean isFunction, boolean isEntryPoint, MessageLog log) {
    try {
        if (fpa.getInstructionAt(address) == null)
            disasmInstruction(program, address);
        if (isFunction) {
            fpa.createFunction(address, name);
        }
        if (isEntryPoint) {
            fpa.addEntryPoint(address);
        }
        if (isFunction && program.getSymbolTable().hasSymbol(address)) {
            return;
        }
        program.getSymbolTable().createLabel(address, name, SourceType.IMPORTED);
    } catch (InvalidInputException e) {
        log.appendException(e);
    }
}

The most difficult place, as mentioned earlier, is counting a link to another name / variable (perhaps the code needs to be improved):


Link Count
public static void setInstrRefName(Program program, FlatProgramAPI fpa, PseudoDisassembler ps, Address address, String name, MessageLog log) {
    ReferenceManager refsMgr = program.getReferenceManager();
    Reference[] refs = refsMgr.getReferencesFrom(address);
    if (refs.length == 0) {
        disasmInstruction(program, address);
        refs = refsMgr.getReferencesFrom(address);
        if (refs.length == 0) {
            refs = refsMgr.getReferencesFrom(address.add(4));
            if (refs.length == 0) {
                refs = refsMgr.getFlowReferencesFrom(address.add(4));
                Instruction instr = program.getListing().getInstructionAt(address.add(4));
                if (instr == null) {
                    disasmInstruction(program, address.add(4));
                    instr = program.getListing().getInstructionAt(address.add(4));
                    if (instr == null) {
                        return;
                    }
                }
                FlowType flowType = instr.getFlowType();
                if (refs.length == 0 && !(flowType.isJump() || flowType.isCall() || flowType.isTerminal())) {
                    return;
                }
                refs = refsMgr.getReferencesFrom(address.add(8));
                if (refs.length == 0) {
                    return;
                }
            }
        }
    }
    try {
        program.getSymbolTable().createLabel(refs[0].getToAddress(), name, SourceType.IMPORTED);
    } catch (InvalidInputException e) {
        log.appendException(e);
    }
}

And, the final touch - apply the signatures:


applySignatures()
public void applySignatures(ByteProvider provider, Program program, Address imageBase, Address startAddr, Address endAddr, MessageLog log) throws IOException {
    BinaryReader reader = new BinaryReader(provider, false);
    PseudoDisassembler ps = new PseudoDisassembler(program);
    FlatProgramAPI fpa = new FlatProgramAPI(program);
    monitor.initialize(getAllModulesCount());
    monitor.setMessage("Applying signatures...");
    for (SignatureData sig : signatures) {
        MaskedBytes fullBytes = sig.getFullBytes();
        MaskedBytes tmpl = sig.getTemplateBytes();
        Address addr = program.getMemory().findBytes(startAddr, endAddr, fullBytes.getBytes(), fullBytes.getMasks(), true, TaskMonitor.DUMMY);
        if (addr == null) {
            monitor.incrementProgress(sig.getModules().size());
            continue;
        }
        addr = addr.subtract(imageBase.getOffset());
        byte[] nextBytes = reader.readByteArray(addr.getOffset() + tmpl.getLength(), sig.getCrc16Length());
        if (!PatParser.checkCrc16(nextBytes, sig.getCrc16())) {
            monitor.incrementProgress(sig.getModules().size());
            continue;
        }
        addr = addr.add(imageBase.getOffset());
        List modules = sig.getModules();
        for (ModuleData data : modules) {
            Address _addr = addr.add(data.getOffset());
            if (data.getType().isGlobal()) {
                setFunction(program, fpa, _addr, data.getName(), data.getType().isGlobal(), false, log);
            }
            monitor.setMessage(String.format("%s function %s at 0x%08X", data.getType(), data.getName(), _addr.getOffset()));
            monitor.incrementProgress(1);
        }
        for (ModuleData data : modules) {
            Address _addr = addr.add(data.getOffset());
            if (data.getType().isReference()) {
                setInstrRefName(program, fpa, ps, _addr, data.getName(), log);
            }
            monitor.setMessage(String.format("%s function %s at 0x%08X", data.getType(), data.getName(), _addr.getOffset()));
            monitor.incrementProgress(1);
        }
    }
}

Тут можно рассказать об одной интересной функции: findBytes(). С её помощью можно искать определённые последовательности байт, с указанными битовыми масками для каждого байта. Метод вызывается так:


Address addr = program.getMemory().findBytes(startAddr, endAddr, bytes, masks, forward, TaskMonitor.DUMMY);

В итоге возвращается адрес, с которого начинаются байты, либо null.


Пишем анализатор


Давайте сделаем красиво, и не будем применять сигнатуры, если мы не хотим, а позволим выбирать этот шаг пользователю. Для этого необходимо будет написать свой собственный анализатор кода (вы могли видеть подобные в этом списке — это всё они, да):



Итак, чтобы вклиниться в этот список, нужно будет унаследоваться от класса AbstractAnalyzer и переопределить некоторые методы:


  1. Конструктор. Должен будет вызвать конструктор базового класса с указанием имени, описания анализатора, и его типа (об этом позже). У меня выглядит как-то так:

public PsxAnalyzer() {
    super("PSYQ Signatures", "PSX signatures applier", AnalyzerType.INSTRUCTION_ANALYZER);
}

  1. getDefaultEnablement(). Определяет, доступен ли наш анализатор всегда, или же только при выполнении каких-то условий (например, если используется наш же загрузчик).
  2. canAnalyze(). Можно ли вообще использовать данный анализатор на загружаемом бинарном файле.
    Пункты 2 и 3 можно в принципе проверять одной единственной функцией:

public static boolean isPsxLoader(Program program) {
    return program.getExecutableFormat().equalsIgnoreCase(PsxLoader.PSX_LOADER);
}

Где PsxLoader.PSX_LOADER хранит имя загрузчика, и определено ранее в нём же.


Итого, имеем:


@Override
public boolean getDefaultEnablement(Program program) {
    return isPsxLoader(program);
}
@Override
public boolean canAnalyze(Program program) {
    return isPsxLoader(program);
}

  1. registerOptions(). Вовсе не обязательно переопределять этот метод, но, если нам нужно что-то до анализа спросить у пользователя, например, путь к pat-файлу, то лучше всего это делать в данном методе. Получаем:

private static final String OPTION_NAME = "PSYQ PAT-File Path";
private File file = null;
@Override
public void registerOptions(Options options, Program program) {
    try {
        file = Application.getModuleDataFile("psyq4_7.pat").getFile(false);
    } catch (FileNotFoundException e) {
    }
    options.registerOption(OPTION_NAME, OptionType.FILE_TYPE, file, null,
            "PAT-File (FLAIR) created from PSYQ library files");
}

Тут необходимо пояснить. Статический метод getModuleDataFile() класса Application возвращает полный путь к файлу в каталоге data, который имеется в дереве нашего модуля, и может хранить любые необходимые файлы, на которые мы захотим позже сослаться.


Ну а метод registerOption() регистрирует опцию с именем указанным в OPTION_NAME, типом File (т.е. у пользователя будет возможность выбирать файл через обычное диалоговое окно), дефолтным значением и описанием.


Далее. Т.к. нормальной возможности потом сослаться на зарегистрированную опцию у нас не будет, потребуется переопределить метод optionsChanged():


@Override
public void optionsChanged(Options options, Program program) {
    super.optionsChanged(options, program);
    file = options.getFile(OPTION_NAME, file);
}

Здесь мы просто обновляем глобальную переменную согласно новому значению.


Method added(). Now the main thing: the method that will be called when the analyzer starts. In it we will receive a list of addresses available for analysis, but we only need those that contain code. Therefore, you need to filter. Final code:


Added () method
@Override
public boolean added(Program program, AddressSetView set, TaskMonitor monitor, MessageLog log)
        throws CancelledException {
    if (file == null) {
        return true;
    }
    Memory memory = program.getMemory();
    AddressRangeIterator it = memory.getLoadedAndInitializedAddressSet().getAddressRanges();
    while (!monitor.isCancelled() && it.hasNext()) {
        AddressRange range = it.next();
        try {
            MemoryBlock block = program.getMemory().getBlock(range.getMinAddress());
            if (block.isInitialized() && block.isExecute() && block.isLoaded()) {
                PatParser pat = new PatParser(file, monitor);
                RandomAccessByteProvider provider = new RandomAccessByteProvider(new File(program.getExecutablePath()));
                pat.applySignatures(provider, program, block.getStart(), block.getStart(), block.getEnd(), log);
            }
        } catch (IOException e) {
            log.appendException(e);
            return false;
        }
    }
    return true;
}

Here we go through the list of addresses that are executable, and try to apply signatures there.



Conclusions and the ending


Look like that's it. In fact, there is nothing super complicated here. There are examples, the community is lively, you can safely ask about what is not clear while writing the code. Bottom line: a working loader and an analyzer of executable files Playstation 1.



All source codes are available here: ghidra_psx_ldr
Releases here: Releases


Also popular now: