
Xenoblade Chronicles - game data parsing

Hello! My name is Artem, in tyrnets it is better known under the idiotic nickname TTEMMA , but not the point. I am one of the founders of the amateur group of translators Russian Studio Video 7 and the only romhacker-programmer in this team.
My team and I were the first to present Resident Evil fans with translations of two iconic games on the Nintendo GameCube - Resident Evil Remake and Resident Evil Zero , someday I’ll talk about how we did it all, but in this topic I’d like to talk about a luxury game like the Xenoblade Chronicles on Nintendo Wiiand how the romhack of this game happened and continues to happen. In this game, everything is made in the Japanese style, it is strange and at some points you just ask yourself the question “Why?”, But then you remember how many strange people the Japanese are and these questions disappear. Well, let's get started?
Foreword
Xenoblade Chronicles is that game that is worth it, no, you even need to purchase Nintendo Wii . JRPG, a large open world, a bunch of auxiliary quests and an exciting storyline that will delay the passage of the game not for weeks, but for months. Everyone familiar with the Nintendo Wii knows that the console is designed for weak colorful family games like Super Mario , etc., but what Monolith Soft created is worthy of praise, their Xenoblade Chronicles has beautiful and beautiful graphics, despite the huge technical limitations of the console (only Resident Evil Remake and Resident Evil Zero can compete in terms of graphics ).

Having looked at the game with the team, we decided that it needs to be translated into our great and mighty one. But as you know, without understanding the technical component of the game, it’s clearly not worth undertaking a translation. And now we’ll talk about the technical component.
Technical details
I’ll talk a little about Wii itself and about what this topic will not discuss.
That the Nintendo GameCube , that the Nintendo Wii runs on an IBM processor with PowerPC architecture . This processor works in Big-Endian mode, this is important to remember (as for me, hacking Big-Endian files is much more convenient than Little-Endian due to the byte order).
Fortunately, Nintendo took great care of game developers and provided just a huge number of formats for any purpose in their SDKs, and it is not about these that this topic will be discussed in. Maybe later I will talk about them in detail, but my laziness is unlikely to allow it. Highlight from Nintendo FormatsI want only one - BRFNA (Binary Revolution Font), and it will be discussed further.
Font
The font in Xenoblade Chronicles (hereinafter XC) is stored in a standard, but rarely used in games format with the BRFNA extension.
There are only two standard font formats for Wii :
- BRFNT (more popular)
- BRFNA (rarely used)
I will not delve into their structure, but only talk about the differences:
- A new block has been added to BRFNA that stores encoding information (ansi, kanji, european, etc.)
- In BRFNA, textures with symbols are compressed in formats unknown to me, while in BRFNT they lie in the clear and are easily edited.
BRFNA managed to cause a lot of problems, firstly, an unknown type of compression, and secondly, a slightly strange separation of encodings. Oddly enough, the official font converter from Nintendo ’s 3DS SDK rescued us from this situation . But there were problems with it too, I had to study the used encodings in XC itself , write separate configuration files for the texture converter and play with the settings of the converter itself so that the font was identical to the original. And oh good, after several days of torment, I was able to deduce Russian letters using Russian character codes from UTF8.

True, the game rested for a long time due to the size of the new font and crashed at the very beginning of the game's download. At first there were suspicions that my crooked hands were doing something wrong, but after I removed the umlauts from the font, the game started quietly. But I categorically did not want to remove the umlauts, so I approached from the other side, I just changed the texture format from IA4 (4 bits per color, 4 bits to transparency) to just I4 (4 bits per color, without transparency) and voila, XC shot up like darling.
Why did I decide to change the texture format? Because I can! Well, to be honest, this did not degrade the quality of the characters. Character output in this game works in such a way that it outputs only the alpha channel, without using the main channel at all, but if you use the font format without transparency, then there is nothing to use except for the main channel. Disgrace, I thought, and decided to do without transparency, so as not to litter the place.
At this point, the work with the font was completed and deleted from the list of tasks.
PKB \ PKH - file containers
I started my story not at all with that. To get to many basic files, you will have to somehow extract them from the PKB container.
PKB is just a container, without any pointers, sizes and file names. All you can notice is a bunch of files aligned to 2048 bytes.
PKB example

The most interesting is stored in PKH files, but you have to try to get to them. All PKH files are in a separate archive for each U8 language named static.arc.
STATIC.ARC English

PKH is a very strange markup for PKB, which stores the size, pointer and index of the file. From the index, the game itself somehow gets the full file name, but I did not deal with it, because it is too dreary and pointless.
I could not disassemble the structure of this container to the end, but enough was studied to extract and pack the files.
PKH can be divided into 2 blocks: Header and Entry, which I did.
public class pkhModuleEntry
{
public uint ID;
public uint unk;
public ushort sizeFile;
public uint offsetFile;
public pkhModuleEntry()
{
ID = unk = offsetFile = sizeFile = 0;
}
}
public class pkhModule
{
uint Magic;
uint version;
uint tableOffset;
uint pkhSize;
uint countFiles;
pkhModuleEntry[] entry;
string[] extensions;
...
}
Entry at us begins with the pointer tableOffset. Only here the problem is that entry is divided into several blocks, loading all the information about the files is as follows:
for (int i = 0; i < countFiles; i++)
{
entry[i] = new pkhModuleEntry();
entry[i].ID = mainPkhSfa.ReadUInt32();
entry[i].unk = mainPkhSfa.ReadUInt32();
}
for (int i = 0; i < countFiles; i++)
entry[i].sizeFile = mainPkhSfa.ReadUInt16();
for (int i = 0; i < countFiles; i++)
entry[i].offsetFile = mainPkhSfa.ReadUInt32();
From the code above, you can understand that all information about files is divided into 3 blocks:
- File indices and unknown value
- File sizes
- File pointers
You can notice that the pointer to a specific file is stored in uint32, that is, in a 4-byte variable, but the size, for some reason, in 2-byte. I will explain this flaw, as I said above, in PKB files are aligned at 2048 bytes and this was done not without reason. The file size is indicated not in bytes, but in the amount of data blocks. For example, the file size is 0xC, therefore the size in PKB will be 0xC * 0x800 = 0x6000.
PKH example

Having studied this structure, the unpacker / packer was quickly riveted and I began to study the containers that store the text.
Text containers
As always, the Japanese have done oddities in their game. After a long study of game containers, 3 fronts with game text were highlighted:
- The BDAT container - stores in itself some data and lines, priority system ones (menus, trade, settings).
- SB container - stores scripts and lines with conversations with residents.
- Container REV - stores the data and lines used in cut scenes.
The Japanese approached their lines very well, but we did not like this fact at all.
Only strings are encrypted in each container, this would not be a problem if only one encryption algorithm were used. But alas, the Japanese decided for each container to develop their own encryption algorithm, which created a lot of problems for us.
In this topic, I will only talk about the BDAT container and its encryption algorithm, I will not say anything about encryption in the SB container, but I can not say anything about encryption in the REV container, because while he is in the process of hacking.
BDAT Container

The very first container that hacked me was BDAT. A quick glance, it was difficult to understand that he stores the text in himself. But we didn’t do it with a finger, so we immediately went to google about this format. Some information on the structure of this container was found on the foreign forum and proofs were provided that the text was stored there. Even the software was found that extracts it, but for some reason he didn’t eat my files. After searching through foreign forums, I realized that their version of the game contains plain text, but I don’t see this in my files. Flows of information and various assumptions immediately flowed into my head, and only one was true - the Japanese were encrypted, encrypted. There is only one thing left, to figure out how.
After several manipulations, I had a memory dump in my hands with decrypted BDAT and the original, the process of analyzing these files began. After spending a lot of time comparing files, I could not figure out the encryption. I did not see any patterns and there was only one way out - to debut!
Unfortunately, Dolphin has a shitty debugger (or I just got too fed up and got used to the PCSX debugger, where there are all the possible functions for debugging). I needed to find out in which area of memory BDAT is decrypted and put the bric on it there, but Dolphin can put the break only on the command at the address, but on reading / writing from the opr. RAM does not know how, this has become a problem. The search for Dolphin with additional functions for debugging began and one was found - Dolphin DebugFast based on version 4, it added only one feature - read / write RAM RAM, what is needed, I thought, and proceeded to further hack.
Having found in my memory a section with the data I need, I set up a brick and began to study how the game decrypts its BDAT. Everything turned out to be simple and at the same time interesting. There is a 2 byte key in BDAT, the first byte is loaded into the R5 register, the second byte in R0, respectively, there is also a Boolean variable, which at the beginning of the decryption is set to 1 (true).
If the Boolean variable is set to 1, then the decryption takes place using the register R5, if it is 0, then the decryption takes place using the register R0.
Encryption is based on a simple XOR, the decryption order is as follows:
- Encrypted Byte = Encrypted Byte ^ R (5 or 0)
- R (5 or 0) = (Encrypted Byte + R (5 or 0)) & 0xFF
- Change boolean to the opposite value
C # Code:
public static void BDAT_DecryptPart(int offset, int size, ushort key, MemoryStream data)
{
data.Position = offset;
int endOffset = offset + size;
if (endOffset > data.Length)
endOffset = (int)data.Length;
bool reg = true;
byte _r0 = (byte)(0xFF - (key & 0xFF));
byte _r5 = (byte)(0xFF - (key >> 8 & 0xFF));
byte inByte = 0;
while (offset < endOffset)
{
inByte = data.GetBuffer()[offset];
if (reg)
{
data.GetBuffer()[offset] = (byte)(inByte ^ _r5);
_r5 = (byte)((_r5 + inByte) & 0xFF);
reg = false;
}
else
{
data.GetBuffer()[offset] = (byte)(inByte ^ _r0);
_r0 = (byte)((_r0 + inByte) & 0xFF);
reg = true;
}
offset += 1;
}
}
Encryption is designed very interestingly, each next byte depends on the past, and even with alternation, brilliant! Moreover, the resources for decryption are almost expended, but it is not possible to understand the essence of the algorithm without debugging.
Having finished with encryption, I began to understand the structure of BDAT itself. After deciphering the string data, at the beginning of the file some names were noticed, more like the name of some blocks.
Example
Encrypted block with 0x2C - 0x66.


But I postponed the analysis of this block, and decided to deal with the general structure. By means of a difficult analysis, it was revealed that the Header takes only 0x20 bytes, its structure I described below.
I will not go deep as I have defined all this, but simply tell you what each of these bytes means.
class header
{
public uint magic;
public byte mode;
public byte unk;
public ushort offsetToNameBlock;
public ushort sizeTableStruct;
public ushort unkTableOffset;
public ushort unk2;
public ushort offsetToMainData;
public ushort countEntryMain;
public ushort unk3; public ushort unk4;
public ushort cryptKey;
public uint offsetToStringBlock;
public uint sizeStringBlock;
...
}
- Magic is constant and always equal to BDAT (ansi)
- Mode - 1: no encryption, 3: encryption available
- unk - as you know, it is unknown, but this byte is always zero
- offsetToNameBlock - pointer to an encrypted block with block names
- sizeTableStruct - the size of one block with all the data
- unkTableOffset - a pointer to a table that I could not parse to the end
- unk2 - unknown, but always 0x3D
- offsetToMainData - pointer to a block containing all data
- countEntryMain - the number of blocks by the offsetToMainData pointer (the size of the MainData block can be calculated this way: sizeTableStruct * countEntryMain)
- unk3 - unknown, always 0x01
- unk4 - unknown, always 0x02
- cryptKey - 2 byte decryption key
- offsetToStringBlock - pointer to a block with text
- sizeStringBlock - block size with text (equal to 0 if there is no text)
After Header, unknown data goes to offsetToNameBlock, as it turned out this information about blocks in MainData has the following structure:
class typeStruct
{
public byte unk;
public byte type;
public ushort idx;
...
}
- unk - unknown
- type - data type
- idx - pointer in MainData (the exact pointer is calculated like this: offsetToMainData + (IndexStructure * sizeTableStruct) + idx
And the last block remained - offsetToNameBlock, it has the following structure:
class nameBlock
{
public string bdatName;
public nameBlockEntry[] nameEntry;
public nameBlock(StreamFunctionAdd sfa, int countName)
{
bdatName = sfa.ReadAnsiStringStopByte();
sfa.SeekValue(2);
nameEntry = new nameBlockEntry[countName];
for (int i = 0; i < countName; i++)
{
nameEntry[i] = new nameBlockEntry(sfa);
}
}
}
class nameBlockEntry
{
public ushort offsetToStructType;
public ushort unk;
public string name;
public typeStruct type;
public nameBlockEntry(StreamFunctionAdd sfa)
{
offsetToStructType = sfa.ReadUInt16();
unk = sfa.ReadUInt16();
name = sfa.ReadAnsiStringStopByte();
type = new typeStruct(sfa, offsetToStructType);
sfa.SeekValue(2);
}
}
I want to select only the variable countName, which is not found anywhere in Header, but it can be calculated by taking the pointer to NameBlock 0x20 and dividing this number by 4. I will explain why: Header ends at 0x20, NameBlock starts far after Header, and as we know, right after Header goes information about the structure of blocks in MainData, which takes 4 bytes per structure. And so, to find out the number of such structures, you need to know the size of only information about the structures and divide by their size, that is 4.
It seems, at first glance, a complex structure, but I will try to explain in another way:
There is a block where all data is stored - MainData. This block is divided into several blocks, the number of which is described by the variable countEntryMain, and the size of one such block is described by the variable sizeTableStruct. But what data is stored in one such block is already described using the typeStruct class, the number of which can be from 1 to several. For each typeStruct there is a name that is stored in nameBlockEntry.
That's all, BDAT was disassembled, software was riveted to extract / replace text that successfully plows.
Example of retrieved rows from BDAT

Conclusion
In this topic, I tried to voice how I tried to hack one of the legendary Wii games and bring to you how the Japanese continue to do everything so that no one scours in their files.
Perhaps there will be a continuation of the analysis of formats in this game, but this is not accurate. If you liked my article, I’ll talk about how we translated Resident Evil Remake and Resident Evil Zero .
Thank you for your time!
PS This is my first article on such topics, please do not throw slippers, but it is better to immediately point out errors. Maybe he didn’t fully reveal something necessary or didn’t explain it, please indicate this so that more such errors do not happen again.