Post by zatarita on Nov 25, 2021 17:39:55 GMT 10
Hello!
Today I would like to share the knowledge I have learned about MDT files while writing a script to decompile them. This is a technical specification mapping out the binary file.
Overview
The MDT file contains text for each of the supported languages (Japanese, English, French, German, Italian, and Spanish) seen throughout the game. When talking about computers "text" is just a symbol that is mapped to a number. This process is called encoding. There are many different standards for text encoding, you've likely heard of ASCII, or utf-8/utf16. For RE4 it uses it's own internal mapping tables for this; In fact, it actually has two separate mapping tables one for Latin based languages, and the other Japanese. The Japanese contains katakana, hiragana, and kanji which makes it rather large.
Each MDT file can be in one of two configurations.
1) the mdt contains the text for all 6 supported languages.
2) the mdt contains text only for the version it's built for.
The header will give us enough information to be able to determine which version we're dealing with. It seems like most of the stage MDTs contain all the languages, the others MDTs tend to vary.
MDT files contain more information than just the text as well. We are able to use special characters to modify the flow, appearance, and interactions of text as well. We are able to do things like change the font color, reference character names, item names, chose options, the list goes on. By utilizing each of the control flow characters we can create more interesting, and interactive text.
The File Header:
The file header to a MDT file varies depending on version. If all languages are present in the file we will see the number "6" in the first 4 bytes of the file. This tells us how many offsets to expect. If this number is anything else but 6 it means the MDT file is a single language file.
On the left we have a file that contains all 6 languages.
blue = offset to Japanese
pink = offset to English
green = offset to French
orange = offset to German
purple = offset to Italian
yellow = offset to Spanish
These offsets point to the metadata for the chunk that contains text for that language.
On the right we have a file that only contains text from the version the game is built for. In my case English. This goes straight into the Chunk header for the data.
The Chunk Header
The Chunk header is at the beginning of their respective offsets (or the beginning of the file)
The first 4 bytes aren't important; however, the second 4 bytes dictates how many different strings are in the file. in our case. Following this is an array of offsets to the beginning of each string. This offset is relative to the chunk. So you'll need to add the offset we gathered from the File Header to this offset to get the actual offset of the string.
In our case the string count is 5 strings, and the offsets are 0x1c, 0x112, 0x138, 0x19c, and 0x23c
Then we have the strings themselves.
String Encoding
Each string starts with a 0, and ends with a 1.
In between these two points we have 2 byte numbers. These two byte numbers correspond to a table that determines what text symbol it will be. (The tables are included at the end) On top of this there are special characters that modify how the text is presented to the screen. These special characters can also use the next character to augment how it does that. This will make more sense with examples:
Special Characters
0x2: Insert a line from the MDT into that position, 0xE will return control back to the original string after you're done.
For example this one merged
"It won't budge! They must be pressing against it from the other side." with
"This old truck's blocking the trail. ... "
when the 0x2 character is encountered the next 2 bytes in the file is a line number, not another character.
So in this case it's 0x2 -> line # 0xC
It's important to note that if you merge two lines together, you might want to add the special character 0xE to the end of the second string. If you don't do this, and the second line ends the dialog, it will trim the remaining text from the first string.
0x3
the newline character. Nothing fancy here. Typically you only have room for one new line per "page".
0x4
New page character. This will reset the cursor to the first line, and erase existing text. Not uncommon to see this paired up with 0x8.
0x5
Unsure about this one; however, it's only seen with the "end of chapters x-x" messages.
0x6
Font color. The 2 bytes immediately after the 0x6 corresponds to the color the font will be
The font color can be one of the following colors:
White: 0-4
Grey: 5
Green: 7
Yellow: 10
Red: 11
Just the shadow of the text: 12
In our case we used color green, before calling the string merge character which made the merged text green. After we set the color back to white. Though we don't see it here because it's the end of the page.
This is just to show that it is possible to use multiple special characters together.
0x7
Options for selections. I'm unsure exactly how your result gets calculated; however, text that comes after a 0x7 is an option to a question dialog
0x8
Pause until input. This is pauses execution until you press continue
0x9
Sleep. This will pause the printing of a string for x amount of seconds. The amount of seconds is the 2 bytes that come immediately after the 0x9. This is seen in walkie talkie cutscenes.
0xA
Item amount. When you're picking up an item and the item has a quantity attached, this is a placeholder for that quantity. EG TMP ammo (8) or 1000ptas. I believe this only functions correctly when it is triggered by an event that contains a quantity of some sort.
0x10
File placeholder. This is the name of a document or file you pick up that contains text. This is the name of that document/file.
0x11
Item Placeholder. This is the name of an item as determined by the item list in core.dat 17.mdt
0x12
Character name placeholder. This is the name of a character in the game. This is almost exclusively used for walkie talkie nameplates. The 2 bytes immediately after the 0x12 determine the character name
Legal values are:
0: Leon
1: Hunnigan
2: Salazar
3: Saddler
String Tables
Lastly all we need is the table of to map the characters to the numbers.
For the latin based languages I have a complete set; however, for the Japanese characters, I think I only have about half; however, I will update more as I learn more in the future.
For sanity I included a zip file with text documents that contain them since there are a few hundred characters.
Download
Today I would like to share the knowledge I have learned about MDT files while writing a script to decompile them. This is a technical specification mapping out the binary file.
Overview
The MDT file contains text for each of the supported languages (Japanese, English, French, German, Italian, and Spanish) seen throughout the game. When talking about computers "text" is just a symbol that is mapped to a number. This process is called encoding. There are many different standards for text encoding, you've likely heard of ASCII, or utf-8/utf16. For RE4 it uses it's own internal mapping tables for this; In fact, it actually has two separate mapping tables one for Latin based languages, and the other Japanese. The Japanese contains katakana, hiragana, and kanji which makes it rather large.
Each MDT file can be in one of two configurations.
1) the mdt contains the text for all 6 supported languages.
2) the mdt contains text only for the version it's built for.
The header will give us enough information to be able to determine which version we're dealing with. It seems like most of the stage MDTs contain all the languages, the others MDTs tend to vary.
MDT files contain more information than just the text as well. We are able to use special characters to modify the flow, appearance, and interactions of text as well. We are able to do things like change the font color, reference character names, item names, chose options, the list goes on. By utilizing each of the control flow characters we can create more interesting, and interactive text.
The File Header:
The file header to a MDT file varies depending on version. If all languages are present in the file we will see the number "6" in the first 4 bytes of the file. This tells us how many offsets to expect. If this number is anything else but 6 it means the MDT file is a single language file.
On the left we have a file that contains all 6 languages.
blue = offset to Japanese
pink = offset to English
green = offset to French
orange = offset to German
purple = offset to Italian
yellow = offset to Spanish
These offsets point to the metadata for the chunk that contains text for that language.
On the right we have a file that only contains text from the version the game is built for. In my case English. This goes straight into the Chunk header for the data.
The Chunk Header
The Chunk header is at the beginning of their respective offsets (or the beginning of the file)
The first 4 bytes aren't important; however, the second 4 bytes dictates how many different strings are in the file. in our case. Following this is an array of offsets to the beginning of each string. This offset is relative to the chunk. So you'll need to add the offset we gathered from the File Header to this offset to get the actual offset of the string.
In our case the string count is 5 strings, and the offsets are 0x1c, 0x112, 0x138, 0x19c, and 0x23c
Then we have the strings themselves.
String Encoding
Each string starts with a 0, and ends with a 1.
In between these two points we have 2 byte numbers. These two byte numbers correspond to a table that determines what text symbol it will be. (The tables are included at the end) On top of this there are special characters that modify how the text is presented to the screen. These special characters can also use the next character to augment how it does that. This will make more sense with examples:
Special Characters
0x2: Insert a line from the MDT into that position, 0xE will return control back to the original string after you're done.
For example this one merged
"It won't budge! They must be pressing against it from the other side." with
"This old truck's blocking the trail. ... "
when the 0x2 character is encountered the next 2 bytes in the file is a line number, not another character.
So in this case it's 0x2 -> line # 0xC
It's important to note that if you merge two lines together, you might want to add the special character 0xE to the end of the second string. If you don't do this, and the second line ends the dialog, it will trim the remaining text from the first string.
0x3
the newline character. Nothing fancy here. Typically you only have room for one new line per "page".
0x4
New page character. This will reset the cursor to the first line, and erase existing text. Not uncommon to see this paired up with 0x8.
0x5
Unsure about this one; however, it's only seen with the "end of chapters x-x" messages.
0x6
Font color. The 2 bytes immediately after the 0x6 corresponds to the color the font will be
The font color can be one of the following colors:
White: 0-4
Grey: 5
Green: 7
Yellow: 10
Red: 11
Just the shadow of the text: 12
In our case we used color green, before calling the string merge character which made the merged text green. After we set the color back to white. Though we don't see it here because it's the end of the page.
This is just to show that it is possible to use multiple special characters together.
0x7
Options for selections. I'm unsure exactly how your result gets calculated; however, text that comes after a 0x7 is an option to a question dialog
0x8
Pause until input. This is pauses execution until you press continue
0x9
Sleep. This will pause the printing of a string for x amount of seconds. The amount of seconds is the 2 bytes that come immediately after the 0x9. This is seen in walkie talkie cutscenes.
0xA
Item amount. When you're picking up an item and the item has a quantity attached, this is a placeholder for that quantity. EG TMP ammo (8) or 1000ptas. I believe this only functions correctly when it is triggered by an event that contains a quantity of some sort.
0x10
File placeholder. This is the name of a document or file you pick up that contains text. This is the name of that document/file.
0x11
Item Placeholder. This is the name of an item as determined by the item list in core.dat 17.mdt
0x12
Character name placeholder. This is the name of a character in the game. This is almost exclusively used for walkie talkie nameplates. The 2 bytes immediately after the 0x12 determine the character name
Legal values are:
0: Leon
1: Hunnigan
2: Salazar
3: Saddler
String Tables
Lastly all we need is the table of to map the characters to the numbers.
For the latin based languages I have a complete set; however, for the Japanese characters, I think I only have about half; however, I will update more as I learn more in the future.
For sanity I included a zip file with text documents that contain them since there are a few hundred characters.
Download