Skyrim Mod:String Table File Format

With each plugin/master file that contains an lstring (lookup string) datatype, there is an accompanying set of string tables in Data\Strings. The naming convention is the plugin/master filename then an underscore then the language and the extension. For example, English Skyrim.esm has 'Skyrim_English' as the base filename. There are 3 files with different extensions (DLSTRINGS, ILSTRINGS, STRINGS), the significance of which appears to be that DLSTRINGS contains Journal/Book entries, ILSTRINGS has subtitled conversations and STRINGS contains general strings like item names. With the exception of STRINGS having a slightly different string data format, they share the same format.

The string files are simple uncompressed data with a layout that consists of an 8-byte header that contains the count of strings and the total size of the string data at the end of the file. This is followed by a series of 8-byte structs that consist of the string ID for reference and a relative offset to the string from the beginning of the string data.

The string data itself has 2 formats that are only slightly different, the .STRINGS file has simple null-terminated (C-style) strings, while the .ILSTRINGS and .DLSTRINGS also have null-terminated strings but additionally have a uint32 preceding the string that declares the length.

HeaderEdit

Type/Size	Info
uint32	Number of entries in the string table.
uint32	Size of string data that follows after header and directory.
Directory Entry[count]	Directory (see below).
uint8[dataSize]	Raw data.

Directory EntryEdit

Directory entries are simple 8-byte structs that consist of two uint32, the first being the ID used by mod files to refer to it and the second is the offset from the beginning of the string data to the string itself. These entries are not required to be sequential, and additionally while the ID is unique the offset is not (eg 2 different IDs can point to the same string).

Type/Size	Info
uint32	String ID
uint32	Offset (relative to beginning of data) to the string. These entries are not required to be sequential. See String Data below.

String DataEdit

There are 2 slightly different types of string data, depending on the file extension.

.stringsEdit

Null-terminated C-style string.

Type/Size	Info
zstring	Null-terminated string data.

.dlstrings, .ilstringsEdit

Also null-terminated C-style string but has an additional uint32 that specifies length preceding the string data. The length includes the null terminator.

Type/Size	Info
uint32	Length of following string, including null-terminator.
uint8[length]	Null-terminated string data.

String EncodingsEdit

The string encodings supported by Skyrim are decided by the "fonts_en.swf" file in the "Skyrim - Interface.bsa", which varies between languages. The following table gives the known supported localizations of the "fonts_en.swf" file (which all have the same filename - the "_en" substring is confusingly not indicative of target language) and corresponding encodings. Blank boxes are unknown.

Localization	Primary Encoding	Secondary Encoding
English	UTF-8	Windows-1252
French	UTF-8	Windows-1252
German	UTF-8	Windows-1252
Italian	UTF-8	Windows-1252
Spanish	UTF-8	Windows-1252
Polish	UTF-8	Custom
Czech		Custom
Russian	UTF-8	Windows-1251
Japanese	UTF-8

The official translations all use the secondary encoding given in the table above, apart from Japanese. Polish and Czech use a custom Windows-1250-based encoding with the following character set (note that original ů and ý characters are not used):

ąáłśźżćščéęťěíůď
ýńňóžőö÷ř?úűü?ţ˙

Skyrim first attempts to interpret a string as encoded in its primary encoding, but if it contains invalid byte sequences then the secondary encoding is used to interpret it. It is unknown what happens if the string also contains invalid bytes when interpreted using its secondary encoding (eg. by including unused bytes).

Note that interpretation is done after alias lookup and substitution, so if the string used for an alias is in a different encoding to the string containing the alias, the combined string will not be displayed correctly. Note also that each localization's fonts include incomplete character support, eg. the English localization's font cannot display Cyrillic characters even when strings are encoded in UTF-8, nor can it display some of the lesser-used characters available in Windows-1252.

There also appears to be a lack of UTF-8 support in certain circumstances, thus far reported for text in scripts. In these circumstances it appears that the secondary encoding is used, but this issue has not yet been investigated.

UESPWiki β