String Encoding

There are a couple of different string encoding methods used in the different NBT formats.

The usual string encoding scheme is called UTF-8.

Bedrock Edition uses UTF-8 to encode strings but has been known to store non-UTF-8 byte sequences in TAG_String fields.

Java Edition uses a modified version of UTF-8 implemented by the Java programming language.

In order to handle the various encoding schemes an encoder/decoder function can be specified when reading or saving binary NBT data.

The following functions are provided to give to the read/write functions.

amulet_nbt.utf8_decoder(b)[source]

Standard UTF-8 decoder

amulet_nbt.utf8_encoder(s)[source]

Standard UTF-8 encoder

amulet_nbt.utf8_escape_decoder(b)[source]

UTF-8 decoder that escapes error bytes to the form ␛xFF

amulet_nbt.utf8_escape_encoder(s)[source]

UTF-8 encoder that converts ␛x[0-9a-fA-F]{2} back to individual bytes

The mutf8 library is a third party library (not developed by us) used to encode and decode Modified UTF-8.

mutf8.decode_modified_utf8()[source]

Decodes a bytestring containing MUTF-8 as defined in section 4.4.7 of the JVM specification.

Parameters:

s – A byte/buffer-like to be converted.

Returns:

A unicode representation of the original string.

mutf8.encode_modified_utf8()[source]

Encodes a unicode string as MUTF-8 as defined in section 4.4.7 of the JVM specification.

Parameters:

u – Unicode string to be converted.

Returns:

The encoded string as a bytes object.