Unicode

UTF-32

In the Unicode encoding UTF-32 each character is encoded with four bytes (32 bits). The result is a larger memory requirement compared to all other codes, since all other encodings have a variable byte length. But it arises also the advantage that UTF-32 encoded files or streams are easier to handle, because each byte has exactly his place and there is no variable length.

One advantage of this encoding is, that a particular character can be accessed easily in memory and it is no problem to determine the length of a text, because you only have to divide the number of used bytes by four to get the number of characters.

A key disadvantage is the larger memory requirement. In comparison to texts consisting of Latin letters, which are stored in UTF-7, UTF-8 or ANSI, the memory requirement of UTF-32 encoding is four times larger. Even when you are using other characters like Cyrillic or Greek letters, UTF-32 needs much more memory, because in all other encodings only less and unusual characters are encoded with four bytes.

UTF-32 can be stored both as Big Endian and Little Endian. The byte order mark at a storage as Big Endian is 00 00 FE FF, as Little Endian accoring FF FE 00 00. See Endianness and Byte Oder Mark for more information.