Unicode UTF-32
Unicode
UTF-32
In the Unicode encoding UTF-32 each character is encoded with four bytes (32 bits). The result is a larger memory requirement compared to all other codes, since all other encodings have a variable byte length. But it arises also the advantage that UTF-32 encoded files or streams are easier to handle, because each byte has exactly his place and there is no variable length.
One advantage of this encoding is, that a particular character can be accessed easily in memory and it is no problem to determine the length of a text, because you only have to divide the number of used bytes by four to get the number of characters.
A key disadvantage is the larger memory requirement. In comparison to texts consisting of Latin letters, which are stored in UTF-7, UTF-8 or ANSI, the memory requirement of UTF-32 encoding is four times larger. Even when you are using other characters like Cyrillic or Greek letters, UTF-32 needs much more memory, because in all other encodings only less and unusual characters are encoded with four bytes.
UTF-32 can be stored both as Big Endian and Little Endian. The byte order mark at a storage as Big Endian is 00 00 FE FF, as Little Endian accoring FF FE 00 00. See Endianness and Byte Oder Mark for more information.

Do you know?
Pipette
With the pipette you are able to edit, convert and pick up colors from your screen. You also learn a lot about the topic colors.
To the Pipette
 
© Stefan Trost Media 2007-2010 | Printable Version | Deutsch | Software Licence | Imprint