Unicode

UTF-16

The encoding format UTF-16 is the oldest one of all Unicode encoding formats and is optimized for the most commonly used characters of the Basic Multilingual Plane (BMP). Unicode characters whose code is in the range of U+0000 to U+FFFF, are in the Basic Multilingual Plane. These are Latin and other European writings and their symbols, African and Asian characters. The characters in this field will be mapped directly to the two bytes (16 bits) of a UTF-16 code unit.

Thus the encoding UTF-16 is best suited for characters of this area, even if it requires twice memory in comparison with the codings UTF-8 and ANSI for texts consisting of ASCII or ANSI characters, because for ASCII characters only one byte (instead of two bytes) is used to store ASCII characters in UTF-8 and ANSI encodings.

UTF-16 Little Endian is used as internal representation of strings in Windows 2000/XP/2003/Vista and that is what is understood in the Windows Notepad with the encoding "Unicode". Other operating systems like Mac OS X, or Symbian also uses UTF-16 as the default encoding.

Both, Big Endian and Little Endian can be used to save UTF-16 encoded texts. The Byte Order Mark (BOM) for UTF-16 Big Endian is FE FF and FF FE for UTF-16 Little Endian. See Endianness and Byte Order Mark for more information.