InfoCenter

UTF-16

The encoding format UTF-16 is the oldest one of all Unicode encoding formats and is optimized for the most commonly used characters of the Basic Multilingual Plane (BMP). Unicode characters whose code is in the range of U+0000 to U+FFFF, are in the Basic Multilingual Plane. These are Latin and other European writings and their symbols, African and Asian characters. The characters in this field are mapped directly to the two bytes (16 bits) of a UTF-16 code unit.

Thus the encoding UTF-16 is best suited for characters of this area, even if it requires twice memory in comparison with the encodings UTF-8 and ANSI for texts consisting of ASCII or ANSI characters, because for ASCII and ANSI characters only one byte (instead of two bytes) is used to store ASCII characters in UTF-8 and ANSI encodings.

UTF-16 Little Endian is used as internal representation of strings in Windows 2000 / XP / 2003 / Vista /  7 / 10 (and in the other Windows version in between) and is what is understood in the Windows Notepad under the encoding named "Unicode". Also other operating systems like macOS, or Symbian are using UTF-16 as their default encoding.

Both, Big Endian and Little Endian can be used to save UTF-16 encoded texts. The Byte Order Mark (BOM) for UTF-16 Big Endian is FE FF and FF FE for UTF-16 Little Endian. See Endianness and Byte Order Mark for more information.