InfoCenter

UTF-7

UTF-7 is an encoding that is used to encode Unicode characters by using only the range of ASCII characters. This encoding has the advantage that even in environments or operating systems that understand only 7-bit ASCII, Unicode characters can be represented and transferred.

For example, some Internet protocols such as SMTP for email, only allow the 128 ASCII characters and all other major bytes are not allowed. All of the other UTF encodings use at least 8 bits, so that they can not be used for such purposes.

The characters A to Z, a to z, 0 to 9 and the special characters ' ( ) , . / : - ? remain in the coding as they are. Thus, texts that are predominantly composed of ASCII characters remain largely readable. The ASCII characters ! " # $ % & * ; < = > @ [ ] ^ _ ` { | } can be remained as they are, but should be coded, since they may not be understood by all programs and protocols. All other characters are encoded and also converted to ASCII characters. The + marks the beginning of such an encoding, the - (or any other character which can not occur in the encoding) marks the end.

The German word for cheese "Käse", for instance, would be coded as K+AOQ-se. The ASCII characters K, s and e remain the same, while "ä" is converted to AOQ (other ASCII characters). The beginning and the end of this sequence are marked with - and +.

Although UTF-7 has a large coding efficiency, it could not prevail because the decoding and encoding is relatively difficult, encodings like UTF-8 can be understood by most software and almost always the 7-bit limitation does not matter much.