InfoCenter

Entering Unicode Characters

Of course, the input of more than 140,000 possible Unicode characters can not easily be done with a single button on a conventional keyboard because keyboards can provide only a small selection of the most common characters - for all other characters there is simply no room.

But what can we do to be able to enter any other character that we don't find on our keyboard? There are several possibilities that are presented below.

The options and notes regarding the input of Unicode characters presented in this article are divided into the following sections:

Input by using the Character Code

In many programs, such as Microsoft Word or WordPad, Unicode characters can be entered directly by using their character code (code point). For that, keep the ALT key pressed and enter the decimal code of your character on the Num Pad (Number Keyboard). For example, the combination ALT + 142 would result in the letter Ä or the combination ALT + 8364 would be the euro sign €.

It should be noted that the input must be carried out with the number pad (not with the numbers that can be found above the letters on the keyboard) and you have to activate the Num Pad (usually with the NUM LOCK key above the numeric keybad). It is more difficult to do this on laptops that do not have a separate number pad. In this case, normally the FN key can be used to access a number block, located on the letters of the keyboard.

The character codes of the characters, you can get from corresponding tables, for example published by the Unicode Consortium on the page unicode.org/charts. Please note that many of these tables only offer the hexadecimal notation of the characters, but you have to use the decimal notation with the ALT key to produce the correct sign. If necessary, you have to convert the hex code to do so.

It should also be noted that not all programs support every character code. In some programs such as old versions of Windows Notepad, only codes up to 255 are supported, all codes over 255 are divided by 256 and the character that corresponds to the remnant of division is shown. For example, ¼ instead of the € sign (8364 MOD 256 = 172).

Of course, writing Unicode characters by using the character code as described above is not very comfortable, especially when such tables are available only for hexadecimal codes or if you are using some characters frequently. An improved approach is to either select the characters directly from character maps or character lists, which are made available, for example, by all large operating systems and are also included in some programs such as Microsoft Word, LibreOffice or OpenOffice. At the same time, these programs often offer the possibility to create key combinations for frequent characters, such as it is possible with Microsoft Word, for example. A description of this opportunity, you can read in the next section "Insertion via Character Maps".

Another possibility is to create your own keyboard layout, in which your personal symbols and characters have their own keyboard key and thus are immediately available in all of your applications and software. In the next but one section "Custom Keyboard Layout" you will be informed about that.

Insertion via Character Maps

In addition to the possibility of input characters via their code, many operating systems such as Windows or macOS as well as some programs such as Microsoft Word, LibreOffice or OpenOffice offer the option of inserting special characters directly via character maps. This eliminates the tedious search for the right character code and the characters can be easily copied and inserted via the clipboard.

To show a character map, you can simply open the pre-installed program "Character Map" in Windows. On macOS, the character map is called "Characters" and can be accessed from many programs via the menu "Edit > Emoji and Symbols" (usually only emojis are displayed after opening the tool for the first time and the view of all characters must be enabled via the symbol at the top right). In programs such as office applications and other text processing programs, you can usually call up the built-in character table using functions such as "Insert > Special Character" or "Insert > Symbol" from the menu, provided the respective program provides a character table.

Of course, the character map is looking different depending on the operating system and the program. However, usually, it is a window in which all available characters or a selection of characters are listed. You can either go through all of these characters until you have found your desired sign or you can limit your search by using the search function or by selecting certain character types in the options (for example latin, mathematical symbols, punctuation marks or cyrillic letters). The available filters naturally also differ from the respective implementation of the character table. If you have found your character, you can usually copy it into the clipboard or insert it directly.

Depending on the implementation, you can also use the character map to get an overview of which characters are available in which fonts. Furthermore, depending on the system and the program, there is often the possibility to set key combinations for specific characters. This allows you to access frequently used characters faster and easier instead of always having to access the character table.

It should be noted that some of these character maps do not display all Unicode characters. Often the characters are limited to the glyphs contained in the selected font, other character tables only show the characters from the Basic Multilingual Plane (BMP). A program that shows a character map with really all Unicode characters is Babelmap that is presented in the section of the same name.

Finding Characters without Character Maps

If there is no character map available or you do not get any further with using your character map, you can also simply search for the name of a sign, symbol or character on the Internet. You will find enough pages that list the characters and their Unicode codepoints. The characters found can just be copied directly from the browser with CTRL + C and CTRL + V into another application.

Another tip is to create your own character table. A simple text file or an office document with a collection of your frequently used Unicode characters is sufficient. Every time you come across a new character that you want to use again, you can just copy the character to the file. And if you need characters that you already have in your file, you no longer have to search extensively - you can just directly copy and reuse the characters from the file instead.

Custom Keyboard Layout

An elegant method to make frequently used Unicode characters available in all programs is to create your own keyboard layout. That means, that it is your choice how the keys on your keyboard are used and that you can switch between multiple keyboard layouts if you want. For example, you can simply set the copyright sign on the shortcut ALT GR + C or make any other change in your current keyboard layout. That saves a lot of time looking for characters in tables and is more comfortable than memorizing a lot of codes - especially if you always use the same unicode characters again.

Your own keyboard layout can easily be created with programs such as the Microsoft Keyboad Layout Creator (here you get to the download) or the Keyboard Layout Manager. Both programs perform basically the same purpose, the Keyboard Layout Manager is just a much leaner program than the alternative from Microsoft.

With the programs, it is possible to adapt existing layouts or to create entirely new ones. The program will automatically create the appropriate installation package for your custom keyboard layout.

Input in HTML and XML

If you want to use a certain Unicode character in HTML or XML, you can also therefore directly use the code of the character. For decimal codes, the notation is � and � for hexadecimal codes. The code can be entered directly in this form into the HTML source code or the XML file, where 0000 have to be replaced by the code of the Unicode character in decimal respectively hexadecimal notation. That is, for example, © or © for the copyright sign ©.

In HTML, there is yet another way to enter Unicode and special characters. This is called named entities or HTML entities, which means that you can enter a name for some characters in a HTML file that can be specified in the form "&name;". For example, the coding for Ä is Ä (A-Umlaut - ä is the ä in lowercase letters), © stands for the copyright sign ©, € for the euro currency symbol € or   for a non-breaking space. A list of related characters and codes is available here. However, of course, not every Unicode character has its own HTML entity.

Named entities in HTML and XML also have a special significance in the event that characters with a meaning within the XML syntax should be written as a visible text into the HTML source text or into XML files. Specifically, it is about the pointed brackets < and > as well as the quotation mark " and the &-sign that can be written as &lt; (lower than), &gt; (greater than), &quot; (quote) and &amp; (ampersand) without disturbing the XML syntax.

However, the possibility of inserting Unicode characters in HTML or XML files via their character code or with the help of named entities is becoming less important today. At the beginning of the 2000s, most websites used ASCII or ANSI encodings such as Latin-1, with which only a limited number of characters could be displayed directly. If someone wanted to use Unicode characters with code points outside the supported range, there was no other change than to use named entities or other workarounds. Today, almost all websites are using the encoding UTF-8 with which all Unicode characters can be displayed. Therefore, in most cases, the old aids are no longer necessary because the characters can be written directly into the file. Of course with the exception of characters that have a meaning in the syntax. Their use is also still useful for signs that are difficult to recognize as such in the code, such as non breaking spaces or conditional separations.

Unicode in Microsoft Word, WordPad and LibreOffice

From Microsoft Word 2002 on, you can enter a Unicode character directly via its code. To do this, just enter the hexadecimal code of the character directly as text into the Word document and then press the key combination ALT + C (ALT + X in dialog fields). The requested Unicode character that hides behind this code then automatically appears. The same key combination can also be used to show the code of the sign or character that is currently in front of the cursor. So, with the repeated press of ALT + C, it can be switched between the code and the character.

Similarly, it works with the application WordPad from Microsoft, which is usually pre-installed on Windows systems. Here, however, we have to use the key combination ALT + X to convert Unicode character codes into Unicode characters or Unicode characters into their character code. Except for the different key combination, the function works just like in Microsoft Word.

Also in the office program "Writer" from LibreOffice, we can use the keyboard shortcut ALT + X. The only difference compared to Microsoft Word and WordPad is that LibreOffice shows the hexadecimal code in a different form: While Microsoft Word and WordPad display the code for the euro sign (€) in the form "20AC", LibreOffice uses the form "U+20AC" instead. Unfortunately, OpenOffice currently does not offer such a function.

You can get a table of all character codes on the page unicode.org/charts of the Unicode consortium. In addition, there are character maps in the office programs that can of course also be used for entering Unicode characters, which can be accessed via the menu "Insert > Special Character".

BabelMap

Finally, we would like to introduce you to a useful program that can also assist with the input of Unicode characters. It's called BabelMap and you can download it on the page babelstone.co.uk for free.

BabelMap makes it easy to view and search Unicode characters comfortably via their number/code point or their name in order to use them in a different application via the clipboard. In contrast to some other character maps, BabelMap not only shows the characters from the Basic Multilingual Plane (BMP) but supports also all other Unicode Planes (BMP, SMP, SIP, TIP, SSP, SPUA-A, SPUA-B) and thus enables the user to browse through all existing Unicode characters. In addition, there are numerous functions for the search of characters and the analysis of fonts regarding their support for characters or character blocks that make the program a useful tool that goes far beyond the possibilities of conventional character maps. An explanation of all the features can be found on the website of BabelMap.

Presentability of Characters

Regardless of which of the methods presented you use for inserting Unicode characters, the result always depends on whether the character or sign in question can also be displayed with the font used. In particular, unusual characters can quickly lead to problems at this point.

The background of this problem is that to put it simply, every font is a collection of images that are stored in the respective font file. These pictures are called glyphs. Each of these glyphs is assigned a character, for example a glyph for the character "A", another glyph for the character "a", a glyph for the character "." and so forth. If we now write a character with a font, the operating system or the program used tries to get the glyph for the corresponding character from the font file in order to output it. So, it always depends on whether there is actually a glyph for the desired character in the relevant font.

If this is not the case, it may be that the program is trying to present the character with the help of a different font (which has a glyph for the corresponding character) or a replacement symbol is displayed instead of the character. This replacement symbol is also a glyph that is stored in a font file (the .notdef glyph). Typically, this is an empty box, a box with an X or a box with a question mark. Which font may be used as a replacement depends on the operating system and the software. Some programs only use certain fonts as fallback, others can search for a suitable glyph in all installed fonts.

Common font file formats such as TrueType (TTF) or OpenType (OTF) can contain a maximum of only 65,536 glyphs. In contrast, there are over 140,000 Unicode characters. This means that even if a font designer does the work to produce a font with the maximum possible number of glyphs, the font will still not even contain half of all possible Unicode characters. Most of the available fonts contain much less characters and support only a few writing systems, many fonts even contain only a selection of characters of a writing system. You can determine which glyphs for which characters a font contain, for example, via character maps.