Regular Expressions

Regular expressions are a powerful tool for searching text as well as for searching and replacing text. With the help of regular expressions and suitable text editors such as the TextConverter, the editing of texts can be made much easier. However, in order to be able to work with regular expressions, you need the knowledge of some basics that are summarized in this tutorial.

This article is divided into the following sections to which you can jump directly:

What is a regular expression?

The usage of regular expressions corresponds to a search which is more general than a normal search for a simple text. For example, if you are searching for "A" in a text, you can find all occurrences of "A". But what can we do if we want to find any of the uppercase characters? You can search for "A", "B", "C" and so on. Or you can use regular expressions. The short regular Expression [A-Z] is the same as searching for each of the single characters from A to Z. You can do that as precise as you want. You can search for an arbitrary date, an arbitrary e-mail address or whatever you want. How to do that, you can read in this summary.

You can use regular expressions for example in editors for texts or text files like the Text Converter, with which it is possible to delete or replace texts with regular expressions or to use the texts found by regular expressions to use it in another context or position, so that it is possible to change the format of a date or to rewrite each e-mail address to a link. A normal search had to know all possible e-mail addresses for that, what is impossible.

Basics

In this table you can find the most important conventions and characters as well as their meanings. The 15 characters [ ] ( ) { } | ? + - * ^ $ \ and . are meta characters and have a special meaning within regular expressions, which will be explained in the following sections. If you want to use one of these characters as this character in a regular expression, you can use a \ in front of the character to escape the character from its meaning as meta character. Regularly, all of the other characters can be used in a regular expression as such.

Regular Expression	Meaning and Example
a	The regular expression "a" also matches "a". As long as the character is no meta character with another meaning, it can be used in the regular expression directly. Example: abc defg abcdefgbafcgbde 0123456789
[abc]	The square brackets [ and ] can be used to define a character group. The example finds one of the characters a, b or c. Example: abc* defg abcdefgbafcgbde 0123456789*
[a-f]	The hyphen - can be used to define a range of characters. The example finds one of the characters a, b, c, d, e or f. Also in this example, the regular expression only matches one character. Example: abc defg abcdefgbafcgbde 0123456789
[0-9]	The hyphen can also be used to define a range of numbers. The example finds the numbers 0 to 9. Example: abc defg abcdefgbafcgbde 0123456789
[A-Z0-9abc]	Within a character group, ranges and single characters can be used. In the example, the group consists of the upper case letters A to Z, the digits 0 to 9 as well as the lower case letters a, b and c. Example: abc defg abcdefgbafcgbde 0123456789* -&*
[^a]	With the meta character ^ at the beginning of a character group, the character group is negated. That means, the example will match any character but not an a. Example: abc defg abcdefgbafcgbde 0123456789
^a	If the meta character ^ is not included into a character group, it stands for the beginning of a string or a line. The example would match all lines or strings beginning with a. Example: abc defg abcdefgbafcgbde 0123456789
a$	Like the meta character ^ stands for the beginning of a string or a line, the character $ stands for its end. The example would match all strings or lines ending with an a. Example: abc defg abcdefgbafcgbde 0123456789a
^abc$	Here the meta characters ^ and $ are used together. This example would match all strings or lines which are equal to "abc". Example 1: abc Example 2: abc abc
\b	Stands for the position at the begin or end of a word.
\B	Stands for a position which is not at the begin or end of a word.
\babc\b	Finds the single word "abc", but not the string "abc", when it is surrounded by other characters. Example: abc abcde abc deabcde deabc abc
\babc	Finds all words beginning with "abc". Example: abc abcde abc deabcde deabc abc
abc\b	Finds all words ending with "abc". Example: abc abcde abc deabcde deabc abc
\Babc\B	Finds all words containing "abc", but not beginning or ending with "abc". Example: abc abcde abc deabcde deabc abc
abc\B	Finds all words containing "abc", but not ending with "abc". Example: abc abcde abc deabcde deabc abc
ABC\|abc	The character \| stands for an alternative. This regex will find "ABC" and "abc". Example: abcde ABCDE fgabcde FGABCDE
[a-f]	If there is a \ in front of a meta character, the meaning of this meta character is escaped. In this case, the meta character does not represent a range, but is used as own character. So, the example matches the characters a, f and -. In other words, with the character \ it is possible to add meta characters to character groups. Example: abcdefg abcdefgh -
1\+1=2	This regular expression matches the string 1+1=2. Again, the meta character + is escaped with \. Example: 1+1=2* 1\+1=2*
[-af]	Whenever a meta character is within a character group at a position with no meaning, it is used as normal character. The example matches the characters -, a and f. Example: abcdefg abcdefgh -
[af-]	The same applies to a - at the end of a group. Example: abcdefg abcdefgh -
ab[cd]	In this example, one character from a character group is combined with "ab". So, the example matches the strings "abc" and "abd". Example: abcdef abdef cdabcd cdabdc
.	The dot stands for an arbitray character. Every character will be found with this regular expression. It depends on the modifier, whether line breaks are included. Example: abc defg abcdefgbafcgbde 0123456789 -&
.ab	Matches any string with three characters from which the last two characters are a and b, for example aab, eab, %ab, :ab and so on. Example: abc dfgabc &ab fgxab
[^a]ab	Matches the same strings as the regular expression ".ab" except the string "aab". Example: aab cab dd dab fg &ab

If you want to test a regular expression introduced on this page, you can use the software Text Converter for this. Just open an arbitrary text file in this tool and click on Search and Replace in the most right column. Here you can activate regular expressions in the search box. After that you can type in a regular expression to test how it works.

Repetitions

To describe situations in which there are repetitions of characters or whole character classes, you can use some of the following meta characters.

Regular Expression	Meaning and Example
ab{2}	The preceding element has to appear exactly for two times. This example would only match abb. The a is not added with a bracket to a class together with the b, because of that the repeating expression has only affect to the b and not to the a. Example: a ab abb* babb abbb abbbb babbbbbbbbbbd*
ab{2,3}	The preceding element has to appear at least two times and not more than three times. This regular expression would match abb, abbb, but not ab or abbbb. Example: a ab abb* babb abbb abbbb babbbbbbbbbbd*
ab{2,}	The preceding element has to appear at least two times. The example would match abb, abbb, abbbb and so on. Example: a ab abb* babb abbb abbbb babbbbbbbbbbd*
ab{,3}	The preceding element has to appear not more than three times. The example would match abbb and abbbb but not abbbbb. Example: a ab abb* babb abbb abbbb babbbbbbbbbbd*
ab?	The question mark indicates that the preceding element is optional. That means, the preceding element can appear, but it must not appear. The example would match a and ab. The question mark is the same as {0,1}. Example: a ab abb babb abbb abbbb babbbbbbbbbbd
ab+	The plus indicates that the preceding element has to appear at least one times, but also more than one time. The example would match ab, abb, abbb and so on but not a. The plus is the same as the experssion {1,}. Example: a ab abb* babb abbb abbbb babbbbbbbbbbd*
ab*	The character * indicates that the preceding element has to appear zero or more times. It is the same as the expression {0,}. Example: a ab abb* babb abbb abbbb babbbbbbbbbbd*
[ab]+	This example finds strings as a, b, ab, ba, abb, ababa and so on. It does not mean, that exactly the same preceding character has to be repeated. It means that characters from the group has to be repeated. If you would like to find repetitions of the same character, you have to use backreferences. The regular expression would be ([ab])\1+ and will be explained below. Example: a ab abb babb abbb abbbb babbbbbbbbbbd
a[bc]+d	This expression matches strings like abd, acd, abcd, acbd, abccbd, acbcbbcd and so on. Example: abcd aad acbd fg abbccbbcd fg abd fg acd
[0-9]{2,3}	Also in this example there is no need to have the same numbers repeatet. The expression matches all numbers with two or three digits, hence the numbers from 10 to 999. Strings like 1,2 will not be found by this expression. Example: 1,4 10 89* ab3a ab42a 234*

Character Classes

Behind creating own groups of characters with square brackets, there are also some predefined character classes. With this classes, regular expressions become shorter and clearer.

Regular Expression	Meaning and Example
\d	This expression (Digit) stands for a digit, it is the same as [0-9]. Example: abc defg abcdefgbafcgbde 0123456789 -&
\D	This expression stands for any character that is not a digit. It is the same as [^0-9] or [^\d]. Example: abc defg abcdefgbafcgbde* 0123456789 -&*
\w	This expression (Word) stands for a letter, a digit or a underscore. It is the same as [A-Za-z0-9_]. Example: abc defg abcdefgbafcgbde 0123456789* -&*
\W	This expression stands for any character which is no digit, letter or underscore. It is the same as [^\w] or [^A-Za-z0-9_]. Example: abc defg abcdefgbafcgbde 0123456789 -&
\s	This expression stands for whitespace (Space), for example line breaks, tabs, spaces and so on.
\S	This expression stands for any character which is not a whitespace character, it is the same as [^s].

Grouping and Backreferences

With round brackets, you can group some characters, for example to apply an operator on the whole group. Furthermore, with round brackets you can create back references. That means, the characters found in this brackets will be stored, so that you can re-use them in the same regex or even in the replace regex, when searching and replacing with regular expressions. The examples show some of the possibilities, which you should test in a program like the Text Converter to get a feeling for this expressions.

Regular Expression	Meaning and Example
(ab)+	The whole group "ab" is repeated, one ore more repetitions of "ab" will be found. Example: abcde ababcde ababababa
ab(cd\|ef\|gh)i	Within the brackets, there are some alternatives. This example will match the strings "abcdi", "abefi" and "abghi" but no other strings. Example: abcdifg* abcdefghi abghi*
([ab])\1+	Here the backreference \1 is used. Each bracket creates such a reference, the 1 corresponds to the first bracket in the expression. The expression means, that the letter found in the group [ab] has to be repeated one ore more times after the group. Hence, this expression matches "aa", "bb", "aaa", "bbb" and so on. Example: aaaacd efbbbbbbbbghab
([ab])x\1x\1	The reference can also be used more than one time. This expression matches "axaxa" and "bxbxb". Example: axaxa* axax bxbxb axbxa*
([ab])x(c)x\1x\2	In this expression, two references \1 and \2 are used, which correspond to the first and second bracket. The strings "axcxaxc" and "bxcxbxc" will match this expression. Example: axcxaxc* axax bxcxbxc axbxa*
([ab])x(c)x\2x\2	It is not necessary to use each of the references resulting from brackets in the expression. Here only the second reference is used. Example: axcxcxc* axax bxcxcxc axbxa*
(\d+\.)(\d+\.)	References can not only be used within a single regular expression. In the Text Converter, you can search for a string with a regex and replace this string by using references like $1, $2 and so on. If you type the example in the search field and you replace this by $2$1, the found date will be turned around. Please note, that you have to activate regular expressions for the search and replace boxes under the boxes. Example: "11.04." will replaced by "04.11."
\ba\b\s\b([aoeiu][a-z]+)\b	With this regular expression you can find all single words "a" followed by another word beginning with a, o, e, i or u. In English, it is not allowed to write an "a" in front of a word beginning with a vowel. You can use the regular expression "an $1" in the Text Converter to correct this error. Example: "a idea" will be replaced by "an idea"

Modifiers

The behavior of regular expressions can be changed with so called modifiers. If you want to change this modifiers in the Text Converter in general, you can go to the menu "Settings > Settings regarding regular Expressions (RegEx)", where all of the modifiers can be changed. But it is also possible to change modifiers within regular expressions or to apply modifiers only on a part of the regex. How that works, you learn in the second part of this section. The following modifiers can be adjusted in the Text Converter:

Modifier i (Case Insensitive): If this modifier is active, it will be searched independent from upper and lower case characters. That means, the regex [a-z] matches either only lower case letters (only a to z - modifier i is not active) or both, lower and upper case letters (a to z as well as A to Z - modifier i is active).
Modifier m (Multi Line): If this modifier is active, the whole file will be treated as multiple lines. That means ^ and $ match the beginning and the end of the whole file. If the modifier is not active, ^ and $ will match the beginning and the end of a line.
Modifier s (Single Line): Treat a string as a single line. If this modifier is active, the dot . matches all characters including spaces. If this modifier is not active, that will not be the case.
Modifier g (Greedy Mode): This modifier changes all of the following operators like + and *. If this modifier is active, all operators will behave normal. If this modifier is not active, the regular expressions will be applied non greedy. That means the + works as +?, the * works as *? and so on.
Modifier x (Extended Syntax): If this modifier is active, you can use whitespace (for example spaces) in your regular expressions and add comments (after a # in a line all other characters will not be used for the regular expression). With this, the regular expression will be more readable, but you have to escape all spaces with \ whenever it is not used in a character group.

If a modifier should only be used for one regular expression or even only for a part of a regex, you can use the following methods to change the modifiers. The modifiers mentioned above are named by their letters, that means the letters i, m, s, g or x have to be used.

Regular Expression	Meaning and Example
(?i)[abc]	In this example you can see how you can activate a modifier. In this example the modifier i for case insensitivity is activated. Example: abcdef ABCDEF
(?i)[a](?-i)[cd]	By using (?-i) a modifier is deactivated. In the example, first the modifier i is activated, the letters a and A will be found. After that the modifier i is deactivated, c and d have to be lower case to match this expression. Example: ac Ac AC ad Ad AD
((?i)[a])[cd]	With brackets you can reach the same results. Of course, i has to be deactivated generally in this example. Example: ac Ac AC ad Ad AD
(?ig-msx)[abc]	If you want to change more than one modifier at the same time, you can also do that within one expression. Other possibilities are (?ims) to activate some modifiers or (?-ims) to deactivate a number of modifiers.

Unicode

Often, there is the question whether and how you can use Unicode characters within regular expressions, for example Chinese characters or letters from the Cyrillic or Greek alphabet. Originally, regular expressions were only used for ANSI characters and lots of programs utilizing regular expressions are still only supporting the range of ANSI characters. However, this does not apply to the Text Converter. In this software you can use arbitrary Unicode characters in the same way, you are using ANSI characters. The following examples show how this works and how you can use Unicode characters.

Regular Expression	Meaning and Example
[Д-И]	In the Text Converter you can use Unicode characters in the same way you are using ANSI and ASCII characters. The example uses the range Д to И from the Cyrillic alphabet in a character group. Example: АБВГДЕЖЗИКЛ
∞	Arbitrary special characters you can use like this example. Here is the character for infinity. Example: ∞
\x{221E}	Alternatively, you can also use the Unicode HEX code for a character. This code is 221E for the infinity symbol and it is used like the regular expression in the example. Example: ∞
\x41	Also in the ASCII range, you can use the HEX code instead of the character. This makes sense especially when noting tabs or other characters that can not be written directly. In the example the hexadecimal code for A (code 41) is shown. A table with all of the HEX codes, you can get in this ASCII table. Example: ABC ABCABC
[\x{0001}-\x{221E}]	Characters defined with the HEX code in this way can be used as any other character in the syntax of the regular expressions. In the example a group of characters is defined, which range includes all characters up to the code of the infinity symbol. This includes Latin, Greek and Cyrillic but not Japanese characters. Example: ABCGHJΔΨΩБВГДЕЖカﾓヤモ

Examples

With the knowledge written on this page, you can write arbitrary regular expressions on your own, by combining the rules in your own way. As an example, the following regular expression will be analyzed. This regular expression can be used to find an arbitrary E-Mail address from any text:

\b[a-zA-Z0-9._+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b

As you can see, the regular expression is rounded by \b. That means, the E-Mail address should be a single word and not rounded by other characters in the text. The structure of an E-Mail address is name@domain.ending. As you can see in the regular expression, there is a character group before the @ in which the characters should be repeated at least one time (+). This is the name part of the E-Mail address. After the @, there are two other character groups. One for the domain and one for the ending. These groups are divided by an dot. Because a dot is a special character within regular expressions, it has to be escaped by using \. The character group of the domain can consists of an arbitrary number of characters, but at least one character (+) and the character group of the ending has to consist of at least 2 and at most 4 characters. This is indicated by {2,4} behind the group.

With the regular expression in the example, you can find e-mails in texts. But how can you work with this regular expression? For example, in a program like the Text Converter, you can use the expression to search and replace texts. Simply go to the action "Replace Text" and activate regular expressions under the box for the search or replace term. If you use our example, you can enter a text which will replace the e-mail address. For example, you can use the following regular expressions:

Search Term:	(\b[a-zA-Z0-9._+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b)
Replace with:	<a href="mailto:$1">$1</a>

With this, the found e-mail address will be rewritten as link. As you can see, the expression in the search box must be enclosed with brackets. Only if there are brackets around the expression, you can re-use the string in the replace box. The brackets can enclose arbitrary parts of the search term. This makes several other things possible. For example, we can try the following combination:

Search Term:	.(\b[a-zA-Z0-9._+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b).
Replace with:	$1

With this, you are searching for an e-mail address with arbitrary characters (.*) around it. So, the whole text including the address will be found. But it will only replaced with the string found by the term in brackets, that is the e-mail. So, you can extract an e-mail address out of a text with this regular expression.

If you are using more than one bracket in your search term, you can use them with $1, $2, $3 and so on. Here is an example:

Search Term:	(.)search(.)
Replace with:	$1replace$2

In this example, we are searching for the text in front and behind the string "search". All of this will be replaced with the text in front of and behind "search" and the word "replace" will be written between these texts. This regular expression does something, that can be carried out much more easier. The expression only replaces the word "search" with "replace". But if you modify the expression a little bit, you can search for different writings of a word or you can create much more complex search terms.

Regular Expression Service

Do you need help creating a regular expression? We are here to help you. Just write us via our contact form. The price of our service depends on the complexity of your project. Just give us a brief description of what you are planning to do. Of course, your request will be non-binding.

Software for Regular Expressions

If you would like to use the regular expression introduced on this page to work on text files, you can use the software Text Converter for this task. The Text Converter makes it possible to search in text files according to regular expressions, you can replace regular expressions with other texts or other expressions, you can delete parts of the text with the help of regular expressions or you can split files at the position of a regular expression to save the files as single files. Of course, with this program it is also possible to use matched parts in another context or order (back references).

Another program is the Easy MP3 Player, with which it is possible to search your music collection with regular expressions, so that you can transform very specified searches.

Here is a list of programs in which you can use regular expressions:

Text Converter (Searching, replacing and deleting text in an entire text, in lines, in CSV cells and in XML files with the help of regular expressions, back references and file splitting on regular expressions)
Easy MP3 Player (Search your music collection with regular expressions)
Clipboard Saver (Search and replace clipboard contents with regular expressions)
File Renamer (find and replace as well as delete text in file names with regular expressions)
Word Creator (counting characters of a text that correspond to a regular expression)

Important Note

This text about Regular Expression was written by Stefan Trost and it is not allowed to use this text (even in parts) in another context without a permission of Stefan Trost.