How to change the Number Format of CSV, XML or other Data
Please note
The functionality described on this page will only be available with the next TextConverter release. If you would like to try it out before, please feel free to contact us.
In this tutorial we would like to take a look at how we can convert any original numbers in CSV and XML files or text into another numerical format. To do this, we use the number format functions that you can find in the actions on the right side of the main window of the TextConverter under CSV > Number Format as well as XML > Number Format. In the last section, we will also show you how you can use number formatting outside of CSV or XML structures within any other type of text or text file.
The tutorial is divided into the following thirteen sections:
- Basic Functionality
- Integer Numbers
- Decimal Numbers
- Leading Zeros
- Thousands Separators
- Other Number Groupings
- Currencies, Weights, Lengths, Percent Signs and other additional Text
- Automatic Detection or custom Separators
- Application in Action Lists
- Application in References
- Format individual Characters, Words or Lines
- Format all Numbers in a Text
- Format only specific Numbers in a Text
Basic Functionality
You can specify your desired format using a so-called format string. This format string is a character chain that corresponds schematically and structurally to your desired target formatting of the number. We will look at the various options and variations of this format string as well as at numerous examples for different conversions in the next sections.
Additionally, it is possible to specify information about the original number formatting regarding the decimal and thousands separator used in the original format. However, this is not absolutely necessary, as in most cases, the automatic detection will already lead to the desired result. Nevertheless, there are ambiguous cases whose resolution requires a specification. We will discuss these in the section about "Automatic Detection or custom Separators" following the format string explanations.
Beyond this format-specific details, you also have to specify in which areas or parts of your text or text files the formatting should be applied. When formatting CSV data, you can specify in which rows and columns the numbers should be formatted. If you want to modify XML files, you can use the typical CSS selectors to limit the conversion to specific inner texts or attributes within the XML structure. And if you want to format numbers in other text-based data structures or within running text, please use the capabilities of the references explained in the last section of this tutorial.
Integer Numbers
A simple integer without decimal places, leading zeros or thousands separators can be created using the simple format string "0". Regardless of the original number, using "0" removes all possible additions to the original number and reduces the number to an integer.
| Format | 1 | 12,34 | 1 234 | 100 % | $4.90 |
| 0 | 1 | 12 | 1234 | 100 | 5 |
That means, numbers such as "1", "100" and "1000" remain as "1", "100" and "1000" because these numbers are already integers, while decimal numbers are rounded and thousands separators are removed. For example, "1.23" becomes "1", "1.9" becomes "2" after rounding to an integer number and the number "1,000,000" with a French thousand separator becomes "1000000". Also currency information, percent signs or other additions like length units, weight information or other text are also removed with the format string "0". For example, "10 GB" becomes "10", "1 Euro" becomes "1" or "USD 1,000.00" becomes "1000".
Decimal Numbers
Decimal numbers can be generated using two extensions to the basic format string "0", which we learned about in the last section. First, you can append any character to the end of "0" as a decimal separator. Second, you can control the number of decimal places displayed via the number of zeros appended after the decimal separator.
For example, with the format string "0.0" you create a decimal number with a single decimal place and a period as decimal separator. With "0,00" you use a comma as decimal separator and output the numbers with two decimal places. If you separate the fields of your CSV data with a comma, a period format is recommended. If you use a semicolon or tabs (TSV), both formats can be used without having to enclose the fields in quotation marks to clearly distinguish field separators from the numbers.
In addition to periods and commas, you can also use any other characters you want as decimal separators. For example, the maritime underscore or the non-breaking space to avoid splitting numbers at the end of lines, as well as any other Unicode characters of your choice. Even separators consisting of multiple identical or different characters are possible. For example, the format string "0//00" converts numbers like "1", "1,23" or "10.5" to "1//00", "1//23" and "10//50".
Once again, also during the decimal number conversion, the thousands separator as well as all additional characters are removed from the original number and the input number is rounded if it contains more decimal places than the required number format pretends. The following table shows some examples of the result of different original numbers (in the columns) for different format strings (in the rows).
| Format | 1 | 1,2 | 1.234 | 0,1 | 0,5 |
| 0 | 1 | 1 | 1 | 0 | 1 |
| 0,0 | 1,0 | 1,2 | 1,2 | 0,1 | 0,5 |
| 0.0 | 1.0 | 1.2 | 1.2 | 0.1 | 0.5 |
| 0,00 | 1,00 | 1,20 | 1,23 | 0,10 | 0,50 |
| 0’000 | 1’000 | 1’200 | 1’234 | 0’100 | 0’500 |
| 0~/00 | 1~/00 | 1~/20 | 1~/23 | 0~/10 | 0~/50 |
| .0 | 1.0 | 1.2 | 1.2 | .1 | .5 |
| ,00 | 1,00 | 1,20 | 1,23 | ,10 | ,50 |
A special case is the definition of a format string without any leading zeros. For example, ".0" or ",0". This allows the output of a number in a kind of decimal notation mainly used in English-speaking countries without an integer part, provided the integer part is zero and thus carries no additional information. Using ".0" would convert the number "0.5" to ".5" while numbers greater than or equal to 1 such as "1.5" would retain their integer part. Naturally, numbers formatted in this way are also recognized as such in the input text of your CSV or XML data and are, if necessary, further processed by the function.
Leading Zeros
With writing multiple zeros without a decimal separator in a row or with appending zeros to the left of your chosen decimal separator you can create leading zeros. Again, you control the number of leading zeros - just like the decimal places - via the number of consecutive zeros.
Using the format string "000" you can, for example, bring your numbers to a minimum length of 3 characters with leading zeros, while longer numbers are of course not shortened but are retained. "1" would become "001", "10" would become "010", "100" would remain "100" and "1000" would also remain "1000" (as well as any other number with more digits), since the latter numbers each exceed the length of three characters, which makes the leading zeros superfluous in this case.
| Format | 1 | 10 | 100 | 1000 | 12,34 |
| 0 | 1 | 10 | 100 | 1000 | 12 |
| 00 | 01 | 10 | 100 | 1000 | 12 |
| 000 | 001 | 010 | 100 | 1000 | 012 |
| 0000 | 0001 | 0010 | 0100 | 1000 | 0012 |
| 00000 | 00001 | 00010 | 00100 | 01000 | 00012 |
| 00.00 | 01.00 | 10.00 | 100.00 | 1000.00 | 12.34 |
| 000.0 | 001.0 | 010.0 | 100.0 | 1000.0 | 012.3 |
| 000,00 | 001,00 | 010,00 | 100,00 | 1000,00 | 012,34 |
Also the combination of a fixed number of decimal places and leading zeros is possible. As an example, the format string "0000.00" combines four leading zeros with two decimal places. The original number "1" would be converted to "0001.00", "12,123" to "0012.12" and "10 000" to "10000.00" with this format string.
Thousands Separators
It is also possible to define thousands separators via the format string. When doing so, a distinction must be made whether the thousands separators should be used with or without leading zeros. Accordingly, the thousands separators must be defined either within the number of leading zeros or using the auxiliary character #, which indicates optionality for the case that the leading zeros should not extend as far as the first occurrence of the thousands separator.
An example of defining a thousands separator is the format string "#,###", which converts the numbers "1", "10", "100", "1000", "10000", "100000" and "1000000" to "1", "10", "100", "1,000", "10,000", "100,000" and "1,000,000". The character # here means that the places marked with it within the number should only be written out if the number in question naturally reaches the length of at least one thousands separator's digit. Thousands digits beyond the first thousands digit, such as those of one million ("1,000,000") or one billion ("1,000,000,000") do not need to be explicitly specified in the format string and are calculated automatically. It is therefore sufficient to specify only the first thousands separator in the format string (even if you expect larger numbers).
| Format | 1 | 100 | 1000 | 1000000 | 1234,56 |
| #.### | 1 | 100 | 1.000 | 1.000.000 | 1.235 |
| #,### | 1 | 100 | 1,000 | 1,000,000 | 1,235 |
| # ### | 1 | 100 | 1 000 | 1 000 000 | 1 235 |
| #'### | 1 | 100 | 1'000 | 1'000'000 | 1'235 |
However, if you want to combine leading zeros and thousands separators, you can simply use zeros instead of the # symbol to represent the desired positions. For example, the numbers "1", "10" and "1000" formatted with the format string "0 000." will result in the output of "0 001", "0 010" and "1 000".
| Format | 1 | 100 | 1000 | 1000000 | 1234,56 |
| 0 000 | 1 000 | 100 000 | 1000 000 | 1000000 000 | 1234 560 |
| 0 000. | 0 001 | 0 100 | 1 000 | 1 000 000 | 1 235 |
| 0 000, | 0 001 | 0 100 | 1 000 | 1 000 000 | 1 235 |
| 0,000. | 0,001 | 0,100 | 1,000 | 1,000,000 | 1,235 |
| 0.000, | 0.001 | 0.100 | 1.000 | 1.000.000 | 1.235 |
| #'#00 | 01 | 100 | 1'000 | 1'000'000 | 1'235 |
| #.000, | 001 | 100 | 1.000 | 1.000.000 | 1.235 |
It's important here that we don't just write "0 000" but rather append a character at the end as a "virtual" decimal separator, even if we don't want any decimal places. It doesn't matter whether we use "0 000." or "0 000," or any other character here, since this character won't be written out later in the output anyway. If we omitted this virtual decimal separator, it would be unclear whether a format string like "0.000" is a decimal number with three decimal places or an integer with a thousands separator. This difference is illustrated in the first three examples in the table above this section. However, if at least one # character appears before our thousands separator, we don't need a virtual decimal separator, since in this case there is no ambiguity. A format string like "#.#00" despite defining two leading zeros does not require any trailing character, whereas a format string like "# 000." does.
| Format | 1 | 100 | 1000 | 1000000 | 1234,56 |
| #,###.0 | 1.0 | 100.0 | 1,000.0 | 1,000,000.0 | 1,234.6 |
| #.###,00 | 1,00 | 100,00 | 1.000,00 | 1.000.000,00 | 1.234,56 |
| 0 000,0 | 0 001,0 | 0 100,0 | 1 000,0 | 1 000 000,0 | 1 234,6 |
| 0,000.000 | 0,001.000 | 0,100.000 | 1,000.000 | 1,000,000.000 | 1,234.560 |
| 000 000,000 | 000 001,000 | 000 100,000 | 001 000,000 | 1 000 000,000 | 001 234,560 |
| #’###•000 | 1•000 | 100•000 | 1’000•000 | 1’000’000•000 | 1’234•560 |
| #--###-:0 | 1-:0 | 100-:0 | 1--000-:0 | 1--000--000-:0 | 1--234-:6 |
The thousands separators can also be combined with any number of decimal places. For example, the format string "#,###.00" formats numbers with a comma as thousands separator and a period as decimal separator ("1000" would thus become "1,000.00"), while "# ###,0" converts numbers with a comma as decimal separator and a space as thousands separator as well as a single decimal place ("1000" would become "1 000,0" with this format string).
Furthermore - just like it applies to the decimal separator - we can use not only single characters but also multiple characters (including Unicode characters) as thousands separators. Thus, in the Text Converter, we are not limited to the usual thousands separators such as period, comma, hyphen and apostrophe. Examples of these can be found in the last two rows of the last table.
Other Number Groupings
In addition to the classic thousands separations that we learned about in the last section and with which numbers are typically divided into groups of three digits, the TextConverter also supports creating digit groupings of any other size.
To do this, we simply need to adjust the position of our "thousands separator" in our format string to our desired group size. So, while we produce groups of three digits with "#.###" (the number "123456" becomes "123.456"), we can create groups of two digits with "#.##" ("12.34.56"), with "#.####" groups of four ("12.3456") and with "#.#" groups of one digit per part ("1.2.3.4.5.6").
| Format | 1 | 100 | 1000 | 1000000 | 1234,56 |
| # # | 1 | 1 0 0 | 1 0 0 0 | 1 0 0 0 0 0 0 | 1 2 3 5 |
| #.## | 1 | 1.00 | 10.00 | 1.00.00.00 | 12.35 |
| #'#### | 1 | 100 | 1000 | 100 0000 | 1235 |
| #_##### | 1 | 100 | 1000 | 10_00000 | 1235 |
| #.#:0 | 1:0 | 1.0.0:0 | 1.0.0.0:0 | 1.0.0.0.0.0.0:0 | 1.2.3.4:6 |
| 0 0 0. | 0 0 1 | 1 0 0 | 1 0 0 0 | 1 0 0 0 0 0 0 | 1 2 3 5 |
| 00 00. | 00 01 | 01 00 | 10 00 | 1 00 00 00 | 12 35 |
| 00.00.00. | 00.00.01 | 00.01.00 | 00.10.00 | 1.00.00.00 | 00.12.35 |
| 000-000. | 000-001 | 000-100 | 001-000 | 1-000-000 | 001-235 |
| 0•0•0•0’0 | 0•0•0•1’0 | 0•1•0•0’0 | 1•0•0•0’0 | 1•0•0•0•0•0•0’0 | 1•2•3•4’6 |
Of course, we can work as usual with both a decimal part (for example "#.#,0") as well as with leading zeros (for example "00 00 00.") again. As we already discussed in the last section, the virtual decimal separator at the end of the format string must not be forgotten, otherwise it would not be possible to distinguish between decimal and thousand separators. Examples of this can be found in the lower part of the table.
Currencies, Weights, Lengths, Percent Signs and other additional Text
So far, we've only looked at pure numbers. In addition to the characters "0" through "9" only the typical characters used as decimal or thousands separators in the original text or the format string have played a role in the last few sections. However, numbers often appear with other meaningful characters, such as currency, weight, or length specifications, percentage signs, or other additional letters, words, or characters, the processing of which we will examine below. Such characters can be processed by the Text Converter both in the source text as well as in the format string.
When interpreting numbers from the source text, additional text is simply ignored if the number appears continuously in the text. As an example, if we use the format string "0", the TextConverter extracts the numbers "1", "3", "7" and "100" from source texts such as "1.10 Euro", "3 kg", "7%" and "$100.00". The remaining characters besides the pure number are ignored in that example, just like also the decimal places (we come later in this section to the question what you can do if that text should not be ignored).
Conversely, the format string "0,00 €" formats the pure numbers "10", "1.5", and "1,234" as "10,00 €", "1,50 €" and "1,23 €". The numbers are formatted exactly as described in the previous sections and then the prefixed and/or appended additional text from the format string is added.
| Format | 1 | 10 | 100 | 1000 | 1,23 |
| 0 kg | 1 kg | 10 kg | 100 kg | 1000 kg | 1 kg |
| 0,00 EUR | 1,00 EUR | 10,00 EUR | 100,00 EUR | 1000,00 EUR | 1,23 EUR |
| $0.00 | $1.00 | $10.00 | $100.00 | $1000.00 | $1.23 |
| 000.00% | 001.00% | 010.00% | 100.00% | 1000.00% | 001.23% |
| # ### m | 1 m | 10 m | 100 m | 1 000 m | 1 m |
| x=.0 | x=1.0 | x=10.0 | x=100.0 | x=1000.0 | x=1.2 |
This works not only when additional characters appear on only one of the two sides, but also when they affect both the input text as well as the format string. An example is the format string "USD 0.00" which outputs the source texts "10 $" and "10,00 USD" as well as also "from 10 Dollar" or "$10" all as "USD 10.00".
However, if no unambiguous number can be extracted from the source text, the original text simply remains as it is. For example, no matter which format string you apply, the string "A1B2C" remains as such, because this text contains the two competing numbers "1" and "2" and it's unclear which of these two numbers should be processed (unless you combine those two numbers with specifying "B" as decimal separator, which would then result in the number 1.2).
CSV cells, XML attributes as well as the inner text of XML data also remain unchanged, provided the respective text doesn't contain any numbers. Empty cells and elements as well as any plain text like "abc" remain just as they are. So, you don't have to worry about the headings of your CSV columns consisting only of letters if you want to set an entire column to a specific number format, for example.
So, this method is particularly suitable for editing texts in which the source numbers are as isolated as possible. This requirement is met in most CSV and XML structures, since typically only a single number occurs per CSV cell or XML attribute. The situation is different, of course, if, for example, continuous text is to be edited, which on the one hand may contain several different numbers and on the other hand, the other text alongside the numbers should be preserved. This can also be a requirement in the case of changing the inner text of XML data. If you want to format such numbers with retaining other characters or additional text passages, instead of the functions "CSV > Number Format" and "XML > Number Format", you should use the reference function, which can be used in all text fields of the Text Converter. An explanation and some application examples of this function can be found in the last section of this tutorial about applying the number format in references.
Automatic Detection or custom Separators
All previous examples work without any further configuration using the TextConverter's auto-detection feature. Numbers like "1", "1.5" or "1,000,000" can be interpreted unambiguously and therefore require no further explanation.
But what happens if your source data contains numbers such as "1.234" or "1,234"? These numbers could both be interpreted either as a decimal number or as the integer number "1234" written with a thousands separator - depending on the decimal separator used. The situation is equally unclear if unusual decimal or thousands separators are used, which could, for example, also be part of continuous text, because without additional information, the Text Converter automatically interprets only the period and the comma as decimal separators and considers only the French space, the Swiss apostrophe, as well as the English comma and the period commonly used in Germany as possible thousands separators.
However, if other characters are to be used in the function of one of the two separators, they must be specified explicitly. For this purpose, the fields "Original Decimal Separator" and "Original Thousands Separator" are available in all number formatting functions of the TextConverter. By default, both of these fields are set to "Auto Detect". You can enter here any characters manually or you can select them from the quick selection list of common separators. Selecting "System" uses the separators provided by the operating system or manually defined in the operating system of the computer on which the TextConverter is currently running. On English systems, these are usually the period as the decimal separator and the comma as the thousands separator, while, for example, on German localizations, this assignment is usually preconfigured the other way around.
Also if your numbers, as in the initial example, can only be clearly interpreted by specifying the separators to be used, or if you want to exclude the formatting of certain number format types, you should take advantage of this separator definition option. For example, if you explicitly specify the period as the thousands separator, number sequences like "123 456 789" will not be processed in a way that they are converted as a contiguous number using the French's space as thousands separator, but instead remain unchanged.
| Format | DecimalSeparator | ThousandsSeparator | 1.234 | 1,234 | 1 |
| 0 | Auto | Auto | 1 | 1 | 1 |
| 0 | . | , | 1 | 1234 | 1 |
| 0 | , | . | 1234 | 1 | 1 |
| 0.0 | Auto | Auto | 1.2 | 1.2 | 1.0 |
| 0.0 | . | , | 1.2 | 1234.0 | 1.0 |
| 0.0 | , | . | 1234.0 | 1.2 | 1.0 |
| 0.0 | • | : | 1.234 | 1,234 | 1.0 |
Without explicitly specifying the separators, both the numbers "1,234" and "1.234" are interpreted as decimal numbers, as this interpretation is more likely and occurs more frequently in practice than considering the period and comma as thousands separators. However, if two identical separators of the above-mentioned types occur in numbers that separate groups of three numbers, the initial situation is clear, and the separators are always interpreted as thousands separators during auto-detection, as the decimal separator may only occur once and, unlike the thousands separator, may not occur multiple times within the same number. However, if multiple separators occur at intervals unusual for groups of thousands, such as in IP addresses like "192.168.0.1", the original text remains unchanged.
If you explicitly define the same character both as a decimal separator and as a thousands separator, the original number will also remain as such in any case, as this specification is not unequivocal. "Auto" and "System", on the other hand, can be used together without any problem, as the TextConverter automatically takes care of the correct assignment of the ultimately used character when using those options.
By the way, you don't have to limit your custom decimal and thousand separators to single characters from the ASCII range; you can also use any Unicode character or even separators consisting of multiple characters. This can be useful, for example, in cases where parts of a number are spread across multiple places in a text and need to be combined using the number format function.
Application in Action Lists
In addition to the configuration of the number format settings using the input fields mentioned above via the graphical user interface of the TextConverter, the number formatting can also be performed using action lists. For this purpose, the functions FORMAT_CELLS, FORMAT_INNERTEXTS as well as FORMAT_ATTRIBUTEVALUES are available and can be used as follows:
FORMAT_CELLS ALL 0
FORMAT_CELLS 1:ALL 0.00 ,
With the example shown in the first line, we convert all fields of a CSV table to integer numbers without decimal places, while the second line outputs the first column of the table in the format 0.00 as a decimal number with two decimal places, assuming the comma as the original decimal separator.
Behind the function name "FORMAT_CELLS", we first specify the cells, columns and rows to be formated in exactly the same way as with all other actions known from CSV action lists (here, "ALL" for all fields respectively "1:ALL" [columns:rows] for all rows in the first column; a general introduction can be found in the linked document). This is followed by the format string and, optionally, the original decimal separator and, if applicable, the original thousands separator. If we omit this part, the automatic detection of both separators is implicitly assumed. Of course, you can also create this command directly guided via the action list editor, so you don't have to memorize the syntax.
Similarly, also the two functions for the XML data work:
FORMAT_INNERTEXTS tagname "0,00 EUR"
FORMAT_ATTRIBUTEVALUES attributname:tagname 0,000 ,.
Behind the respective keyword of the function "FORMAT_INNERTEXTS" (for formatting the inner texts of XML data) respectively "FORMAT_ATTRIBUTEVALUES" (for formatting attribute values within tags of XML data), you first specify the selector in the same way as with the other XML actions (here, all elements with the tag name "tagname" in the first example respectively all attributes with the name "attributename" within tags with the name "tagname" in the second example; a general introduction can be found in the linked document). This is followed by the format string again, as well as the optional specification of the original decimal and thousand separators.
The example from the first line therefore formats all inner text within the tag name "tagname" as "0,00 EUR" (we enclose this format string in quotes because it contains a space that would otherwise be used as a separator between parameters), while the second line formats all attribute values of attributes with the name "attributname" within tags with the name "tagname" in the format "0,000" (with "attributname:ALL" we could convert all attributes with the name "attributname" within all tags, regardless of their names; with "ALL:ALL" all attributes, regardless of tag and attribute name, and so on). While in the first line we again force the automatic detection of the original numbers by not specifying any decimal or thousands separators, in the second example we use ",." to indicate that the original numbers use a comma as the decimal separator and a period as the thousand separator.
Of course, you can also conveniently enter the XML actions using the action list editor, which automatically creates the syntactically correct action list commands for you.
Multi-Character Decimal and Thousand Separators in Action Lists
In our previous examples, our original separators each consist only of one single character. This allows these characters to be defined unambiguously. If only one character is specified, such as "," or ".", this is the original decimal separator, and automatic detection is used for the thousands separator:
<DecimalSeparator>
And if two characters are specified, such as ",.", ".," or ". ", the first of the two characters is assumed to be the original decimal separator while the second character is assumed to be the original thousands separator:
<DecimalSeparator><ThousandsSeparator>
But how can we define separators that consist of multiple characters? After all, a definition such as "abcde" would no longer clearly indicate where one separator ends and the other begins.
So we need a separator for the separators, which you can specify using the following syntax for multi-character original decimal and/or thousands separators:
<Separator><DecimalSeparator><Separator><ThousandsSeparator>
Hence we first define a separator, which will then be used as the separator between the decimal separator and the thousands separator. This separator can only consist of one character and should, of course, not appear in either the decimal separator or the thousands separator.
This specification also allows us to use function words such as "auto" or "system" in our actions in addition to any "real" separator consisting of multiple characters:
FORMAT_CELLS 2:1-3 "$ 0.00" ;auto;'
With this action, we format the first three rows ("1-3") of the second column (2:) of our CSV data in the format "$ 0.00". We want to use the apostrophe as the thousands separator, while we want the original decimal separator to be automatically detected. To be able to write both "auto" and "." consecutively in one parameter (actions have a maximum of two parameters), we first define the separator ";" as an arbitrary character that does not appear in any of the following separator definitions. We then use this character to separate "auto" from ' resulting in ;auto;' as unambiguous separator definition.
Application in References
All the examples we have looked at so far involved applying the number formatting functions to numbers within CSV or XML data structures. But what can we do when the number we want to format appears outside of these structures, for example, in running text or another text-based data format?
Also in these cases we can use the number format function. To do so, we will use it in combination with the reference function of the TextConverter, which gives us many different options, for, for example, accessing, reusing and modifying individual characters, words or lines within any text or text file. The linked document provides an introduction to the capabilities and basics of this function; here, we will focus only on the number format function.
In order to apply number formats within references, we need the placeholder function "format", which has the following syntax:
format=<NumberFormat>
format=<NumberFormat>|sep=<DecimalSeparator>
format=<NumberFormat>|sep=<DecimalSeparator><ThousandsSeparator>
If you want to use the automatic detection provided by the Text Converter, you can use the syntax from the first line, writing, for example, "format=0" or "format=0.00". The syntax shown in the two remaining lines can be used if you want to define an original decimal separator respectively an original decimal separator and an original thousands separator. For example, with "format=0|sep=." you create integer numbers and define the point as the original decimal separator, while with "format=0.0|sep=,." you create decimal numbers with one decimal place and the point as new decimal separator, assuming the comma as the original decimal separator and a point as the original thousands separator. If your separators consist of multiple characters, you can use the same syntax as for the action lists, which was explained at the end of the last section. For example, "format=0|sep=;dsep;tsep" for "dsep" as decimal separator and "tsep" as thousands separator.
The possibilities coming with the use of references are very multifaceted, so we will limit ourselves to only a few example scenarios below to illustrate the opportunities of using the "format" function placeholder. For this to work, references must always be enabled via the option "Actions > Options > References". Otherwise, the references are treated as plain text.
Format individual Characters, Words or Lines
Essentially, the TextConverter's references are working in the same way as placeholders. Depending on the reference used, we can extract any portion of the existing text and then reuse it elsewhere in its original form or edited (in our case, edited using the number format function) - for example, in all text fields of the TextConverter's actions (such as for replacing, deleting, inserting or appending text). The only prerequisite for this is that the replacement of references has to be enabled via the option "Actions > Options > References" in the Text Converter.
As an example, the following three placeholders refer to specific characters, words and lines of the text that we format in various ways:
%ref:chars=10,5>format=0%
%ref:word=17>format=0,0%
%ref:line=3>format=0.00%
The first placeholder, %ref:chars=10,5>format=0% represents the five characters starting at character position 10 of the entire text, formatted as an integer. The second placeholder represents the seventeenth word of the text, formatted as a decimal with one decimal place and a comma as the decimal separator. The third placeholder represents the content of the third line, formatted as a decimal with a period as decimal separator and two decimal places. The numbers and formats are, of course, only examples and can be adapted as desired.
References can also refer to CSV data or XML structures, as the following three examples show:
%ref:cell=2:3>format=0 EUR%
%ref:xml=x>format=USD 0.00%
%ref:xml=x>attribute=a>format=$0.00%
The first placeholder represents the content of the CSV cell in the third row of the second column, formatted with currency symbols as "0 EUR." The second placeholder represents the inner text of the tag matching the selector x, formatted as "USD 0.00." The third placeholder represents the value of the attribute "a" from the tag matching the selector x, formatted as "$0.00." If multiple tags match the specified selector, a single one can be filtered out using >line=x. Thus, %ref:xml=x>line=1>format=USD 0.00% represents the first match of the selector x.
In addition, the individual references can be nested as desired. You can find two examples of this in the next two lines.
%ref:line=7>chars=1-3>format=0000%
%ref:file=data.txt>line=3>word=2>format=0.000,%
The first line represents characters 1 to 3 of line 7 of the entire text, formatted as an integer with four leading zeros. The second line stands for the second word of the third line of the file "data.txt", formatted as an integer with thousand separators.
Format all Numbers in a Text
After seeing some ways to create different placeholders for different text parts and different formats in the last section, we'd now like to look at a concrete use case. We want to format all numbers that appear in a text in the same way.
Through our conversion, we want to turn the text you can find in the column "Original" of the following table into the text to the right of it shown in the column "Result".
| Original | Result |
| The apples cost 2 dollars, the bananas cost 3 dollars and the pumpkin costs 14 dollars. | The apples cost 2.00 dollars, the bananas cost 3.00 dollars and the pumpkin costs 14.00 dollars. |
For this, we use the function "Text > Replace" of the TextConverter with the following settings:
Search for: ([0-9]+)
Replace with: %ref:regmatch=$1>format=0.00%
Options: Interpret as Regular Expression
Since the numbers are all different (and we therefore can't address them individually as such), we work with regular expressions, which allow us to select any character structure we need. The regular expression [0-9] stands for any digit from 0 to 9, and with [0-9]+ we search for at least one or also more digits of this type in a row to match also numbers consisting of more than one place. As a reference, we use "regmatch=$1". This reference represents the match of a regular expression. Specifically, in this case, the match $1, that is the content of the first parentheses of the regular expression (this is why we put [0-9]+ in parentheses). For formatting, we appended the function placeholder "format=0.00" to the reference, which ensures the individual formatting of each match.
With this command, the TextConverter will search for arbitrary numbers within the text. Each match is then replaced with the placeholder %ref:regmatch=$1>format=0.00%, which accordingly formats each match individually before the replacement. If we would omit the formatting part of that placeholder and simply use %ref:regmatch=$1% as replacement, the original text would not change, because this placeholder represents the match exactly (and is the same like just writing only $1).
Format only specific Numbers in a Text
Of course, it's also possible to not change all the numbers in a text in the same way but to limit the formatting to individual numbers. We'll look at this in the next example, where we only want to convert the monetary amounts within the text, while leaving all other numbers, such as quantities, unchanged.
| Original | Result |
| 5 apples cost $3, 4 bananas cost $2 and 1 pumpkin costs $12. | 5 apples cost 3.00 dollars, 4 bananas cost 2.00 dollars and 1 pumpkin costs 12.00 dollars. |
If we would use the settings from the last example, the text would change to "5.00 apples cost $3.00, 4.00 bananas cost $2.00 and 1.00 pumpkin costs $12.00.". Therefore, this time we'll include the currency in our search:
Search for: $([0-9]+)
Replace with: %ref:regmatch=$1>format=0.00 dollars%
Options: Interpret as Regular Expression
So, here, we're searching for an arbitrary number that follows the "$" character. This will result in only the text passages "$3", "$2" and "$12" being found with this search. To extract only the number and not the dollar sign from the results, we only place the dollar sign into the parentheses. We then format each of these found numbers with the format string "0.00 dollars" and obtain our desired text, in which we've changed not only the number format but also the currency symbol.
By the way, in this example, it makes no difference whether we enclose only the number or our entire search term with the brackets, since the format function would also ensure that the original $ sign is replaced with "dollars".
Further Extensions
If we also want to find and replace decimal numbers (and not only integer numbers like in the two examples before), we could extend our regular expression to match also the decimal format and, for example, include the point or the comma in addition to the digits 0 to 9. If we don't want to apply our search to the entire text but, for example, only to specific lines, we could use the function "Lines > Replace in Lines" instead of the function "Text > Replace". Such modifications allow you to vary the examples as desired and adapt them to your own needs.