TextConverter

Supported Formats

With the TextConverter, arbitrary texts and text files can be edited regardless of their format. That can be, for example, plain text files that typically have the file extension TXT, CSV files that typically have one of the file extensions CSV or TSV, files in XML-based formats that, for example, can have the file endings XML, XHTML, HTML, HTM, RSS or SVG, source code files such as PHP, JS, BAT, CMD, SH, VBS, C, CPP, CS, PAS, PY or R as well as any other text formats such as JSON, SQL, DIF, CSS or INI - just to list some of them.

PDF documents or office documents such as Microsoft Word Documents (DOC, DOCX), Microsoft Excel Spreadsheets (XLS, XLSX) or other office files such as ODT, ODS, PPT or PPTX cannot be processed with the TextConverter, because internally those formats are no text files. However, it is possible to export text files and CSV files with the TextConverter into the formats DOCX, ODT, XLSX, ODS and as an image (JPG, PNG, BMP).

The TextConverter offers numerous actions to process texts and text files. With the actions for processing the entire text and with the actions for editing lines, all texts and text files of any format can be edited. In addition, the TextConverter provides some format-specific actions for the processing of CSV files and the processing of XML files.

Regardless of the format of a text file, a text file can be stored in different encodings and with using different types of line breaks. In the two subsequent tables you can see which encodings and line break types are supported by the TextConverter.

Encodings

In the following table you can see an overview of all encodings supported by the TextConverter. These encodings can be read, written and changed by the TextConverter.

If you use the TextConverter with its default settings - that means without changing any settings - the TextConverter will try to automatically determine the encoding of a file. The TextConverter will then also use this encoding for storing the corresponding file. So, if you only want to edit the content of a text file (for example with replacements of text), you do not need to worry about the encoding settings.

If you would like to change the encoding of files or if you want to read files using a specific encoding, you can use the settings under "Actions > Files > Encoding". In addition to the options for reading and writing, you will also find an option regarding the question of whether a byte order mark should be written into the files or not. In the column "BOM" in the table, you can see whether an encoding facilitates byte order marks or not.

Also in the case, you control the TextConverter via the command line or via a script, without specifying an explicit encoding for reading or saving the file, the encoding is automatically determined during reading and also used again for writing. If you want to deviate from this default behavior, you can use the values from the column "Parameter" from the table. An introduction and examples of the use of the parameters can be found in the article about the script control of the TextConverter in the section about parameters for encoding.

EncodingDescriptionBOMParameter
ASCII7-bit encoding with 128 characters (00 to 7F)noascii
Latin-18-bit encoding according to ISO 8859-1nolatin1
Latin-28-bit encoding according to ISO 8859-2nolatin2
WIN-ANSILanguage-dependent ANSI code page of your Windows installationnowin-ansi
WIN-1250Windows Code Page 1250 (Central European)nowin-1250
WIN-1251Windows Code Page 1251 (Cyrillic)nowin-1251
WIN-1252Windows Code Page 1252 (Western European)nowin-1252
WIN-1253Windows Code Page 1253 (Greek)nowin-1253
CP437Code Page 437 (CP437, IBM437, OEM-US)nocp437
UTF-7For using Unicode in non-8-bit environmentsyesutf7
UTF-8Unicode encoding with variable 1 to 4 bytes per characteryesutf8
UTF-16 LEUnicode encoding with variable 2 or 4 bytes per character, Little Endianyesutf16le
UTF-16 BEUnicode encoding with variable 2 or 4 bytes per character, Big Endianyesutf16be
UTF-32 LEUnicode encoding with fixed 4 bytes per character, Little Endianyesutf32le
UTF-32 BEUnicode encoding with fixed 4 bytes per character, Big Endianyesutf32be

You can find out more about the respective encodings and their differences in the introduction to the Unicode text file formats.

Line Break Types

In the following table you can see an overview of all types of line breaks provided by the TextConverter. Since the TextConverter also supports line breaks at custom characters or code points, you are not bound to this selection but you can also define and use your own line breaks at one or more characters or code points.

If the TextConverter is used without explicitly defining a type of line break for reading or writing, the TextConverter will try to automatically determine the type of line break used in a text or text file in its default settings. This type of line break is then also reused for the storage of the file. If you would like to change the line break type of a file or read files using a specific line break, you can use the settings under "Actions > Files > Line Break Type".

If you would like to change the line break type of files via a script or via the command line with the TextConverter or if you want to use a specific line break type for reading files, you can use the values from the column "Parameter". You can find out how you can control the TextConverter in batch mode with parameters for the line break type in the article about the script control of the TextConverter in the section parameters for the line break type.

Line BreakSystem / DesignationCode PointParameter
CRLF
Windows, DOS, OS/2, CP/M, Symbian, Palm, AtariU+000D + U+000Acrlf
LFUnix, Linux, macOS, Mac OS X, Android, AmigaOS, BSDU+000Alf
CRClassic Mac OS, Apple II, Commodore C64, OS-9U+000Dcr
NLEBCDIC New Line - IBM Mainframe SystemsU+0015nl
RNLEBCDIC Require New LineU+0006rnl
LFEBCDIC Line FeedU+0025lf_ebcdic
EOLATASCII End Of LineU+009Beol
GSGroup SeparatorU+001Dgs
RSRecord SeparatorU+001Ers
USUnit SeparatorU+001Fus
FFUnicode Form Feed
U+000Cff
NELUnicode Next Line
U+0085nel
LSUnicode Line Separator
U+2028ls
PSUnicode Paragraph Separator
U+2029ps
VTVertical Tab
U+000Bvt
TABHorizontal Tab
U+0009tab
FIXEDFixed Line Length with x Characters-fixedlength-x
NOCHARNo Character-nochar
-Linebreak at Character x-customstr-x
-Linebreak at Codepoint x-customcp-x
-Linebreak at one of the Characters x, y or z-customstrs-x,y,z
-Linebreak at one of the Codepoints x, y or z-customcps-x,y,z

You can find out more about the different types of line breaks in the introduction to line breaks.

Custom Line Breaks

If you want to work with line actions or if you want to change the line break type of files or texts using the TextConverter, you are not limited to the types of line breaks shown in the table. This selection is only the list of predefined line break types, which you can select directly in the drop down list in the TextConverter.

In order to define user-defined line breaks at one or more arbitrary characters or codepoints, you can go to "Actions > Files > Line Break Type > Read as" or "Actions > Files > Line Break Type > Save as" and select either "Custom Character" or "Custom Codepoint" from the drop down list - depending on whether you want to specify the line break for reading and/or writing as a character or as a codepoint. After this selection, an input field appears in which you can write your desired line break.

If you select "Custom Character", you can directly enter the character or the characters in the input field that should be interpreted as a line break when reading or writing. So, for example "|" or "--".

If you select "Custom Codepoint", you have the option of entering your line break in the form of one or more codepoints. This has the advantage over the specification as a character that you can also easily specify invisible or non-displayable characters. Codepoints can be written either hexadecimal, decimal or in the form U+X. In order to define the Windows line break CR LF as a custom codepoint, you could, for example, use the formats "#0D#0A" (hexadecimal), "13 10" (decimal), "13 10" (dezimal), "U+0D U+0A" or "U+000D U+000A".

If you control the TextConverter via the command line or a script, the custom line breaks can be passed via the parameters customstr-x and customcp-x. With customstr-x you can pass characters and with customcp-x codepoints, with the x standing for the respective character(s) or code point(s). For example, customstr-ab (line break at the string "ab") or customcp-#0D#0A (line break at the Windows line break CR LF defined by the codepoints #0D#0A in hexadecimal notation). Further examples of the use of the parameters for custom line breaks can be found in the tutorial for the script control of the TextEncoder in the section "Custom Characters for Line Breaks". Even if this tutorial is about the TextEncoder, you can also use the examples shown there for the TextConverter.

Lines with a Fixed Line Length

In addition to the line breaks on one or several characters, the TextConverter also supports reading and saving texts and text files with a fixed line length. This means that the end of a line is not defined by a certain character or a certain codepoint, but by a defined number of characters. For example, by the definition that a line always consists of 10 characters.

In the TextConverter, under "Actions > Files > Line Break Type > Read as" you can select the option "Line Break after this Number of Characters (Fixed Line Length)" and enter your desired number of characters. Under "Save as" you can select "No Character" if you want to keep this type of line break. If not, simply select a different type of line break in order to change the line break type of your text.

A more detailed explanation about working with files with a fixed line length can be found in the tutorial about rewriting text files with a fixed line length. This tutorial is written for the TextEncoder, but you can also use everything for the TextConverter.

Line Breaks on multiple Characters

Typically, line breaks are defined by a single fixed character or by a single fixed string. For example, with the fixed character LF (Unix, Linux, macOS) or the fixed string CR LF (Windows). This line break remains constant over the entire file or the entire text and no other character is interpreted as a line break.

However, with the TextConverter you can deviate from this rigid rule and you are also able to define multiple characters or multiple strings that are interpreted independently of each other as a line break. For example, both CR LF and LF. This function can be useful, for example, if text files of different systems have been copied into one file and this file is now to be repaired. This means that the TextConverter could be used at this point to read the file taking into account both types of line breaks in order to then save the file with a fixed uniform type of line break.

If you want to use the TextConverter via the graphical user interface and define line breaks at several characters, you can go to "Actions > Files > Line Break Type > Read as" and either select "Line break at each of these characters (comma-separated)" or "Line break at each of these code points (comma-separated)". These two options offer the possibility of defining several characters as a line break either directly via typing the characters or in the form of codepoints. The individual characters or strings must be separated with a comma. For example, "a,bc" for a line break at both every "a" and on every "bc" in the text. If you want to use the comma as a line break itself, you can put it in quotation marks, for example "",",." for a line break at every comma and every point in the file. Codepoints can be specified in the formats hexadecimal ("#0D#0A"), decimal ("13 10") or in the form U+X ("U+0D U+0A" or "U+000D U+000A").

If you control the TextConverter via the command line or via a script, you can use the parameters customstrs-x and customcps-x for line breaks at multiple characters. The x is to be replaced by the desired line breaks, for example customstrs-a,bc and customcps-#0D#0A for the two examples mentioned above. In the tutorial about the script control of the TextEncoder in the section "Line break on multiple Characters" you will find further explanations and examples for the use of the parameters customstrs-x and customcps-x. Everything in this tutorial also applies to the TextConverter.

Further information and examples on the topic are also available in the AskingBox tutorial "Repair Text Files with mixed Line Breaks". The examples there relate to the TextEncoder again, but can also be used for the TextConverter.