TextEncoder

Supported Formats

With the TextEncoder you can change both the encoding and the type of line break of text files. On this page we have compiled for you which codecs and line breaks the TextEncoder can read, write and change.

Encodings

The following encodings can be read and written with the TextEncoder. In the "BOM" column, you can see if the encoding supports a Byte Order Mark. In the parameter column, you can see which parameter you can use in the batch version to convert files to the corresponding format via a script.

EncodingDescriptionBOMParameter
ASCII7-bit encoding with 128 characters (00 to 7F)
noascii
Latin-18-bit encoding according to ISO 8859-1
nolatin1
Latin-28-bit encoding according to ISO 8859-2
nolatin2
WIN-ANSILanguage-dependent ANSI code page of your Windows installation
nowin-ansi
WIN-1250Windows Code Page 1250 (Central European)
nowin-1250
WIN-1251Windows Code Page 1251 (Cyrillic)nowin-1251
WIN-1252Windows Code Page 1252 (Western European)nowin-1252
WIN-1253Windows Code Page 1253 (Greek)nowin-1253
CP437Code Page 437 (CP437, IBM437, OEM-US)nocp437
UTF-7For using Unicode in non-8-bit environments
yesutf7
UTF-8Unicode encoding with variable 1 to 4 bytes per character
yesutf8
UTF-16 LE
Unicode encoding with variable 2 or 4 bytes per character, Little Endian
yesutf16le
UTF-16 BE
Unicode encoding with variable 2 or 4 bytes per character, Big Endian
yesutf16be
UTF-32 LE
Unicode encoding with fixed 4 bytes per character, Little Endian
yesutf32le
UTF-32 BE
Unicode encoding with fixed 4 bytes per character, Big Endianyesutf32be

Learn more about each encoding in the introduction to Unicode text file formats.

Line Breaks

The following line break types can be read and written with the TextEncoder. In the column "Parameter", you can see which parameter you can use in the batch version to change the line break of files via a script into the respective type.

Line Break
System / Designation
Code Point
Parameter
CRLF
Windows, DOS, OS/2, CP/M, TOSU+000D + U+000A
crlf
LFUnix, Linux, macOS, Mac OS X, AmigaOSU+000Alf
CRClassic Mac OS, Apple II, CommodoreU+000Dcr
NLAIX OS, IBM Mainframe Systems, OS/390U+0015nl
FFUnicode Form Feed
U+000Cff
NELUnicode New Line
U+0085nel
LSUnicode Line Separator
U+2028ls
PSUnicode Paragraph Separator
U+2029ps
VTVertical Tab
U+000Bvt
TABHorizontal Tab
U+0009tab
FIXEDFixed Line Length (x = Number of Characters)
-fixedlength-x
NOCHARNo Character
-nochar
-Linebreak at custom Character x-customstr-x
-Linebreak at custom Codepoint x-customcp-x
-Linebreak at one of the Characters x, y or z-customstrs-x,y,z
-Linebreak at one of the Codepoints x, y or z-customcps-x,y,z

See the introduction to line breaks to learn more about the different types of line breaks. In addition, we recommend the AskingBox tutorial about the rewriting of text files with fixed line length regarding the line break types FIXED and NOCHAR.

Custom Line Breaks

In addition to the preset line break types listed in the table above, any custom characters and strings can also be used as a line break. These characters can be defined as text or in the form of code points in the TextEncoder. For this, in the graphical user interface, under "Read as" and "Save as", you can select the options "Custom Character" or "Custom Code Point" and enter your desired characters or code points in the input field below. Code points can be specified in three different ways: hexadecimal (for example #0D#0A), decimal (for example 13 10) or in the form U+X (for example, U+0D U+0A or U+000D U+000A).

When controlling the TextEncoder via the command line, you can use the parameters customstr-x and customcp-x for custom line breaks. The x stands for the respective user-defined characters or code points, for example: customstr-a (line break character is the letter a) or customcp-#0D#0 (line break on the string defined by the code points #0D#0A = Windows Line Break CR LF).

Line Breaks on multiple Characters

For line breaks on several different characters, the options "Line break at each of these characters (comma-separated)" and "Line break at each of these code points (comma-separated)" or the parameters customstrs-x and customcps-x can be used.

All characters that should be interpreted as a line break can be defined separated by a comma. For example, "a,b" for line breaks at both, on each "a" and on each "b". Similarly, you can define the command line parameters: for example customstrs-",",";" (line break on each comma and on each semicolon) or customcps-#0A,#0D (line break on both code points #0A or #0D = LF or CR).

More information about this topic is available in the AskingBox tutorial about text files with mixed line breaks.