Supported Formats
With the TextEncoder you can change both the encoding and the type of line break of text files. On this page we have compiled for you which codecs and line breaks the TextEncoder can read, write and change.
Encodings
The following encodings can be read and written with the TextEncoder. In the "BOM" column, you can see if the encoding supports a Byte Order Mark. In the parameter column, you can see which parameter you can use in the batch version to convert files to the corresponding format via a script.
Encoding | Description | BOM | Parameter |
ASCII | 7-bit encoding with 128 characters (00 to 7F) | no | ascii |
Latin-1 | 8-bit encoding according to ISO 8859-1 | no | latin1 |
Latin-2 | 8-bit encoding according to ISO 8859-2 | no | latin2 |
WIN-ANSI | Language-dependent ANSI code page of your Windows installation | no | win-ansi |
WIN-1250 | Windows Code Page 1250 (Central European) | no | win-1250 |
WIN-1251 | Windows Code Page 1251 (Cyrillic) | no | win-1251 |
WIN-1252 | Windows Code Page 1252 (Western European) | no | win-1252 |
WIN-1253 | Windows Code Page 1253 (Greek) | no | win-1253 |
CP437 | Code Page 437 (CP437, IBM437, OEM-US) | no | cp437 |
UTF-7 | For using Unicode in non-8-bit environments | yes | utf7 |
UTF-8 | Unicode encoding with variable 1 to 4 bytes per character | yes | utf8 |
UTF-16 LE | Unicode encoding with variable 2 or 4 bytes per character, Little Endian | yes | utf16le |
UTF-16 BE | Unicode encoding with variable 2 or 4 bytes per character, Big Endian | yes | utf16be |
UTF-32 LE | Unicode encoding with fixed 4 bytes per character, Little Endian | yes | utf32le |
UTF-32 BE | Unicode encoding with fixed 4 bytes per character, Big Endian | yes | utf32be |
Learn more about each encoding in the introduction to Unicode text file formats.
Line Breaks
The following line break types can be read and written with the TextEncoder. In the column "Parameter", you can see which parameter you can use in the batch version to change the line break of files via a script into the respective type.
Line Break | System / Designation | Code Point | Parameter |
CRLF | Windows, DOS, OS/2, CP/M, Symbian, Palm, Atari | U+000D + U+000A | crlf |
LF | Unix, Linux, macOS, Mac OS X, Android, AmigaOS, BSD | U+000A | lf |
CR | Classic Mac OS, Apple II, Commodore C64, OS-9 | U+000D | cr |
NL | EBCDIC New Line - IBM Mainframe Systems | U+0015 | nl |
RNL | EBCDIC Require New Line | U+0006 | rnl |
LF | EBCDIC Line Feed | U+0025 | lf_ebcdic |
EOL | ATASCII End Of Line | U+009B | eol |
GS | Group Separator | U+001D | gs |
RS | Record Separator | U+001E | rs |
US | Unit Separator | U+001F | us |
FF | Unicode Form Feed | U+000C | ff |
NEL | Unicode Next Line | U+0085 | nel |
LS | Unicode Line Separator | U+2028 | ls |
PS | Unicode Paragraph Separator | U+2029 | ps |
VT | Vertical Tab | U+000B | vt |
TAB | Horizontal Tab | U+0009 | tab |
FIXED | Fixed Line Length (x = Number of Characters) | - | fixedlength-x |
NOCHAR | No Character | - | nochar |
- | Linebreak at custom Character x | - | customstr-x |
- | Linebreak at custom Codepoint x | - | customcp-x |
- | Linebreak at one of the Characters x, y or z | - | customstrs-x,y,z |
- | Linebreak at one of the Codepoints x, y or z | - | customcps-x,y,z |
See the introduction to line breaks to learn more about the different types of line breaks. In addition, we recommend the AskingBox tutorial about the rewriting of text files with fixed line length regarding the line break types FIXED and NOCHAR.
Custom Line Breaks
In addition to the preset line break types listed in the table above, any custom characters and strings can also be used as a line break. These characters can be defined as text or in the form of code points in the TextEncoder. For this, in the graphical user interface, under "Read as" and "Save as", you can select the options "Custom Character" or "Custom Code Point" and enter your desired characters or code points in the input field below. Code points can be specified in three different ways: hexadecimal (for example #0D#0A), decimal (for example 13 10) or in the form U+X (for example, U+0D U+0A or U+000D U+000A).
When controlling the TextEncoder via the command line, you can use the parameters customstr-x and customcp-x for custom line breaks. The x stands for the respective user-defined characters or code points, for example: customstr-a (line break character is the letter a) or customcp-#0D#0 (line break on the string defined by the code points #0D#0A = Windows Line Break CR LF).
Line Breaks on multiple Characters
For line breaks on several different characters, the options "Line break at each of these characters (comma-separated)" and "Line break at each of these code points (comma-separated)" or the parameters customstrs-x and customcps-x can be used.
All characters that should be interpreted as a line break can be defined separated by a comma. For example, "a,b" for line breaks at both, on each "a" and on each "b". Similarly, you can define the command line parameters: for example customstrs-",",";" (line break on each comma and on each semicolon) or customcps-#0A,#0D (line break on both code points #0A or #0D = LF or CR).
More information about this topic is available in the AskingBox tutorial about text files with mixed line breaks.