Batch Text Encoder
With the program version TextEncoder Pro CL it is possible to change the encoding and the used line break of text files via the command line. This makes it possible to control the TextEncoder via script. On this page we would like to introduce you to the topic and show you some examples of how to use the TextEncoder to edit and convert individual files or the contents of entire folders. In addition, at the bottom of this page, you’ll find an overview of all the parameters you can use.
- Convert single or multiple Files
- Convert Contents of Folders
- Read Files using a specific Format
- Custom Characters for Line Breaks
- Convert Files with fixed Line Length
- Line break on multiple Characters
- Parameters and Settings Files
- Output of File Information
- Overview of all available Parameters
Convert single or multiple Files
Let's start with a simple example. We would like to change the encoding of a single file to UTF-8. For this we pass the following parameters:
TextEncoder.exe -cl C:\test.txt enc=utf8
First you see the parameter -cl, which stands for command line. This parameter controls whether the TextEncoder is started with a graphical user interface or in command line mode without a graphical user interface. We always use this parameter if we want to control the TextEncoder via the command line. If we omit this parameter, the TextEncoder starts normally with a window. As the next parameter, we pass the path to the file we want to edit. Last we pass enc=utf8. This indicates that we want to convert to the UTF-8 encoding. At the bottom of this page you will find a list of all encodings that we can hand over here. This command converts the file C:\test.txt to UTF-8 format.
Similarly, we control the change of the line break type. The following example changes the linebreak of the file C:\test.txt to CR LF (Windows).
TextEncoder.exe -cl C:\test.txt lb=crlf
Next, we want to explicitly specify that the Byte Order Mark (bom) is written to the file. We do that with the parameter "bom":
TextEncoder.exe -cl C:\test.txt enc=utf8 bom=1
The parameter "bom" can take the values "keep", "0" or "1". With "1" the byte order mark is always written to the file, with "0" never. With "keep", the state of the original file is adopted (if possible). In other words, if the original file has a Byte Order Mark, then a Byte Order Mark is also written to the file. If not, not.
So far we have always overwritten the file C:\test.txt. Now we want to save the converted file as a new file. For this we use the parameter "save":
TextEncoder.exe -cl C:\test.txt enc=latin1 save=C:\new.txt
Here we change the encoding of the file C:\test.txt to Latin 1 (ISO 8859-1) and save the file as C:\new.txt. The file C:\test.txt remains unchanged.
When we save the result of the conversion as a new file in this way, both the creation date as well as the last modification date of the new file are at the time of conversion. If we don't want this and want to use the date of the original file instead, we can pass "date=keep" as a parameter:
TextEncoder.exe -cl C:\test.txt enc=utf32be save=C:\new.txt date=keep
This parameter also works for overwriting files, but in this case it would only affect the modification date, since the creation date remains unchanged when overwriting in any case.
Of course, we can not only edit individual files but also several files at the same time, as the next example shows:
TextEncoder.exe -cl C:\test1.txt C:\test2.txt enc=utf16le
As an example, here we convert the files C:\test1.txt and C:\test2.txt to the format UTF16-LE. You can specify as many files as you like.
Convert Contents of Folders
In addition to specifying individual files, you can also pass the path to a folder to edit its contents. The syntax is the same as editing files:
TextEncoder.exe -cl C:\folder enc=utf8
This command converts all files from the folder C:\folder to the UTF-8 format. If you do not want to overwrite the file, you can use one of the parameters save-folder, save-name or save-ext to change the folder, file name, or file extension of the edited files so that the edited files are saved as new files.
TextEncoder.exe -cl C:\folder enc=utf8 save-folder=C:\newfolder
In this example, we convert all files from the folder C:\folder to the UTF-8 format and save the result in the folder C:\newfolder. The original files remain unchanged.
TextEncoder.exe -cl C:\folder enc=utf8 save-folder=C:\new save-name=%%jjjj%%-%%mm%%-%%tt%
In this example we want to change not only the folder but also the name of the file. Here we save the files from the folder C:\folder in the format UTF-8 in the new folder C:\new and change the name of the file to the current date. This is what the placeholders used in the parameter "save-name" mean. Since we have not defined a new file extension with the parameter "save-ext", the file extension of the original file is adopted.
In case we do not want to convert all files from the folder, we can pass filters by using parameters. For example, to edit only files of a certain file extension:
TextEncoder.exe -cl C:\folder enc=utf8 filter-ext=txt
Here we pass the parameter "filter-ext" with the value "txt". This means that we only want to edit files with the extension TXT. If there are files with other endings in the same folder, they will be disregarded. Several file extensions can be specified as follows:
TextEncoder.exe -cl C:\folder enc=utf8 filter-ext=txt-htm
This filter ensures that only files with the extensions TXT or HTM are converted from the folder C:\folder. Additional filters and search options can be found at the bottom of this page in the list of all available parameters.
TextEncoder.exe -cl C:\folder enc=utf8 search-subdirs=0
In addition, you can use the parameter "search-subdirs" to specify whether only the files that are directly in the first level in the specified folder should be edited, or whether all subfolders of the specified folder should also be included. In this example, we specify search-subdirs=0. The result is that files that are in subfolders are not processed. If you omit this parameter or pass search-subdirs=1, all files from all subfolders are also processed. In the example, for example, also files from a folder like C:\folder\folder1\folder2.
TextEncoder.exe -cl C:\folder enc=utf8 save-folder=C:\new keep-subdirs=1
When all files from a folder containing subfolders are converted and the files should be stored in another folder defined by the parameter save-folder, the question is how the files from the subfolders should be treated. This can be controlled via the parameter keep-subdirs. With keep-subdirs=1, the folder structure from the original folder remains and the converted files are stored in corresponding subfolders within the new folder. Apart from that, with keep-subdirs=0, the folder structure is ignored and all files are stored directly in the new folder without subfolders. If we omit the keep-subdirs parameter, the folder structure is ignored (corresponds to keep-subdirs=0).
Read Files using a specific Format
Normally the TextEncoder guesses the encoding and the line break of the existing files and reads the files on this basis. Of course, you can also enforce a specific encoding or line break type when reading the files. You do this with the parameters enc-read and lb-read.
TextEncoder.exe -cl C:\file.txt enc-read=utf8 lb-read=lf
Here we enforce that the file is read as UTF-8 format with the line break LF (Unix, Linux, macOS) and interpreted accordingly. If you omit one of the two parameters or both parameters, the value enc-read=auto respectively lb-read=auto is used. This means that the files are read and interpreted due to automatic recognition of the encoding and the line break.
Custom Characters for Line Breaks
In addition to the previously used constants for common line break types such as crlf or lf, you can also use any user-defined characters for a line break. These characters can be defined both, in the form of code points or in the form of text.
In the following example, we read a file with the line break LF and want to save the file with the line break CR LF. Instead of using the constants lf and crlf, we use the writing by means of code points, so #0A instead of LF and #0D#0A instead of CRLF. We have to write "customcp-" in front of our code point definitions, which is the prefix in the TextEncoder to interpret the following text as code points:
TextEncoder.exe -cl C:\test.txt lb-read=customcp-#0A lb=customcp-#0D#0A
The result of the conversion with this writing is of course identical to the result we get with lb-read=lf lb=crlf. The following call also returns us the same result again:
TextEncoder.exe -cl C:\test.txt lb-read=customcp-10 lb=customcp-U+000DU+000A
This time we have defined the line break LF not hexadecimal (#0A) but decimal (#0A = 10). The third way to define code points for the TextEncoder is the form U+X, as we have defined CR LF here.
Although we have only written typical line break types as code points in these examples, of course, we can also define any untypical line break characters in the form of one or more code points and use it for reading or writing in the TextEncoder.
While we can define code points with the prefix "customcp-", we can define any character or text as a custom line break with the prefix "customstr-" directly. An example of this is the following call:
TextEncoder.exe -cl C:\test.txt lb-read=customstr-a lb=crlf
With this call, we replace any occurrence of the letter "a" in the file test.txt with the Windows line break CR LF. So we interpret the letter "a" as a character for a line break.
Internally, customcp- and customstr- are working in the same way, the difference consists exclusively on user side in the different writing and type of defining characters. However, it is advisable to use "customstr-" only for readable characters such as "a" from our example, while "invisible" characters can be easier defined via code points and the use of "customcp-".
Convert Files with fixed Line Length
All previous examples work with characters such as CR or LF, which are interpreted as a line break. In addition, with the TextEncoder, it is also possible to process line breaks without characters. We are talking about lines defined by a fixed number of characters.
In the following example, we have a file that does not contain a sign for a line break. Instead, we know that 15 characters belong to a line and a new line always starts after 15 characters. We want to convert this file and accordingly insert a character for a line break (CR LF) all 15 characters:
TextEncoder.exe -cl C:\test.txt lb-read=fixedlength-15 lb=crlf
We use the prefix "fixedlength-" for reading (lb-read) and add to this prefix our desired line length (here 15). Since we want to save the file with the line break CR LF, we use "lb=crlf" as a further parameter.
Of course, we can also go the other way around:
TextEncoder.exe -cl C:\test.txt lb=nochar
With this call, we remove all line breaks from the file test.txt. This time, we only use the parameter "lb" (for saving) with the value "nochar" standing for no character. The parameter "lb-read" for reading is omitted this time, so the TextEncoder will try to automatically detect the line break type used in the file.
Line break on multiple Characters
Normally, the character used for a line break is unique within a file. This can be, for example, the character LF as usual on macOS, Linux and Unix systems or CR LF from the Windows world. But how can we deal with files in which different line break types are mixed? For example, because several files were hung together coming from different systems without first adjusting their line breaks?
A solution is to repair the line wrap type of the file. This can be done with the TextEncoder using its ability to read line breaks at several different characters at the same time. A call can look like this:
TextEncoder.exe -cl C:\test.txt lb-read=customcps-10,13 lb=crlf
We use the prefix "customcps-" behind which we can define any number of code points. When reading the file, a line break is then realized on each of these listed code points. In our example, we want to split the file at the two code points 10 and 13. We can then then use any other line break type for saving. In our example we take CR LF.
In addition to a definition of code points, we can also define our characters for a line break via text. For this, we take the prefix "customstr-" instead of "customcps-":
TextEncoder.exe -cl C:\test.txt lb-read=customstrs-a,b,c lb=crlf
In this example, we want to consider the characters a, b and c as line break signs and then save the file with the new line break type CR LF. So, we just replace any occurrence of the letters a, b or c with the Windows line break.
Parameters and Settings Files
In all previous examples, we have passed the settings for the encoding, the linebreak or the storage via individual parameters to the TextEncoder However, it is also possible to use one or more settings files and only hand over the path to these files instead of working with the parameters such as "lb", "enc" or "save".
Settings files can be created directly in the TextEncoder. To do this, first, just adjust all settings via the user interface. You can then save these settings to a settings file via the menu "Settings > Save or Load Settings > Save as File". The result is a file with the file extension TES, which contains all your current settings for line breaks, encoding and storage options. You can find out more about settings files in the article "Save and Load Settings".
After you have created a settings file, you can simply pass it as a parameter:
TextEncoder.exe -cl C:\test.txt C:\settings.tes
With this call you convert the file "test.txt" in accordance with the settings that are stored in the file "settings.tes".
Of course, a combination of settings files and parameters is also possible. Here is an example:
TextEncoder.exe -cl C:\test.txt C:\settings.tes enc=utf8 save-ext=txt
With this call, initially all settings from the file "settings.tes" are loaded. The encoding for the saving is then overwritten via the "enc" parameter and the file extension used for the stored file via the "save-ext" parameter. The remaining settings such as the name or folder of the new file or the settings for the linebreak type come from the file "settings.tes".
If more than one settings files are passed, the order is crucial. First, the first parameter with a settings file is loaded, then the second parameter with a settings file and so on. If all settings files are loaded regardless of their parameter position, the additional parameters are evaluated and then can overwrite the previously loaded settings from the settings files. Of course, you can also edit the settings files manually with any text editor, for example to remove certain settings and, for example, to only keep certain settings. The TextEncoder will use the default values for all missing settings in the setting files.
Output of File Information
In addition to changing the encoding or the line break type of files, the TextEncoder can also be used to output information about text files in the command line or within a script. This includes both general file information such as the size of files as well as properties of text files such as their encoding, their line break type or the number of characters, lines or words.
Further information on this topic as well as an explanation of the possibilities, examples and an overview of all parameters you can use can be found in the tutorial about the output of general file information and textfile-specific information via the command line or script.
Overview of all available Parameters
The following table lists all available parameters that you can use in the TextEncoder. Some of the parameters we have already presented in the examples on this page.
|[Files]||any file path(s)||-||Path to the file to be converted. You can specify multiple files in sequence to convert multiple files at the same time.|
|[Folders]||any folder path(s)||-||Path to a folder whose contents should be converted. To avoid converting all files from the folder, you can use the parameters search-subdirs, filter-ext, filter-name, filter-name-matchcase, filter-name-regex, filter-hiddenfiles and filter-onlytextfiles to narrow your search. Multiple folders can be specified consecutively to simultaneously convert the contents of multiple folders.|
|[TES-Files]||settings files with the file extension *.tes||-||Path to a settings file with the file extension TES that can contain settings for the conversion and the storage location. Settings files can be combined and overwritten with all other parameters. Several setting files can be specified in a row, which are then loaded consecutively. More about the use and generation of settings files can be found in the section "Parameter and Setting Files".|
|lb||keep, system, crlf, lf, cr, nl, ff, nel, ls, ps, vt, tab, nochar, customstr-x, customcp-x||keep||Linebreak type for the converted file. The value "keep" keeps the line break type of the original file, otherwise the specified type. The value "system" corresponds to the standard line break type of the operating system on which the TextEncoder is currently running. So for example crlf for Windows. The constant nochar stands for no character. You can use it to remove all line break characters from a file. With customstr- and customcp-, custom characters can be defined as line break characters via text or code points (the x stands for the user-defined text or codepoint). An explanation and examples can be found here. An overview of the different line break types can be found here.|
|lb-read||auto, system, crlf, lf, cr, nl, ff, nel, ls, ps, vt, tab, fixedlength-x, customstr-x, customstrs-x, customcp-x, customcps-x||auto||Line break with which the file is read. If this parameter is not specified or "auto" is specified, an attempt is made to automatically detect the line break. The value "system" corresponds to the standard line break type of the operating system on which the TextEncoder is currently running. So for example crlf for Windows. With fixedlength-, a text with fixed line length can be read in (for example fixedlength-70 for 70 characters per line). An explanation and examples can be found here. With customstr- and customcp-, custom characters for a line break can be defined via text or code poioints for reading (the x stands for the user-defined text or codepoint). An explanation and examples can be found here. If you have files with mixed line break types, you can use customstrs- and customcps-. This allows you to define multiple comma-separated custom line breaks via text or code points, for example customcps-13,10. An explanation and examples of the application can be found here.|
|enc||keep, ascii, latin1, latin2, win-ansi, win-1250, win-1251, win-1252, win-1253, utf7, utf8, utf16le, utf16be, utf32le or utf32be||keep||Encoding for the converted file. With "keep" the encoding of the original file is used, otherwise the specified encoding. The encoding "win-ansi" depends on the localization of your Windows version. The Windows code page that matches your language version of Windows will be used. An overview of all available encodings can be found here.|
|enc-read||auto, ascii, latin1, latin2, win-ansi, win-1250, win-1251, win-1252, win-1253, utf7, utf8, utf16le, utf16be, utf32le or utf32be||auto||Encoding with which the file is read. If this parameter is not specified, an attempt is made to automatically detect the coding. The encoding "win-ansi" depends on the localization of your Windows version. The Windows code page that matches your language version of Windows will be used.|
|bom||0, 1 or keep||keep||Should a byte order mark be written into the file? 0 for never, 1 for always, "keep" for as in the original file.|
|date||auto or keep||auto||With date=keep, every date of the original file is used for the changed or the newly stored file. If you omit this parameter or if you use date=auto, the changement date of the file depends on the time of processing and the creation date remains at the first creation time of the file.|
|save||Path of any file||-||With this parameter you can save the converted file explicitly under a freely selectable file name. For example, you can specify save=C:\Folder\File.txt to save the converted file as C:\Folder\File.txt. If you only want to save the converted file in a different folder while keeping the file name and the file extension, please use the save-folder parameter and omit the save parameter. In addition, the parameters save-name for the name and save-ext for the file extension can be used in the same way and combined with each other. If you do not specify any of the save, save-folder, save-name, or save-ext parameters, the original file will be overwritten.|
|save-folder||keep or any text||keep||Folder in which the converted file is saved. If you do not specify this parameter or if you call this parameter with the value "keep", the file is saved in the folder in which the original file is located. This parameter can be combined with the parameters save-name and save-ext. Each of these parameters is optional, allowing you to independently define the folder, name and extension. If you want to save the converted file under an explicit file name with path, name and extension, please use the parameter save. If you do not specify any of the save, save-folder, save-name, or save-ext parameters, the original file will be overwritten. With the parameter keep-subdirs you can control how to deal with subfolders within the folder.|
|save-name||keep or any text||keep||Name without folder and file extension, with which the converted file is saved. If you do not specify this parameter or if you use this parameter with the value "keep", the file is given the same name as the original file. This parameter can be combined with the parameters save-folder and save-ext. Each of these parameters is optional, allowing you to independently define the folder, name and extension. If you want to save the converted file under an explicit file name with path, name and extension, please use the parameter save. If you do not specify any of the save, save-folder, save-name, or save-ext parameters, the original file will be overwritten.|
|save-ext||keep or any text||keep||File extension with which the converted file is saved. If you do not specify this parameter or if you use this parameter with the value "keep", the converted file receives the file extension that also had the original file. This parameter can be combined with the parameters save-folder and save-name. Each of these parameters is optional, allowing you to independently define the folder, name and extension. If you want to save the converted file under an explicit file name with path, name and extension, please use the parameter save. If you do not specify any of the save, save-folder, save-name, or save-ext parameters, the original file will be overwritten.|
|search-subdirs||0 or 1||1||Should the subfolders be searched when editing a folder? 0 for no, 1 for yes. A value of 0 converts only the files that are in the first level of the folder.|
|keep-subdirs||0 oder 1||0||This parameter controls whether the folder structure from an original folder should be maintained in a new folder or not. Example of Usage: A folder is passed, its contents should be converted. Within this folder there are one or more subfolders with an arbitrary number of levels. In addition, the files should be stored in a new or other folder (defined via the parameter save-folder). With keep-subdirs=1, the files are stored in the new folder in the same folder structure, as they were stored within the original folder. With keep-subdirs=0, the folder structure from the original folder is ignored and all files are stored directly in the folder defined for storage.|
Example: The file C:\folder\subfolder\file.txt is converted.
Case A: With "TextEncoder.exe -cl C:\folder enc=utf8 save-folder=C:\new keep-subdirs=0" the new file is stored under "C:\new\file.txt".
Case B: With "TextEncoder.exe -cl C:\folder enc=utf8 save-folder=C:\new keep-subdirs=1" the new file is stored under "C:\new\subfolder\file.txt".
|filter-ext||any text||-||If you only want to edit files with a specific extension, you can enter this extension(s) here. For example, filter-ext=txt to edit only files with the extension TXT. Multiple endings can be separated with a hyphen. For example, filter-ext=php-htm-html to edit only files with the extensions PHP, HTM or HTML. If you omit this parameter or leave it empty, files with all endings will be considered.|
|filter-name||any text||-||If you only want to edit files with a specific name, you can enter a name here. All files containing the characters specified with "filter-name" are processed. With filter-name=ab, for example, files like abc.txt or xab.txt. If you omit this parameter or leave it empty, files with all names are taken into account.|
|filter-name-matchcase||0 or 1||0||Should the text specified with the parameter "filter-name" be interpreted according to its uppercase and lowercase writing? 0 for no, 1 for yes. If 1, the text in the file name must be the same in the same case. If 0 it is searched case-insensitive.|
|filter-name-regex||0 or 1||0||If the search filter specified with "filter-name" is to be interpreted as a regular expression, use 1. If you just want to search for the specified text, 0.|
|filter-hiddenfiles||0 or 1||0||Do you want to edit hidden files when converting a folder? 0 for no, 1 for yes. A value of 0 leaves all hidden files untouched, a value of 1 also handles the hidden files.|
|filter-onlytextfiles||0 or 1||1||Do you want to edit only text files when editing a folder? 0 for no, 1 for yes. If yes, each file is checked before being converted to see if it is a binary file and the processing is not carried out if so.|
|openfile||0 or 1||0||Should the converted file be opened after editing? openfile=1 will open the newly created file.|
|delfile||0 or 1||0||Should the original file be deleted after the conversion? delfile=1 deletes the original file. This option is only useful if the converted files are to be saved under a different name or in a different location than the original file and the original files should not be kept.|
|info||%enc%, %bom%, %encbom%, %lb%, %lines%, %chars%, %words%, placeholders for the file name and folder, placeholders for the file size, placeholders for the date and arbitrary text freely combinable||-||With the parameter "info", you can output information about the passed files and the files from passed folders in the console. As a value, you can use the placeholders %enc% (encoding of the file), %bom% (is there a byte order mark?), %encbom% (encoding and byte order mark), %lb% (line break type), %lines% (number of lines), %chars% (number of characters) and %words% (number of words) and any arbitrary text. In addition, the placeholders of the TextConverter for the file name, the folder, the file size and the date can be used. So, for example "info=%enc%" to display only the encoding, "info=%encbom% %lb%" to display encoding, byte order mark and line break type or "info=The file %filename% has %lines% lines, %chars% characters and %words% words." to output this sentence for a file. A detailed explanation as well as further examples can be found in the tutorial about the output of file information via script. Even if the examples in this tutorial refer to the TextConverter, all examples can also be used with the TextEncoder in the same way.|
In principle, all these parameters can be combined with each other and used together. If you do not define a parameter, the specified default value is used for this parameter (the - indicates that this parameter is empty by default). In the column "Values" you can see all the values that this parameter can have.
Old Program Version
If you are still using the old version of the TextEncoders: An overview of the old TextEncoder batch version can be found here.