Split Text Files into several new Files
If we would like to divide the content of a text file into several new files, automation of this task can save us a lot of work and, above all, a lot of time. Especially if we want to split a very large number of files and the separation is always to be carried out according to the same pattern, the task is easy to automate and the savings are particularly large. In this tutorial, we would like to show you an easy way how you can get a quick result without much effort. We use the program TextConverter for this.
Before we take a detailed look at the individual options for separation and the associated options, we would first like to take a look at the general procedure on how to use the TextConverter to split individual files in several new files:
- First add all the files to be separated to the file list in the program. The easiest way to do this is by simply dropping the files from any folder onto the TextConverter.
- Then activate the action "Split Files" on the right side of the main window under "Actions > Files". In the options of this action, you have to activate at least one criterion according to which the files should be splitted.
- When you have set all options and, possibly other actions to edit your files, just click on the button "Convert and Save" (SHIFT + STRG + S) to perform the separation. As the basis for the file names of the individual parts, the settings from the storage options are used. Additionally, a continuous number for each of the parts is attached to the basic names.
In this general description of the procedure, we have not yet talked about which criteria we can select for the separation. We would like to go into this in the next section.
Possibilities of Separation
The TextConverter offers you 3 different options or criteria according to which you can split your files. These options can also be combined:
- Split Files on a Text or a Regular Expression
- Split Files on Line Breaks
- Split Files after Number of Characters
- Combination of multiple Criteria
- General Options for all Separations
- Placeholders for the Numbering of the Parts
- Storage and Configuration of the File Names of the Parts
- Join several Text Files
Split Files on a Text or a Regular Expression
With this option you can divide your original file at a specific text. This means that after each appearance of this search text, a new file begins. Accordingly, if your text occurs twice in the original file, three new files are stored (one with the text that appears in the original file before the first occurrence of the search text, one with the text between the first occurrence and the second occurrence of the search text and a third file with the text that stands in the original file behind the second occurrence of the search text).
It does not matter whether your search text consists of only one character, several words or even multiple lines. Furthermore, the search text does not have to be a static text: If you activate the option "Interpret as Regular Expression" under the text box, you can also work with regular expressions at this point. A simple example would be the regular expression [0-9] which executes a separation on any digit.
If you would like to keep the search text on which was separated in the new files, you can activate one or both of the options "Keep Search Text at the Beginning of each new File" or "Keep Search Text at the End of each new File". If you do not activate any of these two options, the search text will not appear in the new files.
Another option makes it possible not to separate directly at the search text but on the next line break. If the option "Split at next Line Break" is activated, related words of a paragraph remain in the same file and are not separated from each other. This allows you to separate, for example, according to sections that contain certain words without tearing the respective sections apart.
Split Files on Line Breaks
With this option you can separate the original file on its line breaks. This means that for each line of the original file a new file is created that contains the text of the respective line.
For this option, the settings under "Actions > Files > Line Break Type" apply. By default, that means if you do not make any changes here, the type of line break of the original file is automatically recognized and you get the result that you would expect in general. The decisive factor is then the typical line break that you know from an average text editor. However, you can also define other criteria for a line break in the TextConverter. For example, it is possible to define any characters, character chains or several different characters as a line break. This gives you further ways to separate your files flexibly. You can find out how this works in the explanations of custom line breaks on one or several characters.
Split Files after Number of Characters
With this option you can cut your original file into pieces with a freely selectable number respectively length of characters. You can enter any numerical value into the field. For example, if your original file has 2500 characters and you specify a value of 1000 characters, your file is split into 3 parts: The first new file contains the first 1000 characters of the original file, the second new file contains the second 1000 characters of the original file and the third new file contains the remaining 500 characters. If your original file contains fewer characters than the specified value, there is no separation and the original file remains with its content as it is.
You can also use this option to limit the text of all files created to a maximum number of characters. For example, if you combine this option with the other options.
Combination of multiple Criteria
At least one of these introduced options must be activated in order to be able to perform the function. The activation of more than one of these options is also possible. In this case, it is first separated according to the criterion of the first activated option. Then the resulting parts are separated again according to the criterion of the second activated option and so on.
For example, if you activate both the option for a separation on line breaks as well as the option for a separation after a certain number of characters, first it is separated at the line breaks. Then all parts (here the parts are equal to the lines) are gone through and if a line consists of more than the permitted number of characters, it is splitted again within the relevant line in accordance with the second criterion.
General Options for all Separations
Under the 3 options with which you can determine the criteria for the separation of the files, you will find further general options that are always used regardless of the selected criteria:
- Remove whitespace from the beginning or the end of each new file: If this option is activated, spaces, tabs and line breaks will be removed from the beginning or from the end of each new file. This means that if a part of the division should start or end, for example with line breaks or some spaces, these are removed before storing so that the new file begins or ends directly with the actual text.
- Perform actions before or after splitting the files: If there are other actions such as text actions, line actions or CSV actions activated in addition to the file split, the question arises whether these actions should be applied before or after the separation. This is particularly important for line actions or CSV actions that refer to a certain line or column in the text or in the file. That is because the partition can change both the line number and the column number. An extreme example of this would be the separation at line breaks. Before the separation, a file could have 100 lines that can be addressed individually with the actions via the 100 different line numbers. After the separation, however, each file only has one line with the line number 1 - a distinction according to lines is no longer possible. However, if you want to separate according to a different criterion and provide the new files with a line numbering within the respective new file, this action must take place after the separation so that the line numbers start for each new file again. So, depending on the area of application, it can make more sense to execute the actions either before or only after the separation. You can control this distinction with these two options. It is also possible to apply the actions for two times, before as well as after the separation on both occasions.
- Minimal length of a new file (in characters): With this option you can define a minimal length for the resulting parts respectively for the new files. It is only saved if a new resulting file contains at least as many characters as stated. With this option you can prevent, for example, the storage of empty files. Depending on the separation criteria, empty files without content can arise through different circumstances. For example, if you split at line breaks and a file contains several line breaks or empty lines after each other in a row. If you indicate that the new files should have at least a length of one character, such empty parts are ignored after the separation and not saved. Of course, you can also set the number higher and thus control the storage according to other criteria. If you set the number to 0, every resulting part is saved, so also empty files.
Placeholders for the Numbering of the Parts
In addition to the simple placeholders and the placeholders for references, the TextConverter provides two other placeholders that can only be used in connection with splitting files: %part_num% and %part_abs%.
The placeholder %part_num% stands for the number of the part while the placeholder %part_abs% stands for the total number of parts. Both placeholders can be used in the file name (that means in the fields "Folder", "Name" and "File Extension" of the storage options) as well as in the actions and the files themselves.
If a file is split into 5 parts, as an example, the placeholder %part_abs% always stands for "5" while the placeholder %part_num% depends on the respective part. For the first part, %part_num% is "1", for the second part, it is "2", and so on. With this placeholder it is therefore possible, for example, to write the number of each part in the respective partial file, to number the file names of the parts consecutively or to save the individual parts in different folders whose names contain the number of the part.
Since the current version of the TextConverter does not provide a preview for file separations, the placeholders %part_num% and %part_abs% are not considered in the preview.
Storage and Configuration of the File Names of the Parts
In which folder and under what name the new files should be saved, you can define at the bottom right of the main window of the TextConverter. Here you can select an arbitrary folder and determine a base name for all files. With the option "keep", this can also be the folder or the name of the original file.
If you use the default settings, the individual parts respectively the files containing the individual parts are numbered consecutively by appending a consecutive number to the specified name. For example, the file names of the saved parts could be "file-01.txt", "file-02.txt" to "file-20.txt".
If you want to number the files in a different way, you can use the placeholder %part_num% within the storage options, which stands for the number of the part in question. For example, if you use "%part_num% %name%" as the file name, the partial files from the example would be named "01 file.txt", "02 file.txt" to "20 file.txt" or if you use "%name% (%part_num%)", the resulting file names would be "file (01).txt", "file (02).txt" through "file (20).txt".
If the file name contains the placeholder "%part_num%", there is no automatic numbering by appending the number of the part. On the other hand, if the file name does not contain the placeholder "%part_num%", an automatic numbering always occurs, except the option "Number File Names of Parts only if necessary" is activated and no file with the resulting name already exists.
For the file naming of the individual parts, you can also use references. An example would be using the placeholder "%ref:line=1%" which represents the first line of the file. If you use this placeholder as a file name, the first line of each part is used as the file name for this part. If you specify, for example, the placeholder "%ref:word=1%" as the folder, the individual parts will be sorted according to their first word into different folders, each folder having the first word of the respective file as its name. Of course, you can also use any other of the available references or combine the references with other characters or placeholders. If you use references and thus already get a unique file name, you can activate the option "Number File Names of Parts only if necessary" if you do not want any additional automatic numbering of the files.
Even if we sometimes only speak of one file as the original file in this tutorial, the function can of course also be used with multiple files at the same time. This means that if you have more than one file in your file list, each file is separated individually independently of the other files in the file list.
Join several Text Files
In addition to the possibility of dividing individual files into several new files, the TextConverter also offers the reverse way: How you can put any number of files together, you can learn in the tutorial about combining several text files.