InfoCenter

Line Breaks

There are several different ways in which line breaks can be implemented within plain text files. In this article we would like to first look at these different kinds of line break types and then address the problems that can arise due to this variability as well as provide solutions and some application examples for these problems.

Basically, there are three different categories into which we can classify the different types of line breaks: character-based line breaks, line breaks by defining a fixed line length, as well as line breaks implemented by a markup language. In the following sections, we would first like to compare these three categories and their most prominent representatives as an introduction to the topic.

Character-Based Line Break Types

Most plain text files use certain previously defined characters respectively bytes to mark their line breaks. If the program that is supposed to read, process or display such a text file knows these characters, it knows that these characters should not be displayed as letters but can be interpreted as (invisible) line breaks.

This approach would be easy to implement if a single specific character for a line break had been agreed upon over time. However, because the various systems have grown historically, this is still not the case today. So, depending on the operating system, other characters or bytes can be used for a line break.

Characters and Codepoints for Line Breaks and their Use

The following table provides an overview of the different characters and character combinations for line breaks and the most common systems that is using each type of line break:

AbbreviationCode (Hex/Dec)Character SetSystem/Usage
CR LF0D 0A / 13 10ASCIIWindows, MS-DOS, OS/2, Symbian OS, Palm OS, Atari TOS, CP/M, MP/M, RT-11, Amstrad CPC, DEC TOPS-10 as well as most other early non-Unix and non-IBM operating systems
LF0A / 10ASCIILine Feed - Unix and Unix-like systems (Linux, macOS, Mac OS X, Android, BSD, AIX, Xenix and so on), Amiga, AmigaOS, QNX, Multics, BeOS, RISC OS and other POSIX standard oriented systems
CR0D / 13ASCIICarriage Return - Mac OS (Classic) up to version 9, Apple II, Lisa OS, Commodore 64 (C64), Commodore 128 (C128), Acorn BBC, ZX Spectrum, TRS-80, Oberon, HP Series 80, MIT Lisp Machine, OS-9
RS1E / 30ASCIIRecord Separator - QNX (before the POSIX implementation with version 4)
EOL9B / 155ATASCIIEnd Of Line - Atari 8-Bit Computer
NL15 / 21EBCDICNew Line - IBM Mainframe Systems such as z/OS (OS/390) or IBM i (i5/OS, OS/400)
LF25 / 37EBCDICLine Feed - EBCDIC character for ASCII's 0A
RNL06 / 06EBCDICRequire New Line (since 2007)
 76 / 118ZX80/ZX81Sinclair Research Home Computers Linebreak
VTU+000BUnicodeVertical Tab
FFU+000CUnicodeForm Feed
NELU+0085UnicodeNext Line
LSU+2028UnicodeLine Separator
PSU+2029UnicodeParagraph Separator

The worldwide most widespread and frequently used character set is ASCII (American Standard Code for Information Interchange) respectively the Unicode standard that is based on ASCII. The two most common and widespread line break types also come from this character set: the Unix line break LF as well as the Windows line break CR LF.

Unix and the current macOS from Apple use the Unicode code point U+000D as a line break, while older Apple systems use U+000A. Windows and MS-DOS use both of these characters one after the other in the order 0D0A. In addition to these three characters and character sequences, the Unicode standard also requires the code points U+000B (Vertical Tab VT), U+000C (Form Feed FF = new page), U+0085 (Next Line NEL), U+2028 (Line Separator LS) as well as U+2029 (Paragraph Separator PS) to be interpreted as a line break. However, to date, only a few programs do this.

One of the best-known character sets outside of the ASCII world is the 8-bit character set EBCDIC (Extended Binary Coded Decimal Interchange Code) developed by IBM for its mainframe computers. This character set uses the hexadecimal character 15 (decimal 21) for a line break, which combines the functions of CR and LF. In addition, EBCDIC also contains the ASCII-typical characters CR and LF (albeit the latter under a different character code) and, from 2007, the additional character RNL (Required New Line), which can be used to encode a conditional automatic line break.

Less common or only of historical relevance are the line breaks EOL (End Of Line) used on Atari 8-bit computers (mainly used in the 1980s) from the 8-bit character set ATASCII (ATARI Standard Code for Information Interchange) used by Atari, the line breaks from the ZX80 and ZX81 character sets used by Sinclair Research Ltd for its computers also in the 1980s as well as the line break RS (Record Separator), which was used by the QNX operating system until the release of version 4.0 in 1990. Some historical operating systems even defined newlines at the bit level: for example, the CDC 6000 series operating systems from the 1960s, at a time when memory was expensive, defined their line breaks as two or more zero bits filled 6-bit characters at the end of a 60-bit word.

Why does the Windows Line Break consist of two Characters?

The fact that Windows, MS-DOS and most other early non-Unix and non-IBM operating systems, in contrast to the other operating systems mentioned, define their line breaks with two characters has historical reasons and can be traced back to the procedure of typewriters and old printing devices:

On a typewriter, the break of a line is namely also carried out by two actions that can be distinguished from each another: On the one hand, the writing position moves back to the beginning of the line (carriage return) and, on the other hand, the writing position moves down one line, for example, by pushing the paper to be printed further by turning the roller (line feed). According to this logic, a complete "line break" is made up of a combination of these two actions. When character set systems for computers were developed in the 1960s, in these character sets, separate control characters for the carriage return as well as for the line feed were defined in order to be able to map and implement the control of printers at that time in the same way. This history is still reflected in today's most recent Windows versions.

The carriage return was given the decimal code 13 (hexadecimal 0D) in the ASCII character set at that time and is abbreviated as "CR", the line feed was given the decimal code 10 (hexadecimal 0A) and is abbreviated as "LF". Both of these characters can still be found in the current Unicode standard under the same numerical code points today.

Some systems also used the distinction between CR and LF for various text effects. If only CR without LF was used for the printer control, a carriage return without a line feed could be achieved. In this way, the writing position could reach the beginning of an already printed line and thus overprint the existing text with other characters. For example, this way text could be underlined, crossed out or written in bold. Diacritical characters outside of the character set actually used were also made possible in this way by overprinting or combining different characters. Similarly, the control character RI (Reverse Line Feed) defined with the code point U+008D in the Unicode standard can be used.

Unicode, ASCII, EBCDIC, HTML Entities and Escape Sequences

As we saw in the last section, in addition to many similarities, there are also certain differences between the individual character sets. For this reason, we would like to compare the relevant characters again in the next table:

CharacterUnicode Code PointASCIIEBCDICHTML EntityEscape Sequence
CRU+000D0D130D13

 \r
LFU+000A0A102537

\n
CR LF-0D 0A13 100D 2513 37--\r\n
NEL/NLU+0085-1521……\u0085
VTU+000B0B110B11\v
FFU+000C0C120C12\f
LSU+2028--

\u2028
PSU+2029--

\u2029

Since the Unicode standard has completely adopted all characters from the ASCII character set with identical code points as its "Basic Latin" block for compatibility reasons, all characters for line breaks from the ASCII character set such as the line feed LF, the carriage return CR, the vertical tab VT as well as the page feed FF are defined both in the ASCII character set and as Unicode codepoints with the same number.

In addition, the Unicode standard defines the code points U+0085, U+2028 and U+2029 as additional line breaks that are not part of the ASCII character set. To be distinguished from these real line breaks are the Unicode code points U+2424 (Symbol for Newline), U+23CE (Return Symbol), U+240D (Symbol for Carriage Return) as well as U+240A (Symbol for Line Feed). Although those characters do not generate a line break themselves, they can be used to create glyphs that are visible to the user in order to visualize the otherwise invisible line break characters.

The EBCDIC character set, which is mainly used on IBM mainframe systems, also has many parallels to ASCII. Although the standard EBCDIC line break is the character NEL (hexadecimal code 15 / decimal code 21), which itself has no ASCII equivalent, EBCDIC also defines codepoints for the characters CR, LF, VT and FF. Of these four characters, only LF is defined under a code point different from ASCII in EBCDIC (25/37 instead of 0A/10).

The Unicode character equivalent to EBCDIC-NL is NEL (Next Line) and has the Unicode codepoint U+0085. This character has been defined in the Unicode standard in addition to CR and LF to enable bidirectional conversion from and to all other encodings. If we only had the characters CR and LF available in the Unicode standard, this would not be possible: For example, if we wanted to convert an EBCDIC text to Unicode and back again, in this case we could first convert all NEL line breaks to either LF or CR LF. When converting back, however, we would be faced with ambiguity, since EBCDIC makes a distinction between CR, LF and NL and it would therefore no longer be clear whether our LF and CR characters were already LF and CR before (and should therefore be maintained) or they were originally an NL (which would have to be converted back). So, only because the three different characters CR, LF and NEL are also available to us in the Unicode standard, transformation is possible without loss of information.

Furthermore, the last two columns of the table show the HTML entities as well as the escape sequences of the individual characters. The HTML entities can be used to insert the respective characters into HTML source text. The table shows the HTML entities in both hexadecimal and decimal notation. These two variants lead to the same result and can therefore be used interchangeably. For the LF character, also the HTML entity 
 can be used. Similarly, also the escape sequences from the last column are placeholders for the characters mentioned. The escape sequences can be used, for example, in regular expressions or in some programming languages as an alias respectively to insert the corresponding line break characters. More about this in the section on line breaks in the program code.

ASCII-based 8 bit encodings such as the Windows code pages or the Latin character sets are not listed in the table, as these character sets have also adopted all ASCII characters and therefore correspond to the "ASCII" column of the table.

Just for the sake of completeness, it should also be mentioned that in addition to the common and most frequently used Unicode standard, which adopted its first code points from the ASCII character set, there is also an alternative Unicode standard called UTF-EBCDIC, which is instead widely based on the EBCDIC character set.

Byte Representations in different Encodings

Depending on the encoding used, the mentioned Unicode codepoints result in different bytes within a stored file. The following table provides an overview of the byte sequences of the various line break types in the encodings ASCII, UTF-7, UTF-8, UTF-16 Litte Endian and Big Endian as well as UTF-32 Litte Endian and Big Endian:

CharacterUnicode Code PointASCIIUTF‑7UTF‑8UTF‑16 LEUTF‑16 BEUTF‑32 LEUTF‑32 BE
CRU+000D0D0DOD0D 0000 0D0D 00 00 0000 00 00 0D
LFU+000A0A0A0A0A 0000 0A0A 00 00 0000 00 00 0A
CR LF-0D 0A0D 0A0D 0A0D 00 0A 0000 0D 00 0A0D 00 00 00 0A 00 00 0000 00 00 0D 00 00 00 0A
NEL/NLU+0085-2B 41 49 55C2 8585 0000 8585 00 00 0000 00 00 85
VTU+000B0B2B 41 41 730B0B 0000 0B0B 00 00 0000 00 00 0B
FFU+000C0C2B 41 41 770C0C 0000 0C0C 00 00 0000 00 00 0C
LSU+2028-2B 49 43 67E2 80 A828 2020 2828 20 00 0000 00 20 28
PSU+2029-2B 49 43 6BE2 80 A929 2020 2929 20 00 0000 00 20 29

Typical 8-bit encodings that are based on ASCII, such as the Windows code pages or the Latin character sets, are not listed separately in the table. These encodings use the same bytes as ASCII, which can be found in the ASCII column. Many other ANSI code pages and character sets also follow this convention.

The byte representations listed in this table are, among other things, important for the detection of the line break type of files, which we will address in the section on recognizing the line break type of a file.

Line Break Characters as Line Separators or Line Terminators

Characters for line breaks can be interpreted in two different ways, both of which have their proponents and applications: a line break character can be considered either as a separator between two lines or as a marker for the end of a line.

To demonstrate this difference, let's look at the following example, where "N" represents the line break character:

abcNdefN

The content of such a file could be interpreted in two different ways:

There are programs that regard newline characters as separators and other programs that interpret newline characters as terminators. The problems that result from this are obvious: programs that consider the line break character as a separator may interpret one (empty) line too much; programs that consider the line break character as a line-end marker may have problems reading the last line of a file.

Input of Line Break Characters

The system line break is usually easiest to enter using the Enter key. An exception occurs if the input is made within an editor that understands other line break types and in which either a file with a non-system line break type is currently being worked on or the settings of this editor (or another program) is set to a corresponding other line break type.

Entering the other line break types is a little more difficult: some systems and text editors allow the keyboard shortcut CTRL + J to enter the LF character. Other common key combinations are CTRL + M for CR as well as CTRL + K for VT (this is also the reason why sometimes ^M is displayed for CR). If we interpret CR and LF as a carriage return and a line feed, we can do this using the Pos1 and the Arrow-Down keys.

Within HTML source code, line break characters can additionally be inserted via their HTML entities, which are listed in the table that can be found in the section "Unicode, ASCII, EBCDIC, HTML Entities and Escape Sequences". Furthermore, we can enter the characters using the keyboard shortcut ALT + Codepoint of the character using the Num keypad of the keyboard and in some contexts, such as in regular expressions or in many programming languages, we can also use the escape sequences of the characters, which are also listed in the table mentioned for each of the characters. More on the latter in the section on line breaks in the source code of programming languages.

Line Breaks by Defining a Fixed Line Length

In contrast to the line break types based on specific character definitions that were introduced in the last section, text files with a fixed line length do not require the definition of one or more characters for a line break. Instead, each line of such a file is based on a line length that can initially be freely selected, but is kept constant within the whole file. In the file itself, all lines are then simply written one after the other and, if necessary, brought to the required length using a suitable filler respectively padding character.

The content of such a file (here, for example, with a fixed line length of four characters) can then look like this:

ABCDABC ABCD

A program that knows the line length used for the file and can display it, can then interpret this content as follows:

ABCD
ABC
ABCD

Since the second line only contains three characters, we used a space as a filler character here. If we hadn't done that, the "A" from the third line would have moved to the end of the second line.

Distribution and Areas of Application

Files with a fixed line length are significantly less common than files that implement their line breaks with a defined break character. The main reason against using a fixed line length is the lack of flexibility. After all, very few texts have the same number of characters in each line.

Nonetheless, there are some useful applications for such files, for example in the case of CSV data or other data sets whose values in each line are all of the same length, so additional characters for line breaks would not add any further information to the interpretation of such files, so that these characters can be omitted accordingly (especially in applications or environments where memory needs to be saved).

Fixed Line Length as System Line Break

The fixed line length was only used as a system line break on some of the first mainframe computers. At that time, fixed line lengths of 72 or 80 characters were common on such systems. This number was modeled on the punch cards used previously, which also typically included 80 columns per card, of which the columns 73 to 80 were often used for sequence numbers. Some of these systems encoded lines longer than 80 characters by placing a carriage character such as # as the first character at the beginning of the next line to be linked.

Records-based file systems, such as those used by the operating systems OpenVMS, RSX-11 or various newer mainframe computers, also do not require a line break character. Such systems store text files as one record per line. Each of these records contains a length field at the beginning of the line in which the length of the respective line is stored individually. This means that no additional line delimiter in the form of a control character is necessary, since the reading program already knows from this information after how many characters the line ends respectively how many characters have to be read to read a line. Even if storage in this way does not require a line break character, the record management systems used are usually able to pass the requested lines to a requesting program with a line separator character if necessary.

Line Breaks in HTML Source Code and other Markup Languages

Next to the character-based line break types and the line breaks defined by a fixed line length, which we looked at in the last two sections, there is another way to implement line breaks using a markup language.

Line Breaks in HTML Source Code

One of the best-known representatives of markup languages is the XML-based source code of HTML, the basis of Internet pages as we know them today. The implementation of line breaks in HTML source code and in other similar markup languages is special because line breaks can occur on two different levels: The source text itself can contain any character-based line breaks such as CRLF or LF, but they remain hidden because the final display of line breaks on the website that is later visible in the browser is solely based on the text based HTML tags and other formatting such as CSS style sheets.

To illustrate this, we would like to look at two HTML sources as an example. On the one hand, there is the following HTML source code:

<h1>Headline</h1><p>First Paragraph</p><p>Second<br>Paragraph</p>

On the other hand, there is this source code:

<h1>Headline</h1>
<p>First Paragraph</p>
<p>Second<br>
Paragraph</p>

As we can see, the first example does not contain any "visible" line breaks, while in the second example there is a line break after each meaningful paragraph and within the second paragraph. Nevertheless, both source codes lead to exactly the same display in the browser. So-called whitespace such as additional spaces, tabs or even line breaks do not play any role in the interpretation of the source text.

The only thing that matters in this example source code is that we have put one text into an h1 tag (from "heading 1") and two other texts into a p-tag (from "paragraph"). By default (you can also override this behavior), these tags are both interpreted to insert a line break in the form of a paragraph after them. The same applies to tags like h2 (heading 2), h3 (heading 3), li (list elements) or the classic HTML line break br (simple break, which we used to wrap the second paragraph. Other tags such as formatting tags like b (bold) or i (italic) do not insert automatic breaks in the representation.

Classic line breaks in the source text that are not based on tags, on the other hand, can be used independently of the display in the browser, for example to structure the source text in order to make it more readable. These line breaks, which are later invisible in the browser, can be used in the form of character-based line breaks or, for example, they can be inserted via so-called HTML entities, which are listed in the section on HTML entities.

Override the Behavior with the pre Tag

This behavior of invisible character-based line breaks in the source text can be overridden using the HTML tag "pre" as well as the CSS style attribute "white-space:pre". The line breaks and other whitespace such as spaces in the source text that are located within a pre tag or within tags with the CSS property "white-space" with the value "pre" are output as such in the browser:

<pre>Line 1
Line 2</pre>
<span style="white-space: pre">Line 3
Line 4</span>

This source text creates four broken lines in the browser even though the line breaks were only written into the source text using otherwise invisible "whitespace". The line break between the first and the second line is created by the pre tag, the line break between the third and the fourth line is created by the CSS property of the enclosing span element.

Line Breaks in LaTeX, Markdown, RTF, Creole, PostScript, BBCode and AsciiDoc

Other common markup languages include LaTeX, Markdown, RTF, Creole and PostScript, each of which uses a different syntax to mark line breaks:

In markup languages such as BBCode or AsciiDoc, on the other hand, despite the possibility of other markups (such as "[b]word[/b]" or "*word*" for bold text), line breaks from the source text are also included in the result. So, in these markup languages, the character-based line break itself is used as markup (what is similar to Markdown).

Requirements for using Markup Languages

The prerequisite for using markup languages such as HTML, TeX / LateX, Markdown, RTF, Creole, PostScript, BBCode or AsciiDoc is of course that the used markups, commands and tags must be known. Without knowing how specific markup is meant, to be used or to be interpreted, a representation is not possible.

Line Breaks in the Source Code of Programming Languages

In programming source code, we are also faced with the problem that - similar to HTML source text - we have to make a distinction between the source code itself and what is later actually displayed by the executed and possibly compiled program. It is important to master the balancing act between code that is as human-readable as possible, but which may not have a negative impact on the program.

Like in HTML, this balancing act was solved again, by the fact that many programming languages make a strict distinction between the line breaks in the source code and the line breaks that the program later outputs: Depending on the operating system, the usual character-based line breaks can generally be used in the source code, while for those in the program a markup language exists to output line breaks, which can differ from programming language to programming language. Some examples of this are listed in the following table:

LanguageExplicit Line BreakSystem Line Break
Cchar s[] = "-\r\x0A-";char s[] = "\n";
C++std::string s = "-\r\x0A-";std::string s = "\n";
C#string s = "-\r\n-";string s = Environment.NewLine;
JavaString s = "-\r\n-";String s = System.lineSeparator();
String s = "-%n-";
JavaScript / TypeScriptvar s = "-\n-";
Delphivar s: string;
s := '-' + #13#10 + '-';
var s: string;
s := sLineBreak;
Lazarus / FreePascalvar s: string;
s := '-' + #13#10 + '-';
var s: string;
s := LineEnding;
PHP$s = "-\r\n-";$s = PHP_EOL;
Pythons = "-\r\n-"s = os.linesep
Perlmy $s = "-\r\x0A-"; my $s = "\n";
Haskell"-\CR\LF-" :: [Char]"\n" :: [Char]
Visual BasicDim s1 As String = "-" & vbCrLf & "-";
Dim s2 As String = "-" & vbCr & "-";
Dim s3 As String = "-" & vbLf & "-";
Dim s1 As String = System.Environment.NewLine;
Dim s2 As String = vbNewLine; (deprecated)
SQLUPDATE tab SET col = '-' + CHAR(13) + CHAR(10) + '-'; 

As the table shows, in most programming languages we can use two different approaches to insert a line break:

In the next two sections we would like to go into more detail about both variants and their pitfalls.

Explicit Line Break

In many programming languages such as C, C++, C#, Java, PHP, Python, Perl or Haskell, the escape sequences such as \r and \n introduced in the section "Character-Based Line Break Types" can be used to insert a line break into a string. Basically, \r stands for a carriage return (CR, U+000D) and \n for a line feed (LF, U+000A), which allows the line breaks to be generated for the different systems.

Depending on the programming language, the following aspects and particularities must be taken into account when using \r and \n:

The situation is somewhat different in Delphi, Lazarus, Visual Basic and SQL. Instead of the escape sequences \r and \n, in Visual Basic we can use the constants vbCr and VbLf for the characters CR and LF. There is also the constant vbCrLf for the Windows line break respectively for both characters together. In Delphi, FreePascal and Lazarus we can insert the characters directly via their character codes #13 (CR) and #10 (LF). The situation is likewise in the database language SQL, where we can similarly use CHAR(13) and CHAR(10) to generate the corresponding characters.

Variables, Constants and Functions for the System Line Break

In addition to this explicit definition of line breaks, most programming languages also provide us with variables, constants or functions with which the respective system line break can be inserted regardless of the platform:

In this way, we can decide for ourselves whether we want to explicitly use a specific line break type (for example because our goal is to save files with exactly this line break type) or whether our program should automatically use the appropriate system line break (for example because our program should produce an appropriate output on different systems).

Network Protocols

The use of the correct line break type also plays an important role in network protocols. Many of these network protocols, such as HTTP, SMTP, FTP and IRC, are text-based and use the CRLF line break type for their line-by-line transmitted requests.

Some programs adhere strictly to this standard and accordingly refuse to process requests that use a different line break type such as LF (such as qmail). Other programs are more tolerant in their processing or even incorrectly always use the system line break type for their requests, which can lead to problems in communication with systems that implement the standard more strictly. Some of these problems also arise from the use of the C-typical \n which, as we saw in the last section, can resolve as either the correct CRLF or the incorrect LF in programming languages such as C, C++, Perl and Haskell depending on the operating system and mode.

To avoid these problems, some protocols now recommend also recognizing line break types other than CRLF. However, with the continued use of CRLF we are still on the right side, as we do not know which possibly outdated program is being used on the other side.

Detection of the Line Break Type of a File

In contrast to the encoding of text files, whose "encoding ID" we can write for certain Unicode encodings as an identification mark in the form of a so-called Byte Order Mark (BOM) at the beginning of a text file, it is not that easy when trying to recognize the line break type of a file. For the line break type used in a file, there is nothing comparable that could signal us the used line break type in a similar way like the BOM. Therefore, when we have an unknown text file in front of us, all we have left are a few rules of thumb to determine the line break type of this file.

A first indication delivers the operating system on which the text file has been created: If the file originated on a Windows computer, the Windows line break CR LF is likely. However, if the file was created on a recent Mac or on Linux, the file probably uses the Unix type line break LF. However, this type of classification can at best be a rule of thumb, since there are, for example, enough text editors for Windows available that may have preset the Windows line break in their default settings, but with which it is also possible to create files using any other line break type. In addition, it could also be completely unclear which system a file actually comes from.

For this reason, when interpreting a text file of unknown origin, we should not rely on such guesswork but rather try to make a decision based on the bytes of the file that we know. For example, we can proceed as follows:

Fortunately, we only have to do the work described here if we want to program an application ourselves that can handle all types of text files. A program that can already do this is the TextConverter: By default, the TextConverter works with the option "Line Break Type" > "Automatic Detection", which means that the TextConverter automatically carries out the analysis of your files as described here and you don't notice anything about it. However, with the TextConverter it is of course also possible to change this default setting and read or save files with any other type of line break. With the TextConverter, all line break types presented in this article can be used in the same way as a fixed line length or single respectively multiple user-defined characters or code points as a line break.

Problems with File Exchanges

The different encodings for line breaks can cause serious problems when exchanging files between different systems.

The problems can be of a wide variety of kinds:

In order to nevertheless make such files readable on the system of your choice, there are two options: either we just use a program that also understands exotic types of line breaks, or we swap the character for the line break in these files before further viewing or editing. We'll look at how this works in the next section.

How to Change the Line Break Type of Files

If you would like to read text files from other operating systems or other sources that use a different line break type than your operating system natively on your system, you can rewrite the line breaks of the files in question respectively replace the previous line breaks with the line break type you prefer. Such a rewrite may also be necessary if you want to read your text files with a program that only understands a certain kind of line break and cannot carry out the necessary conversion itself.

Change Line Break Type with the TextEncoder

Regardless for what reason you would like to change the line break type of files, you can easily make this change, even with any number of files at the same time, using the software TextEncoder. To do this, simply follow the steps below:

The TextEncoder supports all character-based line break types presented in this tutorial as well as line breaks after a fixed number of characters for both reading and saving text files. Additionally, you can also define and use custom line breaks via single or multiple characters or codepoints.

If you want to automate the line break change of one or more files (for example all files from a specific folder) via a script, you can use the TextEncoder in its version TextEncoder Pro CL.

Change Line Break Type with the TextConverter

Also with the application Text Converter it is possible to change the used line breaks of text files. The procedure is the same as that just described for the TextEncoder. Also the selection of supported line break types is identical to the TextEncoder.

However, the line break options in the TextConverter are not located under "Changes > Line Breaks" but under "Actions > Files > Line Break Type". In addition, you can also use the TextConverter for numerous other manipulations of plain text, CSV as well as XML files, while the TextEncoder is only intended for changing the encoding and line break type of text files. The Text Converter is also available as a batch version, which can be controlled and automated via the command line or using a script.

Files with mixed Line Breaks

In the section "Detecting the Line Break Type of a File" we had already discussed the case that there may be text files that can contain several types of line breaks at the same time. None of the possible line break types can then clearly and uniquely be assigned to such a file.

Emergence of Files with mixed Line Breaks

Such files with mixed line breaks can emerge in various ways:

It is possible, for example, that a file may have been edited by different people on different systems. For example, if these people are using a text editor that can only understand and write their own system line break type, the following can quickly happen: Person A creates a text file on Linux. At this point, the resulting file only contains the Unix line break type LF. Person B then opens the file on Windows and begins adding a few paragraphs. These new paragraphs are written into the file using the Windows CR LF line break, but the old LF line breaks remain untouched. Such a file then unintentionally contains several types of line breaks.

The same can happen if multiple files from different systems are appended together without first harmonizing the line break type of the files.

Reparation of Files with mixed Line Breaks

But what can we do when it's already too late? How can we fix such a file with mixed line breaks? Fortunately, we don't have to do this manually since we can just use the TextEncoder again, which has already been introduced in the last section. How this works exactly is explained in the tutorial "How to Repair Text Files with mixed Line Breaks".

And that next time you don't end up with files with mixed line breaks again: the program TextConverter can join several text files together, taking into account their different line break types. And of course without you having to explicitly care about it.