8.0.2. Newlines
All systems use the virtual "\n" to represent a
line terminator, called a newline. There is no
such thing as a newline character; it is a platform-independent way
of saying "whatever your string library uses to represent a line
terminator." On Unix, VMS, and Windows, this line terminator in
strings is "\cJ" (the Ctrl-J character). Versions
of the old Macintosh operating system before Mac OS X used
"\cM". As a Unix variant, Mac OS X uses
"\cJ".
Operating systems also vary in how they store newlines in files. Unix
also uses "\cJ" for this. On Windows, though,
lines in a text file end in "\cM\cJ". If your I/O
library knows you are reading or writing a text file, it will
automatically translate between the string line terminator and the
file line terminator. So on Windows, you could read four bytes
("Hi\cM\cJ") from disk and end up with three in
memory ("Hi\cJ" where "\cJ" is
the physical representation of the newline character). This is never
a problem on Unix, as no translation needs to happen between the
disk's newline ("\cJ") and the string's newline
("\cJ").
Terminals, of course, are a different kettle of fish. Except when
you're in raw mode (as in system("stty raw")), the
Enter key generates a "\cM" (carriage return)
character. This is then translated by the terminal driver into a
"\n" for your program. When you print a line to a
terminal, the terminal driver notices the "\n"
newline character (whatever it might be on your platform) and turns
it into the "\cM\cJ" (carriage return, line feed)
sequence that moves the cursor to the start of the line and down one
line.
Even network protocols have their own expectations. Most protocols
prefer to receive and send "\cM\cJ" as the line
terminator, but many servers also accept merely a
"\cJ". This varies between protocols and servers,
so check the documentation closely!
The important notion here is that if the I/O library thinks you are
working with a text file, it may be translating sequences of bytes
for you. This is a problem in two situations: when your file is not
text (e.g., you're reading a JPEG file) and when your file is text
but not in a byte-oriented ASCII-like encoding (e.g., UTF-8 or any of
the other encodings the world uses to represent their characters). As
if this weren't bad enough, some systems (again, MS-DOS is an
example) use a particular byte sequence in a text file to indicate
end-of-file. An I/O library that knows about text files on such a
platform will indicate EOF when that byte sequence is read.