Comparing Files (Unix Power Tools, 3rd Edition)

11.1. Checking Differences with diff

Go to http://examples.oreilly.com/upt3 for more information on: diff

The diff command displays different versions of lines that are found when comparing two files. It prints a message that uses ed-like notation (a for append, c for change, and d for delete) to describe how a set of lines has changed. The lines themselves follow this output. The < character precedes lines from the first file and > precedes lines from the second file.

Let's create an example to explain the output produced by diff. Look at the contents of three sample files:

test1	test2	test3
apples	apples	oranges
oranges	oranges	walnuts
walnuts	grapes	chestnuts

When you run diff on test1 and test2, the following output is produced:

$ diff test1 test2
3c3
< walnuts
--
> grapes

The diff command displays the only line that differs between the two files. To understand the report, remember that diff is prescriptive, describing what changes need to be made to the first file to make it the same as the second file. This report specifies that only the third line is affected, exchanging walnuts for grapes. This is more apparent if you use the -e option, which produces an editing script that can be submitted to ed , the Unix line editor. (You must redirect standard output (Section 43.1) to capture this script in a file.)

$ diff -e test1 test2
3c
grapes
.

This script, if run on test1, will bring test1 into agreement with test2. (To do this, feed the script to the standard input of ed (Section 20.6) or ex; add a w command (Section 20.4) at the end of the script to write the changes, if you want to.)

If you compare the first and third files, you find more differences:

$ diff test1 test3
1dO
< apples
3a3
> chestnuts

To make test1 the same as test3, you'd have to delete the first line (apples) and append the third line from test3 after the third line in test1. Again, this can be seen more clearly in the editing script produced by the -e option. Notice that the script specifies editing lines in reverse order; otherwise, changing the first line would alter all subsequent line numbers.

$ diff -e test1 test3
3a
chestnuts
.
1d

So what's this good for? Here's one example.

When working on a document, it is common practice to make a copy of a file and edit the copy rather than the original. This might be done, for example, if someone other than the writer is inputing edits from a written copy. The diff command can be used to compare the two versions of a document. A writer could use it to proof an edited copy against the original.

$ diff brochure brochure.edits
49c43,44
< environment for program development and communications,
--
> environment for multiprocessing, program development
> and communications, programmers
56c51
< offering even more power and productivity for commericial
--
> offering even more power and productivity for commercial
76c69
< Languages such as FORTRAN, COBOL, Pascal, and C can be
--
> Additional languages such as FORTRAN, COBOL, Pascal, and

Using diff in this manner is a simple way for a writer to examine changes without reading the entire document. By redirecting diff output to a file, you can keep a record of changes made to any document. In fact, just that technique is used by both RCS and CVS (Section 39.4) to manage multiple revisions of source code and documents.

--DD, from Unix Text Processing (Hayden Books, 1987)

Chapter 11. Comparing Files

Contents:

11.1. Checking Differences with diff