0.2. Availability of sed and awk
Sed and awk were part of Version 7 UNIX (also known as "V7," and
"Seventh Edition") and have been part of the standard distribution
ever since. Sed has been unchanged since it was introduced.
The Free Software Foundation GNU project's version of sed is freely
available, although not technically in the public domain. Source code for
GNU sed is available via anonymous FTP[1]
to the
host ftp.gnu.ai.mit.edu. It is in the
file ftp://ftp.gnu.ai.mit.edu/pub/gnu/sed-2.05.tar.gz. This is a tar
file compressed with the gzip program, whose source
code is available in the same directory. There are many sites
world-wide that "mirror" the files from the main GNU distribution
site; if you know of one close to you, you should get the files from
there. Be sure to use "binary" or "image" mode to transfer the
file(s).
In 1985, the authors of awk extended the language, adding many useful
features. Unfortunately, this new version remained inside AT&T
for several years. It became part of UNIX System V as of Release 3.1.
It can be found under the name of nawk, for new awk; the older version
still exists under its original name. This is still the case on
System V Release 4 systems.
On commercial UNIX systems, such as those from Hewlett-Packard, Sun,
IBM, Digital, and others, the naming situation is more complicated.
All of these
systems have some version of both old and new awk, but what each
vendor names each program varies. Some have oawk
and awk, others have awk and
nawk. The best advice we can give is to check your
local documentation.[2]
Throughout this book, we use the term awk to
describe POSIX awk. Specific implementations will be referred to by
name, such as "gawk," or "the Bell Labs awk."
Chapter 11, "A Flock of awks" discusses three freely available
awks (including where to get them), as well as several commercial ones.
NOTE:
Since the first edition of this book, the awk language was standardized
as part of the POSIX Command Language and Utilities Standard (P1003.2).
All modern awk implementations aim to be upwardly compatible with the
POSIX standard.
The standard incorporates features that originated in both new awk and gawk.
In this book, you can assume that what is true for one implementation of
POSIX awk is true for another, unless a particular version is designated.
0.2.1. DOS Versions
Gawk, mawk, and GNU sed have been ported to DOS. There are files on
the main GNU distribution site with pointers to DOS versions of these
programs. In addition, gawk has been ported to OS/2, VMS, and Atari
and Amiga microcomputers, with ports to other systems (Macintosh,
Windows) in progress.
egrep, sed, and awk are
available for MS-DOS-based machines as part of the MKS Toolkit
(Mortice Kern Systems, Inc., Ontario, Canada). Their implementation
of awk supports the features of POSIX awk.
The MKS Toolkit also
includes the Korn shell, which means that many shell scripts written
for the Bourne shell on UNIX systems can be run on a PC. While most
users of the MKS Toolkit have probably already discovered these tools
in UNIX, we hope that the benefits of these programs will be obvious
to PC users who have not ventured into UNIX.
Thompson Automation Software[3]
has an awk compiler for UNIX, DOS, and Microsoft Windows. This
version is interesting because it has a number of extensions to the
language, and it includes an awk debugger, written in awk!
We have used a PC on occasion because Ventura
Publisher is a terrific formatting package. One of the reasons we
like it is that we can continue to use vi to create
and edit the text files and use sed for writing editing scripts. We
have used sed to write conversion programs that translate
troff macros into Ventura stylesheet tags.
We have also used it to insert tags in batch mode. This can save
having to manually tag repeated elements in a file.
Sed and awk are also useful for writing conversion programs that handle different file
formats.
0.2.2. Other Sources of Information About sed and awk
For a long time, the main source of information on these utilities was
two articles contained in Volume 2 of the UNIX Programmer's
Guide. The article awk--A Pattern Scanning
and Processing Language (September 1, 1978) was written by
the language's three authors. In 10 pages, it offers a brief
tutorial and discusses several design and implementation issues. The
article SED--A Non-Interactive Text Editor
(August 15, 1978) was written by Lee E. McMahon. It is a reference
that gives a full description of each function and includes some
useful examples (using Coleridge's Xanadu as
sample input).
In trade books, the most significant treatment of sed and awk appears
in The UNIX Programming Environment by Brian
W. Kernighan and Rob Pike (Prentice-Hall, 1984). The chapter entitled
"Filters" not only explains how these programs work but shows how they
can work together to build useful applications.
The authors of awk collaborated on a book describing the enhanced
version: The AWK Programming Language
(Addison-Wesley, 1988). It contains many full examples and
demonstrates the broad range of areas where awk can be applied. It
follows in the style of the UNIX Programming
Environment, which at times makes it too dense for some
readers who are new users. The source code for the example programs
in the book can be found in the directory
ftp://netlib.bell-labs.com/netlib/research/awkbookcode on netlib.bell-labs.com.
The IEEE Standard for Information and Technology Portable Operating
System Interface (POSIX) Part 2: Shell and Utilities (Standard
1003.2-1992)[4]
describes both sed and awk.[5]
It is the "official" word on the features available for portable shell
programs that use sed and awk. Since awk is a programming language in
its own right, it is also the official word on portable awk programs.
In 1996, the Free Software Foundation published The GNU Awk
User's Guide, by Arnold Robbins. This is the documentation
for gawk, written in a more tutorial style than the Aho, Kernighan, and
Weinberger book. It has two full chapters of examples, and covers
POSIX awk. This book is also published by SSC under the title
Effective AWK Programming, and the Texinfo source
for the book comes with the gawk distribution.
It is one of the current deficiencies of GNU sed that it has no documentation
of its own, not even a manpage.
Most general introductions to UNIX introduce sed and awk in a long
parade of utilities. Of these books, Henry McGilton and Rachel
Morgan's Introducing the UNIX System offers the
best treatment of basic editing skills, including use of all UNIX text
editors.
UNIX Text Processing (Hayden Books, 1987), by the
original author of this handbook and Tim O'Reilly, covers sed and awk
in full, although we did not include the new version of awk. Readers
of that book will find some parts duplicated in this book, but in general a
different approach has been taken here. Whereas in the textbook we
treat sed and awk separately, expecting only advanced users to
tackle awk, here we try to present both programs in relation to one
another. They are different tools that can be used individually or
together to provide interesting opportunities for text processing.
Finally, in 1995 the Usenet newsgroup
comp.lang.awk came into being. If you
can't find what you need to know in one of the above books, you can
post a question in the newsgroup, with a good chance that someone will
be able to help you.
The newsgroup also has a "frequently asked questions" (FAQ) article
that is posted regularly. Besides answering questions about awk, the
FAQ lists many sites where you can obtain binaries of different
versions of awk for different systems. You can retrieve the FAQ via
FTP in the file called ftp://rtfm.mit.edu/pub/usenet/comp.lang.awk/faq from the host rtfm.mit.edu.
 |  |  | Preface |  | 0.3. Obtaining Example Source Code |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|