A common use for directing output to files is to split up a large file
into a number of smaller files. Although UNIX provides
utilities, split and csplit,
that do a similar job, they do not have the ability to give the new
file a useful filename.
Similarly, sed can be used to write to a
file, but you must specify a fixed filename. With awk, you can use a
variable to specify the filename and pick up the value from a pattern
in the file. For instance, if $1 provided a string that could be used
as a filename, you could write a script to output each record to its
own file:
print $0 > $1
You should perhaps test the filename, either to determine its length
or to look for characters that cannot be used in a filename.
If you don't close your files, such a program would eventually run out
of available open files, and have to give up. The example we are
going to look at works because it uses the
close() function so that you will not run
into any open-file limitations.
The following script was used to split up a large file containing
dozens of manpages. Each manual page began by setting a number
register and ended with a blank line:
.nr X 0
(Although they used the -man macros for the most
part, the beginning of a manpage was strangely coded, making things a
little harder.) The line that provides the filename looks like this:
.if \nX=0 .ds x} XDrawLine "" "Xlib - Drawing Primitives"
The fifth field on this line, "XDrawLine," contains the filename.
Perhaps the only difficulty in writing the script is that the first
line is not the one that provides the filename. Therefore, we collect
the lines in an array until we get a filename. Once we get the
filename, we output the array, and from that point on we simply write
each input line to the new file. Here's the
man.split script:
# man.split -- split up a file containing X manpages.
BEGIN { file = 0; i = 0; filename = "" }
# First line of new manpage is ".nr X 0"
# Last line is blank
/^\.nr X 0/,/^$/ {
# this conditional collects lines until we get a filename.
if (file == 0)
line[++i] = $0
else
print $0 > filename
# this matches the line that gives us the filename
if ($4 == "x}") {
# now we have a filename
filename = $5
file = 1
# output name to screen
print filename
# print any lines collected
for (x = 1; x <= i; ++x){
print line[x] > filename
}
i = 0
}
# close up and clean up for next one
if ($0 ~ /^$/) {
close(filename)
filename = ""
file = 0
i = 0
}
}
As you can see, we use the variable file as a flag
to convey whether or not we have a valid filename and can write to the
file. Initially, file is 0, and the current input
line is stored in an array. The variable i is a
counter used to index the array. When we encounter the line that sets
the filename, then we set file to 1. The name of
the new file is printed to the screen so that the user can get some
feedback on the progress of the script. Then we loop through the
array and output it to the new file. When the next input line is
read, file will be set to 1 and the
print statement will output it to the named file.