35.9 Splitting Files at Fixed Points: splitMost versions of UNIX come with a program called split whose purpose is to split large files into smaller files for tasks such as editing them in an editor that cannot handle large files, or mailing them if they are so big that some mailers will refuse to deal with them. For example, let's say you have a really big text file that you want to mail to someone: % Running split on that file will (by default, with most versions of split ) break it up into pieces that are each no more than 1000 lines long:
Note the default naming scheme, which is to append "aa," "ab," "ac," etc., to the letter "x" for each subsequent filename. It is possible to modify the default behavior. For example, you can make it create files that are 1500 lines long instead of 1000: % You can also get it to use a name prefix other than "x": % Although the simple behavior described above tends to be relatively universal, there are differences in the functionality of split on different UNIX systems. There are four basic variants of split as shipped with various implementations of UNIX:
The only way to tell which version you've got is to read the manual page for it on your system, which will also tell you the exact syntax for using it. The problem with the third variant is that although it tries to be smart and automatically do the right thing with both text and non-text files, it sometimes guesses wrong and splits a text file as a non-text file or vice versa, with completely unsatisfactory results. Therefore, if the variant on your system is (3), you probably want to get your hands on one of the many split clones out there that is closer to one of the other variants (see below). Variants (1) and (2) listed above are OK as far as they go, but they aren't adequate if your environment provides only one of them rather than both. If you find yourself needing to split a non-text file when you have only a text split , or needing to split a text file when you have only bsplit , you need to get one of the clones that will perform the function you need. Variant (4) is the most reliable and versatile of the four listed, and is therefore what you should go with if you find it necessary to get a clone and install it on your system. There are several such clones in the various source archives, including the freely available BSD UNIX version. Alternatively, if you have installed perl (37.1 ) , it is quite easy to write a simple split clone in perl , and you don't have to worry about compiling a C program to do it; this is an especially significant advantage if you need to run your split on multiple architectures that would need separate binaries. If you need to split a non-text file and don't feel like going to all of the trouble of finding a split clone that handles them, one standard UNIX tool you can use to do the splitting is dd (35.6 ) . For example, if bigfile above were a non-text file and you wanted to split it into 20,000-byte pieces, you could do something like this:
- |
|