24.8 Save Space: tar and compress a Directory TreeIn the UNIX filesystem, files are stored in blocks ( 52.9 ) . Each nonempty file, no matter how small, takes at least one block. [2] A directory tree full of little files can fill up a lot of partly empty blocks. A big file is more efficient because it fills all (except possibly the last) of its blocks completely.
The tar ( 19.5 ) command can read lots of little files and put them into one big file. Later, when you need one of the little files, you can extract it from the tar archive. Seems like a good space-saving idea, doesn't it? But tar , which was really designed for magnetic t ape ar chives, adds "garbage" characters at the end of each file to make it an even size. So, a big tar archive uses about as many blocks as the separate little files do. Okay, then why am I writing this article? Because the gzip ( 24.7 ) utility can solve the problems. It squeezes files down - especially, compressing gets rid of repeated characters. Compressing a tar archive typically saves 50 percent or more.
Making a compressed archive of a directory and all of its subdirectories is easy:
tar
copies the whole tree when you give it the top directory name.
Just be sure to save the archive in some directory that won't be
copied - so
tar
won't try to archive its own archive!
I usually put the archive in the parent directory.
For example, to archive my directory named
project
, I'd use the
commands below.
If you work on a system that has 14-character filename length limits, be
sure that the archive filename (here,
The tar l (lowercase letter L) option will print messages if any of the files you're archiving have other hard links ( 18.4 ) . If a lot of your files have other links, archiving the directory may not save much disk space - the other links will keep those files on the disk, even after your rm -r command. Any time you want a list of the files in the archive, use tar t or tar tv :
To extract all the files from the archive, type:
% Of course, you don't have to extract the files into a directory named project . You can read the archive file from other directories, move it to other computers, and so on. You can also extract just a few files and/or directories from the archive. Be sure to use exactly the name shown by the tar t command above. For instance, to restore the old subdirectory named project/io (and everything that was in it), you'd type:
% - |
|