24.18 Huge Files Might Not Take a Lot of Disk SpaceIf you're doing filesystem cleanup, you use ls -l , and see a file with ten million bytes... "Yipes!" you say, "That must be eating a lot of disk space!" But if you remove the file, df ( 24.9 ) shows almost no difference in disk space. Why? It could be a sparse file , a file with a lot more NUL characters in it than anything else (that's a general definition, but it's basically correct). The command ls -ls ( 17.14 ) will show you sparse files; the disk usage in the first column will be relatively much smaller than the character count:
% Programs that use dbm ( d ata b ase m anagement subroutines) often create sparse files because dbm uses file location as part of its hashing and tries to spread out entries in the database file so there is lots of blank space between them. Many UNIX filesystems (although not all - the Andrew File System, for example does not) support the ability to greatly reduce the amount of space taken up by a file that is mostly NULs by not really storing the file blocks that are filled with NULs. Instead, the OS keeps track of how many blocks of NULs there are between each block that has something other than NULs in it, and feeds NULs to anybody who tries to read the file, even though they're not really being read off a disk. You can create a sparse file in C by using fopen (3) to open a file and fseek (3) to move the file pointer far past the end of the file without writing anything. The file up to where you fseek will contain NULs, and the kernel (probably) won't save all of those NULs to disk. By the way, sparse files can be a problem to copy. The kernel isn't smart enough to figure out you're feeding it a sparse file if you actually feed it the NULs. Therefore, standard file copying programs like cp that just read the file in and write it out in a different location lose, because they end up creating a file that really does take up as much as space physically as there are NULs in the abstract file object. Then your disk space might really be in trouble. [Some operating systems have a cp -z option to solve this problem. -TC ] - , |
|