What to Do in an Emergency (Running Linux, 4th Edition)

8.6. What to Do in an Emergency

It's not difficult to make a simple mistake as root that can cause real problems on your system, such as not being able to log in or losing important files. This is especially true for novice system administrators who are beginning to explore the system. Nearly all new system admins learn their lessons the hard way, by being forced to recover from a real emergency. In this section, we'll give you some hints about what to do when the inevitable happens.

You should always be aware of preventive measures that reduce the impact of such emergencies. For example, take backups of all important system files, if not the entire system. If you happen to have a Linux distribution on CD-ROM, the CD-ROM itself acts as a wonderful backup for most files (as long as you have a way to access the CD-ROM in a tight situation — more on this later). Backups are vital to recovering from many problems; don't let the many weeks of hard work configuring your Linux system go to waste.

Also, be sure to keep notes on your system configuration, such as your partition table entries, partition sizes and types, and filesystems. If you were to trash your partition table somehow, fixing the problem might be a simple matter of rerunning fdisk, but this helps only as long as you can remember what your partition table used to look like. (True story: one of the authors once created this problem by booting a blank floppy, and had no record of the partition table contents. Needless to say, some guesswork was necessary to restore the partition table to its previous state!)

Of course, for any of these measures to work, you'll need a way to boot the system and access your files, or recover from backups, in an emergency. This is best accomplished with an "emergency disk," or "root disk." Such a disk contains a small root filesystem with the basics required to run a Linux system from floppy — just the essential commands and system files, as well as tools to repair problems. You use such a disk by booting a kernel from another floppy (see Section 5.2.1 in Chapter 5) and telling the kernel to use the emergency disk as the root filesystem.

Most distributions of Linux include such a boot/root floppy combination as the original installation floppies. The installation disks usually contain a small Linux system that can be used to install the software as well as perform basic system maintenance. Some systems include both the kernel and root filesystem on one floppy, but this severely limits the number of files that can be stored on the emergency disk. How useful these disks are as a maintenance tool depends on whether they contain the tools (such as fsck, fdisk, a small editor such as vi, and so on) necessary for problem recovery. Some distributions have such an elaborate installation process that the installation floppies don't have room for much else.

At any rate, you can create such a root floppy yourself. Being able to do this from scratch requires an intimate knowledge of what's required to boot and use a Linux system, and exactly what can be trimmed down and cut out. For example, you could dispose of the startup programs init, getty, and login, as long as you know how to rig things so that the kernel starts a shell on the console instead of using a real boot procedure. (One way to do this is to have /etc/init be a symbolic link to /sbin/bash, all on the floppy filesystem.)

While we can't cover all the details here, the first step in creating an emergency floppy is to use mkfs to create a filesystem on a floppy (see the section Section 6.1.4 in Chapter 6). You then mount the floppy and place on it whatever files you'll need, including appropriate entries in /dev (most of which you can copy from /dev on your hard-drive root filesystem). You'll also need a boot floppy, which merely contains a kernel. The kernel should have its root device set to /dev/fd0, using rdev. This is covered in Section 5.2.1 in Chapter 5. You'll also have to decide whether you want the root floppy filesystem loaded into a ramdisk (which you can set using rdev as well). If you have more than 4 MB of RAM, this is a good idea because it can free up the floppy drive to be used for, say, mounting another floppy containing additional tools. If you have two floppy drives, you can do this without using a ramdisk.

If you feel that setting up an emergency floppy is too hard for you now after reading all this, you might also want to try some of the scripts available that do it for you (e.g., tomsrtbt at http://www.toms.net/rb/). But whatever you do, be sure to try the emergency floppy before disaster happens!

At any rate, the best place to start is your installation floppies. If those floppies don't contain all the tools you need, create a filesystem on a separate floppy and place the missing programs on it. If you load the root filesystem from floppy into a ramdisk, or have a second floppy drive, you can mount the other floppy to access your maintenance tools.

What tools do you need? In the following sections, we'll talk about common emergencies and how to recover from them; this should guide you as to what programs are required for various situations. It is best if the tools you put on that floppy are statically linked in order to avoid problems with shared libraries not being available at emergency time.

8.6.1. Repairing Filesystems

As discussed in Section 6.1.5 in Chapter 6, you can use fsck to recover from several kinds of filesystem corruption. Most of these filesystem problems are relatively minor, however, and can be repaired by booting your system in the usual way and running fsck from the hard drive. However, it is usually better to check and repair your root filesystem while it is unmounted. In this case, it's easier to run fsck from an emergency floppy.

There are no differences between running fsck from floppy and from the hard drive; the syntax is exactly the same as described earlier in the chapter. However, remember that fsck is usually a frontend to tools such as fsck.ext2. On other systems, you'll need to use e2fsck (for Second Extended filesystems).

It is possible to corrupt a filesystem so that it cannot be mounted. This is usually the result of damage to the filesystem's superblock, which stores information about the filesystem as a whole. If the superblock is corrupted, the system won't be able to access the filesystem at all, and any attempt to mount it will fail (probably with an error to the effect of "can't read superblock").

Because of the importance of the superblock, the filesystem keeps backup copies of it at intervals on the filesystem. Second Extended filesystems are divided into "block groups," where each group has, by default, 8192 blocks. Therefore, there are backup copies of the superblock at block offsets 8193, 16385 (that's 8192 × 2 + 1), 24577, and so on. If you use the ext2 filesystem, check that the filesystem has 8192-block groups with the following command:

dumpe2fs device | more

(Of course, this works only when the master superblock is intact.) This command will print a great deal of information about the filesystem, and you should see something like:

Blocks per group:         8192

If another offset is given, use it for computing offsets to the superblock copies, as mentioned earlier.

If you can't mount a filesystem because of superblock problems, chances are that fsck (or e2fsck) will fail as well. You can tell e2fsck to use one of the superblock copies, instead, to repair the filesystem. The command is:

e2fsck -f -b offset device

where offset is the block offset to a superblock copy; usually, this is 8193. The -f switch is used to force a check of the filesystem; when using superblock backups, the filesystem may appear "clean," in which case no check is needed. -f overrides this. For example, to repair the filesystem on /dev/hda2 with a bad superblock, we can say:

e2fsck -f -b 8193 /dev/hda2

Superblock copies save the day. The previous commands can be executed from an emergency floppy system and will hopefully allow you to mount your filesystems again.

Recently, so-called journalling filesystems have been introduced in most Linux distributions. Examples of these are the ext3 filesystem, the Reiser filesystem, and the jfs filesystem. These are less prone to filesystem corruption because they keep a log (the "journal") of all changes made. Chances are that with these filesystems, you will never need to use any of the techniques described here.

8.6.2. Accessing Damaged Files

You might need to access the files on your hard-drive filesystems when booting from an emergency floppy. In order to do this, simply use the mount command as described in Section 6.1.2 in Chapter 6, mounting your filesystems under a directory such as /mnt. (This directory must exist on the root filesystem contained on the floppy.) For example:

mount -t ext2 /dev/hda2 /mnt

will allow us to access the files on the Second Extended filesystem stored on /dev/hda2 in the directory /mnt. You can then access the files directly and even execute programs from your hard-drive filesystems. For example, if you wish to execute vi from the hard drive, normally found in /usr/bin/vi, you would use the command:

/mnt/usr/bin/vi filename

You could even place subdirectories of /mnt on your path to make this easier.

Be sure to unmount your hard-drive filesystems before rebooting the system. If your emergency disks don't have the ability to do a clean shutdown, unmount your filesystems explicitly with umount, to be safe.

Two problems that can arise when doing this are forgetting the root password or trashing the contents of /etc/passwd. In either case, it might be impossible to log in to the system or su to root. To repair this problem, simply boot from your emergency disks, mount your root filesystem under /mnt, and edit /mnt/etc/passwd. (It might be a good idea to keep a backup copy of this file somewhere in case you delete it accidentally.) For example, to clear the root password altogether, change the entry for root to:

root::0:0:The root of all evil:/:/bin/bash

Now root will have no password; you can reboot the system from the hard drive and use the passwd command to reset it.

If you are conscientious about system security, you might have shivered by now. You have read correctly: if somebody has physical access to your system, he or she can change your root password by using a boot floppy. Luckily, there are ways to protect your system against possible assaults. Most effective are, of course, the physical ones: if your computer is locked away, nobody can access it and put a boot floppy into it. There are also locks for the floppy drive only, but notice that you need such a protection for the CD-ROM drive as well for floppy-drive locks to be useful. If you don't want to use physical protection, you can also use the BIOS password if your computer supports that: configure the BIOS so that it does not try to boot from CD-ROM or floppy (even if a CD or floppy disk is inserted at boot time) and protect the BIOS settings with a BIOS password. This is not as secure because it is possible to reset the BIOS password with hardware means, but it still protects you against casual would-be intruders. Actually, of course, somebody could steal the whole computer.

Another common problem is corrupt links to shared system libraries. The shared library images in /lib are generally accessed through symbolic links, such as /lib/libc.so.5, which point to the actual library, /lib/libc.so.version. If this link is removed or is pointing to the wrong place, many commands on the system won't run. You can fix this problem by mounting your hard-drive filesystems and relinking the library with a command, such as:

cd /mnt/lib; ln -sf libc.so.5.4.47 libc.so.5

to force the libc.so.5 link to point to libc.so.5.4.47. Remember that symbolic links use the pathname given on the ln command line. For this reason, the command:

ln -sf /mnt/lib/libc.so.5.4.47 /mnt/lib/libc.so.5

won't do the right thing; libc.so.5 will point to /mnt/lib/libc.so.5.4.47. When you boot from the hard drive, /mnt/lib can't be accessed, and the library won't be located. The first command works because the symbolic link points to a file in the same directory.

8.6.3. Restoring Files from Backup

If you have deleted important system files, it might be necessary to restore backups while booting from an emergency disk. For this reason, it's important to be sure your emergency disk has the tools you need to restore backups; this includes programs such as tar and gzip, as well as the drivers necessary to access the backup device. For instance, if your backups are made using the floppy tape device driver, be sure that the ftape module and insmod command are available on your emergency disk. See Section 7.5 in Chapter 7 for more about this.

All that's required to restore backups to your hard-drive filesystems is to mount those filesystems, as described earlier, and unpack the contents of the archives over those filesystems (using the appropriate tar and gzip commands, for example; see Section 8.1 earlier in this chapter). Remember that every time you restore a backup you will be overwriting other system files; be sure you're doing everything correctly so that you don't make the situation worse. With most archiving programs, you can extract individual files from the archive.

Likewise, if you want to use your original CD-ROM to restore files, be sure the kernel used on your emergency disks has the drivers necessary to access the CD-ROM drive. You can then mount the CD-ROM (remember the mount flags -r -t iso9660) and copy files from there.

The filesystems on your emergency disks should also contain important system files; if you have deleted one of these from your system, it's easy to copy the lost file from the emergency disk to your hard-drive filesystem.