home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Running Linux, 4th Ed.Running Linux, 4th Ed.Search this book

3.3. Running into Trouble

Almost everyone runs into some kind of snag or hang-up when attempting to install Linux the first time. Most of the time, the problem is caused by a simple misunderstanding. Sometimes, however, it can be something more serious, such as an oversight by one of the developers or a bug.

This section will describe some of the most common installation problems and how to solve them. It also describes unexpected error messages that can pop up during installations that appear to be successful.

In general, the proper boot sequence is:

  1. After booting from the LILO prompt, the system must load the kernel image from floppy. This may take several seconds; you know things are going well if the floppy drive light is still on.

  2. While the kernel boots, SCSI devices must be probed for. If you have no SCSI devices installed, the system will "hang" for up to 15 seconds while the SCSI probe continues; this usually occurs after the line:

    lp_init: lp1 exists (0), using polling driver

    appears on your screen.

  3. After the kernel is finished booting, control is transferred to the system bootup files on the floppy. Finally, you will be presented with a login prompt, or be dropped into an installation program. If you are presented with a login prompt such as:

    Linux login:

    you should then log in (usually as root or install — this varies with each distribution). After you enter the username, the system may pause for 20 seconds or more while the installation program or shell is being loaded from floppy. Again, the floppy drive light should be on. Don't assume the system is hung.

3.3.1. Problems with Booting the Installation Medium

When attempting to boot the installation medium for the first time, you may encounter a number of problems. Note that the following problems are not related to booting your newly installed Linux system. See Section 3.3.4 for information on these kinds of pitfalls.

  • Floppy or medium error occurs when attempting to boot. The most popular cause for this kind of problem is a corrupt boot floppy. Either the floppy is physically damaged, in which case you should re-create the disk with a brand-new floppy, or the data on the floppy is bad, in which case you should verify that you downloaded and transferred the data to the floppy correctly. In many cases, simply re-creating the boot floppy will solve your problems. Retrace your steps and try again.

    If you received your boot floppy from a mail-order vendor or some other distributor, instead of downloading and creating it yourself, contact the distributor and ask for a new boot floppy — but only after verifying that this is indeed the problem. This can, of course, be difficult, but if you get funny noises from your floppy drive or messages like cannot read sector or similar, chances are that your medium is damaged.

  • System "hangs" during boot or after booting. After the installation medium boots, you see a number of messages from the kernel itself, indicating which devices were detected and configured. After this, you are usually presented with a login prompt, allowing you to proceed with installation (some distributions instead drop you right into an installation program of some kind). The system may appear to "hang" during several of these steps. Be patient; loading software from floppy is very slow. In many cases, the system has not hung at all, but is merely taking a long time. Verify that there is no drive or system activity for at least several minutes before assuming that the system is hung.

    Each activity listed at the beginning of this section may cause a delay that makes you think the system has stopped. However, it is possible that the system actually may "hang" while booting, which can be due to several causes. First of all, you may not have enough available RAM to boot the installation medium. (See the following item for information on disabling the ramdisk to free up memory.)

    Hardware incompatibility causes many system hangs. Even if your hardware is supported, you may run into problems with incompatible hardware configurations that are causing the system to hang. See Section 3.3.2, for a discussion of hardware incompatibilities. Section 10.2 in Chapter 10 lists the currently supported video chipsets, which are a major issue in running graphics on Linux.

  • System reports out-of-memory errors while attempting to boot or install the software. This problem relates to the amount of RAM you have available. Keep in mind that Linux itself requires at least 4 MB of RAM to run at all; almost all current distributions of Linux require 8 MB or more. On systems with 8 MB of RAM or less, you may run into trouble booting the installation medium or installing the software itself. This is because many distributions use a ramdisk, which is a filesystem loaded directly into RAM, for operations while using the installation medium. The entire image of the installation boot floppy, for example, may be loaded into a ramdisk, which may require more than 1 MB of RAM.

    The solution to this problem is to disable the ramdisk option when booting the install medium. Each distribution has a different procedure for doing this. Please see your distribution documentation for more information.

    You may not see an out-of-memory error when attempting to boot or install the software; instead, the system may unexpectedly hang or fail to boot. If your system hangs, and none of the explanations in the previous section seems to be the cause, try disabling the ramdisk.

  • The system reports an error, such as "Permission denied" or "File not found," while booting. This is an indication that your installation boot medium is corrupt. If you attempt to boot from the installation medium (and you're sure you're doing everything correctly), you should not see any such errors. Contact the distributor of your Linux software and find out about the problem, and perhaps obtain another copy of the boot medium if necessary. If you downloaded the boot disk yourself, try re-creating the boot disk, and see if this solves your problem.

  • The system reports the error "VFS: Unable to mount root" when booting. This error message means that the root filesystem (found on the boot medium itself) could not be found. This means that either your boot medium is corrupt or you are not booting the system correctly.

    For example, many CD-ROM distributions require you to have the CD-ROM in the drive when booting. Also be sure that the CD-ROM drive is on, and check for any activity. It's also possible the system is not locating your CD-ROM drive at boot time; see Section 3.3.2, for more information.

    If you're sure you are booting the system correctly, your boot medium may indeed be corrupt. This is an uncommon problem, so try other solutions before attempting to use another boot floppy or tape. One handy feature here is RedHat's new mediacheck option on the CD-ROM. This will check if the CD is OK.

3.3.2. Hardware Problems

The most common problem encountered when attempting to install or use Linux is an incompatibility with hardware. Even if all your hardware is supported by Linux, a misconfiguration or hardware conflict can sometimes cause strange results: your devices may not be detected at boot time, or the system may hang.

It is important to isolate these hardware problems if you suspect they may be the source of your trouble. In the following sections, we describe some common hardware problems and how to resolve them.

3.3.2.1. Isolating hardware problems

If you experience a problem you believe is hardware-related, the first thing to do is attempt to isolate the problem. This means eliminating all possible variables and (usually) taking the system apart, piece by piece, until the offending piece of hardware is isolated.

This is not as frightening as it may sound. Basically, you should remove all nonessential hardware from your system (after turning the power off), and then determine which device is actually causing the trouble — possibly by reinserting each device, one at a time. This means you should remove all hardware other than the floppy and video controllers, and, of course, the keyboard. Even innocent-looking devices, such as mouse controllers, can wreak unknown havoc on your peace of mind unless you consider them nonessential. So, to be sure, really remove everything that you don't absolutely need for booting when experimenting, and add the devices one by one later when reassembling the system.

For example, let's say the system hangs during the Ethernet board detection sequence at boot time. You might hypothesize that there is a conflict or problem with the Ethernet board in your machine. The quick and easy way to find out is to pull the Ethernet board and try booting again. If everything goes well when you reboot, you know that either the Ethernet board is not supported by Linux, or there is an address or IRQ conflict with the board. In addition, some badly designed network boards (mostly ISA-based NE2000 clones, which are luckily dying out by now) can hang the entire system when they are auto-probed. If this appears to be the case for you, your best bet is to remove the network board from the system during the installation and put it back in later, or pass the appropriate kernel parameters during boot-up so that auto-probing of the network board can be avoided. The most permanent fix is to dump that card and get a new one from another vendor that designs its hardware more carefully.

What does "Address or IRQ conflict?" mean, you may ask. All devices in your machine use an interrupt request line, or IRQ, to tell the system they need something done on their behalf. You can think of the IRQ as a cord the device tugs when it needs the system to take care of some pending request. If more than one device is tugging on the same cord, the kernel won't be able to determine which device it needs to service. Instant mayhem.

Therefore, be sure all your installed non-PCI devices are using unique IRQ lines. In general, the IRQ for a device can be set by jumpers on the card; see the documentation for the particular device for details. Some devices do not require an IRQ at all, but it is suggested you configure them to use one if possible (the Seagate ST01 and ST02 SCSI controllers are good examples). The PCI bus is more cleverly designed, and PCI devices can and do quite happily share interrupt lines.

In some cases, the kernel provided on your installation medium is configured to use a certain IRQ for certain devices. For example, on some distributions of Linux, the kernel is preconfigured to use IRQ 5 for the TMC-950 SCSI controller, the Mitsumi CD-ROM controller, and the busmouse driver. If you want to use two or more of these devices, you'll need first to install Linux with only one of these devices enabled, then recompile the kernel in order to change the default IRQ for one of them. (See Section 7.4 in Chapter 7 for information on recompiling the kernel.)

Another area where hardware conflicts can arise is with DMA channels, I/O addresses, and shared memory addresses. All these terms describe mechanisms through which the system interfaces with hardware devices. Some Ethernet boards, for example, use a shared memory address as well as an IRQ to interface with the system. If any of these are in conflict with other devices, the system may behave unexpectedly. You should be able to change the DMA channel, I/O, or shared memory addresses for your various devices with jumper settings. (Unfortunately, some devices don't allow you to change these settings.)

The documentation for your various hardware devices should specify the IRQ, DMA channel, I/O address, or shared memory address the devices use, and how to configure them. Of course, a problem here is that some of these settings are not known before the system is assembled and may thus be undocumented. Again, the simple way to get around these problems is to temporarily disable the conflicting devices until you have time to determine the cause of the problem.

Table 3-2 is a list of IRQ and DMA channels used by various "standard" devices found on most systems. Almost all systems have some of these devices, so you should avoid setting the IRQ or DMA of other devices to these values.

Table 3-2. Common device settings

Device

I/O address

IRQ

DMA

ttyS0 (COM1)

3f8

4

n/a

ttyS1 (COM2)

2f8

3

n/a

ttyS2 (COM3)

3e8

4

n/a

ttyS3 (COM4)

2e8

3

n/a

lp0 (LPT1)

378 - 37f

7

n/a

lp1 (LPT2)

278 - 27f

5

n/a

fd0, fd1 (floppies 1 and 2)

3f0 - 3f7

6

2

fd2, fd3 (floppies 3 and 4)

370 - 377

10

3

3.3.2.2. Problems recognizing hard drive or controller

When Linux boots, you see a series of messages on your screen, such as the following:

Console: colour VGA+ 80x25 
Floppy drive(s): fd0 is 1.44M 
ttyS00 at 0x03f8 (irq = 4) is a 16550A
...

Here, the kernel is detecting the various hardware devices present on your system. At some point, you should see the line:

Partition check:

followed by a list of recognized partitions, for example:

Partition check: 
  hda: hda1 hda2 
  hdb: hdb1 hdb2 hdb3

If, for some reason, your drives or partitions are not recognized, you will not be able to access them in any way.

Several conditions can cause this to happen:

Hard drive or controller not supported
If you are using a hard drive or controller (IDE, SCSI, or otherwise) not supported by Linux, the kernel will not recognize your partitions at boot time.

Drive or controller improperly configured
Even if your controller is supported by Linux, it may not be configured correctly. (This is a problem particularly for SCSI controllers; most non-SCSI controllers should work fine without additional configuration.)

Refer to the documentation for your hard drive and controller for information on solving these kinds of problems. In particular, many hard drives will need to have a jumper set if they are to be used as a "slave" drive (e.g., as the second hard drive). The acid test for this kind of condition is to boot up Windows or some other operating system known to work with your drive and controller. If you can access the drive and controller from another operating system, the problem is not with your hardware configuration.

See the previous section, Section 3.3.2.1, for information on resolving possible device conflicts and the following section, Section 3.3.2.3, for information on configuring SCSI devices.

Controller properly configured, but not detected
Some BIOS-less SCSI controllers require the user to specify information about the controller at boot time. The following section, Section 3.3.2.3, describes how to force hardware detection for these controllers.

Hard-drive geometry not recognized
Some older systems, such as the IBM PS/ValuePoint, do not store hard-drive geometry information in the CMOS memory where Linux expects to find it. Also, certain SCSI controllers need to be told where to find drive geometry in order for Linux to recognize the layout of your drive.

Most distributions provide a boot option to specify the drive geometry. In general, when booting the installation medium, you can specify the drive geometry at the LILO boot prompt with a command such as:

boot: linux hd=cylinders,heads,sectors

where cylinders, heads, and sectors correspond to the number of cylinders, heads, and sectors per track for your hard drive.

After installing the Linux software, you can install LILO, allowing you to boot from the hard drive. At that time, you can specify the drive geometry to the LILO installation procedure, making it unnecessary to enter the drive geometry each time you boot. See Section 5.2.2 in Chapter 5 for more about LILO.

3.3.2.3. Problems with SCSI controllers and devices

Presented here are some of the most common problems with SCSI controllers and devices, such as CD-ROMs, hard drives, and tape drives. If you are having problems getting Linux to recognize your drive or controller, read on. Let us again emphasize that most distributions use a modularized kernel and that you might have to load a module supporting your hardware during an early phase of the installation process. This might also be done automatically for you.

The Linux SCSI HOWTO contains much useful information on SCSI devices in addition to that listed here. SCSIs can be particularly tricky to configure at times.

It might be a false economy, for example, to use cheap cables, especially if you use wide SCSI. Cheap cables are a major source of problems and can cause all kinds of failures, as well as major headaches. If you use SCSI, use proper cabling.

Here are common problems and possible solutions:

If your SCSI controller is not recognized, you may need to force hardware detection at boot time. This is particularly important for SCSI controllers without BIOS. Most distributions allow you to specify the controller IRQ and shared memory address when booting the installation medium. For example, if you are using a TMC-8xx controller, you may be able to enter:

boot: linux tmx8xx=interrupt,memory-address

at the LILO boot prompt, where interrupt is the controller IRQ, and memory-address is the shared memory address. Whether you can do this depends on the distribution of Linux you are using; consult your documentation for details.

3.3.3. Problems Installing the Software

Installing the Linux software should be trouble-free if you're lucky. The only problems you might experience would be related to corrupt installation media or lack of space on your Linux filesystems. Here is a list of common problems:

If you have other strange errors when installing Linux (especially if you downloaded the software yourself), be sure you actually obtained all the necessary files when downloading.

For example, some people use the FTP command:

mget *.*

when downloading the Linux software via FTP. This will download only those files that contain a "." in their filenames; files without the "." will not be downloaded. The correct command to use in this case is:

mget *

The best advice is to retrace your steps when something goes wrong. You may think that you have done everything correctly, when in fact you forgot a small but important step somewhere along the way. In many cases, just attempting to redownload or reinstall the Linux software can solve the problem. Don't beat your head against the wall any longer than you have to!

Also, if Linux unexpectedly hangs during installation, there may be a hardware problem of some kind. See Section 3.3.2 for hints.

3.3.4. Problems after Installing Linux

You've spent an entire afternoon installing Linux. In order to make space for it, you wiped your Windows and OS/2 partitions and tearfully deleted your copies of SimCity 2000 and Railroad Tycoon 2. You reboot the system and nothing happens. Or, even worse, something happens, but it's not what should happen. What do you do?

In Section 3.3.1, earlier in this chapter, we covered the most common problems that can occur when booting the Linux installation medium; many of those problems may apply here. In addition, you may be victim to one of the following maladies.

3.3.4.2. Problems booting Linux from the hard drive

If you opted to install LILO instead of creating a boot floppy, you should be able to boot Linux from the hard drive. However, the automated LILO installation procedure used by many distributions is not always perfect. It may make incorrect assumptions about your partition layout, in which case you need to reinstall LILO to get everything right. Installing LILO is covered in Section 5.2.2 in Chapter 5.

Here are some common problems:

  • System reports "Drive not bootable-Please insert system disk". You will get this error message if the hard drive's master boot record is corrupt in some way. In most cases, it's harmless, and everything else on your drive is still intact. There are several ways around this:

    • While partitioning your drive using fdisk, you may have deleted the partition that was marked as "active." Windows and other operating systems attempt to boot the "active" partition at boot time (Linux, in general, pays no attention to whether the partition is "active," but the Master Boot Records installed by some distributions like Debian do). You may be able to boot MS-DOS from floppy and run fdisk to set the active flag on your MS-DOS partition, and all will be well.

      Another command to try (with MS-DOS 5.0 and higher, including all Windows versions since Windows 95) is:

      FDISK /MBR

      This command will attempt to rebuild the hard drive master boot record for booting Windows, overwriting LILO. If you no longer have Windows on your hard drive, you'll need to boot Linux from floppy and attempt to install LILO later.

    • If you created a Windows partition using Linux's version of fdisk, or vice versa, you may get this error. You should create Windows partitions only by using Windows' version of fdisk. (The same applies to operating systems other than Windows.) The best solution here is either to start from scratch and repartition the drive correctly, or to merely delete and re-create the offending partitions using the correct version of fdisk.

    • The LILO installation procedure may have failed. In this case, you should boot either from your Linux boot floppy (if you have one), or from the original installation medium. Either of these should provide options for specifying the Linux root partition to use when booting. At boot time, hold down the Shift or Ctrl key and press Tab from the boot menu for a list of options.

  • When you boot the system from the hard drive, Windows (or another operating system) starts instead of Linux. First of all, be sure you actually installed LILO when installing the Linux software. If not, the system will still boot Windows (or whatever other operating system you may have) when you attempt to boot from the hard drive. In order to boot Linux from the hard drive, you need to install LILO (see the section Section 5.2.2 in Chapter 5).

    On the other hand, if you did install LILO, and another operating system boots instead of Linux, you have LILO configured to boot that other operating system by default. While the system is booting, hold down the Shift or Ctrl key and press Tab at the boot prompt. This should present you with a list of possible operating systems to boot; select the appropriate option (usually just linux) to boot Linux.

    If you wish to select Linux as the default operating system to boot, you will need to reinstall LILO.

    It also may be possible that you attempted to install LILO, but the installation procedure failed in some way. See the previous item on installation.

3.3.4.4. Problems using the system

If login is successful, you should be presented with a shell prompt (such as # or $) and can happily roam around your system. The next step in this case is to try the procedures in Chapter 4. However, some initial problems with using the system sometimes creep up.

The most common initial configuration problem is incorrect file or directory permissions. This can cause the error message:

Shell-init: permission denied

to be printed after logging in. (In fact, anytime you see the message permission denied, you can be fairly certain it is a problem with file permissions.)

In many cases, it's a simple matter of using the chmod command to fix the permissions of the appropriate files or directories. For example, some distributions of Linux once used the incorrect file mode 0644 for the root directory ( / ). The fix was to issue the command:

# chmod 755 /

as root. (File permissions are covered in the section Section 4.13 in Chapter 4.) However, in order to issue this command, you needed to boot from the installation medium and mount your Linux root filesystem by hand — a hairy task for most newcomers.

As you use the system, you may run into places where file and directory permissions are incorrect, or software does not work as configured. Welcome to the world of Linux! While most distributions are quite trouble-free, you can't expect them to be perfect. We don't want to cover all those problems here. Instead, throughout the book we help you to solve many of these configuration problems by teaching you how to find them and fix them yourself. In Chapter 1, we discussed this philosophy in some detail. In Chapter 5, we give hints for fixing many of these common configuration problems.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.