home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam    

Book HomeRunning LinuxSearch this book

3.3. Running Into Trouble

Almost everyone runs into some kind of snag or hangup when attempting to install Linux the first time. Most of the time, the problem is caused by a simple misunderstanding. Sometimes, however, it can be something more serious, such as an oversight by one of the developers or a bug.

This section will describe some of the most common installation problems and how to solve them. If your installation appears to be successful, but you received unexpected error messages during the installation, these are described here as well.

3.3.1. Problems with Booting the Installation Media

When attempting to boot the installation media for the first time, you may encounter a number of problems. Note that the following problems are not related to booting your newly installed Linux system. See the section "Section 3.3.4, "Problems After Installing Linux"" for information on these kinds of pitfalls.

Floppy or media error occurs when attempting to boot.

The most popular cause for this kind of problem is a corrupt boot floppy. Either the floppy is physically damaged, in which case you should recreate the disk with a brand new floppy, or the data on the floppy is bad, in which case you should verify that you downloaded and transferred the data to the floppy correctly. In many cases, simply recreating the boot floppy will solve your problems. Retrace your steps and try again.

If you received your boot floppy from a mail-order vendor or some other distributor, instead of downloading and creating it yourself, contact the distributor and ask for a new boot floppy--but only after verifying that this is indeed the problem.

System "hangs" during boot or after booting.

After the installation media boots, you see a number of messages from the kernel itself, indicating which devices were detected and configured. After this, you are usually presented with a login prompt, allowing you to proceed with installation (some distributions instead drop you right into an installation program of some kind). The system may appear to "hang" during several of these steps. Be patient; loading software from floppy is very slow. In many cases, the system has not hung at all, but is merely taking a long time. Verify that there is no drive or system activity for at least several minutes before assuming that system is hung.

The proper boot sequence is:

  1. After booting from the LILO prompt, the system must load the kernel image from floppy. This may take several seconds; you know things are going well if the floppy drive light is still on.

  2. While the kernel boots, SCSI devices must be probed for. If you do not have any SCSI devices installed, the system will "hang" for up to 15 seconds while the SCSI probe continues; this usually occurs after the line:

    lp_init: lp1 exists (0), using polling driver
    appears on your screen.

  3. After the kernel is finished booting, control is transferred to the system bootup files on the floppy. Finally, you will be presented with a login prompt, or be dropped into an installation program. If you are presented with a login prompt such as:

    Linux login:
    you should then log in (usually as root or install--this varies with each distribution). After entering the username, the system may pause for 20 seconds or more while the installation program or shell is being loaded from floppy. Again, the floppy drive light should be on. Don't assume the system is hung.

Each of the preceding activities may cause a delay that makes you think the system has stopped. However, it is possible that the system actually may "hang" while booting, which can be due to several causes. First of all, you may not have enough available RAM to boot the installation media. (See the following item for information on disabling the ramdisk to free up memory.)

Hardware incompatibility causes many system hangs. The section "Section 1.9, "Hardware Requirements"" in Chapter 1, "Introduction to Linux", presents an overview of supported hardware under Linux. Even if your hardware is supported, you may run into problems with incompatible hardware configurations that are causing the system to hang. See the next section, "Section 3.3.2, "Hardware Problems"," for a discussion of hardware incompatibilities:

System reports out-of-memory errors while attempting to boot or install the software.

This problem relates to the amount of RAM you have available. On systems with 4 MB of RAM or less, you may run into trouble booting the installation media or installing the software itself. This is because many distributions use a "ramdisk," which is a filesystem loaded directly into RAM, for operations while using the installation media. The entire image of the installation boot floppy, for example, may be loaded into a ramdisk, which may require more than 1 MB of RAM.

The solution to this problem is to disable the ramdisk option when booting the install media. Each release has a different procedure for doing this; on the SLS release, for example, you type floppy at the LILO prompt when booting the a1 disk. See your distribution's documentation for details.

You may not see an "out of memory" error when attempting to boot or install the software; instead, the system may unexpectedly hang or fail to boot. If your system hangs, and none of the explanations in the previous section seem to be the cause, try disabling the ramdisk.

Keep in mind that Linux itself requires at least 4 MB of RAM to run at all; almost all current distributions of Linux require 8 MB or more.

The system reports an error, such as "Permission denied" or "File not found" while booting.

This is an indication that your installation boot media is corrupt. If you attempt to boot from the installation media (and you're sure you're doing everything correctly), you should not see any errors such as this. Contact the distributor of your Linux software and find out about the problem, and perhaps obtain another copy of the boot media if necessary. If you downloaded the boot disk yourself, try recreating the boot disk, and see if this solves your problem.

The system reports the error "VFS: Unable to mount root" when booting.

This error message means that the root filesystem (found on the boot media itself) could not be found. This means that either your boot media is corrupt or that you are not booting the system correctly.

For example, many CD-ROM distributions require you to have the CD-ROM in the drive when booting. Also be sure that the CD-ROM drive is on, and check for any activity. It's also possible the system is not locating your CD-ROM drive at boot time; see the next section, "Section 3.3.2, "Hardware Problems"," for more information.

If you're sure you are booting the system correctly, then your boot media may indeed be corrupt. This is an uncommon problem, so try other solutions before attempting to use another boot floppy or tape.

3.3.2. Hardware Problems

The most common problem encountered when attempting to install or use Linux is an incompatibility with hardware. Even if all your hardware is supported by Linux, a misconfiguration or hardware conflict can sometimes cause strange results: your devices may not be detected at boot time, or the system may hang.

It is important to isolate these hardware problems if you suspect they may be the source of your trouble. In the following sections, we describe some common hardware problems and how to resolve them.

3.3.2.1. Isolating hardware problems

If you experience a problem you believe is hardware-related, the first thing to do is attempt to isolate the problem. This means eliminating all possible variables and (usually) taking the system apart, piece-by-piece, until the offending piece of hardware is isolated.

This is not as frightening as it may sound. Basically, you should remove all nonessential hardware from your system (after turning the power off), and then determine which device is actually causing the trouble--possibly by reinserting each device, one at a time. This means you should remove all hardware other than the floppy and video controllers, and, of course, the keyboard. Even innocent-looking devices, such as mouse controllers, can wreak unknown havoc on your peace of mind unless you consider them nonessential.

For example, let's say the system hangs during the Ethernet board detection sequence at boot time. You might hypothesize that there is a conflict or problem with the Ethernet board in your machine. The quick and easy way to find out is to pull the Ethernet board and try booting again. If everything goes well when you reboot, then you know that either the Ethernet board is not supported by Linux (see the section "Section 1.9, "Hardware Requirements"" in Chapter 1, "Introduction to Linux" for a list of compatible boards), or there is an address or IRQ conflict with the board. In addition, some badly designed network boards (mostly NE2000 clones) can hang the entire system when they auto-probed. If this appears to be the case for you, your best bet is to remove the network board from the system during the installation and put it back in later, or pass the appropriate kernel parameters during boot-up so that auto-probing of the network board can be avoided. The most permanent fix is to dump that card and get a new one from another vendor that designs its hardware more carefully.

"Address or IRQ conflict?" What on earth does that mean? All devices in your machine use an interrupt request line, or IRQ, to tell the system they need something done on their behalf. You can think of the IRQ as a cord the device tugs when it needs the system to take care of some pending request. If more than one device is tugging on the same cord, the kernel won't be able to determine which device it needs to service. Instant mayhem.

Therefore, be sure all your installed devices are using unique IRQ lines. In general, the IRQ for a device can be set by jumpers on the card; see the documentation for the particular device for details. Some devices do not require an IRQ at all, but it is suggested you configure them to use one if possible (the Seagate ST01 and ST02 SCSI controllers being good examples).

In some cases, the kernel provided on your installation media is configured to use a certain IRQ for certain devices. For example, on some distributions of Linux, the kernel is preconfigured to use IRQ 5 for the TMC-950 SCSI controller, the Mitsumi CD-ROM controller, and the busmouse driver. If you want to use two or more of these devices, you'll need first to install Linux with only one of these devices enabled, then recompile the kernel in order to change the default IRQ for one of them. (See the section "Section 7.4, "Building a New Kernel"" in Chapter 7, "Upgrading Software and the Kernel", for information on recompiling the kernel.)

Another area where hardware conflicts can arise is with direct memory access (DMA) channels, I/O addresses, and shared memory addresses. All these terms describe mechanisms through which the system interfaces with hardware devices. Some Ethernet boards, for example, use a shared memory address as well as an IRQ to interface with the system. If any of these are in conflict with other devices, the system may behave unexpectedly. You should be able to change the DMA channel, I/O, or shared memory addresses for your various devices with jumper settings. (Unfortunately, some devices don't allow you to change these settings.)

The documentation for your various hardware devices should specify the IRQ, DMA channel, I/O address, or shared memory address the devices use, and how to configure them. Again, the simple way to get around these problems is to temporarily disable the conflicting devices until you have time to determine the cause of the problem.

Table 3-1 is a list of IRQ and DMA channels used by various "standard" devices found on most systems. Almost all systems have some of these devices, so you should avoid setting the IRQ or DMA of other devices to these values.

Table 3-1. Common Device Settings

Device I/O address IRQ DMA
ttyS0 (COM1) 3f8 4 n/a
ttyS1 (COM2) 2f8 3 n/a
ttyS2 (COM3) 3e8 4 n/a
ttyS3 (COM4) 2e8 3 n/a
lp0 (LPT1) 378 - 37f 7 n/a
lp1 (LPT2) 278 - 27f 5 n/a
fd0, fd1 (floppies 1 and 2) 3f0 - 3f7 6 2
fd2, fd3 (floppies 3 and 4) 370 - 377 10 3

3.3.2.2. Problems recognizing hard drive or controller

When Linux boots, you see a series of messages on your screen such as the following:

Console: colour EGA+ 80x25, 8 virtual consoles 
Serial driver version 3.96 with no serial options enabled 
tty00 at 0x03f8 (irq = 4) is a 16450 
tty03 at 0x02e8 (irq = 3) is a 16550A 
lp_init: lp1 exists (0), using polling driver 
…
Here, the kernel is detecting the various hardware devices present on your system. At some point, you should see the line:
Partition check:
followed by a list of recognized partitions, for example:
Partition check: 
  hda: hda1 hda2 
  hdb: hdb1 hdb2 hdb3
If, for some reason, your drives or partitions are not recognized, you will not be able to access them in any way.

There are several conditions that can cause this to happen:

Hard drive or controller not supported

If you are using a hard drive controller (IDE, SCSI, or otherwise) not supported by Linux, the kernel will not recognize your partitions at boot time.

Drive or controller improperly configured

Even if your controller is supported by Linux, it may not be configured correctly. (This is a problem particularly for SCSI controllers; most non-SCSI controllers should work fine without additional configuration.)

Refer to the documentation for your hard drive and controller for information on solving these kinds of problems. In particular, many hard drives will need to have a jumper set if they are to be used as a "slave" drive (e.g., as the second hard drive). The acid test for this kind of condition is to boot up MS-DOS or some other operating system known to work with your drive and controller. If you can access the drive and controller from another operating system, then it is not a problem with your hardware configuration.

See the previous section, "Section 3.3.2.1, "Isolating hardware problems"," for information on resolving possible device conflicts and the following section, "Section 3.3.2.3, "Problems with SCSI controllers and devices"," for information on configuring SCSI devices.

Controller properly configured, but not detected

Some BIOS-less SCSI controllers require the user to specify information about the controller at boot time. The following section, "Section 3.3.2.3, "Problems with SCSI controllers and devices"," describes how to force hardware detection for these controllers.

Hard drive geometry not recognized

Some systems, such as the IBM PS/ValuePoint, do not store hard-drive geometry information in the CMOS memory where Linux expects to find it. Also, certain SCSI controllers need to be told where to find drive geometry in order for Linux to recognize the layout of your drive.

Most distributions provide a boot option to specify the drive geometry. In general, when booting the installation media, you can specify the drive geometry at the LILO boot prompt with a command such as:

boot: linux hd=cylinders,heads,sectors
where cylinders, heads, and sectors correspond to the number of cylinders, heads, and sectors per track for your hard drive.

After installing the Linux software, you can install LILO, allowing you to boot from the hard drive. At that time, you can specify the drive geometry to the LILO installation procedure, making it unnecessary to enter the drive geometry each time you boot. See the section "Section 5.2.2, "Using LILO"" in Chapter 5, "Essential System Management" for more about LILO.

3.3.2.3. Problems with SCSI controllers and devices

Presented here are some of the most common problems with SCSI controllers and devices, such as CD-ROMs, hard drives, and tape drives. If you are having problems getting Linux to recognize your drive or controller, read on. Let us again emphasize that most distributions use a modularized kernel and that you might have to load a module supporting your hardware during an early phase of the installation process. This might also be done automatically for you.

The Linux SCSI HOWTO contains much useful information on SCSI devices in addition to that listed here. SCSIs can be particularly tricky to configure at times.

It might be economizing on the false end, for example, if you use cheap cables, especially if you use wide SCSI. Cheap cables are a major source of problems and can cause all kinds of failures, as well as major headaches. If you use SCSI, use proper cabling.

Here are common problems and possible solutions:

A SCSI device is detected at all possible IDs.

This problem occurs when the system straps the device to the same address as the controller. You need to change the jumper settings so that the drive uses a different address from the controller itself.

Linux reports sense errors, even if the devices are known to be error-free.

This can be caused by bad cables or by bad termination. If your SCSI bus is not terminated at both ends, you may have errors accessing SCSI devices. When in doubt, always check your cables. In addition to disconnected cables, bad-quality cables are a not-so-uncommon source of troubles.

SCSI devices report timeout errors.

This is usually caused by a conflict with IRQ, DMA, or device addresses. Also, check that interrupts are enabled correctly on your controller.

SCSI controllers using BIOS are not detected.

Detection of controllers using BIOS will fail if the BIOS is disabled, or if your controller's "signature" is not recognized by the kernel. See the Linux SCSI HOWTO for more information about this.

Controllers using memory-mapped I/O do not work.

This happens when the memory-mapped I/O ports are incorrectly cached. Either mark the board's address space as uncacheable in the XCMOS settings, or disable cache altogether.

When partitioning, you get a warning "cylinders > 1024," or you are unable to boot from a partition using cylinders numbered above 1023.

BIOS limits the number of cylinders to 1024, and any partition using cylinders numbered above this won't be accessible from the BIOS. As far as Linux is concerned, this affects only booting; once the system has booted, you should be able to access the partition. Your options are to either boot Linux from a boot floppy, or boot from a partition using cylinders numbered below 1024. See the section "Section 3.1.7, "Creating the Boot Floppy or Installing LILO"" earlier in this chapter.

CD-ROM drive or other removable media devices are not recognized at boot time.

Try booting with a CD-ROM (or disk) in the drive. This is necessary for some devices.

If your SCSI controller is not recognized, you may need to force hardware detection at boot time. This is particularly important for SCSI controllers without BIOS. Most distributions allow you to specify the controller IRQ and shared memory address when booting the installation media. For example, if you are using a TMC-8xx controller, you may be able to enter:

boot: linux tmx8xx=interrupt,memory-address
at the LILO boot prompt, where interrupt is the IRQ of controller, and memory-address is the shared memory address. Whether you can do this depends on the distribution of Linux you are using; consult your documentation for details.

3.3.3. Problems Installing the Software

Installing the Linux software should be trouble free if you're lucky. The only problems you might experience would be related to corrupt installation media or lack of space on your Linux filesystems. Here is a list of common problems:

System reports "Read error, file not found," or other errors while attempting to install the software.

This is indicative of a problem with your installation media. If you are installing from floppy, keep in mind that floppies are quite susceptible to media errors of this type. Be sure to use brand-new, newly formatted floppies. If you have a Windows partition on your drive, many Linux distributions allow you to install the software from the hard drive. This may be faster and more reliable than using floppies.

If you are using a CD-ROM, be sure to check the disk for scratches, dust, or other problems that might cause media errors.

The cause of the problem may also be that the media is in the incorrect format. For example, if using floppies, many Linux distributions require floppies to be formatted in high-density MS-DOS format. (The boot floppy is the exception; it is not in MS-DOS format in most cases.) If all else fails, either obtain a new set of floppies, or recreate the floppies (using new ones) if you downloaded the software yourself.

System reports errors such as "tar: read error" or "gzip: not in gzip format."

This problem is usually caused by corrupt files on the installation media itself. In other words, your floppy may be error-free, but the data on the floppy is in some way corrupted. For example, if you downloaded the Linux software using text mode, rather than binary mode, your files will be corrupt and unreadable by the installation software. When using FTP, just issue the binary command to set that mode before you request a file transfer.

System reports errors such as "device full" while installing.

This is a clear-cut sign you have run out of space when installing the software. If the disk fills up, not all distributions can clearly recover, so aborting the installation won't give you a working system.

The solution is usually to recreate your filesystems with the mke2fs command, which will delete the partially installed software. You can then attempt to reinstall the software, this time selecting a smaller amount of software to install. In other cases, you may need to start completely from scratch, and rethink your partition and filesystem sizes.

System reports errors such as "read_intr: 0x10" while accessing the hard drive.

This is usually an indication of bad blocks on your drive. However, if you receive these errors while using mkswap or mke2fs, the system may be having trouble accessing your drive. This can either be a hardware problem (see the section "Section 3.3.2, "Hardware Problems"" earlier in this chapter), or it might be a case of poorly specified geometry. If you used the option:

hd=cylinders,heads,sectors
at boot time to force detection of your drive geometry and incorrectly specified the geometry, you could receive this error. This can also happen if your drive geometry is incorrectly specified in the system CMOS.

System reports errors such as "file not found" or "permission denied."

This problem can occur if not all of the necessary files are present on the installation media or if there is a permissions problem with the installation software. For example, some distributions of Linux have been known to have bugs in the installation software itself; these are usually fixed rapidly and are quite infrequent. If you suspect that the distribution software contains bugs, and you're sure that you have not done anything wrong, contact the maintainer of the distribution to report the bug.

If you have other strange errors when installing Linux (especially if you downloaded the software yourself), be sure you actually obtained all of the necessary files when downloading.

For example, some people use the FTP command:

mget *.*
when downloading the Linux software via FTP. This will download only those files that contain a "." in their filenames; if there are any files without the "." you will miss them. The correct command to use in this case is:
mget *

The best advice is to retrace your steps when something goes wrong. You may think that you have done everything correctly, when in fact you forgot a small but important step somewhere along the way. In many cases, just attempting to re-download or reinstall the Linux software can solve the problem. Don't beat your head against the wall any longer than you have to!

Also, if Linux unexpectedly hangs during installation, there may be a hardware problem of some kind. See the section "Section 3.3.2, "Hardware Problems"" for hints.

3.3.4. Problems After Installing Linux

You've spent an entire afternoon installing Linux. In order to make space for it, you wiped your Windows and OS/2 partitions and tearfully deleted your copies of SimCity 2000 and Railroad Tycoon 2. You reboot the system and nothing happens. Or, even worse, something happens, but it's not what should happen. What do you do?

In the section "Section 3.3.1, "Problems with Booting the Installation Media"," earlier in this chapter, we covered the most common problems that can occur when booting the Linux installation media; many of those problems may apply here. In addition, you may be victim to one of the following maladies.

3.3.4.2. Problems booting Linux from the hard drive

If you opted to install LILO instead of creating a boot floppy, you should be able to boot Linux from the hard drive. However, the automated LILO installation procedure used by many distributions is not always perfect. It may make incorrect assumptions about your partition layout, in which case you need to reinstall LILO to get everything right. Installing LILO is covered in the section "Section 5.2.2, "Using LILO"" in Chapter 5, "Essential System Management".

Here are some common problems:

System reports "Drive not bootable-Please insert system disk."

You will get this error message if the hard drive's master boot record is corrupt in some way. In most cases, it's harmless, and everything else on your drive is still intact. There are several ways around this:

When you boot the system from the hard drive, MS-DOS (or another operating system) starts instead of Linux.

First of all, be sure you actually installed LILO when installing the Linux software. If not, the system will still boot MS-DOS (or whatever other operating system you may have) when you attempt to boot from the hard drive. In order to boot Linux from the hard drive, you need to install LILO (see the section "Section 5.2.2, "Using LILO"" in Chapter 5, "Essential System Management").

On the other hand, if you did install LILO, and another operating system boots instead of Linux, then you have LILO configured to boot that other operating system by default. While the system is booting, hold down the Shift or Control key and press Tab at the boot prompt. This should present you with a list of possible operating systems to boot; select the appropriate option (usually just linux) to boot Linux.

If you wish to select Linux as the default operating system to boot, you will need to reinstall LILO.

It also may be possible that you attempted to install LILO, but the installation procedure failed in some way. See the previous item on installation.

3.3.4.4. Problems using the system

If login is successful, you should be presented with a shell prompt (such as # or $) and can happily roam around your system. The next step in this case is to try the procedures in Chapter 4, "Basic Unix Commands and Concepts". However, there are some initial problems with using the system that sometimes creep up.

The most common initial configuration problem is incorrect file or directory permissions. This can cause the error message:

Shell-init: permission denied
to be printed after logging in. (In fact, any time you see the message permission denied, you can be fairly certain it is a problem with file permissions.)

In many cases, it's a simple matter of using the chmod command to fix the permissions of the appropriate files or directories. For example, some distributions of Linux once used the incorrect file mode 0644 for the root directory ( /  ). The fix was to issue the command:

# chmod 755 /
as root. (File permissions are covered by the section "Section 4.13, "File Ownership and Permissions"" in Chapter 4, "Basic Unix Commands and Concepts".) However, in order to issue this command, you needed to boot from the installation media and mount your Linux root filesystem by hand--a hairy task for most newcomers.

As you use the system, you may run into places where file and directory permissions are incorrect, or software does not work as configured. Welcome to the world of Linux! While most distributions are quite trouble-free, you can't expect them to be perfect. We don't want to cover all of those problems here. Instead, throughout the book we help you to solve many of these configuration problems by teaching you how to find them and fix them yourself. In Chapter 1, "Introduction to Linux", we discussed this philosophy in some detail. In Chapter 5, "Essential System Management", we give hints for fixing many of these common configuration problems.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.







??????????????@Mail.ru