Recoverable Errors |
 |
When LVM encounters a recoverable (correctable)
error, it internally retries the failed operation assuming that the
error will correct itself or that you can take steps to correct it.
Examples of recoverable errors are the following:
A disk that goes missing after the volume group is activated
A loose disk cable (which
looks like a missing disk)
In these cases, LVM logs an error message to the
console, but it does not return an error to the application accessing
the logical volume.
If you have a current copy of the data on a separate,
functioning mirror, then LVM directs the I/O to a mirror copy, the
same as for a nonrecoverable error. Applications accessing the logical
volume do not detect any error. (To preserve data synchronization
between its mirrors, LVM retries recoverable write requests to a problematic
disk, even if a current copy exists elsewhere. However, this process
is managed by a daemon internal to LVM and has no impact on user access
to the logical volume.)
However, if the device in question holds the only
copy of the data, LVM retries the I/O request until it succeeds—that
is, until the device responds or the system is rebooted. Any application
performing I/O to the logical volume might block, waiting for the
device to recover. In this case, your application or file system might
appear to be stalled and might be unresponsive.
Temporarily Unavailable Device
By default, LVM retries I/O requests with recoverable
errors until they succeed or the system is rebooted. Therefore, if
an application or file system stalls, your troubleshooting must include
checking the console log for problems with your disk drives and taking
action to restore the failing devices to service.
Permanently Unavailable Device
If retrying the I/O request never succeeds (for
example, the disk was physically removed), your application or file
system might block indefinitely. If your application is not responding,
you might need to reboot your system.
As an alternative to rebooting, you can control
how long LVM retries a recoverable error before treating it as nonrecoverable
by setting a timeout on the logical volume. If the device fails to
respond within that time, LVM returns an I/O error to the caller.
This timeout value is subject to any underlying physical volume timeout
and driver timeout, so LVM can return the I/O error seconds after
the logical volume timeout expired.
The timeout value is normally zero, which is interpreted
as an infinite timeout. Thus, no I/O request returns to the caller
until it completes successfully.
View the timeout value for a logical volume using
the lvdisplay command, as follows:
# lvdisplay /dev/vg00/lvol1 | grep Timeout
IO Timeout (Seconds) default |
Set the timeout value using the -t option of the lvchange command. This sets the
timeout value in seconds for a logical volume. For example, to set
the timeout for /dev/vg01/lvol1 to one minute, enter the following command:
# lvchange -t 60 /dev/vg01/lvol1 |
 |
 |  |
 |
 | CAUTION: Setting a timeout on a logical volume increases
the likelihood of transient errors being treated as nonrecoverable
errors, so any application that reads or writes to the logical volume
can experience I/O errors. If your application is not prepared to
handle such errors, keep the default infinite logical volume timeout. |
 |
 |  |
 |
 |
 |  |
 |
 | TIP: Set the logical volume timeout to an integral multiple
of any timeout assigned to the underlying physical volumes. Otherwise,
the actual duration of the I/O request can exceed the logical volume
timeout. For details on how to change the I/O timeout value on a physical
volume, see pvchange(1M). |
 |
 |  |
 |
Nonrecoverable Errors |
 |
Nonrecoverable errors are considered fatal; there
is no expectation that retrying the operation will work.
If you have a current copy of the data on a separate,
functioning mirror, then LVM directs reads and writes to that mirror
copy. The I/O operation for the application accessing the logical
volume completes successfully.
However, if you have no other copies of the data,
then LVM returns an error to the subsystem accessing the logical volume.
Thus, any application directly accessing a logical volume must be
prepared for I/O requests to fail. File systems such as VxFS and most
database applications are designed to recover from error situations;
for example, if VxFS encounters an I/O error, it might disable access
to a file system or a subset of the files in it.
LVM considers the following two situations nonrecoverable.
Missing Device When the Volume Group Was Activated
If the device associated with the I/O was not present when the volume group was activated, LVM
prints an error message to the user's terminal at activation
time. You must either locate the disk and restore it to service, or
replace it, then activate the volume group again.