RAIDframe disk driver
driver provides RAID 0, 1, 4, and 5
(and more!) capabilities to OpenBSD
. This document
assumes that the reader has at least some familiarity with RAID and RAID
concepts. The reader is also assumed to know how to configure disks and
pseudo-devices into kernels, how to generate kernels, and how to partition
RAIDframe provides a number of different RAID levels including:
- RAID 0
- provides simple data striping across the components.
- RAID 1
- provides mirroring.
- RAID 4
- provides data striping across the components, with parity stored on a
dedicated drive (in this case, the last component).
- RAID 5
- provides data striping across the components, with parity distributed
across all the components.
There are a wide variety of other RAID levels supported by RAIDframe, including
Even-Odd parity, RAID level 5 with rotated sparing, Chained declustering, and
Interleaved declustering. The reader is referred to the RAIDframe
documentation mentioned in the
section for more detail
on these various RAID configurations.
Depending on the parity level configured, the device driver can support the
failure of component drives. The number of failures allowed depends on the
parity level selected. If the driver is able to handle drive failures, and a
drive does fail, then the system is operating in "degraded mode". In
this mode, all missing data must be reconstructed from the data and parity
present on the other components. This results in much slower data accesses,
but does mean that a failure need not bring the system to a complete halt.
The RAID driver supports and enforces the use of ‘component
labels’. A ‘component label’ contains important
information about the component, including a user-specified serial number, the
row and column of that component in the RAID set, and whether the data (and
parity) on the component is ‘clean’. If the driver determines
that the labels are very inconsistent with respect to each other (e.g. two or
more serial numbers do not match) or that the component label is not
consistent with its assigned place in the set (e.g., the component label
claims the component should be the 3rd one of a 6-disk set, but the RAID set
has it as the 3rd component in a 5-disk set) then the device will fail to
configure. If the driver determines that exactly one component label seems to
be incorrect, and the RAID set is being configured as a set that supports a
single failure, then the RAID set will be allowed to configure, but the
incorrectly labeled component will be marked as ‘failed’, and
the RAID set will begin operation in degraded mode. If all of the components
are consistent among themselves, the RAID set will configure normally.
Component labels are also used to support the auto-detection and
auto-configuration of RAID sets. A RAID set can be flagged as
auto-configurable, in which case it will be configured automatically during
the kernel boot process. RAID filesystems which are automatically configured
are also eligible to be the root filesystem. There is currently no support for
booting a kernel directly from a RAID set. To use a RAID set as the root
filesystem, a kernel is usually obtained from a small non-RAID partition,
after which any auto-configuring RAID set can be used for the root filesystem.
more information on auto-configuration of RAID sets.
The driver supports ‘hot spares’, disks which are on-line, but are
not actively used in an existing filesystem. Should a disk fail, the driver is
capable of reconstructing the failed disk onto a hot spare or back onto a
replacement drive. If the components are hot swapable, the failed disk can
then be removed, a new disk put in its place, and a copyback operation
performed. The copyback operation, as its name indicates, will copy the
reconstructed data from the hot spare to the previously failed (and now
replaced) disk. Hot spares can also be hot-added using
If a component cannot be detected when the RAID device is configured, that
component will be simply marked as 'failed'.
The user-land utility for doing all
configuration and other operations is
used with the
option to initialize all
RAID sets. In particular, this initialization includes re-building the parity
data. This rebuilding of parity data is also required when either a) a new
RAID device is brought up for the first time or b) after an un-clean shutdown
of a RAID device. By using the
performing this on-demand recomputation of all parity before doing a
integrity and parity integrity can be ensured. It bears repeating again that
parity recomputation is required
filesystems are created or used on the RAID device. If the parity is not
correct, then missing data cannot be correctly recovered.
RAID levels may be combined in a hierarchical fashion. For example, a RAID 0
device can be constructed out of a number of RAID 5 devices (which, in turn,
may be constructed out of the physical disks, or of other RAID devices).
It is important that drives be hard-coded at their respective addresses (i.e.,
not left free-floating, where a drive with SCSI ID of 4 can end up as
) for well-behaved functioning of
the RAID device. This is true for all types of drives, including IDE, HP-IB,
etc. For normal SCSI drives, for example, the following can be used to fix the
sd0 at scsibus0 target 0 # SCSI disk drives
sd1 at scsibus0 target 1 # SCSI disk drives
sd2 at scsibus0 target 2 # SCSI disk drives
sd3 at scsibus0 target 3 # SCSI disk drives
sd4 at scsibus0 target 4 # SCSI disk drives
sd5 at scsibus0 target 5 # SCSI disk drives
sd6 at scsibus0 target 6 # SCSI disk drives
information. The rationale for fixing the device addresses is as follows:
Consider a system with three SCSI drives at SCSI ID's 4, 5, and 6, and which
map to components /dev/sd0e
of a RAID 5 set. If the drive
with SCSI ID 5 fails, and the system reboots, the old
will show up as
. The RAID driver is able to
detect that component positions have changed, and will not allow normal
configuration. If the device addresses are hard coded, however, the RAID
driver would detect that the middle component is unavailable, and bring the
RAID 5 set up in degraded mode. Note that the auto-detection and
auto-configuration code does not care about where the components live. The
auto-configuration code will correctly configure a device even after any
number of the components have been re-arranged.
The first step to using the
driver is to
ensure that it is suitably configured in the kernel. This is done by adding a
line similar to:
pseudo-device raid 4 # RAIDframe disk device
to the kernel configuration file. The ‘count’ argument (
‘4’, in this case), specifies the number of RAIDframe drivers to
configure. To turn on component auto-detection and auto-configuration of RAID
sets, simply add:
to the kernel configuration file.
All component partitions must be of the type
(e.g., 4.2BSD) or
(e.g., RAID). The use of the latter
is strongly encouraged, and is required if auto-configuration of the RAID set
is desired. Since RAIDframe leaves room for disklabels, RAID components can be
simply raw disks, or partitions which use an entire disk. Note that some
platforms (such as SUN) do not allow using the FS_RAID partition type. On
these platforms, the
driver can still
auto-configure from FS_BSDFFS partitions.
A more detailed treatment of actually using a
device is found in
. It is
highly recommended that the steps to reconstruct, copyback, and re-compute
parity are well understood by the system administrator(s)
a component failure. Doing the wrong
thing when a component fails may result in data loss.
Additional debug information can be sent to the console by specifying:
raid device special files.
is a port of RAIDframe, a framework for rapid
prototyping of RAID structures developed by the folks at the Parallel Data
Laboratory at Carnegie Mellon University (CMU). RAIDframe, as originally
distributed by CMU, provides a RAID simulator for a number of different
architectures, and a user-level device driver and a kernel device driver for
Digital UNIX. The
driver is a
kernelized version of RAIDframe v1.1.
A more complete description of the internals and functionality of RAIDframe is
found in the paper "RAIDframe: A Rapid Prototyping Tool for RAID
Systems", by William V. Courtright II, Garth Gibson, Mark Holland, LeAnn
Neal Reilly, and Jim Zelenka, and published by the Parallel Data Laboratory of
Carnegie Mellon University. The
first appeared in NetBSD 1.4
from where it was ported
to OpenBSD 2.5
Certain RAID levels (1, 4, 5, 6, and others) can protect against some data loss
due to component failure. However the loss of two components of a RAID 4 or 5
system, or the loss of a single component of a RAID 0 system, will result in
the entire filesystems on that RAID device being lost. RAID is
a substitute for good backup practices.
Recomputation of parity MUST
whenever there is a chance that it may have been compromised. This includes
after system crashes, or before a RAID device has been used for the first
time. Failure to keep parity correct will be catastrophic should a component
ever fail -- it is better to use RAID 0 and get the additional space and
speed, than it is to use parity, but not keep the parity correct. At least
with RAID 0 there is no perception of increased data security.