NAME
uda
—
UDA50 disk controller
interface
SYNOPSIS
uda0 at uba? csr 0172150
uda1 at uba? csr 0160334
mscpbus* at uda?
DESCRIPTION
This is a driver for the DEC UDA50 disk controller and other compatible controllers. The UDA50 communicates with the host through a packet protocol known as the Mass Storage Control Protocol (MSCP). Consult the file ⟨vax/mscp.h⟩ for a detailed description of this protocol.
The uda
driver is a typical block-device
disk driver; see
physio(9) for a description of block I/O. The script
MAKEDEV(8) should be used to create the uda
special files; should a special file need to be created by hand, consult
mknod(8).
The MSCP_PARANOIA
option enables runtime
checking on all transfer completion responses from the controller. This
increases disk I/O overhead and may be undesirable on slow machines, but is
otherwise recommended.
The first sector of each disk contains both a first-stage bootstrap program and a disk label containing geometry information and partition layouts (see disklabel(5)). This sector is normally write-protected, and disk-to-disk copies should avoid copying this sector. The label may be updated with disklabel(8), which can also be used to write-enable and write-disable the sector. The next 15 sectors contain a second-stage bootstrap program.
DISK SUPPORT
During autoconfiguration, as well as when a drive is opened after all partitions are closed, the first sector of the drive is examined for a disk label. If a label is found, the geometry of the drive and the partition tables are taken from it. If no label is found, the driver configures the type of each drive when it is first encountered. A default partition table in the driver is used for each type of disk when a pack is not labelled. The origin and size (in sectors) of the default pseudo-disks on each drive are shown below. Not all partitions begin on cylinder boundaries, as on other drives, because previous drivers used one partition table for all drive types. Variants of the partition tables are common; check the driver and the file /etc/disktab (disktab(5)) for other possibilities.
Special file names begin with
‘ra
’ and
‘rra
’ for the block and character
files respectively. The second component of the name, a drive unit number in
the range of zero to seven, is represented by a
‘?
’ in the disk layouts below. The
last component of the name is the file system partition designated by a
letter from ‘a
’ to
‘h
’ and which corresponds to a minor
device number set: zero to seven, eight to 15, 16 to 23 and so forth for
drive zero, drive two and drive three respectively (see
physio(9)). The location and size (in sectors) of the partitions:
RA60 partitions
disk | start | length | |
ra?a | 0 | 15884 | |
ra?b | 15884 | 33440 | |
ra?c | 0 | 400176 | |
ra?d | 49324 | 82080 | same as 4.2BSD ra?g |
ra?e | 131404 | 268772 | same as 4.2BSD ra?h |
ra?f | 49324 | 350852 | |
ra?g | 242606 | 157570 | |
ra?h | 49324 | 193282 |
RA70 partitions
disk | start | length |
ra?a | 0 | 15884 |
ra?b | 15972 | 33440 |
ra?c | 0 | 547041 |
ra?d | 34122 | 15884 |
ra?e | 357192 | 55936 |
ra?f | 413457 | 133584 |
ra?g | 341220 | 205821 |
ra?h | 49731 | 29136 |
RA80 partitions
disk | start | length | |
ra?a | 0 | 15884 | |
ra?b | 15884 | 33440 | |
ra?c | 0 | 242606 | |
ra?e | 49324 | 193282 | same as old Berkeley ra?g |
ra?f | 49324 | 82080 | same as 4.2BSD ra?g |
ra?g | 49910 | 192696 | |
ra?h | 131404 | 111202 | same as 4.2BSD |
RA81 partitions
disk | start | length |
ra?a | 0 | 15884 |
ra?b | 16422 | 66880 |
ra?c | 0 | 891072 |
ra?d | 375564 | 15884 |
ra?e | 391986 | 307200 |
ra?f | 699720 | 191352 |
ra?g | 375564 | 515508 |
ra?h | 83538 | 291346 |
RA81 partitions with 4.2BSD-compatible partitions
disk | start | length | |
ra?a | 0 | 15884 | |
ra?b | 16422 | 66880 | |
ra?c | 0 | 891072 | |
ra?d | 49324 | 82080 | same as 4.2BSD ra?g |
ra?e | 131404 | 759668 | same as 4.2BSD ra?h |
ra?f | 412490 | 478582 | same as 4.2BSD ra?f |
ra?g | 375564 | 515508 | |
ra?h | 83538 | 291346 |
RA82 partitions
disk | start | length |
ra?a | 0 | 15884 |
ra?b | 16245 | 66880 |
ra?c | 0 | 1135554 |
ra?d | 375345 | 15884 |
ra?e | 391590 | 307200 |
ra?f | 669390 | 466164 |
ra?g | 375345 | 760209 |
ra?h | 83790 | 291346 |
The ra?a partition is normally used for the root file system, the ra?b partition as a paging area, and the ra?c partition for pack-pack copying (it maps the entire disk).
FILES
- /dev/ra[0-9][a-p]
- /dev/rra[0-9][a-p]
DIAGNOSTICS
- panic: udaslave
- No command packets were available while the driver was looking for disk drives. The controller is not extending enough credits to use the drives.
- uda0: no response to Get Unit Status request
- A disk drive was found, but did not respond to a status request. This is either a hardware problem or someone pulling unit number plugs very fast.
- uda0: unit N off line
- While searching for drives, the controller found one that seems to be manually disabled. It is ignored.
- uda0: unable to get unit status
- Something went wrong while trying to determine the status of a disk drive. This is followed by an error detail.
- uda0: unit N, next X
- This probably never happens, but I wanted to know if it did. I have no idea what one should do about it.
- uda0: cannot handle unit number N (max is X)
- The controller found a drive whose unit number is too large. Valid unit numbers are those in the range [0..7].
- uda0: uballoc map failed
- UNIBUS resource map allocation failed during initialization. This can only happen if you have 496 devices on a UNIBUS.
- uda0: timeout during init
- The controller did not initialize within ten seconds. A hardware problem, but it sometimes goes away if you try again.
- uda0: init failed, sa=...
- The controller refused to initialize.
- uda0: controller hung
- The controller never finished initialization. Retrying may sometimes fix it.
- uda0: still hung
- When the controller hangs, the driver occasionally tries to reinitialize it. This means it just tried, without success.
- uda0: command ring too small
- If you increase
NCMDL2
, you may see a performance improvement. (See /sys/arch/vax/mscp/mscpreg.h.) - uda0: controller error, sa=0%o (...)
- The controller reported an error. The error code is printed in octal, along with a short description if the code is known (see the UDA50 Maintenance Guide, DEC part number AA-M185B-TC, pp. 18-22). If this occurs during normal operation, the driver will reset it and retry pending I/O. If it occurs during configuration, the controller may be ignored.
- uda0: stray intr
- The controller interrupted when it should have stayed quiet. The interrupt has been ignored.
- uda0: init step N failed, sa=...
- The controller reported an error during the named initialization step. The driver will retry initialization later.
- uda0: version X model Y
- An informational message giving the revision level of the controller.
- uda0: DMA burst size set to N
- An informational message showing the DMA burst size, in words.
- uda0: SETCTLRC failed: `detail'
- The Set Controller Characteristics command (the last part of the controller initialization sequence) failed. The detail message tells why.
- uda0: attempt to bring ra0 on line failed: `detail'
- The drive could not be brought on line. The detail message tells why.
- uda0: ra0: unknown type N
- The type index of the named drive is not known to the driver, so the drive will be ignored.
- uda0: attempt to get status for ra0 failed: `detail'
- A status request failed. The detail message should tell why.
- panic: udareplace
- The controller reported completion of a REPLACE operation. The driver never issues any REPLACEs, so something is wrong.
- panic: udabb
- The controller reported completion of bad block related I/O. The driver never issues any such, so something is wrong.
- uda0: lost interrupt
- The controller has gone out to lunch, and is being reset to try to bring it back.
- panic: mscp_go: AEB_MAX_BP too small
- You defined
AVOID_EMULEX_BUG
and increasedNCMDL2
and Emulex has new firmware. RaiseAEB_MAX_BP
or turn offAVOID_EMULEX_BUG
. - uda0: unit N: unknown message type 0xXXX ignored
- The controller responded with a mysterious message type. See /sys/vax/mscp.h for a list of known message types. This is probably a controller hardware problem.
- uda0: unit N out of range
- The disk drive unit number (the unit plug) is higher than the maximum number the driver allows (currently 7).
- uda0: unit N not configured, message ignored
- The named disk drive has announced its presence to the controller, but was not, or cannot now be, configured into the running system. Message is one of `available attention' (an `I am here' message) or `stray response op 0xXXXX status 0xXXXX' (anything else).
- Emulex SC41/MS screwup: uda0, got N correct, then changed 0xXXXX to 0xYYYY
- You turned on
AVOID_EMULEX_BUG
, and the driver successfully avoided the bug. The number of correctly handled requests is reported, along with the expected and actual values relating to the bug being avoided. - panic: unrecoverable Emulex screwup
- You turned on
AVOID_EMULEX_BUG
, but Emulex was too clever and avoided the avoidance. Try turning onMSCP_PARANOIA
instead. - uda0: bad response packet ignored
- You turned on
MSCP_PARANOIA
, and the driver caught the controller in a lie. The lie has been ignored, and the controller will soon be reset (after a `lost' interrupt). This is followed by a hex dump of the offending packet. - uda0: ... error datagram
- The controller has reported some kind of error, either `hard'
(unrecoverable) or `soft' (recoverable). If the controller is going on
(attempting to fix the problem), this message includes the remark
`(continuing)'. Emulex controllers wrongly claim that all soft errors are
hard errors. This message may be followed by one of the following 5
messages, depending on its type, and will always be followed by a failure
detail message (also listed below).
- memory addr 0x%x
- A host memory access error; this is the address that could not be read.
- unit N: level N retry N, ... N
- A typical disk error; the retry count and error recovery levels are printed, along with the block type (`lbn', or logical block; or `rbn', or replacement block) and number. If the string is something else, DEC has been clever, or your hardware has gone to Australia for vacation (unless you live there; then it might be in New Zealand, or Brazil).
- unit N: ... N
- Also a disk error, but an `SDI' error, whatever that is. (I doubt it has anything to do with Ronald Reagan.) This lists the block type (`lbn' or `rbn') and number. This is followed by a second message indicating a microprocessor error code and a front panel code. These latter codes are drive-specific, and are intended to be used by field service as an aid in locating failing hardware. The codes for RA81s can be found in the RA81 Maintenance Guide, DEC order number AA-M879A-TC, in appendices E and F.
- unit N: small disk error, cyl N
- Yet another kind of disk error, but for small disks. (``That's what it says, guv'nor. Dunnask me what it means.'')
- unit N: unknown error, format 0x%x
- A mysterious error: the given format code is not known.
The detail messages are as follows:
- success (...) (code 0, subcode N)
- Everything worked, but the controller thought it would let you know that something went wrong. No matter what subcode, this can probably be ignored.
- invalid command (...) (code 1, subcode N)
- This probably cannot occur unless the hardware is out; ... should be `invalid msg length', meaning some command was too short or too long.
- command aborted (unknown subcode) (code 2, subcode N)
- This should never occur, as the driver never aborts commands.
- unit offline (...) (code 3, subcode N)
- The drive is offline, either because it is not around (`unknown drive'), stopped (`not mounted'), out of order (`inoperative'), has the same unit number as some other drive (`duplicate'), or has been disabled for diagnostics (`in diagnosis').
- unit available (unknown subcode) (code 4, subcode N)
- The controller has decided to report a perfectly normal event as an error. (Why?)
- media format error (...) (code 5, subcode N)
- The drive cannot be used without reformatting. The Format Control Table cannot be read (`fct unread - edc'), there is a bad sector header (`invalid sector header'), the drive is not set for 512-byte sectors (`not 512 sectors'), the drive is not formatted (`not formatted'), or the FCT has an uncorrectable ECC error (`fct ecc').
- write protected (...) (code 6, subcode N)
- The drive is write protected, either by the front panel switch (`hardware') or via the driver (`software'). The driver never sets software write protect.
- compare error (unknown subcode) (code 7, subcode N)
- A compare operation showed some sort of difference. The driver never uses compare operations.
- data error (...) (code 7, subcode N)
- Something went wrong reading or writing a data sector. A `forced error' is a software-asserted error used to mark a sector that contains suspect data. Rewriting the sector will clear the forced error. This is normally set only during bad block replacement, and the driver does no bad block replacement, so these should not occur. A `header compare' error probably means the block is shot. A `sync timeout' presumably has something to do with sector synchronisation. An `uncorrectable ecc' error is an ordinary data error that cannot be fixed via ECC logic. A `N symbol ecc' error is a data error that can be (and presumably has been) corrected by the ECC logic. It might indicate a sector that is imperfect but usable, or that is starting to go bad. If any of these errors recur, the sector may need to be replaced.
- host buffer access error (...) (code N, subcode N)
- Something went wrong while trying to copy data to or from the host (Vax). The subcode is one of `odd xfer addr', `odd xfer count', `non-exist. memory', or `memory parity'. The first two could be a software glitch; the last two indicate hardware problems.
- controller error (...) (code N, subcode N)
- The controller has detected a hardware error in itself. A `serdes overrun' is a serialiser / deserialiser overrun; `edc' probably stands for `error detection code'; and `inconsistent internal data struct' is obvious.
- drive error (...) (code N, subcode N)
- Either the controller or the drive has detected a hardware error in the drive. I am not sure what an `sdi command timeout' is, but these seem to occur benignly on occasion. A `ctlr detected protocol' error means that the controller and drive do not agree on a protocol; this could be a cabling problem, or a version mismatch. A `positioner' error means the drive seek hardware is ailing; `lost rd/wr ready' means the drive read/write logic is sick; and `drive clock dropout' means that the drive clock logic is bad, or the media is hopelessly scrambled. I have no idea what `lost recvr ready' means. A `drive detected error' is a catch-all for drive hardware trouble; `ctlr detected pulse or parity' errors are often caused by cabling problems.
SEE ALSO
HISTORY
The uda
driver appeared in
4.2BSD.