hardware timers/counters for quick crash
Hardware watchdog timers are devices that reboot the machine when it hangs. The
kernel continually resets the watchdog clock on a regular basis. Thus, if the
kernel halts, the clock will time out and reset the machine. Watchdog timers
may be configured to be reset from userland to cause a reboot if process
scheduling fails; see
A number of hardware watchdogs are supported, and all are configured using
- Automatically reset (‘tickle’) the watchdog timer but
disable it at system shutdown time.
- The timeout in seconds. Setting it to zero disables the watchdog
In situations where the machine provides vital services which are not handled
completely in kernel space, e.g. mail exchange, it may be desirable to reboot
the machine if process scheduling fails. This is done by setting
to zero and running a
process which repeatedly sets
to the desired timeout
value. Then, if process scheduling fails, the process resetting the timer will
not be run, leading to the machine being rebooted. Note that the kernel will
not automatically disable an enabled watchdog at system shutdown time when
is set to zero.
Watchdog timers should be used in high-availability environments where getting
machines up and running quickly after a crash is more important than
determining the cause of the crash. A watchdog timer enables a crashed machine
to autonomously attempt to recover quickly after a system failure.
Note that this also means that it is unwise to combine watchdog timers with
since the latter may prevent
the former from resetting the watchdog timeout before it expires. This means
that the machine will be rebooted before any debugging can be done. In other
words: For mission critical machines, disable
will give the chance to perform a crash dump and reboot. Simply setting the
watchdog will lose the debug trace of what went wrong.
For systems with multiple watchdog timers available, only a single one can be
used at a time. There is currently no way of selecting which device is used;
the first discovered by the kernel is selected.