continuously for years without errors, and in some cases recover by
themselves if an error occurs. Therefore the software is usually
developed and tested more carefully than that for personal computers,
and unreliable mechanical moving parts such as disk drives, switches
or buttons are avoided.
Specific reliability issues may include:
1. The system cannot safely be shut down for repair, or it is too
inaccessible to repair. Examples include space systems, undersea
cables, navigational beacons, bore-hole systems, and automobiles.
2. The system must be kept running for safety reasons. "Limp modes"
are less tolerable. Often backups are selected by an operator.
Examples include aircraft navigation, reactor control systems,
safety-critical chemical factory controls, train signals, engines on
single-engine aircraft.
3. The system will lose large amounts of money when shut down:
Telephone switches, factory controls, bridge and elevator controls,
funds transfer and market making, automated sales and service.
A variety of techniques are used, sometimes in combination, to recover
from errors -- both software bugs such as memory leaks, and also soft
errors in the hardware:
* watchdog timer that resets the computer unless the software
periodically notifies the watchdog
* subsystems with redundant spares that can be switched over to
* software "limp modes" that provide partial function
* Immunity Aware Programming
No comments:
Post a Comment