HW4: Chapters 11 & 12
11.4 One type of architectural pattern is a Protection System, which is a specialized system that is associated with some other system. These systems independently monitor their environment and if sensors indicate a problem that the controlled system is failing to deal with, then the protection system is activated to shut down the process or equipment that is failing.
There’s self-monitoring architectures where computations are taken on separate channels and the output of these computations are compared. To be effective in detecting both hardware and software faults, hardware used in each channel is diverse like a different processor from different manufacturers. The software used in each channel is also diverse. The Airbus 340 uses such a system with 5 self-checking computers. In more than 15 years of operation, control of the aircraft has never been lost due to total flight control system failure!
Finally with N-version programming architectures are a tried and proven concept of triple modular redundancy (TMR) which has been adopted from the hardware architectures. Here the output from each unit is passed to an output comparartor usually implemented as a voting system, where if two or more are the same that’s the value that’s outputed. If one unit fails it’s output is ignored and attempted to repair when possible. So with software we try and implement three different versions developed by different development teams but from the same specification. In the event of a single failure at least two versions should be the same (in a 3-version arch.).
The commonality between all three is that they all use Software Diversity to achieve fault tolerance. This means if the probability of failure on demand is .001 for each system then a 3-channel system’s POFOD is 1 million times greater (multiply all three).
11.7 A radiation therapy machine using N-version programming seems a bit like overkill but of course I am not sure what a radiation therapy machine does but I am assuming it delivers radiation as part of chemotherapy for cancer patients. In this case the most important thing is to not blast the patient with very high levels of radiation that could make things worse, all things considered. A better approach I think would be to use a Protection System where we monitor the Control System to make sure the appropriate amounts of radiation are being delivered and when a fault occurs to overtake the environment via the Protection System. I could be a horrible person for suggesting that a cancer patient doesn’t deserve the most effective method of delivering radiation but going by studies this kind of therapy isn’t always successful to begin with and I’d reserve resources to higher life threatening operations like transportation systems for N-version programming.
11.9 Handling all exceptions within a program makes it possible to detect and recover from some input errors and unexpected external events that can lead to system failure. As such, we can provide a high degree of fault tolerance if all the exceptions possible are handled which will reduce and limit system failures and that is what a system with high availability demands.
12.5
1. The system is free to increase speed up to the maximum allowed speed limit if the signal status of upcoming track is not red.
2. The system will begin applying brakes to decrease speed if it is nearing a track with a signal status of red.
3. The system will loosen the brakes if brakes were being applied because of a previously signal status of red that is no longer red and continue normal operation.
4. The system will apply emergency brakes once the speed is below 20 MPH if it is nearing a track with a signal status of red.
5. The system is free to decrease speed down to the minimum allowed speed limit if the signal status of upcoming track is not red.