Designing for Safety - Part 4

by Wayne M. Krakau - Chicago Computer Guide, December 1999
It’s time to close out the coverage of designing safe systems. This is the fourth of the series. The goal is to inspire the creation of safer networks with a corresponding lessening of critical errors and subsequent down time.

A safety factor that is becoming more prominent over the years is power infrastructure. That is, we should consider not just the quality of the juice coming from the power company (or in this area frequently NOT coming from Com Ed), but also the quality of the power bouncing around within our buildings.

As faster and more sophisticated CPUs and support chips are created, they dump more and more garbage (electrical irregularities) onto the electrical wiring and in turn into other devices including computers and other network devices. Non-computer electric devices such as flourescent lighting, copiers, refrigerators, etc. also pollute the electrical system within a building.

Standards violations within the wiring also contribute to electrical problems. A lot of the infractions are so obvious that even I can find them, but when you talk to power experts, their horror stories are positively mind-bending. Considering the number of (electrical) safety-related violations that they encounter on a regular basis, I’m quite surprised that we haven’t started recalling whole buildings the same way we recall cars with faulty wiring. Perhaps that will happen after enough people get zapped or barbecued.

I have found that, increasingly, the underlying electrical wiring is so bad the even a high-quality UPS (Uninterruptible Power Supply) isn’t enough to keep a file server or other network device running. A plague of mysterious glitches and gremlins is usually the first sign. The lockups and ABENDS (Abnormal End, an old term passed on from mainframes) are truly random from a computer processing point of view, though if you expand your thinking to include the possibility of power problems, you might notice some causal relationship with a power system-related event. These events could include motors cycling, air-conditioning systems starting, or my favorite, everybody switching on their PCs almost all at once the first thing in the morning.

Initially you usually test and/or replace various components within the server, trying to pin down the source of these non-repeatable errors. You might even swap out the UPS itself. Eventually you move on to manipulating electrical devices yourself. If you’re lucky, you will find that you can affect a computer’s reliability be turning on or adjusting things like fans, lights, photocopiers. You might even have neighboring offices or even buildings cycle their major machinery, looking for a clue.

Rewiring is the most obvious answer, but that is often not financially feasible, or not even available as an option because the client has leased space and doesn’t have the authority to alter the electrical system.

I finally had to start adding a power conditioner to the UPS, or, if there wasn’t a UPS there in the first place, adding a combined conditioner/UPS. This gives near-perfect power to the server in question, but it does add additional noise to the common electrical system, so it might aggravate problems that other network devices are experiencing. In this case you might have to put power conditioners on all network devices.

If the power infrastructure is in really bad shape, or the network spreads out across multiple electrical areas within a building or even to other buildings, grounding may be an issue. Computers can tell only tell the ones from zeros on the network cable if they have approximately the same idea as to what constitutes a one or a zero. The baseline is the electrical ground. If the ground is "bad" or simply different, communications can randomly or even completely fail.

Again, assuming that rewiring is not an option, you can use a device called a Ground Guard (PowerVar, www.powervar.com). It is available as an add-on device for an existing power conditioner or as an integrated unit built into a power conditioner. It will provide a rock-solid ground for the devices plugged into it. Its single disadvantage is that this new ground will almost certainly not match any other ground, thereby eliminating any chance of communications between protected (by a GroundGuard) and unprotected equipment. The only sure cure is to put a GroundGuard and power conditioner (combined or separate) on every device connected to the network. It isn’t cheap, but it is often a lot less expensive that either gutting the electrical wiring system and starting over, or putting up with an unstable network.

Naturally, if you are protecting the computer from electrical problems coming over the power lines, you should also protect it from power coming over the network cable. Remember that protection from both dangerous overvoltages and potentially disrupting noise operates on the "weakest link" theory. Miss one piece of the chain and you get nailed.

The final terminus of LAN safety (returning to the use of "safety" in relation to network reliability) is the workstation itself. Having recently covered some of the details of making a Windows 9x workstation more reliable, I will summarize my past suggestions.

First, keep the system patched and the software up to date.

Second, use your favorite utility software to solve and even prevent problems.

Third, think seriously about locking either some are even the whole user interface with either Policy Editor (included with Windows 9x) or Novell’s ZenWorks Starter Kit (included with NetWare). Specifically, I have found that Windows 98 and especially Windows 98SE (Second Edition) are particularly sensitive to cris-crossing changes made to the Display and to the Desktop Themes. It seems like these two programs within the Control Panel were written by different programming teams at different times, with conflicting goals. Changing your Windows environment using both programs makes Windows very unhappy. It demonstrates this by adding a new level of instability that will drive the user (and the system integrator) nuts.

I have been requesting that clients try to encourage their users to avoid using Desktop Themes until such time as Microsoft issues a fix. Display is an older program that is less likely to screw things up if it is used alone.

Again, the best way is to simply standardize on an interface and lock it in, but I have been all but threatened with severe bodily harm when I suggest this option to users, as they seem to feel that their civil rights are being violated if they can’t fiddle with the interface. Until somebody figures out a way to charge my fees for repairing a workstation (after too much "fiddling") back to a user’s salary, I don’t think the option of protecting the interface will catch on. After all, even Windows 3.x had a limited version of this problem (troubles caused by inappropriate customizing of the interface), and hardly anybody other than schools ever locked that interface.

Now, I’m waiting to see the effect that the upcoming Y2K-related made-for-TV movie will have on the public’s perception of computing. I have heard that it is incredibly inane with regards to ignoring facts in favor of overdramatizing. Already at least two major motion pictures on the same subject have been cancelled, presumably due to the fact that movie studios are owned by major international conglomerates who might be adversely financially affected by the panic potentially generated by these movies.

Oh well, I still haven’t been able to overcome the myths about computing that were initially spread by the 1983 movie War Games! I’m particularly offended by the stereotype of the overweight, bespeckled, face-fur equipped guy spending all of his time on a computer, to the exclusion of any social life. Where do they get such silly ideas? Now, let me clean up these candy wrappers and set aside my glasses so I can trim my beard.

�1999, Wayne M. Krakau