Designing for Safety - Part 1

by Wayne M. Krakau - Chicago Computer Guide, September 1999
Afer observing a new client company unnecessarily brought to its knees by the crash of its main file server, I decided I’d better publish some information on designing a LAN so that a single point of failure will be only an inconvenience, as opposed to a catastrophe. The irony is that most of these safety-related design techniques are comparatively inexpensive, so implementing them is really a form of cheap insurance that won’t break most budgets.

Despite my company’s current NetWare centric attitude, most of the techniques and products covered here are applicable to any Network operating System (NOS). (Note that since Microsoft’s Windows 2000 Project Manager has publicly claimed, as published in the August 18 issue of Computer World Online News, www.computerworld.com, that "Windows 2000 users would experience thirty times fewer crashes than they do with NT," one simple safety technique would be to run NetWare at least until a fully patched version of Windows 2000 is available, presumably six months to one year after Windows 2000 is released. This is, of course, assuming that you have an overwhelming need to switch your servers to Windows 2000.)

The first design technique I’ll suggest is to use true dual servers. I know that this suggestion seemingly violates my "inexpensive" claim, but for those companies that lose large amounts of money any time that their servers are down, anything that improves uptime is cheap.

My favorite software for implementing true dual servers is Vinca’s StandbyServer (www.vinca.com) software. It comes in both NetWare and NT versions. Novell likes it so much that they now sell it. With StandbyServer, a secondary server is set up with special drivers that make it look like a mirrored disk drive on the first server, even though it is really an independent server running a separate copy of the NOS. Its disk drives are kept in synch with the primary server’s disk drives via a direct network link between the two machines. For added safety, the two servers can be separated so that major disasters (fires, floods, etc.) won’t affect both.

When the secondary server notices that the primary server is no longer responding via either the dedicated link or any of the general-purpose network segments that the two servers share (presumably because the primary has stopped functioning), it immediately reboots and wakes up using the identity of the primary server. Since it has the original primary server’s identity, all workstations and other network-connected devices think they are still talking to the original primary server.

On a system that I set up for an engineering firm using NetWare 4.11, the 22GB disk array on the former secondary (now primary) server mounted in less than 90 seconds. That’s easily fast enough for the NetWare client software on the workstations to automatically reconnect each workstation and reestablish all drive mappings. This means that users merely have to wait a little while and then click on "RETRY" to continue working. Those who don’t happen to read from or write to the server won’t even have to do that. Note that NetWare 5 should, theoretically, mount volumes even faster than NetWare 4.11.

If you can’t justify the expense of true dual servers, you can use the concept of twin servers (my name for it) that most of my clients use. With this method, you purchase one workstation at the same time as you purchase your file server. It should be a twin of the file server, with the same motherboard, cards, and disk non-NOS (to be explained later) disk drives.

If that machine is used on a day-to-day basis as a workstation, you only have to worry about justifying the incremental cost increase over a regular workstation needed to make this machine a twin. That might include the cost of a better overall model of computer. It could also include upgrading to a faster CPU. It would certainly include the added cost of extra RAM, and probably a pair of SCSI controllers (one for disks and one for a tape drive). It might also include either a high-performance or even a multi-port network card. It takes more time to set up and maintain twins, too, so be prepared for some additional labor expenses.

The key to twin servers is using external tape and NOS disk drives in concert with one or more (for redundancy) internal non-NOS drive. If the primary file server crashes and can’t easily be revived, just connect the external tape and disk drives and the UPS (You do have one attached to your file server, don’t you?) to the secondary file server and reboot it, selecting the predefined server mode. In a matter of minutes, you’ve gone from having a completely idle network to just being short one workstation, a survivable event for most businesses. I’ve taught this drive-switching trick to complete computer amateurs, and have observed them using it successfully. It’s not difficult to learn.

External tape drives don’t cost that much more than internal tape drives. While I’ll cover the details of disk drive options later, I will state now that external disk enclosures that can handle plain, mirrored, or duplexed drives are inexpensive. Also, most aftermarket RAID (Redundant Array of Independent Disks) systems are only available as external units. If you use RAID system from a computer manufacture that uses internal RAID systems, you will have to manually transfer each individual drive from one machine to the other and probably have to manually reconfigure the RAID controller, too, eliminating much of the inherent ease-of-use of the concept of twin servers.

To make twin servers work in a NetWare environment, you must have identical MS-DOS partitions on the non-NOS internal drives of the two servers. Since a NetWare server’s identity is contained on its NOS drives, moving the external drives from one machine to the other will also move the server’s ID, so that a quick switch might be fast enough for the workstations to automatically reconnect. Realistically, it takes a few minutes of testing (unless there is smoke pouring out) to determine that you should give up on the main server, so rebooting workstations is often required. Keep in mind, however, that the alternative is to sit and stare at a dead network while waiting for the primary server to be fixed. Even if you have in-house techies and a stock of spare parts, this could take some time.

Since MS-DOS is no longer a viable workstation operating system for most people, I use PowerQuest’s (www.powerquest.com) PartitionMagic to split the internal drives into one MS-DOS partition at the beginning and multiple 16-bit (for compatibility) Win 95/98 partitions. (Since I also like PowerQuest’s DriveImage, I usually sell DriveImage Pro, which contains both DriveImage and PartitionMagic, since it is available at an incredibly low per seat cost when purchased for the whole network.)

PartitionMagic also contains PowerQuest’s BootMagic, which allows you to select which partition (and operating system) to boot on. I set the primary server to default to booting MS-DOS, which will then automatically start NetWare. I set the secondary server to booting Win 98. In each case, I use BootMagic to display a menu with a 30-second countdown that allows you to override the default operating system selection.

After initial configuration and testing, my only concern is that the non-NOS disks in each of the servers stay in synch. If I update NetWare, I may have to copy information between the MS-DOS partitions. If I update Win 98, or add programs to it, I must copy any changed information between Win 98 partitions. The easiest way to do the Win 98 copying is with DriveImage. I can create an image of one machine, copy it to the network, and "plant" it on the other computer, though I would want to manually synchronize the two MS-DOS partitions first.

I finally got tired of hearing clients protest the purchase of an old, theoretically dead, operating system like MS-DOS, so I stockpiled packages of MS-DOS and I now give them away with new servers. That solves the licensing problem without any misunderstandings. There are other MS-DOS-like systems out there, but for maximum compatibility, I prefer using the real thing. Note that many server manufacturers, especially (though not limited to) mail order houses, explicitly instruct you to pirate MS-DOS onto their NetWare servers, both in their documentation and via phone calls to tech support. Giving away MS-DOS also eliminates the necessity of having my company assist a client in doing that piracy, thereby keeping both organizations out of court.

Next month I’ll continue covering safety-related design issues for networks, including detail on the disk options mentioned above. Meanwhile, thinking about smoking computers made me remember an emergency phone call I made to my boss in the middle of the night while working as a mainframe computer operator years ago. When he answered the phone, still half asleep, I said, "The good news is that the fire is out!" I don’t think he ever forgave me for phrasing it that way.

�1999, Wayne M. Krakau