SFT III-PEAT, Part One

by Wayne M. Krakau - Chicago Computer Guide, August 1993

SFT III is here. The ultimate level of Novell's System Fault Tolerance (hence SFT) has been established as a practical alternative for those companies with the highest of up-time requirements or those with demanding performance needs. This is first hand testimony, since I have just put up my first production Netware 3.11 SFT III system at a client site.

SFT Levels I and II include such now common features as Hot Fix (really Dynamic Bad Block Remapping), an intelligent UPS (uninterruptible power supply) interface, duplicate FATs (file allocation tables), duplicate DETs (directory entry tables), disk mirroring (two disks with the same information accessed via a single controller), and disk duplexing (mirroring using two separate controllers).

SFT Level III adds mirrored servers connected by a dedicated high-speed link called (naturally enough) the Mirrored Server Link or MSL. As a secondary disk of a mirrored pair of disks transparently takes over the duty of supplying disk services to its server upon failure of the primary drive, the secondary server in an SFT III pair takes over upon the failure of the primary system.

Unlike the mirrored disks, the mirrored servers are smart enough to switch identities dynamically as one or the other goes down, is repaired or reconfigured as necessary, and comes back up. For instance, if the primary server is taken down for maintenance or enhancement, the secondary server not only provides server resources to the users on the network, it reassigns itself as the primary server. When the former primary server is reawakened, it will recognize that a primary server already exists and will automatically configure itself as the secondary server. At that point, the primary server will recognize that the new secondary server has its disk drives out of synch with its drives and will issue an internal remirror command.

As with mirrored disks, this command causes the secondary server's disks to be updated so synchronization is reestablished, but only after synchronizing the memory within the two servers. Then the other (now primary) server could be brought down for its own maintenance or enhancement.

A requirement for SFT III is a pair of matched servers. Though it is technically possible to have a pair that is different in architecture, memory, or disk characteristics, a brief scan of the SFT III Forum of Netwire (Novell's section within the Compuserve Information System) will convince you to stick with absolutely identical systems. It seems that everyone in the world (Compuserve is accessible in many countries) who has tried unmatched systems is on Netwire begging for help, unable to configure a stable network.

I chose a pair of DTK 486/66DX2 EISA (Extended Industry Standard Architecture) tower systems with 1GB (one billion bytes where one byte is needed for each character). The disk drives communicated via Adaptec 1742 EISA busmastering 32-bit SCSI-II (Small Computer Systems Interface, Version Two, pronounced Scuzzy-Two) controllers (really host bus adapters or HBAs). I chose an identical machine to run regular Netware 3.11 for another department. The only difference was that the mirrored servers had thirty-two megabytes of memory while the single server was equipped with only sixteen megabytes. The extra memory was to accommodate both a larger department running more sophisticated applications, and the extra four megabytes of overhead required by SFT III.

The next requirement is for a pair of Mirrored Server Link (MSL) network interface cards (NICs). Only three companies make NICs approved for use as MSLs, Thomas-Conrad, Eagle, and Plaintree. They use three separate proprietary methods for passing data at 100mbs (megabits per second, or millions of bits per second where eight bits are needed for each character of information before overhead is added). I chose Thomas-Conrad's TCNS (Thomas-Conrad Networking System) for my client since it has been around the longest, and we have had successful experience with that company in general and the TCNS system in particular in the past.

Another requirement is for an equal number of NICs in each server. A Netware 3.11 server can have up to sixteen NICs (if you can find a box with enough slots), and each NIC attaches to a separate network (a network, in Novell terminology, is defined by the cable plant). If a secondary server is to take over all the duties of the primary server, it needs to access all of the same wire segments in order to avoid cutting off some portion of the internetwork.

In this system there are three networks, one with a regular Netware 3.11 server, one with the SFT III system, and one providing a backbone between the two servers. (Note that the mirrored pair looks like a single logical server to the rest of the network, and is referred to as such in any discussion of the internetwork as a whole.) This meant that each of the three physical servers needed two network connections. The client chose the popular 10Base-T standard (Ethernet over unshielded twisted pair wiring) for their networking method.

Since a single Ethernet connection is a waste of the capabilities of an EISA slot, I chose Standard Microsystems Corporation's (SMC) 3032TP dual channel NIC 32-bit busmastering EISA NIC. It has two complete Ethernet chipsets with two separate sockets to more fully utilize EISA's speed without wasting an extra slot. One socket in each physical server would attach to the concentrator group associated with its own department while the other one would attach to the single concentrator assigned to the backbone between servers.

Next month, I'll cover the inner workings of Netware 3.11 SFT III.

                                    1993, Wayne M. Krakau