Document and Image Management Systems, Part Two

DOCUMENT AND IMAGE MANAGEMENT SYSTEMS - Part 2

by Wayne M. Krakau - Chicago Computer Guide, July 1994 - The Law Works, October 1994 - NewsWare, November 1995

This is the second part in a series of columns covering Document and Image Management Systems (DIMS). These systems integrate the management of the images of text and graphic documents as scanned in from paper or film, text and graphic files through import routines, and the text that results from using optical character recognition (OCR) to translate images of documents consisting of dots back into logical text. Part 1 of the series covered the politics of this particular corner of the computer industry, some samples of completed systems, and the legal implications of implementing a DIMS.

The first technical aspect of DIMS that you must worry about is to develop a plan to accommodate or avoid the throughput problems that usually follow the installation of a document and image management system. I am on call for one manufacturer of document and image management systems to help them plan and troubleshoot systems. So far, most of my work for them has been to come in after a marginal network has collapsed from faulty cabling due to the increase in traffic from the new document and image management system. Many systems are already teetering on the edge of disaster. The extra traffic from a DIMS can be enough to put them over that edge. Either response time becomes excessive from the multiple retries necessary to get a message over the network, or the system breaks down to the point where individual workstations start dropping randomly off the LAN. (Refer to my April, May, and June columns for more information on the care and feeding of cable plants.)

For LANs without cabling problems, response time can still be an issue. The sheer quantity of traffic may be enough to cause untenable delays. A new DIMS is useless if nobody can get their work done.

The most obvious way to overcome throughput problems is through the brute force method. For instance, I am currently designing a network for an engineering firm. While they don't plan on an imaging system, they do have a similar need for a high-speed LAN. I am going to use Madge Networks, Inc. (800-876-2343) upcoming 100 megabit unshielded twisted pair network cards along with a Synoptics Communications, Inc. (800-776-6895) concentrator. This system will follow the soon-to-be-approved MLT-3 standard, a derivative of FDDI (Fiber Distributed Data Interface, a 100-megabit descendent of Token Ring using optical fiber). It's fast but very expensive.

The second way to attack this problem is to split the network. Netware 3.x and higher can handle up to 16 network cards. I often use Standard Microsystems Corporation (800-SMC-4YOU) Elite Ultra 32T card. It's a dual channel Ethernet EISA card with two complete Ethernet chipsets and connectors. The server sees it as two different cards. With SMC’s card or separate cards of your own choice you can split the network several ways.

You can just split it arbitrarily 50/50 (or into smaller segments with more additional network cards).

You can isolate critical workstations. In one case, the engineering department of a client required extra throughput, so we are going to give them a dedicated concentrator attached to one of the Ethernet ports. The other port serves the rest of the network.

Another way is to isolate those devices that could overburden the network. This would include any scanning stations, network printers, and fax servers. This is particularly applicable to imaging, since scanning can suck up loads of network bandwidth. The regular users could be on a separate segment from these high-bandwidth devices.

The third way to overcome throughput problems is my favorite. That is to use client-server technology to reduce the amount of data going over the network. Client-server technology splits a database into two parts, with the client portion running on the workstation and the server portion running on the database server (which may also be the file server). The workstation software presents the visual interface, supplies a help system, does some preliminary error checking, and formulates requests for information or update requests based on the operator’s input. The request for information is submitted over the network to the server software.

The server software acts upon the request by searching its hard disk for the appropriate information or by making a final check of the update request. For an information request, just the data needed is sent back over the LAN. For an update request, only an acknowledgment is sent back.

Without client-server technology, the individual workstation is completely responsible for searching the underlying database and processing any updates, causing potentially enormous quantities of data to clog up the network.

You can use a simple client-server database to track indicative (document identification) data. Full-featured systems use OCR, or Optical Character Recognition, to create ASCII text out of the files scanned into the DIMS. They also allow queuing of the OCR task for processing by multiple OCR computers. (A handy technique is to scan the documents during the day and OCR them using all available idle computers during the night.) The resulting text is the base for a searchable full text index and also needs to be client-server.

Finally, the database tracking the images themselves absolutely must use client-server technology. Since I cover the Novell-centric end of the business, I prefer systems that use NLMs, Netware Loadable Modules, to allow client-server access. To further reduce traffic, the actual images should be stored in a compressed format and be passed over the wire still compressed. They can then be expanded at the workstation. There are two other benefits to the NLM method.

First, as an NLM, no actual person opens the database. You may have experienced having a workstation go down in the middle of a transaction, thereby blowing up your database. With properly implemented client-server technology, workstations merely submit requests and receive replies. Only the server software opens and manipulates the database. So, a workstation failure will not jeopardize the database. It is, of course, implied here that you have protected your server with an Uninterruptible Power Supply. The UPS combined with the inherent reliability features of Netware such as Read after Write Verification, Dynamic Bad Block Remapping (Hot Fix), and the Transaction Tracking System (TTS), make the file server the most dependable machine on the network.

Second, if the imaging database grows so that its NLM takes up too many server CPU cycles - that is, it overtaxes the server’s Central Processing Unit, you can get a product called an NLM Server from a company formerly called IWI and now named CommVision (415-254-5720). It is a complete 486 computer on a single card, including RAM. It comes bundled with Run-Time Netware, so you can off-load the imaging NLM onto this card. The NLM Server is placed within the file server, so it maintains bus-speed access to the disk subsystem and the server's network cards. This is often a better solution than having a totally separate machine as a database server that then has to talk to the file server via a LAN connection. Of course, the NLM Server can run other types of NLMs, too, and you can use more than one in a single server. If you need to have your imaging server, your communication server, your fax server, your E-mail server, and your mainframe or minicomputer gateway (or even some less-complicated combination) all in one box, then check out CommVision's 20-slot monster (my description, not theirs) computer. It has triple redundant, hot-swappable power supplies, a 100-megabit backbone, and a segmentable 20-slot backplane. That will cut down substantially on network traffic.

This series will continue with Document and Image Management Systems, Part 3, next month. It will cover storage methods and integration issues.

Š1994, Wayne M. Krakau