DOCUMENT AND IMAGE MANAGEMENT SYSTEMS - Part 3

by Wayne M. Krakau - Chicago Computer Guide, September 1994 - The Law Works, November 1994 - NewsWare, December 1995

This is the third part in a series of columns covering Document and Image Management Systems (DIMS). These systems integrate the management of the images of text and graphic documents as scanned in from paper or film, text and graphic files through import routines, and the text that results from using optical character recognition (OCR) to translate images of documents consisting of dots back into logical text. Part 1 of the series covered the politics of this particular corner of the computer industry, some samples of completed systems, and the legal implications of implementing a DIMS. Part 2 covered the technical aspects of planning a DIMS. This column will cover storage methods and integration issues.

The conventional storage technique for a DIMS is to put the searchable database and index information on your fastest magnetic drive and put the images on optical drives, using jukeboxes as more capacity is needed. I prefer using Micro Design, Incorporated's (407-677-8333) SCSI Express NLM (Netware Loadable Module) to manage optical drives. MDI's product, unlike most of its competitors, taps straight into Netware's native capabilities, such as caching and elevator seeking.

Elevator seeking is particularly valuable for jukeboxes, since it can substantially reduce the need to switch cartridges. For example, assume that three read requests come in and that request number one is for cartridge ten, side A, request number two is for cartridge twenty-three, side B, and request number three is for cartridge ten, side A. Without elevator seeking, the system would mount cartridge ten, then cartridge twenty-three, and then cartridge ten, again. With elevator seeking, the system would mount cartridge ten, satisfy both requests one and three, and then mount cartridge twenty-three. No matter how fast the jukebox (or the unfortunate person assigned to swap cartridges on a single drive) changes cartridges, it is many thousands of time slower than a simple read of the currently mounted cartridge.

While I am covering optical storage, I did want to mention that distribution via CD-ROMs is becoming very popular for imaging systems. A notebook computer with a CD-ROM drive makes a great mobile office. One of my clients, for instance, is planning to produce a thousand CDs every few months containing all of their sales information and product specifications, and distribute them to their worldwide sales force.

Remember, however, that CD-ROMs are designed for mass distribution. WORM (Write-Once, Read Many or Mostly) and Read-Write optical drives are appropriate technology for primary storage in a DIMS. Considering the number of questions I have been getting lately, these two separate technologies are commonly confused.

Some of the latest developments in optical jukeboxes include 2.4TB (a terabyte is 1,000,000,000,000 bytes or 1,000 gigabytes) Ethernet-connected jukeboxes that use 12-inch cartridges. Also several companies are using giant jukeboxes modeled after mail sorting cabinets, with multiple rectangular slots in a large array. Slots are accessed by an arm attached to a pair of crossbars that move like a giant "Etch-A-Sketch". (Does anybody else remember them? They had the original Graphical User Interface long before the Xerox PARC - Palo Alto Research Center - invented them and way before Apple copied Xerox’s technology.) Each slot can carry either a rack for multiple cartridges or an optical drive. Any cartridge-holding slot can be later retrofitted with another drive, thereby negating the one big drawback of jukeboxes. That drawback is the idea that an individual drive cannot have a cartridge swapped until the existing cartridge is finished being used. Even if the cartridge-changing time is very short, everybody waits if the drive is in use.

Another storage method that has just become financially feasible is RAID. RAID means Redundant Array of Inexpensive Disks - as opposed to SLED which means Single Large Expensive Disk.

RAID technology is broken down into several categories called levels as follows:

0 - Data striping without parity. That means data is spread out over multiple disks for speed.

1 - Mirrored disk array. For every data disk there is a redundant twin. Also includes duplexing, the use of dual intelligent controllers for additional speed and reliability.

2 - Bit interleaves data across array and reads using only whole sectors.

3 - Parallel disk array. Disk striping with dedicated parity drives. Drives are synchronized for efficiency in large parallel data transfers.

4 - Independent disk array. Reads and writes on independent drives in the array with dedicated parity drive using sector-level interleave.

5 - Independent disk array. Reads and writes data and parity across all disks with no dedicated parity drive. Uses parallel transfers. Multiple controllers optionally used for higher speed. Usually loses only the equivalent of one drive of array for redundancy. This system is the most popular these days due to both speed and cost effectiveness.

Micropolis (415-964-8300) has a series of RAID systems that they call Radion LT. The largest of these provides 3.52GB (gigabytes) of storage and is expandable to 54GB by stacking up proprietary modules like oversized Leggo blocks! As you add more disks, you can add more SCSI adapters. If a drive fails, the system keeps on running. You can hot-swap the drive without shutting the system down. As the system is expanded, you can even lose multiple drives and still keep running! In conjunction with this software-based system I have been using Adaptec's (800-934-2766) 2742AT twin EISA SCSI adapter. It puts two separate SCSI adapter chipsets on one EISA card. Netware sees it as two separate adapters. Note that the better RAID systems allow the use of multiple controllers to increase throughput.

Pacific Micro Data (714-838-8900) has taken a different tack. They use a 7-slot tower with sliding trays to accommodate anyone's disk drives. They suggest using a Mylex (510-796-6100) controller. It has three separate SCSI ports and hardware RAID on a single EISA card. This offloads the overhead of RAID from the server's processor. Note that I prefer dealing with Mylex products through third parties such as Pacific Micro Data, since Mylex seems to be set up more to support OEM agreements than direct dealer and end-user relationships.

Compatibility is the next thing to watch for. Just because a program runs as a part of Microsoft Windows, doesn't mean it's really a full Windows product. Proprietary routines are common, especially in a market like DIMS where many of the key players are transitioning from selling the old-style proprietary systems to supplying PC and LAN oriented products. I've even received complaints from dealers that some imaging products that they currently carry purposely force incompatibilities so that you have to buy into their proprietary solutions for such things as printing, network faxing, general communications, remote control, E-mail, and even hardware such as scanners, disk drives, and jukeboxes. Their clients are really mad at them for setting them up with products that have these restrictions. Some systems also require a separate imaging server, often running an operating system that would force you to seek outside support. If you are not an OS/2 shop, for instance, why should you be forced to have an OS/2 box? Keep in mind that the separate server technique slows things up both because the servers are usually linked only by a slow network cable and because of the relative speed of NLMs.

Integration among programs is one of the reasons that people go to graphical environments like Windows in the first place. To integrate programs within Windows, a high degree of compatibility is needed. Only programs that use standard Windows routines can be easily integrated using macro programs or a language such as Microsoft’s Visual Basic.

One client I am working with needs to integrate an IBM AS/400 minicomputer application used to run the business with a smaller related LAN-based database that holds a subset of the AS/400 data, as well as a Compulink Management Center’s (310-212-LINK) LaserFiche DIMS. We are planning on using either Attachmate's (800-426-6283) EXTRA or Wall Data's (206-881-5995) Rumba terminal emulation products along with their respective Visual Basic interfaces to feed data to the AS/400. We are using Microsoft Access with Visual Basic's built-in capability to manipulate Microsoft’s Access databases to handle the LAN data. The document and image management software will handle the imaging and the document-based indicative information. The user will enter the data once into an Access database, and then Visual Basic will copy the appropriate fields into the AS/400 program and into the imaging system's indicative fields while Access will retain only the subset that it needs to satisfy those users who don't have access (note the small "a") to the AS/400.

Another client wants to use Lotus Notes, but is unimpressed with the limited imaging add-on available for Notes. We are going to tie LaserFiche and Notes together to satisfy their needs.

It is very common to tie accounting systems and contact managers to DIMS. In a typical use, a customer calls up to ask why a delivery has not arrived. You bring up the customer's information in your accounting software. Simultaneously, an automated link to your DIMS brings that program up in the bottom half of your screen. It then grabs the customer number from the accounting system and passes it to the search field in the imaging program. This brings up the documents associated with this customer. Then you can view the receipt that shows that Joe on the loading dock signed for the package at 4:00PM, yesterday. Then, while you are still speaking to the customer, you could fax the receipt to them using your network-base (of course!) fax system.

The last item I want to mention is input. It is a great help to be able to batch-import text and graphics that already exist in electronic form. Even though a fuzzy search feature can help get around OCR inaccuracy, direct import of text will always be more reliable. Direct import can also retain the quality of graphic images. There can also be a great savings in labor costs. Additionally, with batch-import of graphics, even old microfiche and microfilm stock can be easily input.

Document and image management systems are a great solution to many potential problems, including storage costs, retrieval speeds, misfiling, and data analysis. It does, however, take careful planning to fully integrate all of the new technologies that make them run best.

1994, Wayne M. Krakau