Process

Batch Preparation
Batch preparation is an important first step in assuring a well-functioning document capture process. Key manual tasks include inspecting and separating documents, grouping documents into like categories, and designating the beginning and end of documents and batches.

Scanning
Scanning refers to the actual transformation of paper documents into digital images.  Alternatively, existing image files can be imported into the system. Effective scanning requires precise control over a wide variety of scanners and scanner settings, including resolution, contrast, simplex or duplex operation, advanced thresholding options, etc. In addition, scanning usually allows for in-line extraction of bar code information for purposes of indexing the documents for later retrieval.

OCR and Image Cleanup
Optical character recognition is frequently used in production capture systems to extract information about a document directly from the document itself. There are two forms of OCR: zonal and full-text. Zonal OCR is typically used on forms, where only specific fields on the form are of interest. Full-text OCR is used on free-form documents, such as legal briefs, to read the entire document and then prepare a searchable, full-text index of the document.

    Image cleanup
    Deskewing, despeckling, deshading, streak removal, and other basic cleanup functions
    Line removal and character reconstruction for use on forms
    Edge enhancement, which sharpens character edges to increase OCR accuracy

    In addition, enhanced thresholding options are available on some scanners (for example,  Fujitsu's IPC2 and Bell+Howell's ACE). All of these techniques make the images more readable, increase the accuracy of OCR, and assist the indexing process.

Indexing
Indexing consists of creating meaningful descriptive information for each scanned document and then writing this information into a database that will be used to retrieve the images later.  In most cases, the index information is entered by a keyboard operator based on information on the image itself, an operation known as "key from image." In some cases, however, the index information is extracted automatically from the images via a recognition process -- typically optical character recognition or bar code recognition. Some indexing information may also be assigned automatically to all images included in a particular batch.

QA and Rescanning
Quality assurance entails systematic reviews and checks to ensure that the scanned images are readable and the indexes are accurate. It includes methods for flagging bad images and explaining why or how images should be rescanned, as well as correcting errors or shortcomings in indexing. The QA step can be performed either by a QA operator or by an index operator.

Release
Release is the final stage of the capture process, and consists of handing off batches of in-process images and index information to users of the document imaging system. Typically, this is when the document images are written to optical disk or other long-term storage, and the associated index information is merged with the document database of the larger system. In addition, the release of a document might trigger a workflow process, initiate the foldering and filing of documents, etc.