Vol. XIX, No. 1
CSA Newsletter Logo
Spring, 2006

Scanning Text and Drawings for Archives

Susan C. Jones


For archival purposes all past volumes of any periodical or newsletter should be accessible from a common source. This is a tautology, but rarely is it actually the case. It certainly is not for the CSA Newsletter. Volumes 1-7 (May, 1988 - February, 1995) were printed and posted by mail, with important articles later put on the website. Volumes 8-11 (May, 1995 - Winter, 2000) have both printed and web formats; starting with volume 12 (Spring, 2000), the Newsletter has only appeared as a web publication. Retrieving an article from a past Newsletter is easy for those that have been put on the web. For older articles, we have a file drawer.

We have digitized articles from our early issues relevant to the current archaeological community, but the remaining ones should also be archived in a digital format. The effort to put these out-of-date items into an HTML format, however, is not worth the number of times that they would be referenced. Our solution is to place scans of the printed pages of volumes 1-6 along with indices of the issues on this web site. See "Digitizing Remaining issues of the CSA Newsletter" in this issue (http://csanet.org/newsletter/spring06/nls0607.html). An immediate question arose -- which scanning protocol and file format to use. Not knowing which of of several scanning protocols would be best for this purpose, I experimented by trying several and storing the resulting images in various formats. I thought that our readers might be interested in the results.

The Experiment

I used our standard office equipment which is not the latest, but quite adequate: an HP OfficeJet R-80 All-In-One® printer/copier/fax/scanner, a PC running Windows 2000® and Adobe Photoshop version 5.0.® First, I selected several pages from early issues of the Newsletter which contained text and images. Then, I scanned them using various modes (color, grayscale, and black & white) at 3 resolutions (300, 150 and 75 dpi). I made no adjustments to the images either during or after scanning. Images stored in uncompressed TIFF format provided a basis for the comparison of the scanning modes. Once the decisions about scanning mode and resolution were made, various compression formats were compared for final image quality and file size (which affects the speed of loading and storage requirements).

For this application, the 150 dpi gray-scale scans provided provided sufficient detail. The 300 dpi scans were cumbersomely large. The images would not conveniently fit on my monitor, and the finer resolution for the color and gray-scale scans did not provide any information that could not be seen at the 150 dpi resolution. There was no discernable difference in quality between the images scanned in color mode and those in gray-scale, so I also discarded color scans because they produced TIFF files roughly 3 times as large as the gray-scale scans. I will discuss the reasons for rejecting black & white scans in greater detail below. As expected, none of the scanning modes produced acceptable details at 75 dpi. Having decided that the 150 dpi gray-scale scan produced sufficient detail at an acceptable file size, I proceded to examine the effect of various compression formats on the 150 dpi gray-scale image.

I stored these images in four formats: a low-compression JPEG, a high-compression JPEG, a simple GIF and an interlaced GIF. All four provided acceptable quality images, so I chose to use compression format that produced the smallest file, the high-compression JPEG, to preserve the older issues of the Newsletter on the web.

Some Surprises

When I examined the TIFF images, the results were startling. While there was little discernable difference in quality between the grayscale and color scans, both were far superior to the black & white scans. At 300 dpi, the text in the black & white scan was fuzzy and the thin lines in the drawing were barely visible. At 150 dpi, the text was fuzzier and the thin lines in the drawing had completely disappeared! I also tried converting the image mode to black & white from the gray-scale scan. The resulting black & white image seemed even worse than the image produced by scanning in black & white mode -- unacceptably bad.

Figures 1 and 2 below show the results for the black & white versions of the page. Figure 1 illustrates the image resulting from a direct black & white scan at 150 dpi, while figure 2 shows the image from a grayscale, 150 dpi scan converted to a black & white image within Photoshop. Notice that on these figures there is no pot profile to the left of the cross section. The lines of the profile were too thin to survive the process. Figures 3, 4, 5 and 6 show the images resulting from storing the 150 dpi, gray-scale images in the four compression formats. All these images show the pot profile. Figure 3 is the low-compression JPEG, 4 the high-compression, 5 the normal GIF and 6 the interlaced GIF.

 


Figure 1 - Image using a direct black & white scan at 150 dpi, JPEG format.

 

Figure 2 - Image using color scan at 150 dpi, converted to black & white mode, JPEG format.

 

Figure 3 - 150 dpi, gray-scale image stored using a low-compression JPEG format.

 

Figure 4 - 150 dpi, gray-scale image stored using a high-compression JPEG format.

 

Figure 5 - 150 dpi, gray-scale image stored using a straight GIF format.

 

Figure 6 - 150 dpi, gray-scale image stored using an interlaced GIF format.

 

As you can see, the quality variation in these images is minimal. The two GIF formats produced the largest files. They were both approximately the same size and were 125% the size of my low-compression JPEG file. The high-compression JPEG format obviously produced a much smaller file. In this instance, it was slightly greater than a 50% reduction.

Conclusions

I decided that the grayscale scans at 150 dpi resolution stored as high-compression JPEG files (Figure 4) best suited our requirements. The image quality is sufficient to provide readable text and drawings, while the image files are relatively small and download quickly. With the cost and technology available at the time, photographs were beyond the Newsletter budget, so there were no photographs to consider. If we had had photographs of any format -- black & white, color, film or digital -- we might have come to different conclusions.

-- Susan C. Jones

To send comments or questions to the author, please see our email contacts page.


For other Newsletter articles concerning the issues surrounding electronic media in the humanities, consult the Subject index.

Next Article: Digitizing Remaining Issues of the CSA Newsletter

Table of Contents for the Spring, 2006 issue of the CSA Newsletter (Vol. XIX, no. 1)

Master Index Table of Contents for all CSA Newsletter issues on the Web

CSA Home Page