The CSA Archives: An Overview

Important notice: The CSA web site was re-designed in August of 2010. Some documents then available were out of date; so they were not included in the re-design and were not updated. This is one of those documents. Information about dates of posting and revision remains here, but there will be no revision of any kind after August, 2010.

Pages concerning CSA archival activities may seem to imply that CSA continues to do archival work. However, all archival projects under CSA's oversight have been terminated. Scholars interested in archiving digital files may wish to contact the Archaeology Data Service in York, England; Archaeological Research Institute at Arizona State University; or the Digital Archaeology Record.


An archive, digital or otherwise, for disciplines such as archaeology and architectural history may contain wildly diverse materials, but such an archive is much more than a collection of objects, paper records, or computer files. It is a crucial part of the collective memory of the discipline, the more so in archaeology because the methods of archaeology often leave us with records as our only knowledge of destroyed physical realities. Thus, for anyone who wants to examine archaeological materials anew, archival storage of records from a project is as crucial as proper storage of the finds themselves. For architectural historians, some of the buildings studied may continue to stand and be subject to re-examination at a later date, but the time, effort, and expense required to document a building thoroughly are enormous and should not be wasted. [Please note: The use of the word archive in the foregoing and throughout this document is technically incorrect. The singular is, in fact, archives. However, common usage has changed such that we will use the term as it is so often used in common American English, as the singular form of a word meaning a collection of documents or objects, whether digital or not. Common usage also now includes the use of archive as a verb.]

Of course, the point of archival storage is not simply to maintain the records but to make them accessible to others now and in the future.

The CSA CAD archive and the archive of digitized lantern slides from the Bryn Mawr College Digital Media and Visual Resource Center are intended to meet the special needs for archiving CAD models and digital images. The CAD archive is one of CSA's founding purposes. The digitized lantern slides from the Bryn Mawr College Digital Media and Visual Resource Center are important archival sources for those interested in antiquity. Many of the photographs were taken before their subjects had been damaged or destroyed by the wars and pollution of the twentieth century. This project is more closely focused on providing access to the images, via the Lantern Slides of Classical Antiquity Project page, but the archival preservation of the digitized images is a critical part of the project. The Lantern Slides project is now maintained by Bryn Mawr College, and the Web address has changed to http://www.brynmawr.edu/Admins/DMVRC/lanterns/. The original (and identical) Web pages on the CSA Web site are no longer available.

[Note: The Archaeological Data Archive was a component of CSA archival work, but it ceased active operation in August of 2002. (See The Archaeological Data Archive Project Ceases Operation," by Harrison Eiteljorg, II, CSA Newsletter, XV, 2, Fall, 2002, http://csanet.org/newsletter/fall02/nlf0201.html.) The Archaeological Data Archive Project grew out of proposals from the Committee on Electronic Data and Computer Applications of the Archaeological Institute of America (AIA). The committee refined the ideas underlying the Archaeological Data Archive Project (ADAP), but the ADAP and the archive itself were the responsibilities of CSA, not the AIA. The AIA endorsed the project, as did the American Anthropological Association. The Archaeological Data Archive Project was announced in the fall of 1993.]

Why do computer files need special care?

Ironically, computer files are much more vulnerable to loss over time than paper records. They decay faster - a problem that has been recognized and can be circumvented fairly easily. Much more important, they may become obsolete in a matter of a few years due to advances in computer technology.

Decay of computer files - the weakening of the magnetic, optical, or electronic coding on a substrate - can be overcome by simply copying the files before decay has caused noticeable degradation. Since some modern media decay very slowly, this is not a difficult or onerous process. CDs, for instance, are said now to last a century, though prudence would dictate that a decade be taken as a more realistic life span; so the need to refresh data is not a problem that introduces frequent or complex intervention.

Obsolescence of computer files is a much more serious problem. The pace of change in the computer industry is so fast -- change in hardware, operating system software, and application software -- that it is all but impossible to keep pace, and the effects are surprisingly widespread. One of the effects of these constant changes is the regular alteration of computer file formats (the ways information is encoded on a disk) so that new formats can hold data of different kinds or are more compact or are otherwise considered better. Altering file formats makes the old formats obsolete, and, of course, obsolete files cannot be accessed by new computers (or they can only be accessed at considerable unnecessary expense). Hardware advances result in changes in media (the CDs mentioned above may last a decade or two, but the hardware to access them is already being superseded); meanwhile, changes in operating systems or applications result in new file formats. In either case, the resulting files, even if preserved in pristine condition, may become inaccessible and useless.

Preserving data files by moving them from one form to another - either one storage medium to another or one file format to another - is not particularly difficult in most cases. From medium to medium, the transfer is a simple file copy process so long as the transfer takes place before the existing media have gone completely out of use. Changing the file format, on the other hand, may be very easy -- requiring only the latest iteration of the application used to create the original file so the format can be changed automatically -- or it may require some more specialized expertise. When the format change is not automatic, moving the data to a new format (a process commonly called data migration) requires substantially more expertise and experience. Preparing data files to be accessed by application software from a new company, for instance, can cause considerable difficulty. In such cases, the problems can be caused solely by the changes in software but are more likely to be the result of both the software changes and the complexity of the data. The more specialized the discipline for which data have been collected, the more likely it is that the complexity of the data will present special problems. Therefore, both computer expertise and specialized discipline-specific knowledge will often be required to migrate data successfully. In any case, it is certain that, sooner or later, special skills will be needed to keep computer records in accessible forms. Such skills are what make the CSA CAD archive and the archive of lantern slides important. We believe that such discipline-specific archives are necessary to provide the care needed for specialized data files over time.

Some General Archival Issues

Content of the CSA CAD Archive

The CSA CAD archive is very specialized, holding only CAD models. That imposes some special problems. First, there is the question of the appropriate format(s) in which to store the CAD files. There is no fully capable, non-proprietary format into which the files can be put, no neutral format that can be used for all CAD files. In addition, there are, in the widely-used proprietary formats, many kinds of data other than the geometry of the model that may be included in the file. The extraneous data, in fact, cannot generally be removed from these proprietary formats. Taken together, these problems deny us an ideal choice for an archival data format.

The CSA CAD archive will normally contain files in AutoCAD's standard, proprietary format, called DWG. There are several reasons for choosing this format. First, it is the format of the most widely used program for desktop CAD, AutoCAD. Therefore, most files are likely to be in this format from the beginning. Second, the DWG format imposes few limits on the model, while other formats sometimes do impose limits - typically on the number of layers used in the model. Finally, the DWG format is so widely used that there are available tools for parsing the format.

Although the DWG format is the standard for the archive, other formats will be accepted. In those cases, however, an additional format is required, the DXF format, which is a semi-proprietary format for the exchange of drawing information. (The format is controlled by Autodesk, the company that produces AutoCAD, but the format specifications are public.) This is neither so good nor so robust a format as to be an ideal choice, but it seems to be better than any of the currently available alternatives. Other choices may be acceptable in specific circumstances, and developments in the commercial marketplace may change matters.

Both the DWG format and the DXF format change over time, but programs can generally read older versions of the formats as well as the current ones. Therefore, the archive will contain the format in which a given model was stored, noting the version number of the format. The format will not be updated to a more current version until or unless the migration to a more current version is necessary to preserve access. (That migration would render the files useless for older versions of the software.) In such cases, the older format(s) will also be retained until they can no longer be used at all, and future migration will be based on the original files, not migrated forms, until that is no longer possible.

Content of the Lantern Slides Archive

The Lantern Slides of Classical Antiquity Project includes an archive of the original digitized files in TIFF format. That is the preferred format for all images in the CSA archive. Derivative images in other formats may be used on the Web, but the archival versions will remain in TIFF until a better replacement has been defined.

Adding to the CSA Archive

Data appropriate for the archive may be contributed by their owners. CSA will provide free access to the data for scholarly use, provide paid access to the data for commercial use (with the fees split between CSA as agent and the owner), and, most important, provide the archival storage appropriate for the data; in addition, if requested, CSA will take on the responsibility for asserting copyright protection. Archival storage may include migration of data formats and will certainly include off-site storage of one archival copy. For more information, please contact CSA Director Harrison Eiteljorg, II, at CSA's address (below).

Documentation of Archived Files

All files must be accompanied by adequate documentation; matters such as files types, structure, vocabulary control, and so on must be fully defined and explained. The documentation serves two purposes. First, it will be needed by users of the files, because they will need to understand a great deal about the files in order to use them well. Second, the documentation will be required for eventual data migration. The most difficult aspects of data migration are likely to be those surrounding the organization of the data, not the changes in file formats; so the importance of this documentation cannot be overstated.

Specific documentation requirements depend on file types. To the extent that such requirements have been defined for particular file types, the requirements are available via the Web. There are defined requirements for CAD models and images.

Files must also be accompanied by indexing information so that users can find appropriate information.

Potential data depositors should be familiar with one of the most vexing terms used in discussions of digital data - metadata. Roughly defined, metadata are data about data. The term, however, is used to label both the kinds of data used to supply indexing information and the kinds of data used to describe and define the content of individual files. To make matters worse, the term may be used for either or both kinds of metadata without distinction or explanation. In the opinion of CSA personnel, this usage of the term metadata has made the term worse than useless, as it is often used in such a way that is seems to be one of those jargon words that only the cognoscenti know and use. Therefore, we will try to avoid its use and rely upon the more meaningful terms indexing information and file documentation.

Recognizing that scholars may find it difficult and time-consuming to prepare data for deposition in the archive and to document the files, CSA personnel will assist with file preparation and documentation. Indeed, the documentation is so important that ADAP personnel will examine the documentation with special care to be sure that there will be no problems with data migration. CSA personnel will also assist with the preparation of indexing information on request.

The core of the CSA archive is, of course, the archive itself. The archive includes information that may be browsed on the web and files that may be down-loaded. Along with the actual data are explanatory notes to aid those wishing to use the data. The archival files are presented in two groups -- CAD files and the archive of Lantern Slides from Classical Antiquity. In the case of the lantern slides only, the archival files are only available on request. (See the home page of the Lantern Slides from Classical Antiquity Project.) The files available on the Web are derivative images at two levels of reduction in JPEG format.

Miscellaneous Information.

