Vol. XI, No. 3

Winter, 1999

Documentation for Databases: A Crucial Step

Harrison Eiteljorg, II
Susan C. Jones

There is an important distinction between data and information. Data are facts and, as such, are easily and quickly stated. Information is something more; it is the context within which data reside as well as the data themselves; information implies understanding. One sherd from the Lerna pottery catalog, for instance, is said to be from a pot of shape III, 3, but even supplying the definition, "shoulder-handled tankard with neck and 2 strap handles," does not supply much information. One needs to know that the shape is one of those used in the Greek Neolithic, to see a scaled drawing, and to compare it to similar shapes. (See the figure below for a database window showing the information available about this shape in the Lerna database.)

Reduced view of a database window showing information about the morphology of Lerna pottery.

Databases are intended to turn simple data into more complex information, to provide data in context, as shown in Fig. 3. Without some explanation of a database, however, that transition cannot be made; the information shown in Fig. 3 can be generated only by someone who understands the database. Indeed, one may not even be able to access the data without some understanding of the organization of the database. It is the documentation of the database that allows access to the database and making individual items of data into information. Documentation of a database is therefore fundamental to making databases useful. (1)

The goal for documentation is simple. Potential users must be able to utilize database information intelligently; so they need to understand the nature and structure of the database. Furthermore, in an archival setting, the documentation must be adequate for users many years hence and without access to any of the original scholars. (All documentation need not be available to the public. Some-the portion that is included in metadata for indexing-will be readily available, but the remainder will not.)

Since each individual database is unique, with its own idiosyncrasies and problems; the documentation must be different for each database. Nonetheless, there are common requirements for documentation, and it is the intent here to provide a minimum standard for documentation of any archaeological database. (The CSA/ADAP documentation of the Lerna Pottery database-may be examined as an example. See Data from J. B. Rutter, The Pottery of Lerna IV, www.csanet.org/archive/adap/greece/lernpot/lernameta.html.)

The minimum documentation should start with a basic description of the aims, organization, and materials contained in the database so that users start with a comprehensive picture of the information available to them, its organization, and its general subject matter. There must also be some discussion concerning details of hardware, software, and file structure to provide basic technological information. In addition, the individuals and institution(s) responsible for all aspects of the project (the project director, researchers, computer specialists, data archivists, etc.), dates pertinent to the project, organizational principles used in it, etc. must be stated. Of course, project organization, not only database organization, must be specified, since the organization of the data will be strongly affected by the organization of the work. Some background discussion will also be necessary for users, and the depth and specificity of that background information will depend on assessments of users' sophistication. Basic documentation should also include a discussion of the original sources of data and their forms. (For example, were artifacts examined for data input, or were reports from prior examinations used? Was data entry based on forms or interpretation of free-form text?)

In list form, then the basic documentation should include:

The basic documentation thus far defined should be sufficient to permit a potential user who is sophisticated as to both the technology and the archaeological material to use the database with some confidence. There would, however, be many assumptions that such a user would have to make. Additional documentation should remove the need for assumptions while making it possible for less sophisticated users to access the information.

Detailed documentation:

The two lists of information items should suffice for any user-at least any user with enough computer experience to make use of a database.

This is a very important issue, because so many scholars will be consulting databases in the future. If the documentation is not adequate, information will be lost or misunderstood. Therefore, it seems appropriate to ask for comments on this suggested list of items and to begin the process of setting a standard for archaeological database documentation. Comments should be sent to one of the authors, Harrison Eiteljorg, II, and Susan C. Jones, at CSA. Meanwhile, the Archaeological Data Archive Project is using the foregoing as a requirement for data deposited in the archive. The most recent deposit, the data from the Occaneechi Town excavation (see review of the CD in this issue), for instance, was documented to meet these specifications by one of the editors, R. P. Stephen Davis, Jr., even before the files arrived at the archive.

-- Harrison Eiteljorg, II
  :  Susan C. Jones

To send comments or questions to the author, please see our email contacts page.

(1) The need for documentation precedes the completion of a database and its release to the public. Any database takes a lengthy period of time to develop and complete. As a natural result, many people will have been involved during its life, few of whom will have been involved for the length of the project. As a result, extensive documentation is required throughout the life of the database. For example, data entries will be made by many individuals over a long period of time; so documentation of data entry procedures, terms, and limits is obviously required to maintain continuity during the data recording phase. After excavation has ceased and all field data have been entered, additional data (site plans, results of laboratory analyses, etc.) will probably be added while the analyses and preparation of the material for publication are going forward. These processes would probably involve yet more people, some of whom may not have participated in the data recording that occurred in the field. Of course, individual experts publishing material from the site would be unlikely to have entered all the data cogent to their analyses. Thus, at the early stages as well, documentation must be provided that will guarantee the quality and consistency of information contained in the database. It should be equally apparent that with the involvement of many individuals over an extended period of time, issues of responsibility for and continuity of the database arise. A few such issues are:

This article does not address these issues, but they are important considerations when setting up databases for archaeological projects. Return to body.

For other Newsletter articles concerning the use of electronic media in the humanities, issues surrounding databases, or the Archaeological Data Archive Project, consult the Subject index.

Next Article: To E-Publish or Not to E-Publish?

Table of Contents for the Winter, 1999 issue of the CSA Newsletter (Vol. XI, no. 3)

Master Index Table of Contents for all CSA Newsletter issues on the Web

Return to CSA Home Page