Vol. XVIII, No. 3
Winter, 2006

Even the Superceded Data Are Valuable

Harrison Eiteljorg, II

At the most recent annual meeting of the Archaeological Institute of America Bruce Hartzler gave a presentation about the efforts to convert the data from the American School of Classical Studies' Agora excavations from paper card files to a digital format. Mr. Hartzler has worked on this project for some years, and the resulting data set and access system are very impressive. His demonstration was very effective.

One aspect of the data system especially interested me. Agora file cards were often added to by scholars over the years; they were occasionally changed as well, with the changes showing on the cards -- generally a single line through an entry to strike it, with a change written in. Scholars using the data had found the various additions to and changes of the data to be of value (often along with their own knowledge of the handwriting to identify the scholar responsible for the changes), and Mr. Hartzler therefore included all the data on the cards in the system -- the original entries plus the additions and corrections. Users need not see any but the latest version of the data on the card, but they may also ask to see all the entries.

This struck a chord with me because I have long advocated the preservation of all stages of data entry in data sets that are created in digital form. Entries that are changed can be preserved as part of the data set in a number of ways -- my preference being shadow files that contain each altered version of a particular data row, complete with the dates of all changes and the persons responsible for each change. (Adding this feature to a project database is not burdensome.) While I have long believed that this is a true necessity for project data -- providing not simply the current view of a specific item of information but the choices that have been made throughout the history of the project, it was heartening to hear such a clear example of scholars wanting just such information.

It is not immediately obvious which data items should be so carefully tracked. What items are interpretive and therefore in need of this kind of intellectual rigor? Does one, for instance, keep track of a change in recorded dimensions that amounts only to a correction of an obvious error? Or are only more purely interpretive data items tracked? This is partly a practical question, since each data table that needs a shadow table adds some complexity to the entire database. However, it is also a truly theoretical question, bearing directly on the discipline's view of our "raw data." At the end of the day, I would argue that there are few data categories that do not require this level of care because there are few data categories that are never changed, few that are not susceptible to error.

-- Harrison Eiteljorg, II

