A new database arrived in the CSA office this summer. This was a single-file database (often called a flat-file database) with the simplest kind of information - names and addresses. It illustrates some of the most basic problems with databases, though; so it may be helpful to consider some of the things in this database that should have been different - and that had to be changed.
First, the name category was just that, a single category for the names of the individuals. Title, first name, and last name were all entered together in that category. That may not seem to be a problem at first blush, but how does one alphabetize the list by name? Some people will be listed in order by first name, others by Mr., Mrs., Dr., Prof., or . . . . Similarly, the addresses were separated into only two or three items - a street-address category, sometimes a second one for complex addresses, and a city-town-zip category. Order the group by zip code? Not possible. By state? Also not possible. In fact, one could not even order the list by city, though the city name began each city-town-zip entry, because sometimes the city-town-zip entry was the third item in the information set and sometimes it was the fourth.
To make certain that maximum confusion reigned, the data entry was also inconsistent. Sometimes city names were abbreviated, usually not. Sometimes state names were given the proper Post Office abbreviations; sometimes not.
This was a simple database with fewer than 300 entries. It has been corrected with a combination of manual corrections and FileMaker scripts. Imagine such problems in a large, complex database, though. Consider, for example, an excavation database for ceramic finds. Lumping together the number of pots of a given type with the name of the type may make it impossible to group entries by type (or, better yet, by type and then ordered within type by quantity). Combining ceramic type with excavation context would have similarly unwanted results. There are, of course, many kinds of data that can be combined in ways that seem obvious and desirable, but the best choice is nearly always to separate data into the smallest, most discrete categories possible.
Consistency is also a problem with excavation data - just as it was with the names and addresses. If abbreviations are used (not a good idea for a variety of other reasons), searches become impossible. So does any sensible grouping unless all entries are abbreviated - and consistently abbreviated with the same abbreviations.
Even with a simple database, it is crucial to think carefully about the way the data are to be used before constructing a database or entering data. The ways the data will be stored and entered depend on the ultimate use. In a more complex database the stakes are higher, the problems greater, the fixes more time-consuming and error prone, and the planning required both more extensive and more important for the final results.
For other Newsletter articles concerning issues surrounding the use and design of databases, consult the Subject index.
Next Article: Lantern Slides of Classical Antiquity Project
Table of Contents for the Fall, 2000 issue of the CSA Newsletter (Vol. XIII, no. 2)
Table of Contents for all CSA Newsletter issues on the Web