CSA/ADAP Introduction to Databases
Important notice: The CSA web site was re-designed in August of 2010. Some documents then available were out of date; so they were not included in the re-design and were not updated. This is one of those documents. Information about dates of posting and revision remains here, but there will be no revision of any kind after August, 2010.
Database materials in particular have been superseded by the database chapter in Archaeological Computing.
A database may be likened to a table of information such as one you might see in a book. Tables such as the following are, in effect, simple databases. A table has vertical columns and horizontal rows. So, in the example here, a table of fibulae taken from Gordion I, Three Great Early Tumuli (pp. 168-169), each row contains information about individual fibula identified by inventory number. The columns show the inventory numbers of fibulae, the side the hinge is on, the height, the length, and comments (abbreviated here). These fibulae are all identified as belonging to a common stylistic Group, that called Group XII, 11, by the scholar Blinkenberg.
Inventory # | Hinge | Ht. | Length | Comments |
MM 311 (B 1121) | L | 0.0390 | 0.0420 | Pin missing, stud through spring |
MM 312 (B 1125) | R | 0.0400 | 0.0460 | Spring and pin missing |
MM 313 (B 1124) | R | 0.0400 | 0.0450 | Spring and pin missing |
MM 314 (B 1123) | L | 0.0390 | 0.0470 | Spring and pin missing |
MM 315 (B 1126) | L | 0.0390 | 0.0435 | Spring and pin missing |
MM 316 (B 1122) | R | 0.0410 | 0.0425 | Spring and pin missing |
MM 317 (B 1120) | L | 0.0300 | 0.0400 | Pin and catch missing - and most of spring |
In another example, this one of Pueblo Indian sites from the American Southwest on the Web site of the Archaeological Data Archive Project, the columns are site ID, name of district, name of site, state/federal no., number of rooms, number of kivas, beginning date, ending date, architectural layout, and the rows are individual sites, as identified by the site ID in the first column. Please look at some of the tables from that report before going on. Start with the table for SW Utah/Arizona and then try the one for the Mesa Verde area. Be sure to look through the file until you see the tables; you will need to scroll down through the file a bit.
Tables are effective ways to present information, but a data table (a computerized table, so to speak) is a more effective one if the data are at all complex. For example, a computerized data table can be automatically rearranged to suit one's needs. In the case of the Gordion table we could rearrange the listings so that they were ordered by increasing length of the fibulae, as follows, but that does little for us.
Inventory # | Hinge | Ht. | Length | Comments |
MM 317 (B 1120) | L | 0.0300 | 0.0400 | Pin and catch missing - and most of spring |
MM 311 (B 1121) | L | 0.0390 | 0.0420 | Pin missing, stud through spring |
MM 316 (B 1122) | R | 0.0410 | 0.0425 | Spring and pin missing |
MM 315 (B 1126) | L | 0.0390 | 0.0435 | Spring and pin missing |
MM 313 (B 1124) | R | 0.0400 | 0.0450 | Spring and pin missing |
MM 312 (B 1125) | R | 0.0400 | 0.0460 | Spring and pin missing |
MM 314 (B 1123) | L | 0.0390 | 0.0470 | Spring and pin missing |
Of course, I could have arranged them with the order reversed, or according to other criteria. We could also select only those categories of interest or only those fibulae of interest; for instance, we could have selected only the fibulae with springs on the left or displayed only the inventory number and the comments.
With a short table like this one, the advantages of the computerized version may seem minimal, as indeed they are. We gain very little by rearranging so few items. But look again at the table on the Web for Mesa Verde.
In that case, where there are many entries in the table, we cannot even see them all at once on either a piece of paper or a screen. Therefore, it might be helpful to reorganize them according to date of occupation or size (in terms of number of rooms or number of kivas). We could also ask to see all the sites for which the number of rooms exceeds 20 or all those with an earliest occupation date after 1100 A.D. - or all those with more than 20 rooms AND an earliest occupation date after 1100 A.D.. We could arrange the sites by date of occupation, site number, number of rooms, etc.
As you can see from that example, when the data set grows larger and more complex, the advantages of a computer data table become more and more important and powerful. All the tables in the Pueblo Indian sites could even be combined into one big table and sorted by area to provide the same information that is in the individual tables - but in a single one. Of course, by sorting the table in different ways, as we just discussed, other ways of looking at the information could be tried. We could group sites by size, for instance, for the entire region, not just one or another of the individual areas.
Another example may be found in the Gordion publication, since there are actually several tables of fibulae, one for those of type XII, 11, that were found on the bed of Tumulus MM (the table already reproduced above), another for the those of type XII, 14, that were found on the bed of Tumulus MM, another for those of type XII, 14, that were found on the floor of Tumulus MM, and so on. Each is a separate table in the book, but all could be combined in a single computerized data table. Why would one combine them? To have the value of each in one table - and more.
To combine them effectively, we would need two or three more columns - one to indicate findspot, and one to indicate Blinkenberg Group (perhaps two, one for the major Group and another for the sub-Group). Since the fibulae are often paired, we might also want a column that indicates the inventory number of the other member of the pair (if there is one).
Using a combined table, any user could select the fibulae from any one of the findspots with Blinkenberg Group and have the equivalent of any of the tables presented in the publication; so the single data table would provide all that is in the publication.. Such a table would, though, provide much more than the equivalent of the individual tables. A user could select all the fibulae from the entire corpus according to any of the variables - in this case, findspot, Blinkenberg type, height, length, hinge side, inventory number, and presence or absence of a mate. The paper publication of tables, on the other hand allows no direct way to gather the fibulae together and examine them as a larger group.
We could even group the fibulae according to multiple variables - perhaps asking for all the fibulae with mates and from the bed and belonging to Group XII, 14.
I believe that these are good ways to use databases. Information can be displayed more effectively and with more chance for the user - not the creator of the information but the ultimate user - to do decide what variables are important and how the data should be presented. That is an important point that is too easily overlooked. A paper table is the creator's idea of what readers need to know. A database, on the other hand, can be ordered and selections from it made by its creator of the database or the ultimate user. Such a presentation is much more useful as a result, since the user brings his/her own questions to the data and may require that the data be organized in a specific fashion in order to answer those questions.
So far, we have seen rather simple examples of data tables. They can be very helpful, but the real power of database systems has not yet been seen. We have considered only what are sometimes called flat files - individual stand-alone data tables. The more complex - and more useful - databases involve many tables that are interconnected to allow quite different tables to be used together. These are commonly called relational systems. Though the term relational as originally applied to databases has a very specific definition, it has come to be used to mean different data tables in a larger system, with each table related to one or more others.
Keeping in mind the fibulae from Gordion, let us consider a fuller presentation of the information we have about the finds there. Aside from fibulae, there were many other bronze objects found in the same places. Suppose each object type were in its own data table, as were the fibulae. We might have a data table for bowls and another for jugs. We might also have one table or more for pottery. If we could use all the tables together, we could list all the objects found in a specific context - fibulae, bowls, jugs, and so on. However, we would want different information for pottery than for fibulae; so we would not want to combine the tables. Nonetheless, we could relate all the tables to one another by making sure that the findspots for all artifacts were properly entered in all the tables. Then we could ask for all the objects from all the tables with a given findspot.
I have tried to present an introduction to databases here, and I hope a few things are clear. First, individual data tables (sometime called files) are very much like tables in a paper publication. Second, even rather simple data tables are more useful than paper tables. Third, more complex databases involve many related tables.
Before we stop, you should know that some people use a different set of terms. Tables may be called files, rows may be called records, and columns may be called fields. The terms are interchangeable. You should also know that the term database has some problems. Some people use it to mean an individual table (file); other only use database to refer to a related set of tables (files). With these thoughts in mind, you may now want to turn to the CSA/ADAP Archaeological Database Discussions or return to the CSA Information Technology Page.