Vol. XXII, No. 3
CSA Newsletter Logo
January, 2010

Using Old Data in New Ways

Harrison Eiteljorg, II
(See email contacts page for the author's email address.)


I have recently had reason to be in a hospital and to watch people as they gather information from patients. (This may seem a long way from anything having to do with archaeological computing, but please read on.) Somebody writes down a statement or an observation or a reading; either that person or another transfers that item to another piece of paper, and the piece of paper goes into a chart so that someone else may find it at a later date. Everyone in the chain uses his/her normal reading and writing skills (presumably more carefully practiced than when the typical doctor writes a prescription), and information may be copied multiple times before it makes it to the chart and becomes part of "the record." Once it is part of "the record," of course, it can be accessed by one person at a time: the person holding the chart. This process has the virtue of being time-honored, but I can think of no other virtue. It depends upon accurate reading, writing, and copying of information that is easily treated as too routine to need full attention; it places information in forms and on papers in ways that require shuffling through to find the right item, which may or may not be the most recent reading or observation; and the information is available to only one person at a time.

The absence of computerized patient records in American medicine has been noted by many in the context of the recent health-care debate. Its relevance to archaeological data, however, may seem tenuous at best. Most archaeological projects, in fact, already use digital data and digital recording techniques. Similarly, in contemporary archaeological practice, when someone wants simply to see the information, the digital data can generally be accessed quickly and easily from the basic files. The discipline is, in that area, moving forward, albeit often without a self-conscious sense of the direction to be followed.

It is in the attempt to re-use data for new purposes that archaeological practice does seem often to fall back on procedures that stray from the optimal path, much as modern medical practices do. That is, just as a medical study might require manually digging through patient records to find and copy specific information of relevance to the study, so taking certain data items from an archaeological data set for use elsewhere is a process that too often is accomplished awkwardly and with too much opportunity for the introduction of error. Whether dealing with medical or archaeological information, any time an item must be copied by hand or by keyboard there is the risk of introducing error. Indeed, if there are enough individual items copied, one can be virtually certain that there will be some errors introduced in the process.

Given the need to re-purpose data and the problems with copying information, one might expect the means for avoiding pitfalls to be widely known and practiced. The skills required to take data from one source and reformat it for use elsewhere, however, are not skills that are widely taught or understood. In addition, the process often requires some understanding of data formats and of software packages that might not be used in normal practice, understanding that is not considered important in very many quarters. For instance, in preparing Propylaea data for presentation on the web, it was necessary to find a way to put data from a spreadsheet into a form appropriate for a web page; in doing so it was important to prevent contamination of the data due to inaccurate copying. While there were several ways to accomplish the task, I chose to make a formula in the spreadsheet that combined the extant data content of the spreadsheet with HTML code required by a web page intended to contain the data in tabular format. The actual data items were simply those already in the spreadsheet where they had first been recorded, and the HTML code in the formulae could be easily checked. The tabular data for the web page was re-formatted in the spreadsheet (with a process that did not change -- or even present the possibility of accidental change to -- the original data items) and simply copied for pasting into the web page.

In a more sophisticated setting -- something, for instance, like a computer center housing data from a project -- such a process would seem amateurish, and a more formal process for automating the re-formatting data to make it suitable for a secondary purpose would be prepared. Such sophisticated systems, however, assume a regular and repeatable process for taking data from one format and putting it into another -- and therefore regular and repeatable starting and ending formats. For a one-time re-purposing of data, on the other hand, users need to be able to create one-off solutions like the one described -- or like those used when Propylaea survey data were re-formatted (see "From Field Data to CAD Model: Modeling the NW wing of the Propylaea," by Harrison Eiteljorg, II, CSA Newsletter, XVII, 1; Spring, 2004). A recent InfoWorld item, "Is IT getting too easy?" by Paul Venezia, located at www.infoworld.com/t/education-and-skills/it-getting-too-easy-067?source=IFWNLE_nlt_blogs_2010-01-18, dated 01/18/10, and last accessed 01/18/10 has this in the first paragraph, and it speaks to the same question: "A few recent situations have led me to believe that IT may be getting too easy. . . . too easy for those who are just entering the game." Although the article is about networking issues, it makes the same general point, that experience with the truly basic issues of the work can be extremely important.

Another example of taking data from one source to use the data elsewhere is the creation of an LAS file in AutoCAD® demonstrated in Using AutoCAD to Construct a 4D Block-by-Block Model of the Erechtheion on the Akropolis at Athens, I: Modeling the Erechtheion in Four Dimensions by Paul Blomerus and Alexandra Lesk in the CSA Newsletter, XX,2; Fall, 2007 at csanet.org/newsletter/fall07/nlf0701.html. Since the LAS file produced by AutoCAD is a simple text file with the names of all the layers in the CAD model (plus the characteristics of the layers at the time the file was generated), it is relatively easy to use that LAS file outside AutoCAD. For instance, documenting an AutoCAD model requires a list of layer names for potential users, the LAS file can be used to make a text file with a list of all layer names used, perhaps with characteristics indicated by the layer-naming scheme. The text file is, after all, just that, a text file; so it can be used by a word-processing program to create a new file properly designed for a different purpose. Perhaps more important, it could also be used in a spreadsheet program or a database. (Regular readers will also remember the repeated uses of text generated in any of several other programs in order to be pasted into AutoCAD's command line.)

If using data in ways and for purposes not originally planned is to be commonly achieved, there is a need for insuring the availability of the necessary skills and understanding. Archaeology students -- like most contemporary users of digital data -- need not understand such issues as file formats, especially file formats used only because they are simpler than the more commonly-used formats of the day. However, understanding those formats is the first key to making these kinds of transformations. The discipline therefore needs a level of watchfulness on the part of those responsible for project data. It will be their responsibility to be sure that re-purposing data never requires re-entering the information, that anything entered properly into a computer is never again entered but only accessed and formatted for new uses.

-- Harrison Eiteljorg, II


An index by subject for all CSA Newsletter issues may be found at csanet.org/newsletter/nlxref.html; included there are listings for articles concerning the use of electronic media in the humanities.

Next Article: Know Your Choices

Table of Contents for the January, 2010, issue of the CSA Newsletter (Vol. XXII, No. 3)

Master Index Table of Contents for all CSA Newsletter issues on the Web

CSA Home Page