Articles in Vol. XXVI, No. 2
Website Review: Penn Museum Interactive Research Map & Timeline
Of Layer Names — And Babies — And Bathwater
Archiving the Digital Files from the CSA Propylaea Project
Aggregating Data — A Very Problematic Process
Miscellaneous News Items — And Some Important Questions
To comment on an article, please email
Index of Web site and CD reviews from the Newsletter.
Limited subject index for Newsletter articles.
Direct links for articles concerning:
Search all newsletter articles.
Archiving the Digital Files from the CSA Propylaea Project
Harrison Eiteljorg, II
(See email contacts page for the author's email address.)
The CSA Propylaea Project has been finished for some time now. While my first post-project work was to complete the website so that people could have access to all the digital information, I needed then to find ways to ensure the archival preservation of the fruits of the project for the foreseeable future.
My first attempt to archive the data was via tDAR (the Digital Archaeological Record), the archival organization that is based at Arizona State University and is part of the Digital Antiquity consortium. (See Francis P. McManamon, Keith W. Kintigh, and Adam Brin, "Digital Antiquity and the Digital Archaeological Record (tDAR): Broadening Access and Ensuring Long-Term Preservation for Digital Archaeological Data," CSA Newsletter, XXIII, 2; September, 2010; csanet.org/newsletter/fall10/nlf1002.html.) That was not fruitful for two reasons. First, tDAR will not accept CAD files. Since the CAD files are the primary expression of our results, this was not acceptable, though I will confess that I kept putting that fact out of my mind in order to try to find a way to use tDAR, the only American archival organization for archaeological materials.1 Second, tDAR is very well-designed for projects based in the Americas, especially the southwestern portion of the U.S., but it is not so well-prepared for materials from Europe.
Ignoring the proverbial elephant in the room (the AutoCAD files) at tDAR for a time, I investigated the potential of archiving data at tDAR. In fact, I deposited Archaeological Computing there as an experiment (and I am told it has been downloaded three times, presumably by people who found it via a search process). I contacted people at tDAR whom I knew and proceeded with the CSA Propylaea Project, again despite the CAD files issue that I was effectively ignoring. I learned that the basic text materials (the HTML files on the CSA Propylaea Project website) needed to be combined into a single PDF file for proper archival treatment at tDAR, and I began to try to make a single document from the HTML files that are the pages of the website. At the same time, I tried to figure out how to use the very heavily automated tDAR system to deposit the photographs that are a major part of the project data. It seemed to me to be all but impossible to use the tDAR system to deposit the photographs in a fashion that met the needs of the project. In addition, the fitness of the archives for the Propylaea material had become somewhat more worrisome. Although I assumed that time would change matters, the tDAR archives is clearly aimed at materials from Native American sites first and foremost. Too often I found myself trying to figure out what terms to apply to the material when none of those supplied by tDAR was appropriate. I did get so far as to make what tDAR calls a project for our work and then to deposit five PDF files.
After making the early deposits last year, I finally decided that another avenue was required; the CAD files issue made that necessary. I contacted someone at tDAR to be sure that they would not accept CAD files, even without agreeing to migrate them. They would not.
I then contacted the Archaeology Data Service in York, England. Their primary focus, naturally, is British archaeological material — materials either concerning projects in Britain or produced by scholars from Britain — and I had assumed that they would not be interested in the Propylaea materials. Fortunately, they were willing to archive the Propylaea files, and the ADS has no problem with CAD files. (The fact that the language used for the CSA Propylaea Project was English was important in the decision to accept the CSA Propylaea Project files.)
The process was quite different with the ADS. For the most part, I dealt (via email) with individuals who guided me through the necessary steps, answered questions, and generally worked with me through the process. (A caveat here. Given my own background, I know the people who are in charge at both tDAR and the ADS. As a result, my experience may not be typical. I never felt that I could not or should not write directly to individuals at either organization. I am assured, though, that I received no special treatment at the ADS.)
For the ADS our web pages (HTML documents) were acceptable (with modifications so that they did not include our project-specific menus and other additions), and it was not necessary to combine them into a single PDF. I did need to extract the actual text from our own website design, but, given the database underlying the design, that was quite simple. I also needed to re-word some documents in light of the end-of-the-road usage, and I made some other changes, all relatively simple, to make the pages work in their new setting. I worked on one or two pages, making sure that I understood the requirements and checking with my contact person at the ADS to be sure that the products were acceptable; then I went forward with the other HTML pages. The other documents, by and large, simply had to be sent along to the ADS. [I comment in passing that seeing a thumb drive in a mail pouch and knowing that such a small device could and did contain all the files from the CSA Propylaea Project was humbling, to say the least.]
After preparing the HTML files (and the images used in them), I filled in a spreadsheet with information about all the files (HTML, jpg and other image formats, CAD, PDF, and database formats), all of which had been arranged on the thumb drive in a suitable directory structure, and filled out a prepared form concerning metadata. The process began in March and was completed in July. [I note here that the ADS staff even spent the time required to examine our website to see what the materials were — and noted our use of Greek translations, something we had already decided not to continue.] In order to keep costs down, I had agreed to try to supply the material as the ADS wished. I believe that I could have spent far less time on this process in return for a higher fee. I should also be explicit that neither ADS personnel nor I worked only on this archival project full-time. Many other things (including computer problems, illnesses, and absences from work) slowed the work. As it was, the vast majority of the work was completed in June and July.
At the end of the day, I found the human contact at the ADS to be very much preferable to the more computer-based system at tDAR. Of course, that may in part be because dealing with those people at the ADS was made relatively easy by their skill, patience, and professionalism. They were extremely kind and helpful. The individuals at tDAR with whom I worked were also helpful, but their system was designed to require data deposit via the web, and that just did not seem effective to me unless the project fitted some pre-conceived mold. It did not seem to me that ours, especially the large number of photographs, did fit. [A note appended on 31 October 2013: After the files had been sent to England on a thumb drive and received there, some problems were found, and it turned out that some files had not been properly sent. A few had not been on the disk at all. While the tDAR processes might well have found the same errors, it was the professional staff at the ADS that found these problems and allowed me to sort them out.]
It may be useful to attempt to specify the comparative costs of tDAR and the ADS. This is not an easy task, since tDAR apparently charges on the basis of the number of files being deposited. [I should note here that tDAR was charging no deposit fee when this process began. That condition changed at the end of calendar year 2012.] The files deposited at ADS include multiple derivatives of image files. Assuming that a single PDF had been prepared for tDAR and that only one version of each image had been deposited (as opposed to a thumbnail and a full-size JPG with a caption sent to the ADS in addition to the original image files), I calculated the total charge for archiving the CSA Propylaea Project files at tDAR at approximately $ 12,250; the charge for the ADS was £ 2,500 (about $ 3,900 at current exchange rates — but $ 4,138 by the time the transfer went through, because of the changes in dollar-to-pound value due to the American fiscal crisis — correction of dollar amount added 23 October 2013).
I finish this discussion with something I have become accustomed to saying in the last year or so. I would argue strongly that any archaeological project should establish a relationship with an archival organization at the very beginning of the project. That relationship should permit project personnel to plan with the aid of the archival specialists so that 1) file formats are, from the beginning, as required for the archives, 2) organization of files is designed appropriately for both the project and the archives, and 3) changes instituted in the work of the project will not jeopardize archival processes. Working with an archival organization from the outset should make the final archival processes easier, quicker, less time-consuming and expensive, and more reliable. Best for all.
-- Harrison Eiteljorg, II
1. Please note that Open Context may, to some, appear to be an archival repository, but I believe Open Context is an effort to preserve specific information that meets its standards and can be combined with other, similar information, not an effort to archive all the data produced by a project. Return to text.
All articles in the CSA Newsletter are reviewed by the staff. All are published with no intention of future change(s) and are maintained at the CSA website. Changes (other than corrections of typos or similar errors) will rarely be made after publication. If any such change is made, it will be made so as to permit both the original text and the change to be determined.
Comments concerning articles are welcome, and comments, questions, concerns, and author responses will be published in separate commentary pages, as noted on the Newsletter home page.