Vol. XXI, No. 2
CSA Newsletter Logo
September, 2008

"Le mieux est l'ennemi du bien."

Harrison Eiteljorg, II
(See email contacts page for the author's email address.)

Usually translated as "The perfect is the enemy of the good," this remark from Voltaire's Dictionnaire Philosophique has been used again and again to express the general notion that striving for perfection can obscure or overcome the need to accept a good, if imperfect, solution that is nearer to fruition. Though the literal French could be taken to mean "The better is the enemy of the good," the sense that it may be better to do the possible than strive for something more (nearly) perfect remains, and I have come to believe that this notion applies very directly to the current state of archiving digital data in archaeology.

I take certain things to be true and basic to this subject, and I think those starting points should be made explicit and defended at the outset.

  1. It is better to archive digital data than not, regardless of how well, poorly, or indifferently the data have been prepared, documented, and entered. At present there are numerous data sets from projects that have been going on for years, in some cases decades. If the data are not preserved, the loss to the discipline is unforgivable. While it may be useful to put the data sets into forms that are more robust or more modern, the desire to do so should not be permitted to delay preservation, perhaps indefinitely. The stakes are simply too high.

  2. Most archived digital data may be usefully accessed, even if the data have not been well prepared for archival preservation. It may be very difficult to use data that have been preserved hastily. Files may be hard to deal with, obsolete as to format, and completely lacking in documentation. Each of those impediments has a deleterious effect on the utility of the data; however, none of them completely vitiates the potential utility unless the files themselves represent the sum total of the fruits of a project. That is, assuming some written (published or not) description of the project, data will almost certainly be, to one degree or another, recoverable. There may or may not be costs, and those costs — in time and/or funds — may render the data effectively useless, but the most basic of archival procedures, e.g., putting data into simple, non-proprietary formats such as tab-delimited ASCII for data tables, would go a long way to making the data useful in even the worst circumstances.

  3. Aggregating data and providing easy access may seem to be necessities for digital data, for some even the raison d'être for digitizing in the first instance, but it is not so nearly a reality as to permit those aims to delay necessary — and relatively simple — preservation. After all, data not preserved can never be aggregated with any other data or accessed easily via the web. Never. As I have argued elsewhere (see "XML, Databases, and Standardized Terminology for Archaeological Data," CSA Newsletter, Spring, 2005; XVIII, 1), I do not believe that we have a set of agreed-upon terminology, much less cross-language uniformity, that would permit us to treat data aggregation as a realistic archival aim for some time.1 Therefore, this aim simply must not be permitted to encourage scholars to postpone the preservation of extant data.

The first point is the most critical. It may be that some readers will reject the basic notion that preservation of digital data is, in and of itself, a critical goal, though I hope not. All who accept that starting point, however, will probably take points two and three to be unnecessary in the sense that those issues do not change the first point. It is, however, with regard to points two and three that Voltaire's famous statement comes into play. Many have recognized the clear and obvious fact that the kind of limited, difficult access to data suggested in point 2 is far from optimal. As a result, there are now many people deeply involved in efforts to assist with making better, more robust, more universal forms of data storage for the sake of simple access. That work is critical; in the long run, it is necessary to overcome the problems created by the production of data sets that are in some sense inadequate and data sets that have not been prepared with easy access and aggregation in mind. Those problems are very real, and solutions to them are required if we are to make the transition to having good, open and unfettered access to archaeological data on the web.

My concern is that the more technically interesting and future-oriented work required to make better, more professionally approached and universally designed data sets has left the field with few people who are focused on the present needs and who are consequently interested in doing what I consider to be the necessary work of the moment — preserving existing data sets, however flawed they may be, without requiring the extra time, labor, and expense to transform the data into different forms. Furthermore, to whatever extent preservation is taken to demand better, more universal data sets to begin with, to that extent scholars with less perfect data sets may believe that they cannot preserve what they have until some more perfect state of the data has been achieved. Thus, the better may become truly the enemy of the good if we seem to be sanctioning delay.

This problem is made manifest by the paucity of archival repositories.2 Readers of the CSA Newsletter may remember that my own data from my work on the older propylon were deposited in the Archaeological Research Institute at Arizona State University. For a very long time I tried to find the older propylon data through the web site. At last it is there (http://archaeology.asu.edu/digital/OPAD/index.html), and I am happy to report that the data seem to be very well presented. However, I still know of no alternate repository for simply storing archaeological data sets without reformatting the data to meet the repository's needs, and the Archaeological Research Institute cannot be expected to act as the sole repository for digital data from all American archaeological research. This is not simply unfortunate; it is a sad commentary on the ability of the discipline to put first things first and to do that which must be done — to do the good while waiting for the arrival of the better. This is no longer visionary work but relatively simple and mundane — and absolutely crucial. Where are the professional organizations and academic institutions that should be at work on this?

-- Harrison Eiteljorg, II

1. My favorite example of the problem of terminology is the term amphoriskos. It is defined (in the web resource, Archaeowiki.org last accessed 09/18/08) as "a short to very-short form of the amphora, a jar with two handles . . . " and the same resource defines short as 15-25 cm. and very short as under 15 cm. in height. The Louvre (at http://www.louvre.fr/llv/glossaire/detail_glossaire.jsp?CONTENT%3C%3Ecnt_id=10134198673228616&FOLDER%3C%3Efolder_id=9852723696500935&bmLocale=en - last accessed 09/18/08) defines amphoriskos as "A miniature amphora with two side handles, used for storing perfumed oil." I am not interested in debating the definitions here but simply in pointing out their lack of agreement and specificity — and here I have intentionally chosen a term for which problems introduced by modern languages should not complicate matters.   Return to text.

2. In his article in this issue of the Newsletter, "To What Extent Do Digital Technologies Solve 'Archaeology's Publication Problem'?" Charles Watkinson refers to the existence of many digital repositories as listed at http://maps.repository66.org/. Of the repositories listed there, the vast majority belong to one of five consortia: DSpace, EPrints, Fedora, BEPress, and OpenRepository. These consortia either hold materials of a specific type (e.g., texts) or discipline (e.g., medicine) or subscribe to a particular set of protocols requiring depositors to meet their internal organizational schemata. These consortia all have as their first purpose providing access to the data over the web. They illustrate well the aims mentioned in point 3 above. The remainder of the repositories, those in the "Other repository" category (between 50 and 60 in the U.S.), all seem to be institutionally-based or otherwise limited as to the material they will accept. That, however, is based upon a rather quick examination of the respositories. Oddly, the one archaeological archive I know of and to which I regularly refer, the Archaeological Research Institute at Arizona State University is not included in the list.   Return to text.

For other Newsletter articles concerning issues surrounding digital archiving, issues surrounding the use and design of databases, or the use of electronic media in the humanities, consult the Subject index.

Next Article: Do You Know Where Your Data Are Tonight?

Table of Contents for the September, 2008, issue of the CSA Newsletter (Vol. XXI, no. 2)

Master Index Table of Contents for all CSA Newsletter issues on the Web

CSA Home Page