CSA Newsletter, Fall '07: Reviewing Electronic Data Sets

Reviewing printed books and articles is a process with a long and distinguished history; it is crucial to the academic enterprise. The process is so well-established that it exists in two distinct forms, the review prior to publication and the review after publication. The former is often fully anonymous (both author unknown to reviewer and reviewer unknown to author) and aims to determine the suitability of the material for p ublication and for publication in the particular venue. The latter attempts to assess the final result by putting it into its scholarly context, evaluating the arguments, checking the completeness of the evidence cited, and weighing its contributions for the benefit of readers.

The pre-publication review process requires that a scholar with appropriate field-specific experience examine the manuscript. The reviewer(s) must assess the manuscript, but they also have the opportunity to offer suggestions for improvement -- additional bibliography, lines of reasoning not used, evidence missed. Ideally, reviewers do not require that the author agree with them but that the author state the issues clearly and accurately and that the author argue cogently and logically. Thus, the advance reviewers act as a part of the gate-keeping process that prevents sub-standard work from entering the scholarly corpus and wasting scarce resources. While it is sometimes argued that the results can stifle non-standard opinions, some process is surely required to separate the wheat from the chaff.

The post-publication review is an equally important part of the scholarly process but is normally reserved for book-length works. The post-publication reviewer must also be carefully selected so that expertise is matched to the topic, but the post-publication reviewer has no opportunity to impact the publication, to encourage change, or to suggest improvements to the author, though a review may prompt changes in a subsequent edition of the work.

The post-publication reviewer may be only one of many independent reviewers; there will be as many reviewers as there are venues judging the publication to be worthy of review. As a result, no single review is necessarily definitive, but any good review will help all who use a publication to determine what to trust, what to be wary of, and what to consider discarding. The reviews will point to problems and to successes, and the reviewers will disagree, thereby serving readers even more effectively. In short, they will provide a check, by people with appropriate knowledge, on the methods and selections of the author. The reviews will make it easier for anyone to use the publication effectively and appropriately.

There is nothing about publishing on paper versus doing so online that requires a significantly different review process. In the case of certain kinds of non-linear electronic publications, the process can be much more difficult because examining a non-linear resource is more taxing and more prone to error; parts are easy to miss when they are not related to one another in a simple, lock-step progression. Nevertheless, the issues to examine remain the same: completeness, accuracy, and clarity.

Things change rather dramatically when the electronic resource is a data set: a database, a CAD model, or a GIS data set. The resource is not then a coherent, more-or-less-linear argument marshaling evidence and argument for a specific conclusion or set of conclusions. Instead, the resource is the evidence itself -- it may be from a particular project such as an excavation or survey, or it may be assembled from many projects and aggregated for broader use. Since such resources may be used by many scholars to provide the raw materials for arguments and interpretations, they are at least as important as any other publication and arguably more important. They must be reviewed, though the review process needs to be quite different. There is only evidence in such resources, no argument to follow and evaluate, no interpretation to assess.¹

There are many reasons for reviewing a data set that has been published (in the sense of being made available, most likely via the Internet). For example, if the data have been aggregated from other data sets (a compilation of pottery from a region, for instance), the issue of completeness is critical, just as critical as an up-to-date bibliography. Moreover, an aggregated data set must be based upon terminology and data recording processes in all the original data sets that could be and were reconciled properly; a reviewer should examine and discuss this issue. There are many other important issues, regardless of whether the data set is a single-project one or an aggregation. For example, assuming a database, these issues are among the critical ones:

clarity of field/column names
consistency of terminology
organization of the data (specifically including questions about how the organization does or does not permit important queries)
available format(s)
specificity and clarity of measurement units, meanings of null entries, and so on.

Most of the issues listed would, in fact, not be parts of the data set but included in the documentation thereof; the reviewer would need not only to check the documentation but to verify its accuracy.

The technical aspects of a data set are only one part of a review. A reviewer must also examine the data themselves to be sure of the coverage, the appropriateness of the information recorded, and other issues of scholarship that are unrelated to the digital nature of the resource. As with a review of a paper publication, the aim is to make clear the utility and trustworthiness of the resource. A user must know whether the data are well-organized, how they can be used, and whether they can be trusted.

It may seem that aggregated data sets require more thorough review than smaller, project-specific data sets. However, most project-specific data sets will, sooner or later, become parts of a larger data aggregation (as the Carytatid model became part of the Erechtheion model, see "Using AutoCAD to Construct a 4D Block-by-Block Model of the Erechtheion on the Akropolis at Athens, I: Modeling the Erechtheion in Four Dimensions," by Paul Blomerus and Alexandra Lesk in this issue of the Newsletter.); combining them with similar data sets requires that the person responsible know how well each member of the full set has been constructed. Absent a review, someone seeking to assemble a larger data set from individual resources would be required to conduct the equivalent of a review for each smaller resource individually. Thus, reviews of small, discrete data sets are valuable both as aids for the use of those data sets individually and for their potential as parts of a larger whole. (There is also the salutary impact of having reviews of such data sets to guide those who may be preparing a similar resource.)

The foregoing relates only to post-publication review of data sets. Since "publication" is the act of putting the data into a public forum, such a review is very necessary for the benefit of the public, however defined for the particular resource.

The review process just defined requires two different kinds of expertise (plus, of course, some writing skill). The reviewer must be able to examine the documentation and the digital data sets to be sure that the technical aspects are as they should be -- that the documentation shows correct procedures and that the documentation accurately reflects the resource. At the same time, the reviewer must have broad and deep subject-matter knowledge sufficient to permit a full and careful examination of the data themselves.

As pointed out in the excellent report Peer review and evaluation of digital resources for the arts and humanities (http://www.history.ac.uk/digit/peer/) "lead applicant:" Professor David Bates (Institute of Historical Research), some of the technical details can be examined via automated processes. Nevertheless, there will be a considerable burden on the reviewer to understand and evaluate technical issues. This is, in fact, the rub. How many archaeologists are capable of evaluating the technical aspects of an archaeological data set? (Peer review and evaluation of digital resources for the arts and humanities, 4.3.5: "A recurring theme in discussion was the concern over the degree to which scholars in the arts and humanities are equipped with the skills effectively to review and evaluate digital resources.") That is a difficult issue, and the skills required demand far more than an ability to use a given software type. For a comparison, imagine how many skilled archaeologists could truly assess survey procedures, photographic techniques, or drafting techniques.

Something analogous to pre-publication review of data sets is also desirable, though it is less likely to be done by anyone who might be deemed an external reviewer. The most important pre-publication review of a project-based data set will have been carried out by the project personnel internally before data were ever entered -- and again and again when more and more data show the organizational strengths and weaknesses. Project personnel are unlikely to have sought outside assistance, and adjustments during use of a database will certainly have been made without outside review. In any case, the utility and reliability of the data organization that was primarily determined prior to data entry will be critical but may not have been subject to outside review. A review of data organization by outsiders would obviously be helpful in virtually every instance. Such a review is time-consuming, however, and makes it all the more desirable to start the digital planning processes for a project very early.

Properly conceived, a project-specific data set should have, as part of its preparation, extensive documentation to explain the data organization, terminology, formats, and so on. Nothing should be unexplained save the actual entries. All that work constitutes a part of the pre-publication internal review process. It also provides an excellent way to submit plans for external review.

Aggregated data sets should also have been reviewed in the preparation process, and that process will be much closer in time to the production of the resource. In addition, data entry will be much faster, possibly automatic, and offer fewer opportunities for mid-course corrections. Any scholar contemplating the creation of such an aggregated resource should subject plans to careful scrutiny at the outset, accomplishing something akin to pre-publication peer review. Since the results of the work will be used widely, that review process should involve scholars not associated with the work.

Reviews of electronic resources are as critical to the work of the scholar as reviews of paper publications -- both pre-publication reviews and reviews of the finished products. This may seem to be such an obvious point that it does not need to be stated. However, there are so few scholars prepared to undertake such reviews that even the obvious may remain unstated. Indeed, the need for review of electronic resources seems to be the elephant in the room that scholars are working very hard to ignore lest they be called upon to do something about it.

-- Harrison Eiteljorg, II

1. A data set may indeed contain concealed interpretations by virtue of organization or form of presentation. Of course, any description is necessarily interpretive. When such matters intrude, that should be made clear by any reviewer -- as should the impact thereof. Return to text.

For other Newsletter articles concerning the use of electronic media in the humanities or electronic publishing, consult the Subject index.

Next Article: "Process Matters"

Table of Contents for the Fall, 2007 issue of the CSA Newsletter (Vol. XX, no. 2)

Table of Contents for all CSA Newsletter issues on the Web

CSA Home Page