Vol. XXIII, No. 1
April, 2010

ADS+ and Fedora Commons

Tony Austin
A little bit of history

The Archaeology Data Service (ADS, based at the University of York) was founded in 1996 by a consortium of British universities and the Council for British Archaeology (CBA) with a remit to support

…research, learning and teaching with high quality and dependable digital resources. It does this by preserving digital data in the long term, and by promoting and disseminating a broad range of data in archaeology. The ADS promotes good practice in the use of digital data in archaeology, it provides technical advice to the research community, and supports the deployment of digital technologies.

In the days of its infancy pragmatic decisions in terms of administration, preservation and dissemination were made about the management of ADS data holdings. Through necessity, some metadata such as checksum values were recorded at the file level whilst other information was documented at higher levels such as a collection of files or that of a batch process applied to a group of files. The latter information is held in a custom-built Collections Management System (CMS). Clearly the generation of more extensive information about individual files is both required for effective management and a major undertaking.

Over time it has become increasingly clear that the ADS needed better integration of the information it maintains about its resources and more file-level metadata in order to facilitate both an increased efficiency in terms of preservation management and providing alternative mechanisms for resource discovery. For example, locating and migrating to a more recent version all AutoCAD® DXF files within ADS collections or cross-collection searching for JPEG image files of burial mounds. At present users can search the ArchSearch catalogue to locate archives as a whole, but not to search within archives. As ADS holdings continue to grow, it will become imperative that users are enabled to search across and within archives, allowing them to join up datasets from a number of projects.

The ADS holds over one million files. An internal evaluation in 2006 established that any collections-management upgrade would be a major undertaking far beyond existing ADS resources in terms of what would be mostly staff time. The evaluation also established a core file-level metadata set, the generation of which could at least be partially automated:

There is of course other file level information that could be collected such as EXIF metadata in image files, but that must be another project. It was also clear that a large amount of data cleaning and restructuring would be involved.

A custom solution was initially envisaged. However, recent years have seen the development of robust repository management systems such as DSPACE and Fedora Commons. Fedora (Flexible Extensible Digital Object Repository Architecture), not be confused with the Unix operating system with the same name, provides a digital asset management (DAM) architecture, upon which many types of digital libraries, institutional repositories, and digital archives can be built. Fedora was developed jointly by Cornell University Information Science and the University of Virginia Library. As the name (not the acronym) implies, the system is both flexible and extensible, making it possible to satisfy the needs of the ADS within a well-defined schema.

The ADS has been involved to varying degrees with a number of projects and organizations investigating the potential of Fedora. These include Digital Antiquity (based at Arizona State University) and the York Digital Library (YODL, based at the University of York). After several months of discussion and investigation it became apparent that Fedora could provide a suitable management tool for the ADS. As well as recording core metadata, Fedora stores relationships between files as RDF triples; subject-predicate-object. Thus File B (dissemination format) is a version of File A (original format). This concept is very powerful for digital archives.

Recent times

ADS+ or to give its full title Enhancing and Sustaining the Archaeology Data Service Digital Repository has now been funded by the Arts & Humanities Research Council (AHRC) under the Digital Equipment and Database (DEDEFI) call of 5 November 2009.

The AHRC funding will facilitate the project through the provision of dedicated staff; specifically a project management team, an applications developer to integrate Fedora with existing ADS systems and ADS curatorial staff involved with metadata generation and data cleaning and restructuring. It is expected that the project will move the ADS even closer to compliance with the Open Archival Information System (OAIS) Reference Model (ISO 14721:2003). The project runs from 1 March 2010 for 12 months.

A lot of hard work faces this small team in the coming year. By the end of it the ADS expects to have a custom interface allowing any user to search within its resources and locate individual files using metadata stored in Fedora and a custom interface for staff to manage the data referenced in Fedora. However, beneath any customisation all Fedora repositories store core metadata in Fedora Object XML (FOXML) including Dublin Core (DC) descriptive metadata, which means all Fedora repositories have the potential to interoperate and, consequently, opens up all sorts of possibilities.

-- Tony Austin

