Vol. XVII, No. 2
      
CSA Newsletter Logo
      
Fall, 2004
      

Image Collections on the Web -- Exciting but Still in Their Infancy


Web sites with large, significant collections of digitized photographs are becoming more common, and the arrival of the ArtStor project suggests that many more such sites will be appearing. The technologies used to present photos at those sites varies significantly, and an examination of the current state of the art seems timely. Some sites have opted for simple, inexpensive, rather limited presentations; others have chosen to supply expansive and expensive systems. Still others have chosen a middle ground. Four sites have been examined for this report on web sites with image collections, but they have been examined with an eye toward the presentation technology not the value/interest of the collection. (Attempting to include screen shots of individual web pages to help the reader understand the way each of these sites functions might be helpful, but it would leave this article awash in images and still not provide a good sense of the way any site functions. Readers are strongly encouraged to open each site as they read about it and to follow the threads in each as they go through the discussion.)

1. The Propylaea web site (http://propylaea.org/) -- photographs of the Propylaea and the older propylon in Athens. This web site was created by CSA personnel, Harrison Eiteljorg, II, and Susan C. Jones. Adobe Illustrator® (not purchased for this work but currently priced at $500) and Allaire HomeSite® (priced at $100) were used.

Access to the photographs is via two secondary pages (one for the Propylaea and one for the older propylon). From the secondary pages, each showing plans of the relevant structure, users select a point of view (from an icon) and are presented with a group of thumbnail images of photos (up to a dozen) taken from roughly that point of view -- at 120 x 180 resolution. Each has a label with the name of the photographer and the date for the image; there is also a full description of the image.

From the thumbnails users may select either a VGA-level image (640 x 480 for horizontal format, 360 x 480 vertical) or an XGA-level image (1024 x 768 horizontal, 768 x 576 vertical). Labels on the pages with the VGA- or XGA-level images are identical to those on the thumbnail page except that information about film and scanning is added. (Images that were taken with a digital camera are so labeled, and the original resolution is stated.) The page with the group of thumbnails opens in its own new window, leaving the building plan still open, and each larger image opens in yet another window, leaving the thumbnail window open. Therefore, a user will have many separate windows open at once, whether that was requested or not; each photo requested opens in a separate window and remains on screen until explicitly closed.

There is no way to search for individual images or groups of images, though a clever user could use Google to find some images. The descriptions are not written to make that easy.

2. The Ancient City of Athens (http://www.stoa.org/athens) -- photographs of Athens (and Brauron, with more coming) taken by Professor Kevin Glowacki (Indiana University) and Dr. Nancy Klein. The web site was created by Mr. Glowacki and is maintained at Stoa.org. The site was constructed with iViewtm Media software, which is available for MACs and PCs at a price of $200. (Note that the Stoa site has recently announced the creation of a gallery area where photos can be posted using Gallery image-management software. Users will be able to put their images up via the Web without purchasing any software.)

The home page lists and links to 15 areas of interest as specific as the south slope of the Acropolis, as general as Brauron. For each of these areas of interest, the opening page is a short commentary to familiarize the user. Each commentary page has a link to the relevant images. The images are presented in groups of 15 thumbnail (128 x 85 pixels) images; there may be several pages of such thumbnails.

The images have no information other than an image file name that offers no useful information about the viewpoint, aim, monument, etc. The label does make it clear whether the original photo was digital or film and, if scanned from a slide, whether it was written to the Kodak PhotoCD format.

Clicking on any thumbnail image yields a larger image (640 x 480 pixels). No other size is available. A fuller label appears with the larger image; it provides keywords and a short description. It is hard to determine why keywords are provided; there seems to be no way to search for them. An attempt to use Google to search within the site did not work.

3. Kelsey Museum of Archaeology at the University of Michigan (http://images.umdl.umich.edu/) -- This is one of many collections available online at the University of Michigan. This site and the others that are similar were prepared by the University of Michigan Digital Library Project (UMDL). Not all included images are available to the public. (Since the aim of this comparison is to comment on presentation methods, any problems encountered by unavailable images will be ignored.) No information regarding software, costs, or labor was readily available, though lengthy descriptions of the UMDL Project exist and could probably be mined for such information.

The Kelsey Museum page is surprisingly Spartan. (Michigan sports fans will resent that word choice!) The basic page for accessing the collection -- reached via the page indicated above -- has no instructions or user guidance other than a link to a pop-up window, but the information provided in that window is minimal. Only via that information pop-up is there a link the Kelsey Museum web site where information about the collection can be obtained. It is possible that users are expected to come to the page via other routes, but the other UMDL image collections are arranged in the same manner; so that seems unlikely.

The user is presented with three choices -- browse all images, search the collection, or browse sample records. However, there seemed to be no explanation of the search terms or parameters. The search page has some help in the sense that there are categories available in a pull-down menu, but there is no explanation of any categories or potential terms. The three date possibilities, for instance, may be beginning, end, and ??, but that only becomes clear after one finds an object having multiple dates attached and sees the attached data.

A browse request yields the search results or the whole collection, and a search request yields only the search results. Whether browsing or searching, the user may request only captions (50 per page), images with records (20 per page), or images with captions (20 per page). Displaying only captions yields just that, only captions, but a link takes the user to the full record -- all information about the item, including function (2 levels), accession number, material (2 levels), and many more, with only the applicable fields shown -- with a thumbnail image. From there a link leads to a larger image with neither catalog information nor caption.

Displaying the finds as images with captions yields just what that implies -- the thumbnail images with captions. The caption consists of site, function (both levels), and material (both levels). Each thumbnail image is linked to the same image page reached via the previously-described process -- the page with the full information and thumbnail image. Of course, that page is, in turn, linked to the larger image.

Displaying the finds as images with records provides thumbnails in a portion of the page and the full record for any one item in a different portion of the page. Selecting any thumbnail leads, as above, to the page with a thumbnail and the complete record. Once again, that page leads to a larger image without associated data.

There may be many pages with search results, depending on the number of items selected by the search.

The thumbnails are all from cropped images and measure about 70 pixels on the long side (about one inch on the typical screen). The larger image displayed without associated data (users can actually toggle back and forth very quickly between the page with the larger image and the page with a thumbnail and complete data) has six levels of resolution available. The page initially displays the third level of resolution (about 300 x 449 pixels -- the size of one image accessed, each finished size depending on the cropping). It seems that each succeeding level is roughly double the one before, but sizes of the largest images could not be checked because they did not fit on screen. That highest resolution should measure about 2400 x 3600. At that level of resolution considerable detail can be seen. Each level of resolution is a different downloaded image, and each is displayed alone in the same window.

It seems noteworthy that the resolution numbers had to be to determined by performing a screen capture and then examining the result with PhotoShop. There was no information about the images on the site. Whether the images were made digitally in the first instance or from negatives or slides is not stated. Nor is there any information about file formats of the images that underlie the display system. Nor is it stated whether the largest image available to the user is the maximum resolution stored.

Many of the objects (nearly all in the samples examined) were photographed without scales, and there is no dimensional information in the records. No information about the photograph permits an estimate. Of course, none of the photos includes a color bar. (This is the only site for which scales and color bars would have been possible and desirable. The two sites already discussed have images of monuments and scenes, not objects. The site yet to be discussed has only images taken from old photographs.)

4. The Architecture portion of Visual Collections from Cartography Associates, David Rumsey (http://www.davidrumsey.com/collections/architecture.html) -- a collection of two sets of architectural photographs, one from Cornell University (Andrew Dickson White Collection) and the other from the University of South Florida (Sape A. Zysitra Architectural Slides). Only the Andrew Dickson White site was examined for this article.

The Cartography Associates sites use Insight® software from Luna Imaging. Although this software was originally a required part of the ArtStor system, that is no longer the case. It is very expensive, and Cartography Associates is apparently a commercial enterprise owned by David Rumsey who is, in turn, a member of the Luna Imaging Board. (No information about corporate offices or any physical location for Cartography Associates on the Web could be found, and the email contact for the company is a Luna Imaging address.)

Using this collection requires either using a specific browser (some are not supported) or Luna Imaging client software (JAVA-based) that must be downloaded for use on your computer. The site could not be accessed effectively using Safari, the MAC browser, and, surprisingly, Safari could not be used with the JAVA client software either. Mozilla worked to browse and to access images via the JAVA client. Although the FAQ page makes it clear that InternetExplorer,® Netscape,® and Mozilla® are the only browsers that work directly, it implies that the JAVA client should work on a MAC with Safari.

This is the only site of the group that has real user help. Important information is provided here -- for instance, it is made clear that the system will zoom in on an image until it reaches full magnification. However, there are missing items of importance. There seems to be nothing about the manner in which photographs were converted to digital form. Similarly, some information about the collection itself should have been available but was nowhere to be found. Such information is available from the Cornell University Web site but not here and not by a link from here.

This site is accessed rather differently with the browser and the JAVA client software; so each system will be treated separately. Both are much more sophisticated and finished-looking than the UMDL interface.

Using the browser rather than the JAVA client, a user first gets a page with 20 thumbnails and captions (of the 1239 images available in this collection, with that information showing in the page). The largest of the thumbnails is about 100 x 130 pixels. The window is not a normal browser window. Only the title across the top of the window is shown. No URL is visible, and, at least with Mozilla on a MAC, it could not be made visible with the usual option. No window controls are available.

The data about any image can be requested, and the information will show in a portion of the window removed from the thumbnails. Surprisingly, an alternate information selection here that shows "file data:" the file format and size, in pixels. (There is also something called resolution size, but the entries were not meaningful to this user.) Searching is also provided in this window, and the search possibilities are excellent, with relatively general searches or searches using data fields. (It did take some time to realize that a request for images in Athens or Greece required a request for images for which the field entry for place contains Athens or Greece, not ones for which the field entry equals Athens or Greece

Double-clicking an image brings up a larger version in its own window, again a window without any border elements or controls; this image is about four times the size of the thumbnail. Along with the image is a tool bar that permits you to zoom in or out, access the data about the image (the same good, full data, including image size and "file data," but without information about the scanning procedures), print the image, or close the window. The tool bar also includes a thumbnail view of the image. When the image becomes larger than the window, this can be used to pan by simply clicking in the thumbnail to indicate the desired center point.

Double-clicking more than one thumbnail yields multiple open windows with images.

Zooming in on an image or panning within the image yielded unpredictable results. Sometimes the image was refreshed almost instantly; at other times there was a new download, which was slower.

Now to the JAVA Client. The initial page is very similar -- though the user is no longer in the browser at all but operating via the JAVA client in a separate window. There are 50 thumbnails instead of 20, and there are some extra options provided. Along with those provided in the browser version, the user can access another collection directly from this page; the user can also quit the program from the menu presented here.

The same data items are available. However, the search fields are not the same. It was especially surprising that there was no "place" field. Using the "subject" field to searched for Athens, the same set of photos was found as when "place" was searched via the browser interface, but "place" seems too important a category to be omitted in a group of photographs of architectural monuments.

The window with the larger images -- again a separate window and again a JAVA client window, not a browser window -- appears about the same as the browser window. However, there are significant differences. First, all images open as sub-windows in a single, large window, not as separate windows. This makes some of the tools more necessary, especially the one that closes an image sub-window, since there is now no other way to do that. (The sub-windows have no controls.) Second, the images can be panned directly (or with the same method used in the browser). Third, there is a distance tool that will present the dimensions of the image and calculate the distance from point-to-point in the image. There is also a mechanism for the user to link images to annotations, a URL, or some other media (images or video clips). This feature was not explored. The information did not indicate where or how any such linkage would be stored.

The JAVA client has another, less obvious difference. The images are stored in the MrSID format; they are not JPEG images. (Accessing MrSID images is one of the reasons the Insight JAVA client software is required; web browsers cannot display MrSID images.) This image format allows for greater compression, and one image file can be used to present any level of resolution from the minimum to all pixels. Storage of the image is also in segments called tiles. As a result, the JAVA Insight client does not download whole images in the way that a browser does. Only the requested part of the image at the requested resolution will be downloaded, and only that part of the image file will be decompressed and served. This is more efficient for very large images, especially if users are likely to request high-resolution views of portions of the whole image rather than high-resolution views of the whole image. However, panning in an image may require an additional download if the user pans beyond the downloaded portion (tile) of the image.

The JAVA client also provides a presentation program so that a kind of slide show can be assembled.

Discussion

These four presentations of images fall naturally into two groups. The first two are small collections that do not permit sophisticated searching or other "bells and whistles." Their subject matter is limited, and access is relatively simple. The second two are based on a complex database running in the background. Consequently access is much more flexible, permitting a user to ask for what he/she wants. However, access for general browsing is effectively impossible within these large collections.

Because the large UMDL and Cartography Associates projects use databases to operate, they can grow almost without limit. Adding new images adds virtually no new work to the display and access systems. There may be more items to be found, but the system for finding them -- and displaying the images and information -- works just the same. Smaller collections, on the other hand, involve substantially more effort to add new images.

For small collections the time and trouble required to create a web site is not great. On the other hand, using a database system to control everything saves time and trouble in the long run -- if the number of images is or will become large or if regular additions are expected. In the end, no large image collection can be handled without some form of database control and coordination unless the images are concentrated on such a narrow range of subjects that access need not be via search mechanisms.

One problem with the database-driven image collection is that of defining the collection. Neither of the larger collections examined for this article defined well the scope of their collection (though there were other places to get some of the relevant information). Neither made basic divisions that would encourage a user to browse through a portion of the whole. It was difficult to know what to expect -- and therefore what search terms, for instance, might be useful, not to mention whether or not either collection would have much of interest. (Of course, the Kelsey Museum material is, in a sense, defined. However, those who come to the site on the web may have no familiarity with the museum at all.) These are growing pains, but they are important issues. Users should not have to wade through reams of material to figure out what is available on a site or how to use it; nor should they be left with no help. The Cartography Associates site had a good deal of technical help but nothing about the collection. The UMDL site had virtually no help.

The UMDL and Cartography Associates sites have a major difference. The UMDL site was designed to operate on any Web browser, and the UMDL Project favors open source software. The Andrew Dickson White Collection site constructed by Cartography Associates could operate with only specified browsers; furthermore, many of the other Cartography Associates image collections (there are many beyond the architecture collections) require the JAVA client and will not work alone with any browser. (The browser uses JPEG files, but, as mentioned, the JAVA client uses MrSID files; so there is probably considerable extra cost in making a site available via both interfaces.) Many believe that the gains in functionality achieved with software like the Luna Insight software used for the Cartography Associates collections comes at too high a cost -- proprietary systems that are expensive to upgrade and may even have to be built from scratch when the software has been superceded. (In this instance, the JAVA client version of the software even requires a non-standard image file format.) The possibilities for long-term stability are far better with the less proprietary UMDL approach.

There is also a serious convenience factor with the Insight JAVA client. Were anyone using it to teach, the software would have to be loaded on all computers the teacher might use for a class, along with the appropriate browser. In many settings, that would be impossible, since system administrators do not permit users to add software to campus computers.

In conclusion, this brief survey was disappointing. Professor Glowacki's site and CSA's Propylaea site are so small that they can be easy to use without great difficulty. The two larger sites, on the other hand, both failed on too many levels. Neither provided good information about the scope of the contents, information about the processes of creating the digital images, or any way to assess the overall collection. (Compare, for instance, the Lantern Slides of Classical Antiquity index; something similar could be generated automatically for virtually any collection - http://www.brynmawr.edu/Admins/DMVRC/lanterns/lantindx.html. Readers should note that this page was originally structured by CSA personnel; it clearly reflects their biases.) The problems are particularly acute when trying to find a particular set of images from a collection without foreknowledge that such a set exists. It is all but impossible to know whether a search has failed because the requested images are not there or because the search routine has been poorly structured. In addition, there is nothing to entice the casual visitor, nothing that invites a return visit by suggesting the treasures to be found.

This examination of image web sites has inevitably made it clear that certain features are required of such web site. The following is a list of the things that seem critical for any web site with images:

1. Each collection should have a short but comprehensive statement about its contents. It should be possible to get some sense of how the collection evolved, what collection criteria were operative, what processes were used to make the digital images, what the fields are in the database, and which of those may be used for searching. In addition, some sample database entries should be presented to illustrate the way catalog entries were actually made.

2. Unless the collection is unusually circumscribed, there should be an index page like the one mentioned above (http://www.brynmawr.edu/Admins/DMVRC/lanterns/lantindx.html) that would help any user to get a good sense of the numbers of images in various categories and would provide ready-made groups (pre-defined searches) that make sense in terms of the collection. The categories would vary with the collection. This might be impossible in an extremely comprehensive collection; if so, the content statement should, of course, make that clear.

3. At least 3 images larger than thumbnail size should be available. A reasonable selection would include a VGA-level image, an XGA-level image, and a maximum resolution image. Additional levels of magnification seem of marginal utility. (The ability to zoom in on specific parts of the image is made unnecessary by supplying an image at maximum resolution.)

4. Images should open in separate windows so that each can remain visible until no longer desired.

5. The system should work on any browser that adheres to basic standards. Additional software, even plug-ins, should not be necessary.

6. Searches processes should be simple and obvious.

7. Help should be available from the search page.

8. Anything that might lengthen the site's response time to queries or page requests should be eliminated. Speed is of the essence, especially if the site is to be used during a class.

7. Copyright should not be used to limit access unless necessary (i.e., because of the copyright claims of image suppliers), and fair use should be explicitly permitted.

8. Object photographs should include either a scale or explicit dimensions in the associated database.

9. Object photographs taken in color should include a (Kodak) color bar; black-and-white photographs should include a gray-scale bar.

As ArtStor grows and as other collections of images arrive on the web, the competition will become more and more intense, and the features noted as important in the above list will doubtless be incorporated in many sites. Other features will surely be added as well. This process is just in its infancy, and much is yet to come. The possibilities are indeed exciting.


For other Newsletter articles concerning the issues surrounding the use and design of databases or uses of electronic media in the humanities, consult the Subject index.

Next Article: Archaeological Site Panoramas on the Web

Table of Contents for the Fall, 2004 issue of the CSA Newsletter (Vol. XVII, no. 2)

Master Index Table of Contents for all CSA Newsletter issues on the Web

CSA Home Page