Vol. X, No. 2

Fall, 1997

How to Transfer Your DOS Database into a Windows 95 Database in 659 Easy Steps

Jocelyn Penny Small
Rutgers University

First, some background. For the past twenty-one years I was Director of the U.S. Center of the Lexicon Iconographicum Mythologiae Classicae (LIMC). Part of the reality of funding in the United States was the necessity to computerize.(1) As a result, I developed a database for classical iconography that is separate and independent from the LIMC. Now that the final volume (8) of the LIMC has appeared, the U.S. Center has passed to the Getty, and I am concentrating on Sibyl, as I call the database. Sibyl is not a computerized version of the LIMC. It includes only classical objects in American collections, but with a number of enhancements. Far more information is given for each object than in the LIMC, and a number of files are used to enhance retrieval of information about those objects. These files are important in and of themselves, because they include, among other things, a complete listing of all classical artists, all materials, all techniques, and a concordance of all the major abbreviations for journals (Aph, AJA, JdI, and LIMC). Currently, I am in what I hope are the final throes of preparing Sibyl for release on the Web and on CD-ROM.

Why a DOS database in the first place?

When I began building the database in 1985, there were two choices: Apples and PCs. The database programs available for Apples were not as powerful as the ones that ran on PCs. While DBase was around, it was very feeble in comparison to Revelation, which had important advantages, including repeating or multi-valued fields. A repeating field allows you to put in more than one value per item, such as for objects made of several materials. The repeating value means that each item is treated separately so that it can be sorted separately, unlike items typed together as a single entry. An object made of marble, gold, and silver could be sorted and listed with those components in proper alphabetical order rather than always appearing as the fixed unit of "marble, gold, and silver." I'll return to this issue. Oh, will I return to this issue.

DOS is dead!

In the beginning life was good. Revelation did all THEY said it would. I managed to get it up and running with a minimum of outside help. Unfortunately, the company was and is still bad at technical support and marketing. With an insufficient user base, no support existed either at my local bookstore or at my local computer center. Time passed. I upgraded from Revelation to, good grief, Advanced Revelation. Then Windoes appeared. I continued to run AREV, as it is fondly known. Then Windows 95 appeared. DOS slowly descended to inclusion on the endangered species list with sightings only on obscure islands of machines. I wanted to release my database, but it only ran in DOS. It became mightily apparent that I had to switch.

What to choose?

First I tried the easy and obvious solution. I ordered the successor program to AREV, called OpenInsight, that runs under Windows. There were many problems, and it was buggier than a four-horse chariot.

So, the search began for another program. Two requirements were paramount. First, I had to get a program that was at least as capable as the one I was already using. Second, since I am a small operation with no staff to offer extensive support to users, I decided that I had to use a broadly based program that an irate user (there are always many of those) could find out about at his or her local computer center or bookstore. This go-round I had a programmer, Sharon Girard, who had worked on Sibyl on a consulting, very part-time basis for a number of years. She did most of the work of scouting out the various possibilities. She chose Access, part of the ubiquitous Microsoft Office. It comes installed free on many computers. It is available worldwide. It is "Webbable." It is SQL compatible. That means it adheres (more or less) to an established standard for databases, which in turn implies that it can work with and share data with other database programs with comparative ease. Since it is produced by Microsoft, numerous other software programs work with it.

From AREV to Access

Getting the data out of AREV was a piece of cake for me. My programmer did all the work. I was involved only tangentially, when she called me in agony at the consequences of various sins we had committed in data entry. I might add that those errors existed despite the fact that the majority of fields are controlled, in part because AREV was very forgiving in what it would allow. Any program that you have used for a long period of time, in my case over ten years, inevitably and spontaneously produces errors due to ignorance in the early days, mistakes on the part of the software company in upgrades, and, well, plain carelessness and oversight on the part of the lowly data enterers, us.

The real fun begins with getting the data into another program. Here Access has an import module that purportedly guides you through the process, except that it is inconsistent, misleading, and buggy in parts. In fact, much of Access that deals with the nitty-gritty that the end user is never supposed to see is anything but user friendly. Apparently programmers don't count when it comes to interface design.

More fun than a barrel of monkeys . . .

Does a barrel of monkeys really have fun? And if so, for whom? I had two major problems in the transfer. First, AREV, as you might have gathered, has a database structure that is totally alien to most conventional databases, which do not have repeating fields. Second, graphical interfaces that give so much pleasure to the user are a pain in the neck to the designer. Remember that the more you can do, the more time it will take you to learn what you can do, and the more likely it is to take more time to do it even after you have learned how to do it. Let's look a little more closely at these two issues.

I had to "normalize" my data. Repeating fields had to be spun out into separate, subsidiary tables. To take one of the simpler examples, consider the authors for the Bibliography file. A particular work may have one or more authors; using AREV's repeating field function for author, I could put any number of authors into the author fields which automatically repeated any number of times until all the authors had been entered. I like this solution, because it preserves the complexity of the data, while making it easy to keep track of all in one file. Since Access does not permit repeating fields, for repetitive information, you make another table, Authors. Each author gets a separate record and a link with the Bibliography table. Instead of just one file or table for Bibliography, I now have two. Bibliography, however, is a simple example. Some of my original AREV files now take seven tables for the same data. I have gone from 21 files in AREV with approximately 50,000 total records to 55 tables in Access with over 340,000 records. The hit is simply enormous and not just in number of records, but in complexity of searching and in complexity of designing entry screens and reports. What is most remarkable is, despite the complexity-once you learn it-the searching actually is tremendously faster than in AREV. And, curiously enough, because Access has variable-length storage for its data, there was no increase in the size of the files on the hard disk.

The next turn of the screw on normalization is deciding what goes where and how much you want to normalize, for normalization is based on the idea that no bit of information should be repeated. To me, there is always a tradeoff between ease of use and speed of response, on the one hand, and faithfully following the rules of a mathematician-cum-database-theoretician, who, while brilliant, never dealt with humanistic data. Normalization is not always worth the trouble; so Sibyl is only partially normalized.

Another twist of the screw came with the question marks. Since so much information in classical archaeology is uncertain, I include a generous sprinkling of question marks wherever they are needed. To the database program, the question mark changes the datum. "Greek" and "Greek?" are two different things. In AREV I got away with this arrangement, because it allowed me to build indices which ignored "stop" signs like the question mark. No such tool is available in Access. So, my programmer came up with the excellent solution of making question marks into a separate field. She, alas, called it "Certainty," because what counted to her was whether or not a particular bit of information was really true. To me, the issue was whether or not it was a question mark. Moreover, it takes a lot less effort to catalogue question marks, which are fewer, than certitude. This little bit of miscommunication rippled throughout the import process. Not just the field names had to be changed, but the data themselves. That Access makes such changes easy does not truly mitigate the problem.

Each table presents its own set of problems in conversion. Sometimes it's better to make repeating information into a simple string. The decision depends on how you are going to search that particular field. The solution varies according to the data and what gives you, as my programmer calls it, "a warm tummy feeling." Unfortunately, what warms her tummy doesn't always warm mine and vice versa. The result was a number of major discussions and disagreements on how to slice and dice the data.

Meanwhile back at the ranch . . .

While my programmer and I were going through the hell just described, the office staff of two continued to use the old system. I was keeping two systems running-one of which had up-to-date data and the other which not only was not up to date, but was in constant flux, as we experimented with the design. I pass over that both my programmer and I were learning the program as we went. I had decided -and this is one decision that did not change and that did work-that the office would change all at once and only when both the programmer and I felt the new system was sufficiently stable. Let me emphasize "sufficiently," because changes are still being made in the design, as I write this article. These changes ranged from the simple to the complex. The field in Objects which we had called "Cult"-with a period implied-had to be changed to "Culture," because it is misleading to those who think it is, well, "Cult." Again, the hit in making the change was larger than you might think. We had to change that field name in the original table, on all the forms that have the field, on all the reports, and in all the "canned" queries.

And then there were all those details and routines to learn . . .

As soon as you change systems, you have to learn new ways to do old tricks. It wasn't just that they used a different system for backing up and restoring individual records. They changed the terms. They didn't bother to even discuss the issue. I spent days figuring out that Access doesn't overwrite records. Better that you delete the bad record and copy the good one via the clipboard to the table in question. Yes, Access has "briefcase replication," but replication is not a solution for overwriting one or a small set of bad records with older, not newer, records. Access does not come with a good solution for this problem.

Then there was the vanishing toolbar. One day the toolbar for searching a "completed" form was there; the next day it wasn't. And it was contagious. It went from form to form. In the end something like twenty forms and subforms were infected. I used one of the two free panic calls to Microsoft technical support to find out how to fix it. We never did figure out what caused the problem. The solution was for me to move immediately into Access 97, which is purportedly less buggy. While the upgrade was relatively painless, we-the Microsoft techie and I-had agreed that I had to delete all the infected forms and recreate them from scratch. Some of those forms were easy to remake, one was one of the most complex I had done up. I lost easily two weeks of solid work from this problem. And, no, it hasn't reappeared . . . yet.

The Bottom Line

I learned a number of things from this process some of which are obvious, many of which were painful, even if obvious.

  1. Only my programmer and I could have done the export/import. You have to know how your data really works in order to move it into another system.
  2. There are no shortcuts. You have to learn how the new system works to produce an effective product. This process takes gobs and gobs and gobs of time.
  3. Miscommunications between my programmer and me were costly in time and very frustrating for both of us.
  4. I mistakenly thought that, because I had a working database with a stable structure for ten years, the transfer, at least of the structure, would be easy. It wasn't. The two database programs are far, far too different.
  5. Instead of taking a couple of months, the process has taken a full year! And it's not done yet. The old saw of taking your worst estimate of the time needed to do something on the computer and doubling it was wrong. Try a factor of four or five times your original estimate. Admittedly, not having a full-time programmer is one of the reasons, but the amount of time I have spent on the interface is staggering.
  6. My programmer claims that the next time I make a major change in databases the task will be easier, because my file structure conforms to a more popular standard. I am dubious. I believe it depends on which database program I choose.
  7. The amount of time spent on designing the interface for both retrieval and entry will always be enormous, because interfaces are always idiosyncratic to each software product and therefore have to be designed from the ground up each time.
  8. Despite all of the problems and despite the fact that there are still things I prefer in AREV, Sibyl and I are much better off in a Windows 95 database. Sibyl is glitzier, faster, and, actually for the most part, more capable. She will also presumably be easier to understand for the computer cognoscenti than when she resided in AREV.
  9. No matter how it looks, this article is neither a review of Access nor a description of Sibyl. It is only a saga of one person's trials and tribulations in switching database programs.
  10. The title of this article is a lie. It doesn't take 659 easy steps; it takes more like 25,659 easy steps.

Jocelyn Penny Small

To send comments or questions to the author, please see our email contacts page.

For other Newsletter articles concerning the use of electronic media in the humanities or issues involving the use of databases, consult the Subject index.

Next Article: The Perseus Web Site: A Review

Table of Contents for the Fall, 1997 issue of the CSA Newsletter (Vol. X, no. 2)

Master Index Table of Contents for all CSA Newsletter issues on the Web

Go to CSA Home Page

(1) I am extremely grateful to Rutgers University and the National Endowment for the Humanities, among others, for their generous support over the lifetime of the project. Return to body of article.