Problems with Greek in HTML

Harrison Eiteljorg, II

The web site for the CSA Propylaea Project should present information about the project in Greek as well as English. That decision was made last spring by the CSA Board at the recommendation of Mr. Eiteljorg and Ms. Jones. The process has been somewhat slow, because the Bryn Mawr College graduate student working on the translation, Demi Andrianou, was away for the summer. Upon Ms. Andrianou's return, she translated the home page, and then the fun began. Once again, the problems with different scripts made what seemed a simple job into a rather complicated one.

Ms. Andrianou typed the Greek version of the file using her own computer, a PC running windows and set up to permit using the ISO 8859-7 standard for typing and displaying modern Greek. (ISO 8859-7 is the encoding scheme used prior to Unicode for modern Greek, and it is the encoding system used in Word and other Windows programs running on computers in Greece. Ms. Andrianou's computer and two of CSA's computers running Windows are set up to permit working in Greek. See "The Way Your Computer Handles Text Is Changing," by Susan C. Jones, CSA Newsletter, Winter 2002, Vol. XIV, No. 3, for a thorough discussion of the International Standards Organization, ASCII, and Unicode character sets.) She used MS Word to create the file and then saved it as an ASCII file so that it could be used with the Web authoring software used in the CSA office (Home Site). Unfortunately, saving the file as an ASCII file made it useless, since the ASCII encoding system does not accommodate Greek.

Ms. Andrianou then sent a DOC (Word format) file to CSA to try again. Susan Jones could open the file in Word without difficulty because the CSA computer is set up to use the ISO 8859-7 standard when dealing with Greek. She copied the text and pasted it into Home Site. Alas, that did not work, since Home Site apparently does not deal with Greek characters or understand the ISO 8859-7 standard. (How does one know what character set is used in a file? See the separate article, "Naming Files in a Multi-Script World," by Harrison Eiteljorg, II in this issue.)

Ms. Jones then saved the original file as a Unicode text file, using Word (Word 97, not a newer version, since that is the version used at CSA). That worked, and the Greek text could be displayed as accurately when the text was encoded with Unicode as when it was encoded with the ISO 8859-7 standard. The Unicode file could, in turn, be saved by Word as an HTML file, a file appropriate for the Web. Such a file includes in the header (a part of the file not seen by the user) a statement indicating that Unicode is used for the characters set. Unfortunately, not all computer systems support Unicode. Therefore, it seemed a good idea to use the more widely supported standard, ISO 8859-7 (also specified in the header of an HTML file).

Although the ISO standard for Greek could be used on the CSA Windows machine, it was still not a simple matter to create an HTML document using it. In fact, Ms. Jones could find no way to create an HTML file that would include Greek characters except by using Unicode. While Ms. Jones was on vacation, Mr. Eiteljorg tried to create an HTML file using the ISO 8859-7 standard and found it to be possible by copying the text from the Word document into a blank HTML file created with Mozilla's HTML editor called Composer. (Mozilla is a browser that competes with Microsfot's Internet Explorer and descends from the original browsing program, Mosaic, via Netscape. It is available on the Internet at no cost and includes an HTML editor as a part of the program.) Composer obviously can and does accommodate Greek characters. Knowing that, Ms. Jones could accomplish the same task with the HTML editor built into Netscape.

The new Web pages will begin appearing soon, and they will use the ISO 8859-7 encoding scheme to make sure that the accessibility is as broad as possible. However, this is yet another indication of just how important the move to Unicode will be for all who must use computers with different character sets. (During some of the experimentation for this HTML work, it seemed that using the CSA MAC was even more difficult. No mechanism for dealing with modern Greek via the ISO standards could be found, though support for Unicode is excellent.)

-- Harrison Eiteljorg, II

