Ecolynx Project: Information Context for Biodiversity Conservation

Language and translation issues

Sensitivity to language issues is no simple challenge for a project with large amounts of text data. Traditional text translation is impossibly expensive, and inappropriate where the intention is for the text to be updated frequently through user feedback. Computer-assisted translation packages are deficient in specialist vocabulary; they also have yet to become sophisticated enough in handling grammar and syntax complexities to achieve more than a crude rendering of English texts into other languages. However, for certain purposes this may be enough.

The UIA has been endeavouring to deal with such challenges over many years, exploring different computer and software possibilities over the past decade. For example, it has created a multi-lingual thesaurus (over 103,000 terms), so that users of any UIA data-sets are able to employ non-English subject categories to access data only available in English. Different language interfaces have been created for users of its CD-ROM products. Currently English, French and German interfaces are used. Spanish, Dutch and Italian interfaces have also been tested. The information is also organised such that hyperlinks in one language are valid in another.

In the case of international organizations, official titles are held by the UIA in any languages (with the use of transliteration where necessary). Currently this work has resulted in: English (over 20,000), French (8,229), Spanish (2,662), German (1,861), Italian (1,027), "Nordic" (497), Dutch (445), Latin (307), Portuguese (298), Russian transliterated (174), and Esperanto (71). This means that users can access this data using non-English keywords. In the case of species names, consideration was given to incorporation of common names of species in other European languages. This may be able to be effected during the Implementation Phase, if resources are fortuitously released from tasks of higher priority.

The data on international organizations have been extensively translated into French using a combination of traditional and semi-automatic methods (through funding supplied by francophone governments in 1995/96). As part of this work, portions of the data also were translated into Spanish and German. This work arose from sensitivity to language biases in electronic information. It has created considerable in-house capability in creative application of machine translation. It is intended that further developmental work on computer assisted-translation be undertaken in other languages, as dedicated funding becomes available. It is anticipated that this experience and awareness can be transferred in whatever ways are practicable to the proposed INFO2000 project to enable multi-language user access to the information.

Software packages to provide crude on-the-fly translations of web documents are seen as one means of rapidly providing some degree of access from a variety of languages. These should be examined and incorporated as appropriate. Provision of ‘one-stop’ access to online translation services, e.g. Globalink, should also be explored and enabled if appropriate. It also seems to be the case that using advanced search engine query techniques, access to documents in select languages can be provided.