next up previous
Next: Initial Cleaning and Up: A Full-Text Experiment in Previous: Introduction

Example-Based MT

The basic idea of EBMT is simple (cf. Nagao, 1984): given an input passage S in a source language and a bilingual text archive, where text passages in the source language are stored, aligned with their translations, , into a target language, S is compared with the source-language ``side'' of the archive. The ``closest'' match for passage is selected and the translation of this closest match, the passage is accepted as the translation of S.

The appeal of the basic idea of EBMT is so high that it has been suggested as the basis for tackling additional tasks such as source language analysis (e.g., Jones, 1992; Furuse and Iida, 1992), source-to-target language transfer (e.g., Grishman and Kosaka, 1992; Furuse and Iida, 1992; Watanabe, 1992) and generation (e.g., Somers, 1992). This marks the advent of hybrid rule-based and example-based MT systems. The hybridization route is chosen in the hope that the resulting systems will have fewer practical shortcomings than the pure rule-based systems (a high complexity of processing plus a high price of knowledge acquisition) or the pure EBMT systems (a very ungraceful degradation curve when matches are bad). A very different route to hybridization has been suggested by Nirenburg and his co-workers (Nirenburg, 1993; Nirenburg and Frederking, submitted) in which multiple diverse MT engines co-exist in a single system configuration and contribute their best partial outputs to the overall output of the system.

Our EBMT configuration has been intended as one of the engines in the Pangloss MT project. As the main language pair in that project is Spanish - English, it was also selected for our EBMT effort. We used a bilingual corpus of United Nations documents with about 500 MBytes of English text and a similar quantity of Spanish text which were assumed to be translations of one another. The tasks involved in our experiment included:

Note that a different approach to EBMT would involve using closest matches for complete sentences in the input, not chunks of arbitrary length. An experiment of this kind was reported by us earlier (Nirenburg et al., 1993). The utility and attractiveness of this latter approach grows with the degree of similarity between the input text and the source language side of the archive. In fact, this method will, probably, be used in practice predominantly for translation revision tasks.

In what follows we describe each of the above tasks in turn, with the exception of the last one, which will be described elsewhere as a component of a more general candidate combination process (Nirenburg and Frederking, submitted).



next up previous
Next: Initial Cleaning and Up: A Full-Text Experiment in Previous: Introduction



Steve Beale
Tue Oct 1 12:14:38 MDT 1996