There are a large number of dimensions along which the work reported here can and will develop.
First of all, we need to move to a full-fledged EBMT environment, which means working with a real bilingual archive. The immediate problem to be solved then is the problem of alignment of the archive. If the full-sentence comparison method is used, it is sufficient to have the archive aligned at sentence level. If, however, the partitioning method is used, it becomes necessary to obtain alignment at the sub-sentential level. This latter task is, in fact, exactly the goal of the full-fledged statistical MT approaches. Results in text alignment have been achieved at IBM (e.g., Brown et al., 1990) and AT&T (e.g., Gale and Church, 1991). In the short run, the quality of sub-sentential alingment does not promise high-enough fidelity to support EBMT in a stable fashion. Because of this (and unconditionally for the full-sentence comparison method) a practical EBMT environment will have to involve a user interface, similar to the CMU TWS, to allow the human user to correct system output.
A second avenue of improvement is upgrading the matching and partitioning algorithms and metrics. We have immediate plans to improve our partitioning algorithm by a) optimizing the choice of the longest substring; and b) accepting discontinuous substrings as candidates for partitioning. Among the possibilities for improving the metric are: a) diversifying the treatment of open- and close-class lexical items (a match on the latter can be considered less significant than a match on the former); b) allowing a bonus for ``clustered'' matches, where a contiguous subset of a string matches an example, compared with the match on a similar number of discontiguous words; c) further calibrating the ratios in the metric definition by repeatedly modifying them and running the calibration test on a large number of ratio combinations, to choose the one which leads to the optimum correspondence with the results obtained using the ``control'' metric; d) augmenting the set of comparison criteria, for instance, by including membership in the same semantic class (as did Sumita and Iida, 1991), though this criterion is inherently weaker than synonymy or direct hyperonymy. This enhancement presupposes the availability of a thesaurus or another source of semantic class markers.
Yet another major area of improvement has to do with creating a complete experimental set-up which will allow for fast and abundant calibration of all the parameters of the EBMT environment, which would allow us to adapt the system to a particular set of texts and archives. This experimental testbed will also serve as an evaluation testbed for the quality of the EBMT system itself.
Additional studies must be conducted to calculate the optimum tradeoffs of EBMT utility and robustness versus the complexity of requisite static and dynamic knowledge sources. EBMT researchers should always remember the lesson of the development of the field of qualitative reasoning in AI, which set out to simplify the very intricate and involved theories in physics and other natural sciences by relying on commonsense reasoning and ended up with a set of theories whose formulation was arguably even more intricate and difficult to use for reasoning than the original ones. For EBMT to succeed, it should be shown not to rely on an extensive apparatus of linguistic and domain-oriented language analysis, which forms the basis of the ``traditional'' rule-based MT, which EBMT set out to supplant.