|
The Multilingual Environment for Advanced Translations
This page is intended to be a short introduction into the architecture and mode of operation of Meat, the Multilingual Environment for Advanced Translations. It gives some detail about the three architectural pillars on which a system is built: Charts, Typed Feature Structures, and the Component architecture we use to assemble complex systems from smaller blocks. This introduction is not meant to be a guideline on how to construct systems using Meat, but merely to describe the basic ideas and assumptions behind the implementation. For a more thorough walk-through of the application building process, see the HowTo section within the documentation. Meat is completely written in C++ and runs on Unix machines (we tested Solaris and Linux) and PCs with Windows 95/98/NT.
ChartsThe central data structure within Meat is a chart, which is used to store partial and completed results on all levels of linguistic description. A Chart [Kay:80] is an acyclic, directed graph of hypotheses about parts of a document. Vertices correspond to points between words, edges denote words or descriptions of a sequence of words. Charts are extremely suitable for the representation of results within a natural language processing system. They allow to separate the description of what needs to be processed from the exact order in which actions are carried out, thus allowing for a wide range of search and processing strategies. Moreover, they remove redundancy since not only complete results are stored, but also all partial results that arise during a computation. These partial results can be reused in a larger context.Meat is able to use several types of edges to distinguish between different types and levels of description. Thus, the chart can not only be used for a single purpose (say, syntactic parsing or generation), but it stores all hypotheses on all levels. Internally, so-called tags are used to mark edges as to what module they belong. In fact, the chart of Meat is a weaker version of the layered chart used in [Amtrup:97], in that it does not support hypergraphs or the distribution of modules to employ parallel processing. Here is an example of how chart looks like with some intermediate results presented as edges. In the bottom, you can see the content of one of this edges.
Typed Feature StructuresThe content in the previous figure is a typed feature structure. Feature structures are a means of representing linguistic information in a structured and theoretically sound manner. Using a type skeleton in addition to name-value pairs leads to an efficient, consistent way of describing properties of words and other linguistic objects. An example from the Turkish-English translation system, describing the word economiyi (economy) looks like this:
The feature structure describes the lexical properties of the word (the root and the translation), as well as the inflectional properties obtained by a morphological analyzer. All edges in the chart, be it words, syntactic structures, transfer results of target language surface elements, are described using this uniform formalism. The design of the feature structures we use follows [Carpenter:92]. Our implementation utilizes a vector-oriented representation for feature structures and indexing on types, which makes it efficient even if the application itself is distributed across multiple machines (currently, we make no use of this feature). ComponentsAs already mentioned, Meat is a collection of specialized components that can be composed to form an application, rather than being a fixed translation system. The approach we chose in order to realize a configurable, flexible system is a combination of extreme modularization and user-defined application.Meat currently provides around 40 different modules. The user is able to compose a sequence of modules in order to build a complete application. Upon runtime, the system interprets the application definition and executes the modules needed. An application definition file defines
The components can be divided into several classes:
Amtrup, Jan W., 1997
Carpenter, Bob, 1992
Kay, Martin, 1980 |