Index

Meat Home
Introduction
Projects
Download
Installation
Documentation
Reference manual

Ahmed's home page
Jan's home page

CRL home
Search
Search phrase

The Multilingual Environment for Advanced Translations

> Introduction

This page is intended to be a short introduction into the architecture and mode of operation of Meat, the Multilingual Environment for Advanced Translations. It gives some detail about the three architectural pillars on which a system is built: Charts, Typed Feature Structures, and the Component architecture we use to assemble complex systems from smaller blocks.

This introduction is not meant to be a guideline on how to construct systems using Meat, but merely to describe the basic ideas and assumptions behind the implementation. For a more thorough walk-through of the application building process, see the HowTo section within the documentation.

Meat is completely written in C++ and runs on Unix machines (we tested Solaris and Linux) and PCs with Windows 95/98/NT.

> Architecture
The architecture of Meat is centered around four important properties:

  • To support multilinguality, we use Unicode to represent strings within the system.
  • A central data structure called Chart is used to store final results and intermediate stages of computation. A chart is basically a graph whose edges describe linguistic properties of parts of the input.
  • Typed Feature Structures represent the linguistic objects that are operated on. A feature structure is a complex set of feature-value pairs; the implementation we use guarantees consistency and efficiency within the system.
  • Components are used to compose systems. Meat is not a monolithical system to perform exactly one function, but rather a collection of specialized modules that can easily be tied together to form a wide range of applications. Every module is able to read and write the central chart, all modules operate on feature structures. This reduces interface problems drastically.

Charts

The central data structure within Meat is a chart, which is used to store partial and completed results on all levels of linguistic description. A Chart [Kay:80] is an acyclic, directed graph of hypotheses about parts of a document. Vertices correspond to points between words, edges denote words or descriptions of a sequence of words. Charts are extremely suitable for the representation of results within a natural language processing system. They allow to separate the description of what needs to be processed from the exact order in which actions are carried out, thus allowing for a wide range of search and processing strategies. Moreover, they remove redundancy since not only complete results are stored, but also all partial results that arise during a computation. These partial results can be reused in a larger context.

Meat is able to use several types of edges to distinguish between different types and levels of description. Thus, the chart can not only be used for a single purpose (say, syntactic parsing or generation), but it stores all hypotheses on all levels. Internally, so-called tags are used to mark edges as to what module they belong. In fact, the chart of Meat is a weaker version of the layered chart used in [Amtrup:97], in that it does not support hypergraphs or the distribution of modules to employ parallel processing.

Here is an example of how chart looks like with some intermediate results presented as edges. In the bottom, you can see the content of one of this edges.

Typed Feature Structures

The content in the previous figure is a typed feature structure. Feature structures are a means of representing linguistic information in a structured and theoretically sound manner. Using a type skeleton in addition to name-value pairs leads to an efficient, consistent way of describing properties of words and other linguistic objects. An example from the Turkish-English translation system, describing the word economiyi (economy) looks like this:

NounEntry[
  surf : "ekonomiyi", 
  lex : LexMorph[
    root : "ekonomi"], 
  infl : InflMorph[
    capitalized : False, 
    case : Acc, 
    caseName : "", 
    lemmaPos : Noun, 
    minorPos : None, 
    minorPosName : "", 
    numAgr : Sg3, 
    polarity : Pos, 
    posAgr : Non, 
    voice : Act], 
  trans : <:
    Nominal[
      exp : "economy", 
      key : "economy", 
      left : "", 
      right : ""]:>]

The feature structure describes the lexical properties of the word (the root and the translation), as well as the inflectional properties obtained by a morphological analyzer. All edges in the chart, be it words, syntactic structures, transfer results of target language surface elements, are described using this uniform formalism.

The design of the feature structures we use follows [Carpenter:92]. Our implementation utilizes a vector-oriented representation for feature structures and indexing on types, which makes it efficient even if the application itself is distributed across multiple machines (currently, we make no use of this feature).

Components

As already mentioned, Meat is a collection of specialized components that can be composed to form an application, rather than being a fixed translation system. The approach we chose in order to realize a configurable, flexible system is a combination of extreme modularization and user-defined application.

Meat currently provides around 40 different modules. The user is able to compose a sequence of modules in order to build a complete application. Upon runtime, the system interprets the application definition and executes the modules needed.

An application definition file defines

  • A set of variable definitions, which can be used later on to save on typing and to group things,
  • A set of application definitions, which define which modules to execute for which application, and
  • A set of module definitions, which define the parameters for individual modules.
A small excerpt from the application definition file used for the Persian-English MT system in the Shiraz project is shown below. It exemplifies the composition of modules to form a complete application, as well as the definition of parameters, variables, and the incorporation of command-line parameters.

// Variable definitions
$RES=/home/mcm2/meat/per

// Global parameters
tangoModule = $(RES)/shiraz.mod

// An application
application lookup = Tokenizer($File=$1):PostTokenizer:MorphAnalyzer:
                     DictionaryLookup:DictionaryCompoundLookup:ChartViewer

// Sample module definitions
module Tokenizer {
  class = Tokenizer
  inputFile = /home/mcm/$File
  encoding = UTF8
}

module MorphAnalyzer {
  class = MorphAnalyzer
  grammar = $(RES)/GenMorph.samba
  rule = Morphology
  type = chart
  sourceTag = TOKEN
  targetTag = MATOKEN
}

The components can be divided into several classes:

  • Utility components (e.g. saving and loading a chart)
  • Input/Output components (e.g. tokenization of an input file, writing a result file with html markup)
  • Compilers for various grammar formalisms (e.g. syntax, transfer, generation)
  • Morphological processing (we have an analyzer and generator using a formalism developed at CRL as well as an analyzer that emulates the Xerox xfst machine)
  • Dictionary components (for dictionary compilatoin, lookup and compound lookup)
  • Structure-oriented linguistic modules (e.g. syntactic analyzers, transfer components and syntactic generators)
  • Visualization modules (the chart shown above is generated as part of an application of Meat)

> Building an application

> References

Amtrup, Jan W., 1997
Layered Charts for Speech Translation. In Proceedings of the Seventh International Conference on Theoretical and Methodological Issues in Machine Translation, TMI '97, Sante Fe, NM, Jul. 1997, pp. 192-199. Postscript, 8pp, 44k
Carpenter, Bob, 1992
The Logic of Typed Feature Structures. Tracts in Theoretical Computer Science, Cambridge University Press, Cambridge, MA.
Kay, Martin, 1980
Algorithmic Schemata and Data Structures in Syntactic Processing. Technical Report CSL-80-12, Xerox Palo Alto Research Center.