New Mexico State University
Home Research CRL Staff Publications Resources Employment CRL Internal

Collaborative Research: Interlingual Annotation of Multilingual Text Corpora

This project involves collaborative work between six research institutions, CRL New Mexico State University, ISI University of Southern California, UMIACS University of Maryland, LTI Carnegie Mellon University, Columbia University, and The MITRE Corporation. This research aims at providing a well-defined, motivated and practical semantic level of representation that captures information from natural language text. We refer to this level of representation as an "interlingual representation". The novelty of the research comes not only from the interlingua representation itself, but also from an improved methodology for designing and evaluating such representations.

The research has four aspects: First, to compile a collection of texts for six or seven non-English languages, coupled with at least three translations into English. Second, an interlingual representation framework based on the careful study of these parallel text corpora. The framework will include a formal definition of the representation language along with coding manuals for the main components of meaning (e.g., even time, aspect, modalities, etc.). Third, we will annotate these bilingual corpora using the agreed-upon interlingual representation. This effort will also allow for a straightforward extension of those corpora without further research required. Fourth, we will develop metrics for evaluating interlingual representations and for choosing a grainsize of meaning representation that is appropriate for a given task. The metrics are based on inter-coder reliability, the growth rate of the interlingual representation, and quality of the target language text that can be generated from the interlingua.

For further information about this project contact Dr. Steve Helmreich or Dr. David Farwell.

IL Annotation project's website