next up previous
Next: An Example Up: Ontologies for NLP: Previous: Ontologies for NLP:

The Context: Mikrokosmos

Mikrokosmos ( K ) is a knowledge-based machine translation (KBMT) system under development at the computing research laboratory (CRL) of New Mexico State University (Onyshkevych and Nirenburg, 1994; Mahesh and Nirenburg, 1995; Beale, Nirenburg, and Mahesh, 1995). gif Unlike previous research in interlingual machine translation (MT), this project is building a large-scale, practical MT system. K already has several thousand Spanish words in its lexicon as well as several thousand concepts in its ontology (or world knowledge base). By the end of the year, a lexicon of approximately 7000 Spanish words supported by an ontology of over 5000 concepts will be in place. High-quality meaning representations of up to 10 article-length Spanish texts from the domain of company mergers and acquisitions will have been produced by the K system. In the coming years, K will be expanded into other languages such as Arabic, Japanese, Russian, and Thai.

A comprehensive study of the computational treatment of texts is a multifaceted endeavor covering a wide range of linguistic and pragmatic phenomena. Because the various facets of this knowledge are complex in their own right, study of any individual phenomenon is often conducted in relative isolation from the study of other related phenomena. However, in a KBMT application, knowledge about a large number of interrelated linguistic and language use phenomena is required. A natural way of combining the diverse knowledge required of such a system into a unified whole is for the various phenomena to be treated by separate computational linguistic ``microtheories'' united through a system's control architecture and knowledge representation conventions. gif

In the Mikrokosmos project, a comprehensive study of a variety of microtheories central to the support of KBMT systems is being carried out with the ultimate objective of defining a methodology for representing the meaning of natural language texts in a language-neutral interlingual format called a text meaning representation (TMR). The TMR represents the result of analysis of a given input text and serves as input to the target language generator. The meaning of the input text is represented in the TMR as instantiated elements of an independently motivated model of the world (or ontology). The link between the ontology and the TMR is provided by the lexicon, where the meanings of most open class lexical items are defined in terms of their mappings into ontological concepts and their resulting contributions to TMR structure. The ontology and the lexicon are the two main knowledge sources in the K system. Information about the nonpropositional components of text meaning such as speech acts, speaker attitudes and intentions, relations among text units, coreferences, etc. is also derived from the lexicon with inputs from other microtheories and becomes part of the TMR. Figure gif illustrates the K architecture for analyzing input texts. The workings of this architecture are illustrated below through an example.

   
Figure: The Mikrokosmos NLP architecture.



next up previous
Next: An Example Up: Ontologies for NLP: Previous: Ontologies for NLP:



Kavi Mahesh
Sun Nov 12 15:30:14 MST 1995