next up previous contents
Next: Text Meaning Representations Up: No Title Previous: Subgraph Construction

The Mikrokosmos Machine Translation System

Implied information, background knowledge, ellipsis, coreference, figurative speech, ambiguity; these are a few of the immense challenges a natural language semantic system faces. And yet, humans process language in real time every day with very little misunderstanding. How can a computer do the same?

By constraining the problem. Fifty six million and some odd amount of thousands is, indeed, a large number. Two hundred and thirty five billion is much larger. These two numbers represent the number of choices an computational semantic system faces for a medium size and a slightly larger size problem. Come across a truly long sentence and the numbers soar past . And that only to determine basic semantic dependencies; add in ellipsis and coreference resolution possibilities and they increase even faster. Such exponential growth in the size of the problem must be constrained if serious work is to be accomplished.

In a ``blocks'' world, CSP techniques and solution synthesis are powerful mechanisms. Many ``real-world'' problems, however, have a more complex semantics: constraints are not ``yes'' or ``no'' but ``maybe'' and ``sometimes.'' In computational semantics, certain word-sense combinations might make sense in one context but not in another. We need a method as powerful as CSP for this more complex environment. Our proposal in presenting HG is to 1) use constraint dependency information to partition problems into appropriate sub-problems, 2) combine (gather) results from these sub-problems using a new solution synthesis technique, and 3) prune (hunt) these results using, not constraint satisfaction, but branch-and-bound techniques.

This section provides the background information necessary to understand how HG applies these principles to semantic analysis. We begin by summarizing the Mikrokosmos Machine Translation system. Kavi Mahesh, Evelyne Viegas and Sergei Nirenburg are joint collaborators in this project and have contributed to this section.

In the Mikrokosmos (uK) project being developed by researchers at the Computing Research Laboratory (CRL) of New Mexico State University,gif a comprehensive study of a variety of microtheories central to the support of KBMT systems is being carried out with the ultimate objective of defining a methodology for representing the meaning of natural language texts in a language-neutral interlingual format called a text meaning representation (TMR). The TMR represents the result of analysis of a given input text in any one of the languages supported by the KBMT system, and serves as input to the generation process. The meaning of the input text is represented in the TMR as elements of an independently motivated model of the world (or ontology). The link between the ontology and the TMR is provided by the lexicon, where the meanings of most open class lexical items are defined in terms of their mappings into ontological concepts and their resulting contributions to TMR structure. Information about the nonpropositional components of text meaning such as speech acts, speaker attitudes and intentions, relations among text units, deictic references, etc. is also derived from the lexicon with inputs from other microtheories, and becomes part of the TMR. Figure 19 illustrates the uK architecture for analyzing input texts.

  
Figure 19: The Mikrokosmos NLP Architecture

Initially, the project is concentrating on the microtheory of lexical-semantic dependency, the core microtheory underlying our approach to a comprehensive analysis of the meaning of texts, and the one in which the basic structure of events or states and their properties is specified. Additional microtheories are being developed for aspect, time, modalities, discourse relations, reference, event ellipsis and style.gif





next up previous contents
Next: Text Meaning Representations Up: No Title Previous: Subgraph Construction



Steve Beale
Wed Mar 26 09:27:50 MST 1997