Implied information, background knowledge, ellipsis, coreference, figurative speech, ambiguity; these are a few of the immense challenges a natural language semantic system faces. And yet, humans process language in real time every day with very little misunderstanding. How can a computer do the same?
By constraining the problem. Fifty six million and some odd amount of
thousands is, indeed, a large number. Two hundred and thirty five
billion is much larger. These two numbers represent the number of
choices an computational semantic system faces for a medium size and a slightly larger
size problem. Come across a truly long sentence and the numbers soar
past
. And that only to determine basic semantic dependencies;
add in ellipsis and coreference resolution possibilities and they
increase even faster. Such exponential growth in the size of
the problem must be constrained if serious work is to be accomplished.
In a ``blocks'' world, CSP techniques and solution synthesis are powerful mechanisms. Many ``real-world'' problems, however, have a more complex semantics: constraints are not ``yes'' or ``no'' but ``maybe'' and ``sometimes.'' In computational semantics, certain word-sense combinations might make sense in one context but not in another. We need a method as powerful as CSP for this more complex environment. Our proposal in presenting HG is to 1) use constraint dependency information to partition problems into appropriate sub-problems, 2) combine (gather) results from these sub-problems using a new solution synthesis technique, and 3) prune (hunt) these results using, not constraint satisfaction, but branch-and-bound techniques.
This section provides the background information necessary to understand how HG applies these principles to semantic analysis. We begin by summarizing the Mikrokosmos Machine Translation system. Kavi Mahesh, Evelyne Viegas and Sergei Nirenburg are joint collaborators in this project and have contributed to this section.
In the Mikrokosmos (uK) project being developed by researchers at the
Computing Research Laboratory (CRL) of New Mexico State
University,
a comprehensive
study of a variety of microtheories central to the support of KBMT systems is being carried out with
the ultimate objective of defining a methodology for representing the meaning of natural language
texts in a language-neutral interlingual format called a text meaning representation (TMR). The
TMR represents the result of analysis of a given input text in any one of the languages supported by
the KBMT system, and serves as input to the generation process. The
meaning of the input text is represented in
the TMR as elements of an independently motivated model of
the world (or ontology). The link between the ontology and the TMR is provided by the lexicon,
where the meanings of most open class lexical items are defined in terms of their mappings into
ontological concepts and their resulting contributions to TMR
structure. Information about the nonpropositional components of text meaning such
as speech acts, speaker attitudes and intentions, relations among text
units, deictic references, etc. is also derived from the lexicon with
inputs from other microtheories, and becomes part of the TMR.
Figure 19 illustrates the uK architecture for analyzing input
texts.
Figure 19: The Mikrokosmos NLP Architecture
Initially, the project is concentrating on the microtheory of lexical-semantic dependency, the core
microtheory underlying our approach to a comprehensive analysis of the meaning of texts, and the
one in which the basic structure of events or states and their
properties is specified. Additional
microtheories are being developed for aspect, time, modalities,
discourse relations, reference, event ellipsis and
style.