Figure 1: Input Semantic Representation to the Mikrokosmos Generator
In contrast to modularization by tasks such as discourse structuring, clause structuring and lexical choice, the Mikrokosmos project (http://crl.nmsu.edu/Research/Projects/mikro/index.html) attempts to modularize on the ontological and linguistic data that serves as inputs to the text generation process, that is, based on the types of inputs we expect, not on the types of processing we need to perform.
The generation lexicon in our approach is essentially the same as the
analysis lexicon, but with a different indexing scheme: on ontological concepts
instead of NL lexical units, as in analysis
([Stede1996] is an
example of another generator with a comparable lexicon structure, although our
work is richer, including collocational constraints, for example). The
generation lexicon contains
information (such as, for instance, semantics-to-syntax dependency mappings)
that drives the generation process, with the help of several dedicated
microtheories that deal with issues such as focus and reference (values of
which are among the elements of our input representations).
Lexicon entries in both analysis and generation can be thought of as ``objects'' or ``modules'' corresponding to each unit in the input. Such a module has the task of realizing the associated unit, while communicating with other objects around it, if necessary (similar to [De Smedt1990]).
Each module can be involved in carrying out several of the tasks like those
listed by Wanner and Hovy. For instance, modules for specific events or
properties are used in setting up clause and sentence structures as well
as lexical choice, as will be shown below. Interactions and constraints flow
freely, with the control mechanism dynamically tracking the
connections
. One outcome of this division of labor between
declarative data and the control architecture is that the bulk of knowledge
processing resides in the lexicon, indexed for both analysis and generation. This has
greatly simplified knowledge acquisition in general [Nirenburg et al. 1996] and made it
easier to adapt analysis knowledge sources to generation [Viegas and Beale1996] as
well as to convert knowledge sources acquired for one language to use with
texts in another.
Below we sketch out how this organization works. We begin by describing the main types of lexicon entries with the goal of demonstrating how each performs various generation tasks. We then take a look at the different types of constraints associated with each kind of entry.