In contrast to this division of knowledge into tasks such as discourse
structuring, clause structuring and lexical choice, the Mikrokosmos
project attempts to modularize based on the ontological types and
natural linguistic
phenomena that serve as inputs to our processing.
In semantic analysis [Beale et al. 1995], the natural division
is along word types: nouns, verbs, adjectives, etc., and along
linguistically-based microtheories
such as studies of tense (to discover aspectual
components of meaning) and coreference analysis. For
generation [Viegas et al. 1997], our inputs are semantic
representations, which become the focus of our modularization. The
most important types of semantic representations are the ontological
categories of EVENTS, OBJECTS and PROPERTIES. In
addition to these we have several generation microtheories that deal
with issues such as focus and reference.
In general, then, we modularize based on the types of inputs we expect, not on the types of processing we need to perform. Each module can perform any task. For instance, EVENTS and PROPERTIES both set up clause and sentence structures as well as contribute to lexical choice, as will be shown below. Interactions and constraints flow freely among the modules and are processed by the control mechanism. It is interesting to note that one outcome of this division of labor is that the bulk of our knowledge is resident in the lexicon, both for analysis (where the lexicon is indexed on words) and generation (where the same lexicon can be indexed on concepts). This has greatly simplified knowledge acquisition in general [Nirenburg et al. 1996] and made it easier to adapt analysis knowledge sources to generation [Viegas and Beale1996] as well as converting knowledge sources for one language to another.
Below we sketch out some examples of how this type of organization
works. We begin by describing the main types of lexicon entries
with the goal of demonstrating how each can perform various
generations tasks. We then describe how lexicon
entries are combined to create options for the generator. We also
discuss a few of the heuristics used to choose between these options.
The following section then gives a brief overview of how the control
architecture efficiently processes these locally created plans to
obtain a globally optimal output.