The Mikrokosmos project is focusing on processing texts about mergers and acquisitions of companies. However, since the input language is unrestricted, the ontology must, in fact, cover a wide range of concepts outside this particular domain.
Figure:
Top-level hierarchy of the Mikrokosmos ontology showing the
first three levels of the object, event, and property taxonomies.
All entities in the
K ontology are classified into free-standing entities
and properties. Free-standing entities are in turn classified into
objects and events. Figure
shows the top-level hierarchy in the ontology. Objects, events, and properties
constitute the concepts in the ontology which are represented as
frames. Each frame is a collection of slots with one or more
facets and fillers. The slots (including inherited ones) collectively
define the concept by specifying how the concept is related to other
concepts in the ontology (through relations) or to literal or
numerical constants (through attributes). Lexicon entries
represent word or phrase meanings by mapping them to concepts in the
ontology. A number of concepts in the domain of mergers and
acquisitions are located under the ORGANIZATION subtree under
SOCIAL-OBJECTs and the BUSINESS-ACTIVITY subtree under
SOCIAL-EVENTs (see Figure
).
Each concept is represented by a frame that has a name
and the following slots: a definition that is
an English string used solely for human browsing purposes, a
time-stamp for bookkeeping, taxonomic links ( is-a and
subclasses for concepts and instances and instance-of for instances), and other slots (see Figure
for an example). Other slots can be any property defined under the property
hierarchy of the ontology. The properties, though they are defined as
concepts, are not instantiated as stand alone TMR frames; they are
present in TMRs only in the form of slots in objects or events.
Unlike many other classifications with a narrow focus (e.g., Casati and
Varzi, 1993; Hayes, 1985; Mars, 1993), our ontology must cover a wide
variety of concepts in the world. In particular, our ontology cannot
stop at organizing terminological nouns into a taxonomy of objects and
their properties; it must also represent a taxonomy of (possibly
complex) events and include many interconnections between objects and
events to support a variety of disambiguation tasks. As such, concepts
in the
K ontology are far from being atomic symbols; they
have a rich internal structure to them.
Just as there is no single grammar that is the ``true'' grammar of a
natural language, it is reasonable to argue that there is no unique
ontology for any domain. The
K ontology is one possible classification of concepts in its domain constructed manually according to a well-developed set of guidelines. Its utility in NLP
can only be evaluated by the quality of the translations produced by
the overall system or through some other evaluation of the overall NLP
system (such as in an information extraction or retrieval test). This
is not to say that the ontology is randomly constructed. It is not.
Its construction has been constrained throughout by the guidelines as
well as by the requirements of lexical semantics and their acquisition.
In NLP work, the term ``ontology'' is sometimes also used to refer to
a different kind of knowledge base which is essentially a strict
hierarchical organization of a set of symbols with little or no
internal structure to each node in the hierarchy (e.g., Farwell, et
al. 1993; Knight and Luk, 1994). Frames in the
K ontology, however, have a rich internal structure through which are represented various types of relationships between concepts and the constraints, defaults, and values upon these relationships. It is from this rich structure
and connectivity that one can derive most of the power of the ontology
in producing a TMR from an input text. Mere subsumption relations
between nearly atomic symbols do not afford the variety of ways listed
above in which the
K ontology aids lexicon acquisition and disambiguation in language processing.
The above distinction between highly structured concepts and nearly
atomic concepts can be traced to a difference in the grain size
of decomposing meanings. Grain size is a scale that denotes the
extent to which a complex meaning is decomposed into more primitive
concepts and relationships between them as opposed to representing it
by a single concept with little internal structure. For example, the
meaning of ``to teach'' can be represented either by a single concept
named TEACH or decomposed into several subevents such as
lecturing, question answering, and evaluating, each of which involves
several participants such as the teacher, students, a class room, a
lesson, and so on. A highly decompositional (or compositional) meaning
representation relies on a very limited set of primitives (i.e.,
concept names). As a result, the representation of many basic concepts
becomes too complex and convoluted. The other extreme is to map each
word sense in a language to an atomic concept. As a result, the
nature of interconnection among these concepts becomes unclear, to say
nothing about the explanatory power of the system (cf. the argument
about the size of the set of conceptual primitives in Hayes,
1979). Though, presumably, any piece of world knowledge could be
useful for NLP, in
K we take a hybrid approach and strive to contain
the proliferation of concepts for a variety of methodological reasons,
such as tradeoffs between the parsimony of ontological representation
and that of lexical representation and the need for language
independent meaning representations. Control over proliferation of
concepts is achieved by situated development and a set of guidelines
that tell the ontology acquirer when not to introduce a new concept
(see Mahesh 1995; Mahesh and Nirenburg, 1995). The
K ontology is not limited to its domain but is more developed in the chosen domain.
The
K ontology also makes a clear distinction between conceptual and
episodic knowledge and includes only conceptual knowledge. Instances
and episodes are acquired in a separate knowledge base called the
onomasticon. The methodology for acquiring the onomasticon includes a
significant amount of automation and is very different from ontology
acquisition, which is done manually via continual interactions with
lexicographers.
Kavi Mahesh