next up previous
Next: Ontology Size and Up: To appear in Previous: An Example

The Mikrokosmos Ontology

The Mikrokosmos project is focusing on processing texts about mergers and acquisitions of companies. However, since the input language is unrestricted, the ontology must, in fact, cover a wide range of concepts outside this particular domain.

   
Figure: Top-level hierarchy of the Mikrokosmos ontology showing the first three levels of the object, event, and property taxonomies.

All entities in the K ontology are classified into free-standing entities gif and properties. Free-standing entities are in turn classified into objects and events. Figure gif shows the top-level hierarchy in the ontology. Objects, events, and properties constitute the concepts in the ontology which are represented as frames. Each frame is a collection of slots with one or more facets and fillers. The slots (including inherited ones) collectively define the concept by specifying how the concept is related to other concepts in the ontology (through relations) or to literal or numerical constants (through attributes). Lexicon entries represent word or phrase meanings by mapping them to concepts in the ontology. A number of concepts in the domain of mergers and acquisitions are located under the ORGANIZATION subtree under SOCIAL-OBJECTs and the BUSINESS-ACTIVITY subtree under SOCIAL-EVENTs (see Figure gif).

Each concept is represented by a frame that has a name gif and the following slots: a definition that is an English string used solely for human browsing purposes, a time-stamp for bookkeeping, taxonomic links ( is-a and subclasses for concepts and instances and instance-of for instances), and other slots (see Figure gif for an example). Other slots can be any property defined under the property hierarchy of the ontology. The properties, though they are defined as concepts, are not instantiated as stand alone TMR frames; they are present in TMRs only in the form of slots in objects or events.

Unlike many other classifications with a narrow focus (e.g., Casati and Varzi, 1993; Hayes, 1985; Mars, 1993), our ontology must cover a wide variety of concepts in the world. In particular, our ontology cannot stop at organizing terminological nouns into a taxonomy of objects and their properties; it must also represent a taxonomy of (possibly complex) events and include many interconnections between objects and events to support a variety of disambiguation tasks. As such, concepts in the K ontology are far from being atomic symbols; they have a rich internal structure to them.

Just as there is no single grammar that is the ``true'' grammar of a natural language, it is reasonable to argue that there is no unique ontology for any domain. The K ontology is one possible classification of concepts in its domain constructed manually according to a well-developed set of guidelines. Its utility in NLP can only be evaluated by the quality of the translations produced by the overall system or through some other evaluation of the overall NLP system (such as in an information extraction or retrieval test). This is not to say that the ontology is randomly constructed. It is not. Its construction has been constrained throughout by the guidelines as well as by the requirements of lexical semantics and their acquisition.

In NLP work, the term ``ontology'' is sometimes also used to refer to a different kind of knowledge base which is essentially a strict hierarchical organization of a set of symbols with little or no internal structure to each node in the hierarchy (e.g., Farwell, et al. 1993; Knight and Luk, 1994). Frames in the K ontology, however, have a rich internal structure through which are represented various types of relationships between concepts and the constraints, defaults, and values upon these relationships. It is from this rich structure and connectivity that one can derive most of the power of the ontology in producing a TMR from an input text. Mere subsumption relations between nearly atomic symbols do not afford the variety of ways listed above in which the K ontology aids lexicon acquisition and disambiguation in language processing.

The above distinction between highly structured concepts and nearly atomic concepts can be traced to a difference in the grain size of decomposing meanings. Grain size is a scale that denotes the extent to which a complex meaning is decomposed into more primitive concepts and relationships between them as opposed to representing it by a single concept with little internal structure. For example, the meaning of ``to teach'' can be represented either by a single concept named TEACH or decomposed into several subevents such as lecturing, question answering, and evaluating, each of which involves several participants such as the teacher, students, a class room, a lesson, and so on. A highly decompositional (or compositional) meaning representation relies on a very limited set of primitives (i.e., concept names). As a result, the representation of many basic concepts becomes too complex and convoluted. The other extreme is to map each word sense in a language to an atomic concept. As a result, the nature of interconnection among these concepts becomes unclear, to say nothing about the explanatory power of the system (cf. the argument about the size of the set of conceptual primitives in Hayes, 1979). Though, presumably, any piece of world knowledge could be useful for NLP, in K we take a hybrid approach and strive to contain the proliferation of concepts for a variety of methodological reasons, such as tradeoffs between the parsimony of ontological representation and that of lexical representation and the need for language independent meaning representations. Control over proliferation of concepts is achieved by situated development and a set of guidelines that tell the ontology acquirer when not to introduce a new concept (see Mahesh 1995; Mahesh and Nirenburg, 1995). The K ontology is not limited to its domain but is more developed in the chosen domain.

The K ontology also makes a clear distinction between conceptual and episodic knowledge and includes only conceptual knowledge. Instances and episodes are acquired in a separate knowledge base called the onomasticon. The methodology for acquiring the onomasticon includes a significant amount of automation and is very different from ontology acquisition, which is done manually via continual interactions with lexicographers.





next up previous
Next: Ontology Size and Up: To appear in Previous: An Example



Kavi Mahesh
Sun Nov 12 15:30:14 MST 1995