next up previous
Next: The Context: Mikrokosmos Up: To appear in Previous: To appear in

Ontologies for NLP: Introduction

The field of natural language processing (NLP) is concerned with the construction of computer software systems that can process texts written in English and other natural languages with interpretation and generation capabilities much like those of human beings. Unlike text processing or word processing which processes texts at more or less superficial levels, many NLP systems try to extract the meanings contained in a sentence or an entire text. Meanings thus extracted may be used for various tasks such as performing robot actions, retrieving information from a database, or, in the case of machine translation systems, producing an equivalent translation in a different natural language.

In order to extract meanings from a text and to process the meanings for performing various tasks, the NLP system must be able to represent meanings in a form suitable for manipulation by the computer. A first step towards representing meaning is selecting a set of symbols as the primitive elements of which more complex meaning representations are constructed. For example, if the only meaning we wanted to represent was the gender of a person, we could have selected the symbols M and F to represent it. Since natural language texts contain a wide range of complex meanings, the set of symbols selected for representing meaning tends to be much larger. A traditional dictionary describes meanings of words using other words in the same or another language. This is not a good choice for computer processing of meanings for a variety of reasons. For example, words in most languages are highly ambiguous; words have synonyms and do not map to meanings uniquely; and so on.

We call the symbols used to represent meanings concepts to distinguish them from words in a language. For NLP purposes, we not only need to select a set of concepts but also to tell the computer how a concept is related to some or all of the other concepts known to the system. Such knowledge of conceptual relationships is invaluable in resolving ambiguities in the meaning of a text. One part of specifying conceptual relationships involves organizing them in a hierarchy or a taxonomic classification. The other part is adding ``cross'' links between different branches of the classification to represent relationships between concepts other than taxonomic relations. Such a classification system results in a richly interconnected network of concepts in the world (or a particular domain of focus). We call the network an ontology.

In the field of natural language processing (NLP) there is now a consensus that all NLP systems that seek to represent and manipulate meanings of texts need an ontology as a source of semantic primitives (Bateman, 1993; Carlson and Nirenburg, 1990). An ontology for NLP purposes is a body of knowledge about the world (or a domain) that: a) is a repository of primitive symbols used in meaning representation; b) organizes these symbols called concepts in a tangled subsumption hierarchy; and c) further interconnects these symbols using a rich system of semantic and pragmatic relations defined among the concepts. In order for such an ontology to become a computational resource for solving problems such as ambiguity and reference resolution, it must be actually constructed, not merely defined formally, as is the practice in the field of formal semantics. The ontology must also be put into well-defined relations with other knowledge sources in the system such as a lexicon.





next up previous
Next: The Context: Mikrokosmos Up: To appear in Previous: To appear in



Kavi Mahesh
Sun Nov 12 15:30:14 MST 1995