We are currently in the process of a massive acquisition of objects,
events, and their properties related to the domain of company mergers
and acquisitions. Over the period of about three months, the
K ontology has acquired over 2000 concepts organized in a tangled
hierarchy with ample interconnection across the
branches. Figure
shows the rate of growth of the ontology over the last eight months. This graph shows our initial acquisition phase starting from an older ontology developed at Carnegie Mellon University (Carlson and Nirenburg, 1990), an
intermediate clean up phase when we deleted hundreds of questionable
and unrelated concepts and the current phase of massive acquisition.
Figure:
Rate of growth of the
K ontology.
The ontology emphasizes depth in organizing concepts and reaches depth 10 or more along a number of paths. The quality of the semantic classification in the ontology is measured by several parameters which are monitored continually with the help of computer programs. For example, the branching factor is kept less than 5 at most points. Each concept has, on average, 5 to 10 slots linking it to other concepts or literal constants. The top levels of the hierarchy have proved very stable as we are continuing to acquire new concepts at the lower levels.
In parallel, we have built a Spanish lexicon of over 3000 words (where
each entry is at least as elaborate as the entry shown in Figure
)
each of which maps to one or more of the
over 4000 concepts in the ontological world model. These concepts
cover a wide variety of categories (but with particular emphasis on
the domain of mergers and acquisitions of companies). Each concept has
links to 15 other concepts on an average, making the ontology a richly
connected network of the kind ideally suited to the search algorithm
we employ for checking constraints.
K is able to process an unedited Spanish news article and produce
TMRs of reasonably good quality as judged by native Spanish speakers
and expert Spanish linguists. The TMRs produced by
K are evaluated by comparing them against ``golden'' TMRs for the same texts produced by hand by an independent team of linguistic semanticists.
We have so far tested the system thoroughly on three texts
and produced TMRs for all the sentences in the texts. A second, large
scale testing and TMR production effort has just been started. By the
end of the year 1995, we expect to have acquired over 7000 entries in
the Spanish lexicon supported by about 5000 concepts in the
K ontology and to have completed testing the system on up to 10
article-length texts. The above sizes of the lexicon and the ontology
are sufficient to support the processing of over 400 Spanish texts on
mergers and acquisitions that we have in our corpus.
Kavi Mahesh