In this work, we are developing probabilistic classifiers for two
challenging and diverse NLP tasks using a common set of techniques.
One classifier will be capable of disambiguating a large vocabulary of
words with respect to a full set of sense distinctions from a
published source, such as Longman's on-line dictionary. The second
will perform a discourse processing task that involves segmentation,
reference resolution, and belief: segmenting a text into blocks that
express the beliefs and opinions of a single agent, and identifying
noun phrases that refer to that agent. Both systems will be fully
automatic.
Exploiting recent developments in applied statistics, we are using a
richer class of statistical models than previously used in most NLP
applications, along with a set of tools for (1) fitting such models to
the data, (2) estimating the parameters of the chosen model from
untagged data, (3) and resolving interdependent ambiguities.
Together, these techniques will make it computationally feasible to
automatically develop and apply probabilistic models that express a
complex set of relationships among a diverse set of variables. This
work will advance statistical NLP toward expressing and using the
types of knowledge typically thought to be necessary for high-level
NLP tasks.
This research is funded by the Office of Naval Research
under grant number N00014-95-1-0776.