Proposal: Evaluation



next up previous
Next: Main Relations to Other Work

Up: Description of Proposed Work

Previous: Proposed Work (III): Discourse Application


Many have argued that classification of natural phenomena such as language into finite sets of mutually exclusive classes is not possible (cf. [43],[49]). We use such classes in the interests of practicality; such classifications have proven useful in many areas of artificial intelligence (AI). A difficult issue this raises is that human consistency in assigning such classifications will not be 100\%. Investigating the extent to which humans do agree and the implications of this for evaluation are unfortunately outside the scope of this work. To lessen the impact of the ``classical category'' assumption[43] on this work, we will develop the tagging instructions with care. Borderline cases will surely arise for each of the ambiguities. One task will be to specify default classes, i.e., classes into which borderline cases should be placed.

There are two metrics that are frequently used to evaluate the performance of a classifier, assuming that a single correct tag has been assigned to each ambiguous object: perplexity, and percent correct. We will report results in terms of both. In reporting results, the performance of each classifier will be compared to the performance of a classifier that assigns the most frequently occurring tag to each ambiguous object; this is a lower bound on the performance



next up previous
Next: Main Relations to Other Work

Up: Description of Proposed Work

Previous: Proposed Work (III): Discourse Application