next up previous
Next: Disfluencies Up: Ellipsis. Previous: Ellipsis.

Referring Expressions

Only 2 of the dialogs were marked up for all cases of referring expressions although all 16 dialogs were marked up for pronominal anaphors. In all cases, descriptions of the referents were included. The task of marking up the dialogs for pronominals alone took roughly four hours each. The corresponding English translations of the dialogs were counted but not broken down by subcategory nor marked up again because we did not see an immediate use for the marked up data at CMU. The 22 Spanish texts were also marked up for pronominal anaphors and descriptions of the referents were included. Anaphoric pronouns in the corresponding English translations were counted but they were not broken down by category nor were the texts marked up.

The operational definition of anaphoric pronoun which we used was uncontroversial. It included all those elements which behaved syntactically as nouns and were used to refer to some individual, place, event, situation, etc. which had been mentioned in prior discourse or was present in the context of the utterance. Examples included personal, clitic, interrogative, demonstrative, relative and possessive pronouns.

The subclassification was initially done on the basis of syntactic function, person and number and later on the basis of the semantic category of the referent. It might be noted that, as in the case of ellipsis, it appeared that classifying anaphoric references on the basis of the type of information sources that supported interpretation such as topic, focussed information, or common knowledge. However, as in the case of ellipsis, such a classification would have required even more effort on the part of the analysts and was therefore not pursued.

In the 6 dialogs we found 426 anaphors, 104 of which were personal pronouns, 204 of which were clitic pronouns, 63 of which were relative pronouns, and 36 of which were interrogative pronouns, 7 of which were demonstrative pronouns, and 12 of which were indefinite pronouns. In terms of semantic categories, 316 references were to humans (especially the speaker and the addressee), 46 were to other sorts of objects (concrete or abstract), 22 were to times, 18 were to locations, and 24 were to events or situations. On average, then about .64 of the utterances contained anaphoric pronoun. In the 11 news articles we found 148 anaphors, 6 of which were personal pronouns, 77 of which were clitic pronouns, 53 of which were relative pronouns, and 2 of which were interrogative pronouns, 4 of which were demonstrative pronouns, and 6 of which were indefinite pronouns. In other words, on average about .48 of the clauses contained anaphoric pronouns. Of particular note is the extreme difference between dialog and text in the occurrences of personal and interrogative pronouns as opposed to the consistency in the occurrences of clitic and relative pronouns.



next up previous
Next: Disfluencies Up: Ellipsis. Previous: Ellipsis.



Computing Research Laboratory
Wed Jun 7 19:21:15 MDT 1995