next up previous
Next: Referring Expressions Up: Introduction Previous: Fragments.

Ellipsis.

The task of marking up the dialogs for ellipsis took roughly four hours per dialog once an operational definition was established. All 16 dialogs were marked up. The corresponding English translations of these dialogs were counted but not broken down by subcategory nor marked up primarily because we did not see an immediate use for the marked up data at CMU. The 22 Spanish texts were also marked up to indicate the sites of the ellipted information and suggested descriptions of the information ellipted were included. Occurrences of ellipsis in the corresponding English translations were counted but they were not broken down by category nor were the texts marked up.

The operational definition of ellipted information that we arrived at is ``that information which the speaker intentionally expressed implicitly as part of his or her utterance but did not state explicitly''. There are many problems with the vagueness of this definition and its application but we do assume that acts of ``expressing something implicitly'' are clear from various types of incompleteness in the form of the expression uttered in contrast with its interpretation and that, to be interpreted, the information implicitly expressed must be recoverable from the context of the utterance.

Not surprisingly, many examples of ellipsis are associated with sentence fragments such as

Bien. Y tú?
(I've been) fine. And (how have) you (been)?

but they are also common in the case of grammatically complete expressions such as

¿Crees que me podrás ayudar?
Do you think that you'll be able to help me (fill out the forms)?

The approach to categorization we have begun with is a standard syntactic approach which classifies the type of ellipted information on the basis of the syntactic category of the descriptive expression used to identify the ellipted information. The above examples, for instance, are cases of subject-verb ellipsis, verb-predicate adjective ellipsis and complement clause ellipsis respectively. It should be noted, however, that this is not a very useful categorization since the process of identifying the ellipted information is pragmatic. A more informative classification, therefore, would be on the basis of information sources that support recovery such as topic, focussed information, or common knowledge. However, since such a classification would have required even more effort on the part of the analysts in order to identify and mark up the data, it was not pursued.

In the 6 dialogs we found 375 cases of ellipsis, 135 of which were ellipted nominals, 39 of which were ellipted verbals, 97 of which were ellipted prepositional or adverbial phrases, and 104 of which were ellipted clauses of one sort or another. That is to say, on average about .56 of the utterances contained ellipsis. In the 11 news articles we found 51 cases of ellipsis, 43 of which were ellipted nominals, 2 of which were ellipted verbals, and 6 of which were ellipted clauses. On average, then, about .17 of the clauses contained ellipsis. Again, ellipsis is clearly more common in dialog than text and will have to be handled by effectively and efficiently by dialog translation systems.





next up previous
Next: Referring Expressions Up: Introduction Previous: Fragments.



Computing Research Laboratory
Wed Jun 7 19:21:15 MDT 1995