next up previous
Next: Ellipsis. Up: Tasks 2 and Previous: Tasks 2 and

Fragments.

Six Spanish dialogs were marked up to indicate sentence fragment boundaries, taking roughly two hours each once an operational definition was established. The six English translations of these dialogs, the six additional Spanish dialogs and their English translations were counted but not broken down by subcategory nor marked up. In the case of the six additional Spanish dialogs this was because of lack of budgeted resources. In the case of the English translations, it was because we did not see an immediate use for the marked up data at CMU given their approach and priorities although in future such data may prove useful.

In regard to the 20 Spanish texts and their translations, only one fragment was found, a headline, and, since that case was not marked, by default none of the texts or their translation can be taken as marked up.

The operational definition of a sentence fragment that we ended up using was ``an expression used or being used to communicate full propositional content parts of which must be recovered from the context of the utterance in order to be successful''. In particular, this definition covers two of the three types of fragments we identified, ellipted expressions such as:

...pos depende en qué hora...
...well, it depends on what time (you can meet)...

and incomplete utterances such as:

...solamente que... bueno...
...it's just that... well...

A third subcategory of fragments we distinguished included conversational constructions such as:

...nada, aquí nada más.
...nothing (is happening), (I'm just sitting) here nothing more (than that).

While these are somewhat more problematical because they have been classified as complete sentences by some grammarians, we have opted to treat they as fragments and included them in the results reported below.

As mentioned above, the 6 Spanish dialogs contained 666 utterances and 360 turns. The 11 Spanish news articles contained 133 sentences with 304 clauses. In the dialogs we found 290 fragments, 142 of which were the results of ellipsis, 29 of which were incomplete utterances, and 119 of which were conversational constructions. That is to say, about .43 of the utterances were fragments. In the 11 news articles we found 1 fragment, a headline, which was due to ellipsis. Clearly fragments are extremely common in dialog and will have to be handled by effectively and efficiently by dialog translation systems.



next up previous
Next: Ellipsis. Up: Tasks 2 and Previous: Tasks 2 and



Computing Research Laboratory
Wed Jun 7 19:21:15 MDT 1995