An Application of the Interlingua System ISS for Spanish-English Pronominal Anaphora Generation
by
Jesús Peral and Antonio Ferrández

Responses/rebuttals by the authors to critiques by Kevin Knight

Critique a) It may be hard for some readers to get a feeling for what your performance figures mean in the context of processing a single, whole newspaper article. What you think about the idea of putting up an Internet web page that displays an article from El Pais (say April 15, 2000) with zero-pronouns automatically inserted using various colors, with the same colors used to highlight the previous textual units those zero-pronouns refer to?

It is a good idea to put up an Internet web page with a fragment of text that displays zero-pronouns automatically inserted using various colors and highlighting, with the same colors, the textual units those zero-pronouns refer to. Currently, we have a demo of our computational system that works in a very similar way to K. Knight's suggestion but in local mode. It remains to adapt it to work in a remote system (through Internet, for example). Our system works in the following way:

1) Firstly, a grammar must be selected in order to parse the text. It is possible to choose between a Spanish or English grammar.
2) Then, the corpus to be parsed must be chosen. There are two possible languages to work with: Spanish and English texts.
3) After that, we choose between partial parsing or full parsing of the text.
4) Finally, we have to decide the kind of Natural Language Processing (NLP) problems (anaphora, ellipsis, zero-pronouns, etc.) to be solved.

The output will be the solution of the NLP problems. We can select the way in which results will be presented:
a) plain ASCII text, or
b) web page.

For instance, if we want to solve anaphora and zero-pronouns, a web page will be generated containing the following information: 1) The text in which anaphora and zero-pronouns are highlighted, and 2) For each of these linguistic phenomena, we can see (by clicking in the web page): antecedents before restrictions, antecedents after restrictions (accessibility space of the anaphor's solution, morphological and syntactic restrictions), antecedents after preferences and the solution of the anaphor. It may be observed that our system works in a very similar way to K. Knight's proposal.

Currently, the system does not use semantic information. However, some extensions using Wordnet are being developed.

If it is possible, a demo of the computational system will be presented at the workshop.

Critique b) The word "interlingua" often refers to a representation that abstracts away from the syntax and overt discourse markers employed by particular languages. When you say "interlingua", what do you mean?

In this paper, we have used an interlingua representation of the text that allows us Spanish-English pronominal anaphora generation. This interlingua means a representation that abstracts away from the syntax employed by particular languages. The interlingua representation is based on the feature structure for each clause. This feature structure contains semantic roles and is language independent both in syntax and structure. However, due to the fact that semantic information has not been used in our computational system at the moment, semantic roles have been identified with heuristics. These heuristics are very close to the syntactic structure of a clause (subject, verb, object and modifiers). For example: "If the verb is in passive and there is a prepositional phrase that begins with 'by', the noun phrase included in the prepositional phrase will be the AGENT (semantic role) of the clause". Furthermore, we are currently working on the way of incorporating a semantic resource to the system that allows us the correct detection of the semantic roles of a clause.

Critique c) What you think about the idea of using a parallel Spanish-English text (say, EU documents) to help you obtain the "correct" translation of any zero-pronoun into English? I.e., instead of using human judges, look at actual human translation output.

It is a good idea to use parallel Spanish-English documents to evaluate automatically the correct generation of Spanish zero-pronouns into English One of the problems that we could find in carrying out this automatic evaluation consists of the way of selecting the Spanish zero-pronoun and the concrete location in the English document. We should point out that structures of Spanish sentences are more flexible than English ones and some constituents of Spanish sentences can appear in every position of the sentence. Due to this reason, sometimes it is really difficult to match the Spanish zero-pronoun and its corresponding English pronoun.

Critique d) What is the best 15-line algorithm you can imagine for zero-pronoun translation? (The best 1-line algorithm might be Print "he"). How well do you think it would work?

One possible 1-line algorithm for zero-pronoun translation could be "Print HE". I think that this is not a good algorithm for some reasons:

The generation of Spanish zero-pronouns into English depends very much on the genre of the text we are going to work with. If we would work with narrative texts, zero pronouns can appear with all their varieties: singular, plural, masculine and feminine. Then, they will be translated into the English pronouns "he, she, it and they". If we would work with technical manuals, we can find mainly zero-pronouns that refer to things and a few ones (or none) that refer to persons. The correct translation of these kinds of pronouns will be the English pronoun "it" (if the pronoun is in singular) and the English pronoun "they" (if the pronoun is in plural).

To SIG-IL Workshop Series Home Page

Last Updated: April 19, 2000

Copyright 2000 Computing Research Lab.