Peral and Ferrandez describe in their paper a very interesting program for translating pronouns (and missing pronouns) between Spanish and English. It is great to see an evaluation that butresses their claims about which heuristics are useful in this area. Their evaluation addresses three main problems:
1) how to detect verbs that are preceded by zero-pronouns in Spanish (they achieve 88% correct)
2) how to figure out what those zero-pronouns refer to in previous text (they achieve 75% correct when the subsequent verb is third person)
3) how to translate zero-pronouns into English (they achieve 75% correct when the subsequent verb is third person)
In achieving this performance, the authors' program performs substantial analysis on incoming Spanish text, which is not restricted to be in any particular domain or genre. This analysis is primarily syntactic. While there are some semantic keywords (e.g., AGENT, ACTION), these look to be mostly syntactic in reality (e.g., SUBJECT, VERB). On top of this syntactic analysis, the authors' program performs some chaining of lexical entries, i.e., lightweight co-reference. Various heuristics are used to perform this analysis, and various other heuristics turn the analysis into answers for the three problems listed above.
This all seems eminently reasonable. I can only throw out a few heavy questions in hopes that the authors can suggest some answers:
a) It may be hard for some readers to get a feeling for what your performance figures mean in the context of processing a single, whole newspaper article. What you think about the idea of putting up an Internet web page that displays an article from El Pais (say April 15, 2000) with zero-pronouns automatically inserted using various colors, with the same colors used to highlight the previous textual units those zero-pronouns refer to?
b) The word "interlingua" often refers to a representation that abstracts away from the syntax and overt discourse markers employed by particular languages. When you say "interlingua", what do you mean?
c) What you think about the idea of using a parallel Spanish-English text (say, EU documents) to help you obtain the "correct" translation of any zero-pronoun into English? I.e., instead of using human judges, look at actual human translation output.
d) What is the best 15-line algorithm you can imagine for zero-pronoun translation? (The best 1-line algorithm might be Print "he"). How well do you think it would work?
To SIG-IL Workshop Series Home Page
Copyright 2000 Computing Research Lab.