We would like to thank the reviewers and the critiquers for their insightful comments.
While studying the characteristics needed for the animation of actions we recognized that the representation of an action should be language independent and should have, as much as possible, a consistent representation across languages. In this context, it is true that the paper is a tentative investigation of the potential of our action representation as an interlingua. We do not have a system with the same capabilities as other MT systems and we are only exploring the possibilities of the representation itself. Moreover, we do not claim to have implemented a full-coverage interlingua system, but rather that the PARs we have implemented (along with those we continue to implement) can be considered a form of interlingua. PAR is not a new formalism; it is a set of templates containing the information necessary for animation. One of the advantages of looking at language-specific phenomena in constructing these representations is that if the phenomena are consistent for other languages in the same language family, then the same treatment can be applied.
The system is implemented and runs for a few dozen actions. There is a checkpoint animation scenario that illustrates the use of standing orders where the animation is driven primarily by PARs, some of which originate from the natural language input. We are in the process of integrating a planner, and adding a hierarchical organization of PAR schemas based on the verb classes. Right now the animation PARs and the natural language PAR schemas are almost identical except for a couple of fields such as the activity field in the PAR schemas. The synchronous parser has the full coverage of the Xtag grammar, about 4000 verbs, and the animation PARs are implemented according to the scenarios.
With respect to the specific linguistic issues that were raised by Teruko Mitamura, yes, these are for the most part issues that we did not discuss and that are relevant to the overall task. However, some of them require contributions from the natural language analyzer and from the planner that we thought were beyond the scope of our paper which dealt primarily with the action representations.
For instance, the order in which the actions will be executed is not solely a representational problem. Each PAR schema corresponds to only one action. The natural language interface outputs situation calculus expressions containing one or more PAR schemas, and the planner decides how to execute them. The details of the execution are often left to be determined by the animation system. Each object, when instantiated, is set up to have attributes which include its location in space. Our CONTACT PAR has a "reachable" applicability condition which assumes that the location of the object which has to be "reachable" is available in this attribute. There are two HIT PARs, one for a simple contact and one for a contact that results in a change of location of the object. In order to animate the second one, where the object moves, the animation system will make reference to a pre-stored hitting animation. This is parameterized, so if the force in the current animation is greater, the object will go farther, proportionally to the increase in force, but the basic trajectory will remain the same.
Mitamura's first example, "John hit the ball with his bat and hammered it", makes reference to the first HIT PAR, which just involves contact, whereas the second example, "John hit the ball with his bat and caught it", involves the second HIT PAR.
In the same way, the resolution of ambiguities such as "He was waving and drove away," which could be animated to either be performed sequentially or in parallel, would also be the domain of the planner, rather than a representation issue. This kind of ambiguity, which MT applications do not have to resolve, can also be left unresolved by us until the planning stage.
(i) Watashi ga rousoku wo fuki-keshita.
I NOM candle ACC blow-extinguish-PAST
(ii) Watashi ga hi wo keshita.
I NOM flame ACC extinguish-PAST
(iii) Watashi ga rampu wo keshita.
I NOM lamp ACC turn-off-PAST
In the Japanese examples above, "kesu" would consistently be mapped to termination conditions and "fuku" (for the candle) would be mapped to the activity field. For the lamp and the flame, when the activity field is not specified, we can use object specific reasoning, or an object specific database, where the method for being "extinguished" or "turned-off" is associated with the object. (Libby Levison, "Connecting Planning and Acting Via Object-Specific Reasoning," PhD Dissertation, University of Pennsylvania, 1996.)
The examples for "bolt," "tape," and "staple," would have the same representation as "spoon." The instrument or verb (depending on the language) could perhaps be added to the activity field. (Our MANNER field is now used by the animation group exclusively for adverbial modification, which is why we have recently added the activity field.)
Action representations for different languages should generate the same animations, so creating the mappings from these languages to PAR should not be a problem. Rather, by deriving our PAR hierarchy from verb-classes, the challenge is determining to what extent the classes hold cross-linguistically.
To SIG-IL Workshop Series Home Page
Copyright 2000 Computing Research Lab.