Representation of Actions as an Interlingua
by
Karin Kipper and Martha Palmer

Critique by Teruko Mitamura

A Parameterized Action Representation (PAR) as a form of interlingua is presented and used to animate virtual human agents in a simulated 3D environment.

1. Underspecified Actions

The difference between Interlingua for MT and PAR schema for animation instruction is that the PAR includes many details, such as start and end state information, which may be underspecified in the language. For example, the authors show that in a PAR schema for actions of contact, such as "John hit the ball with his bat", applicability conditions and preparatory actions have to be satisfied before the action can be executed, and termination conditions and post assertions need to be represented, so that the PAR schema can provide enough information for production of animation.

The paper does not mention any challenges which a PAR schema may encounter when certain linguistic phenomena are encountered. In particular, when a series of actions occur, it is essential to have continuity from each action to the next action for correct animation instruction.

Consider the following sentence:

 (1) John hit the ball with his bat and hammered it.

We assume the reading of the verb "hammer" which implies a downward striking motion using an instrument (a hammer). The most probable interpretation of sentence (1) would be that John hits the ball, where the ball is probably on the ground, and then John either drops the bat or shifts the bat to his left hand while picking up a hammer to hammer the ball. The location of the ball is underspecified, and what John would do with his bat when he moves to the next action is also not specified. This underspecification would not be a problem for MT, since it is not necessary to explicitly generate implied actions, but for animation instruction these implications must be resolved properly so that a continuous, correct animation can be achieved.

The location of the ball would be different when the secondary action differs in meaning:

 (2) John hit the ball with his bat and caught it.

In this action, John probably hit the ball high up in the air, and the ball fell down to him as he caught it. In order to catch the ball with his hands, he probably needs to drop the bat first, or at least free one hand. The first action is very different from the one in (1) and it cannot be determined until the second action is described.

Consider also the following example:

 (3) John hit the ball with his bat and kicked it.

In this case, John doesn't have to do anything about the bat in order to kick the ball. Whether or not the ball is in the air or on the ground when the hitting occurs is ambiguous.

As these examples illustrate, this type of underspecification must be resolved in order to provide complete animation instruction. It is not clear how the PAR could handle these underspecified and/or ambiguous actions, and how the planner could plan these actions for animation.

2. Verb-framed and Satellite-framed Languages

The authors claim that the PAR schemas don't distinguish the difference between verb-framed and satellite-framed languages, since the PAR is independent from the source language. The filler of activity field is ACTION, which determines how the action is performed. The manner of motion, such as "float", is represented in the activity field in the PAR in Figure 6, whereas in action with instrument, such as "spoon", the activity field is unspecified in Figure 7. It is not clear why the manner of motion fills the activity field while the action with instrument do not.

In the example below, "blow out" is expressed by compounding the verbs in Japanese.

(4a) I blew out the candle.

(4b) Watashi ga rousoku wo fuki-keshita. I NOM candle ACC blow-extinguish-PAST

Like Spanish, "kesu" (extinguish) is the main verb, and "fuku" (blow) is the manner of how the candle is extinguished. How does the PAR represent (4b)? If "fuku" fills in the activity field and "kesu" maps onto the termination condition, how about the following sentences?

   (5)  Watashi ga  hi      wo  keshita.
          I     NOM flame  ACC  extinguish-PAST

(6) Watashi ga rampu wo keshita. I NOM lamp ACC turn-off-PAST

In Japanese, the same verb "kesu" is used for "extinguishing flame" and "turn off the lamp", since "kesu" means making something to disappear. How would "kesu" be represented in the PAR?

In some cases of conflation in English, Japanese expresses the same action with a main verb and an adjunct to the main verb phrase. Examine the verbs of attaching below.

(7a) He taped the split. (7b) Kare ga wareme wo teepu de tometa. he NOM split ACC tape with attached

(8a) He bolted the handle. (8b) Kare ga handoru wo boruto de tometa. he NOM handle ACC bolt with attached

(9a) He stapled three sheets of paper. (9b) Kare ga san-mai no kami wo hocchikisu de tometa. he NOM three-sheets GEN paper ACC staple with attached

In English, "tape", "bolt", and "staple" are conflated with instrument. In Japanese, the main verb is "tomeru" (attach) and instrument is expressed with the adjunct to the main verb. How is the PAR represented in these examples? Like in the case of "spoon", is the activity field unspecified? If so, why? In the case of conflation, would it make more sense to decompose into action + instrument?

3. Ambiguity Resolution

The authors do not discuss how to handle ambiguity in the system. When there is more than one semantic interpretation of a sentence for action, how do they resolve ambiguity for providing animation instruction?

For example, the following actions could occur sequentially or simultaneously.

   (10a) He was waving and drove away.
   (10b) He waved and drove away.
   (11)  He was eating and walked away.

Without a context, the ambiguity can't be resolved, and this is a general issue for machine translation. However, as the authors point out, machine translation can often preserve ambiguities, whereas in their application, it is essential to resolve ambiguity.

4. Current Status of the System and Future Work

The authors do not provide details about the current status of the system. For example, how many action verbs are represented in the current system?

The authors' future work includes investigation of building action representations from a class-based verb lexicon. When building action representations based on different types of languages (e.g. verb-framed and satellite-framed), what would be the biggest challenges that need to be addressed?

To SIG-IL Workshop Series Home Page

Last Updated: March 31, 2000

Copyright 2000 Computing Research Lab.