As mentioned above, the parser/semantic interpreter should, in the ILT for an utterance, include a list of all speech acts possible for that utterance. The discourse module should choose the most appropriate one, given the context. If there are no speech acts in an ILT, the discourse module should attempt to fill one in. Thus, an augmented ILT should include exactly one speech act in its speech-act slot. The arguments to the speech act, e.g., in an utterance performing a suggest speech act, the actor and the thing being suggested, should be straight-forwardly recoverable from the augmented ILT. If it turns out in practice that ambiguity does arise as to which slots contain the arguments, the discourse module must explicitly mark, in the augmented ILT, the speech-act arguments as such.
In addition to the usage-oriented specification of the speech acts prepared by researchers at CMU, it would be useful to consider the basic semantics of the more common speech acts in the scheduling dialogs. The following are not general speech acts, but are applicable only when the participants have decided that they need to meet about something. All concern one or more parameters of this meeting, where the most often-occurring parameter is time; other parameters that may be involved are the location, participants, and the purpose of the meeting: confirm-time (confirm), request-suggestion, state constraint, reject, accept, suggest-time (suggest). The non-optional arguments are the agent (the speaker) and the parameter(s) of the meeting being addressed.
Theoretically, it can be argued that a single utterance can be used to perform multiple speech acts. It can also be argued that multiple utterances together can be used to perform a single speech act. We make the perhaps simplifying assumptions that each utterance performs exactly one speech act, and that each speech act is performed by a single utterance. However, the discourse module will, among other things, perform plan-inference. The speech act in the ILT for an utterance will be a leaf in the plan-inference derivation tree. The actions along the path from the root to the leaf can be thought of as also being assigned to the utterance. Further, an action in the middle of the tree can be thought of as being assigned to all of the utterances covered by the subtree of which that action is the root. Thus, to the extent that the complex cases can be accounted for in a hierarchical model, they are not eliminated by our simplifying assumptions.
Recognizing intentional information (i.e., identifying speech acts) is an attempt to understand something of the deeper meaning of an utterance. Similar intentions might be expressed very differently in different languages. The speech act can be used as a guide to expressing the utterance appropriately in the target language. Further, Grosz & Sidner (1986) and others propose that focus information (for the purpose of ellipsis and anaphora resolution) and discourse intentions are interrelated. Recognizing speech acts might aide the system in performing ellipsis and anaphora resolution. Part of the method proposed in section 3.9 is intended to exploit such constraints.