The case of mismatches has been widely ignored in comparison to
divergences.
It might well be that the latter, being rather a syntactic
phenomenon, can be detected and resolved more easily than mismatches
which definitely involve a semantic treatment, with practically no
syntactic trigger.
Briefly, divergences are differences in constructions, whereas mismatches are differences in meanings which are similar but not identical from one language to another.
For instance if we consider, the following utterances:
Clearly, French and English lexicalize the concepts bake (i.e. inside an oven) and cook (i.e. on top of the stove) differently. English has two words for these two concepts while French has only one. In the context of these sentences, however, there is no ambiguity in the meaning.
One could choose to have two meanings for cuire, with different selectional restrictions associated to them, and meeting the ones for bake and cook respectively. If cuire seems ambiguous in English (because of different lexicalisations), it is by no means ambiguous in French. Therefore encoding two entries cuire-V1 and cuire-V2 goes against our French monolingual intuition.
It has been argued that a sense enumeration approach fails to render
an account of the creativity of new meanings of words in novel
contexts (Pustejovsky, 1995) and others.
It is indeed true that it is
impossible to have an exhaustive list of meanings for every single
word, or complex expressions (for instance what about the treatment of
metonymies or the whole range of metaphors).
Nevertheless, we claim that by taking advantage of most of the
information listed in lexicons (such as the lexicons described in
[Viegas and Nirenburg, 1995a], and [Viegas et al., 1996]), and embodied
in processing methods, we can produce on the fly new meanings which were not
listed in a lexicon entry.
To do so, we focus below on the lexical semantic information which should be minimally encoded in the lexicon to allow the system to perform the best lexical choice.
The information contained inside a lexeme is divided minimally into 10 zones corresponding to various levels of lexical information, relevant to phonology, orthography, morphology, syntax-semantic linking, stylistics, and paradigmatic and syntagmatic information, along with sub-zones containing triggers for analysis and generation. (see Meyer et al. 1990).
Let us now consider the partial analysis lexicon entry for the French verb
cuire with the following corresponding semantics: COOK,
displayed in (Figure 1).
Figure 1: Partial Sense Entry for the French lexical item
cuire.
The entry for cuire has the following selectional
restrictions for the agent, HUMAN and for the theme FOOD.
In fact, some of these
constraints can be part of the conceptual frame, resulting in
no extra effort in acquisition for the lexicographer.
Conversely, our generation lexicons are indexed on concepts from an ontology (world model) as described in (Mahesh and Nirenburg, 1995) and also on interlingua structures (such as attitudes/relations). The major advantage of using an ontology is to enable knowledge sharing among different natural languages, thus supporting multilinguality, and also minimizes problems linked to mismatches.
We give below some relevant fragments of entries for COOK, for the French generation
lexicon, where we focus on the syntax-semantics interface, namely: SYN
(subcategorisation information) and SEM (providing the semantic
information with associated selectional restrictions), as shown in
(Figure 2).
Figure: Partial Entry in the French generation lexicon
for the concept COOK.
Our transcategorial approach to sense discrimination is a good basis for paraphrasing, thus the concept COOK from the ontology, can be lexicalised in our French lexicon, at least in: cuire, dorer, laisser mijoter... (verbs), cuisson, dorure (nouns), cuit (adjective). Moreover, it renders vacuous problems linked to divergences (Viegas & Nirenburg, 1995b).
As far as mismatches are concerned, we improved our processing
mechanisms to handle such mismatch cases by searching for the best
match in the ontology at run-time by using generalization and
specialization mechanisms which try to fit the input to the most
appropriate level of generality of the concepts involved.
On the other hand, if both concepts BAKE and COOK are to be in the same sentence as in:
then only planning techniques can help us generate additional arguments, such as cuire au four, (cook with an oven) as we do not want to generate:
This last sentence emphasizes the fact that choosing between cuire or cuire au four is a question of lexical choice which cannot be performed outside of context, as we detail in next section.