The format for lexeme names is the character "+", followed by the dictionary form of the lexeme, followed by "-" and an indication of the sense of the dictionary form, followed by a representation of the syntactic category (e.g., v, n, adj) and a number indicating which sense of that syntactic category. For example, +eat-v2 is the lexeme label for the second sense of the verb eat.
A proper name which is to be entered into the lexicon would reference an Onomasticon entry in the lexical semantics (defined below), but would otherwise appear just as any other lexicon entry. For example, +Paris-n1 might be the label for the lexical item Paris which names the city Paris, France. This arrangement allows language-independent world knowledge to be maintained independently of language-specific nomenclature (which, in turn, affects its phonology, morphology, syntactic behavior, etc).
The CAT, MORPH, SYN, and SYN-STRUC zones are all referenced during the syntactic parsing (to include segmentation, tokenization, and morphological analysis) which, in practice, precedes the invocation of the semantic analysis (locally) in the processing model described below. The SEM zone is a focus of interest because it is the locus of interaction with the ontology (and onomasticon) knowledge base, and thus the source of many of the building blocks of the eventual meaning representation in TMR. Other sections in this document, as well as some of the papers, discuss the formalism used in the lexical semantic specification in the SEM zone, and the utilization of that specification is discussed in subsequent sections.
The syntactic parse structure used in the system described here is a modification of a Lexical-Functional Grammar (LFG) f-structure level parse representation (although it shall be referred to as an f-structure anyway). In the syntactic parses, the traditional LFG f-structure is augmented by a ROOT identifier (akin to the labelling of a node in a tree structure); at each level of the structure, the ROOT identifier is followed by the wordsense identifier for the relevant word in the parse. The representation can be thought of as a list representation of a (possibly recursive) feature structure, where each attribute name is followed by either a symbol value or another (imbedded) f-structure. For example, the f-structure below is the preferred parse of the sentence The old man ate a doughnut in the shop.
The same structure may also be viewed in the (perhaps more familiar) typed feature structure matrix shown in Figure 4A. Figure 4A not available.]((ROOT +EAT-V1)
(MOOD DECL) (VOICE ACTIVE) (NUMBER S3)
(CAT V) (TENSE PAST) (FORM FINITE)
(SUBJ ((ROOT +MAN-N1)
(NUMBER S3) (CAT N)
(PROPER -) (COUNT +) (CASE NOM)
(DET ((ROOT +THE-DET1) (CAT DET)))
(MODS ((ROOT +OLD-ADJ1) (CAT ADJ)
(ATTRIBUTIVE + -))))
(OBJ ((ROOT +DONUT-N1)
(NUMBER S3) (CAT N) (PROPER -) (COUNT +)
(DET ((ROOT +A-DET1) (CAT DET)))))
(PP-ADJUNCT ((ROOT +IN-PREP1)
(CAT PREP)
(OBJ ((ROOT +SHOP-N1)
(NUMBER S3) (CAT N)
(PROPER -) (COUNT +)
(DET ((ROOT +THE-DET1)
(CAT DET))))))))
Since f-structures do not indicate linear order, the fs-pattern merely indicates a piece large enough to establish all necessary dependencies. Thus, in the simple case, the fs-pattern for a verb will indicate the arguments which the verb subcategorizes for. In LFG f-structures, all arguments (including subjects) are immediate children of the verb node, so the selection in the fs-pattern is for elements which are descendants of the current lexeme in the f-structure tree. However, we also use the same mechanism for syntactic relationships other than arguments. So adjectives and prepositions, for example, select (in their respective fs-patterns) for the syntactic head which they modify (in addition, prepositions select for their arguments.)
In the fs-patterns, we place variables at the ROOT positions selected for by the lexeme in question, which is identified by the variable $var0; this allows the fs-patterns to be inherited (using the CLASS mechanism described above). Subsequently numbered variables ($var1, $var2, ...) identify other nodes in the f-structure with which the current lexeme has syntactic or semantic dependencies. For example, the fs-pattern below is appropriate for any regular monotransitive verb:
((root $var0)Or, viewed as a feature structure:
(subj ((root $var1) (cat n)))
(obj ((root $var2) (cat n))))
[figure missing]
The exact syntactic relationship of words in a sentence may vary by syntactic transformations, valency changes, or movement rules; for this reason, we introduce this level of indirection (the variables) in the fs-patterns. Additional advantages of this mechanism include the ability to inherit fs-patterns from a hierarchy, as well as reducing the work in assigning lexical-function <==> case role correspondences.
In cases of lexicon entries for idioms, verbs with particles, non-compositional collocations, etc., the ROOT attribute in an fs-pattern may be followed by a specific lexeme, not by a variable. For example, the special sense of kick which defines the idiom kick the bucket will select for an OBJect with ROOT +bucket-n1, where +bucket-n1 is a lexeme identifier for a standard sense of the word bucket. Additionally, in the fs-pattern, the attribute-value pair will be followed by the symbol null-sem as follows: (ROOT +bucket-n1 null-sem) to indicate that this sense of bucket does not contribute to the semantics of the idiom. In cases of semantic structure of idioms such as spill the beans, spill will select for an OBJect which will specify (ROOT +beans-n3), meaning that this special sense of beans (meaning information) does contribute its meaning as an idiom chunk to the entire idiom. In both of these cases, the root specified is obligatory, so the special sense in question will fail the syntactic parse (in analysis) if the selected-for root does not appear in the utterance. In generation, this special sense will get selected in the lexical selection process only if the meaning is appropriate.
The SYN-STRUC zone has two facets. If the word is syntactically regular, non-idiomatic, having no particles, etc., then the CLASS facet is used to indicate which fs-pattern to inherit from the class hierarchy. If none of the class fs-patterns are appropriate for the lexeme in question, an fs-pattern may be locally specified in the LOCAL facet; in fact, both a class and local information may be specified, and the two fs-patterns are unified.
In addition to specifying syntactic dependency structure, the fs-pattern also indicates an interaction with the meaning pattern from the SEM zone, in that certain portions of the meaning pattern for a phrase or clause are regularly and compositionally determined by the semantics of the components (Principle of Compositionality); the structure of the resulting meaning pattern is determined not only by the semantic meaning patterns of each of the components, but also by their syntactic relationship in the f-structure.