Features on Nodes
· Position (wpos) The linear position of the word in the sentence. This should not be modified or annotated, except for new empty nodes created by the annotator, which should always be given the wpos 100.
· Word (lex.) This is the inflected word form associated with the node. It is almost always correctly displayed already. Example: "anunció" (he/she announced)
· Part-of-Speech (POS). This is the lexical class, taken from a short list. Specific options:
o V -- verbs
o N -- common nouns
o PN -- proper nouns
o Adj -- adjectives
o Adv -- adverbs
o P -- prepositions and subordinating conjunctions
o Pron – pronouns
o Num
-- numerals
o G -- ?
o Conj -- coordinating conjunctions, but not subordinating conjunctions; also includes the comma used in enumerations instead of repeated "y"
o Det -- determiners
o Pun -- punctuation marks, but not the comma used in conjunctions
o Sym -- various symbols (%, =, and the like)
o Uh -- speech-specific sounds, even if meaningful (such as /UH HUH/)
o Misc -- everything else, including greetings (Hola) and interjections (Okay)
· Citation form. This is the citation form (lexeme) of the inflected form. A first "guess" will be included, which needs to be checked and corrected. Example: “demostró” (he/she showed) <demostrar> (show)
· Morphological Features (Morph). A complete specification of the morphological features needed to derive the inflected word form from the base form. The options are grouped by part-of-speech; in the menu of the GRAPH tool, all options are displayed at once and the GRAPH tool does not enforce a proper choice of features given the part-of-speech. Possibilities are:
o NOUNS (including proper nouns and determiners)
o num – number: singular or plural
o gen -- gender:
masculine, feminine or common (“<c>”). Common gender is when a word ends in “–e”, such as “presente”
(present) or “estudiante” (student).
o det: definite--
the definite article, "el" "la", always
precedes nouns in Spanish.
o VERBS (including auxiliaries: “ser” (be) for the passive voice; “estar” (be) for progressive tenses, and “haber” (have) for perfect tenses)
o tense – present, preterit, conditional, future: "canto"
(I sing) , "canté” (I sang)
o aspect – progressive: “realizando” (carrying out); or perfect: “ha dejado” (has left)
o voice- active or passive: "fueron alquilados" (they were rented)
o mood – indicative, subjunctive, conditional or imperative
o num -- singular or plural
o per -- in Spanish the subject is often omitted because the ending of the verb designates the person. So, in "Igualmente señaló" (He/She also pointed out), we can assume that the subject is a third person
o pcp-- past participle "retenido" (retained)
o ger -- gerund "pensando" (thinking)
o inf -- base form when used in infinitive "reducir" (reduce)
o refle-- reflexive form of the verb "se instalaron" (they installed themselves)
o imper -- impersonal structure of the verb "se ve" (it is seen)
o ADJECTIVES and ADVERBS
o comp -- morphological comparative. In Spanish, there are four words that have a morphological comparative form (See “Adjectives and Adverbs” sections for more details).
o sup – morphological superlative. In Spanish, there are four words that have a morphological superlative form (See “Adjectives and Adverbs” sections for more details).
o In comparatives and superlatives formed with más (more) and menos (less), leave the adjective unlabelled.
· Deep role (DRole). This is the role of the node with respect to its mother, in some deeper representation. This is a little murky. We will use strictly syntactic criteria. Specifically, DRole is different from SRole only if there is a form of the verb in which it is realized with more arguments. DRole reflects the argument patterns of the verb if it were in its active, non-ergative form.
o Root. This is the main word (usually the verb) in a sentence.
o Subj -- deep subject. The surface subject, except for passives or ergative verbs (fueron inscritas), in which case there is no deep subject or it is expressed (for passives) by the by phrase. The deep subject may, but need not, agree in person and number with the tensed verb; empty surface subjects are like overt surface subjects in that they may or may not be deep subjects.
o Obj -- deep object. This will most often not have a preposition associated with it in the underlying form (on the surface it might). There is an exception in Spanish: the “personal” a. In Spanish, we use “a” before a noun object when the object is a definite person or a personified thing. Examples: ¿A quiJn ve Vd.? (Whom do you see?); Veo a Juan (I see John). This is in surface subject position for passives and ergatives. This is also the deep role of a complement of a preposition.
o Obj2 -- deep indirect object.
o PObj -- deep prepositional object. An object that is always dominated by a preposition. Example: poner el libro (Obj) sobre (PObj) la mesa (Obj) (put the book on the table) but dar el libro (Obj) a Antonio (Obj2) (give the book to Anthony).
o Mod -- all adjuncts, including modifiers, auxiliaries, appositions, and the like.
· Done. This feature is only a check to make sure that the default values have been checked. Set it to "Y" when you are done with the features for one node.
If DRole is omitted, it is assumed to be the same as SRole (which is frequently the case). There is a certain redundancy among these features, which is intended (for error checking).
Verbs are heads of sentences and clauses.
·
Choosing a head
The head of any complete clausal utterance is the main
verb.
When the main verb is a form of the copula, the head of the clause will vary
depending on the type of copular sentence. Predicative copular constructions
will have the predicate as their head.
Equative copular constructions will have the copula as their head. There
are two verbs in Spanish that correspond to the English copula in Predicative
copular constructions: ser and estar. So, the copula used
will be listed under the features of the headword. For equative copular
constructions, only “ser” can be used.
·
Grammatical Relations
The role of each argument (subject, object, indirect
object) must be annotated as a feature of its node.
The deep grammatical relations should be annotated particularly when there is a
functional role reversal, i.e. a mismatch between surface subject and deep
subject. There are three possible cases:
A) In an impersonal construction, (e.g. se están realizando proyectos/ projects
are being carried out), the surface subject should be annotated as the deep
object. A created node, "<pro>", will become the
deep subject. We delete the node for the pronoun "se",
and include "imp" as a feature of the verb.
B) In a reflexive construction. We should distinguish between two
different kinds of reflexive verbs: "real" reflexive verbs (the
subject and the object are one same person) such as "matarse"
(kill oneself), and "inherent" reflexives. In the former, the
surface subject should be annotated as both deep subject and deep object.
"Inherent" reflexives are verbs that are conjugated like a reflexive
verb (e.g. "enfrentarse a" –to face), but that have a
different object, i.e. other than the subject: " [...] sus grandiosos
proyectos fracasaron tras enfrentarse a interlocutores cambiantes y a menudo
adversarios..." (their grandiose projects failed upon
encountering changing and frequently adversary interlocutors). In the latter category, we should not
include an empty node for the pronoun "se", but list it next to the
verb: "<enfrentar+se>" and annotate the verb with the
feature "refle."
C) In a passive construction, (e.g. fueron alquilados –they were rented)
the surface subject should be annotated as the deep object. The
grammatical subject (usually the patient) should not be annotated as the deep
subject; instead, an empty "<pro>" node should be included.
·
Non-finite clauses
Verb phrases without subjects
Non-finite (gerundive and infinitive) verb phrases (as present participles or infinitives) can appear with or without subjects (1 and 2). Past participles (3) can only appear without subjects.
1.
[...] liberar los mercados permitiría un aumento en el nivel de vida... ([…]
freeing up the markets would allow for an increase in the standard of living…)
2. [...] el acuerdo
suscrito a nivel mundial por los productores desde octubre pasado para defender
los precios. ([…]the agreement signed at the world level by
producers last October to defend prices.)
3. Según un estudio
realizado por la empresa "Russian Real Estate", especializada en la
inmobiliaria de oficinas... (According to a study recently performed
by the "Russian Real Estate" business, specializing in office
space real estate…)
When they appear without subjects, an empty "<pro>" node should
be included as a dependent of the verb. In general, non-finite clauses
will be dependents of main verbs.
·
There-insertion
In Spanish, the verb "haber" (there is, there
are) is conjugated in the 3rd person (singular and plural) to indicate existence.
1. [...] hubo la
convicción... ([…]there was always the conviction…)
2. No habrá
liberación masivas del Café retenido… (There Will Not Be A Massive
Release of Stockpiled Coffee)
In this type of structure, the form of the verb “haber” is the head,
and the surface subject is also the deep subject.
·
Relative Clauses
A relative clause will be the dependent of whatever it modifies, in most cases a noun. As with other clauses, its main verb will be its own head. The relativizer will be a dependent of the main verb like any other argument.
1. [...] América Latina es una de las regiones del mundo con mayor grado de desigualdad, lo cual es fácilmente verificable... ([…] Latin America is one of the regions of the world with the greatest degree of inequality, which is easily verifiable…)
2. [...] una política que durante muchos años tuvo un "sesgo concentrador"... ([…] a policy which had for many years had a "bias toward concentration"…)
3. La nueva Organización Común del Mercado de la Banana (OCMB), que entró en vigor en julio de 1993,... (The new Common Organization for Banana Marketing and Distribution (OCMB) which became operational in July 1993,…)
4. [...] Bélgica, Holanda y Alemania--cuyos puertos sirven de eje distribuidor... ([…]-Belgium, Holland and Germany-whose ports serve as the distribution center…)
·
Passives
The surface vs. deep subject of a passive construction can be indicated through the use of the features. The grammatical subject (usually the patient) will be indicated as the underlying object. We will create an empty "<pro>" node for the deep subject. The surface subject will be annotated as the deep object.
1. [...] los 300 metros cuadrados del tercer piso... fueron alquilados esta mañana... ([…] the 300 square meters on the third floor […]were rented this morning…)
2. Las mismas
fueron inscritas en las ofertas de la Unión Europea en la Ronda Uruguay. (The
same agreements were recorded in the European Union's proposals at the Uruguay
Round.)
3. El presupuesto nacional de
Bolivia ... fue promulgado recién este viernes por el presidente Gonzalo
Sanchez de Lozada. (The Bolivian national budget… was made public
this Friday by President Gonzalo Sanchez de Lozada)
·
Nominal modifiers
The head of a noun phrase is the head noun. A definite determiner (el, la, los,
las) will be included in the features of the head noun; any other determiner is
a dependent of the head noun. Adjectives are separate dependents from
determiners. If there are multiple adjectives, the default structure will
simply have each adjective as a direct dependent of the noun. This is the case
for multiple determiners also.
·
Proper Nouns
Proper nouns should have the value PN for feature POS. They are treated largely
like nouns, except that compound proper nouns are not analyzed syntactically as
if they were common nouns, but rather given right-branching structures.
So in América latina, América is the head, has POS PN, and
carries the other features of this proper noun. Latina is a
dependent on América (with SRole Adj), and also has POS PN.
· Quantifier headed NPs
In a noun phrase consisting of only a quantifier, the quantifier should be the head of the NP. Any modifying phrases are directly dependent on it.
Los Doce acordaron una tregua... (The Twelve agreed to a truce…)
Adjectives and adverbs will be coded in much the same way that nouns and verbs are coded. The same procedure is followed.
· General
Adverbs and adjectives point to modifying concepts --
adjectives for nouns, adverbs for verbs. For example, in the phrase
"sus grandiosos proyectos" (their grandiose projects), the
adjective "grandiosos" modifies the concept "proyectos"
by identifying the scope of the projects. In "No se puede seguir
aplicando un neoliberalismo a ultranza" (Neoliberalism at Any Price Is
No Longer Applicable), the adverbial phrase "a ultranza" (at
any price) modifies the verb by specifying the manner in which the action was
performed.
· Degree
The degree of the modification can be specified by other
modifiers such as "muy" (very) or "bien" (well), as in
"muy demandados" (very demanded) or "bien presentes"
(well established). These degree modifiers are also adverbs.
In addition, there are two kinds of degree specification that have a special
form for irregular adjectives. They are known as the comparative and
superlative forms. In the first, the degree of modification is specified
by comparing the case in question to one other case: [...] las economías
latinoamericanas comienzan a plantearse modelos con mayor pragmatismo,...
([…] Latin American economies are beginning to utilize more pragmatic
models…) In the second, the degree of modification is specified by
comparing to all other cases: "lo más importante" (the
most important one).
In Spanish the comparative and superlative degrees are represented as follows.
For the comparative form, the modifiers "más" (more)
and "menos" (less) precede the adjective in its positive form:
"soy más alto que Juan" (I’m taller than Juan), "soy
menos inteligente que td"
(I’m less intelligent than you). In the superlative structure,
"el/la/los/las" (the) and " más" or "menos"
precede the adjective: "el más nuevo" (the newest one). (This method of representation is
called "compositional" because the meaning is expressed through the
"composition" of several words.)
In order to simplify the lookup procedure in Omega, and to allow for a common
interlingual representation of degree, adjectives and adverbs will be shown in
their base form (called their "positive degree"). If they are in the
text as comparatives or superlatives, that will be indicated as a feature of
their node.
There are a few words in Spanish that have irregular forms of comparison. These
will also be represented in the parse tree in their base form. Below is a short
list, with the positive form in capitals, followed by the irregular comparative
and superlative forms.
BUENO/BUENA/BUENOS/BUENAS/BIEN (good/well) -- mejor (better)
-- el/la mejor; los/las mejores (the best)
MALO/MALA/ MALOS/MALAS/MAL (bad/badly) -- peor (worse) -- el/la
peor; los/las peores (the worst)
JOVEN/JOVENES (young) -- menor (younger)-- el/la menor; los.las
menores (the youngest)
VIEJO/VIEJA/VIEJOS/VIEJAS (old) -- mayor (older) -- el/la mayor;
los/las mayores (the oldest)
· Participial adjectives
Quite often participial forms of
a verb will show up in syntactic positions also occupied by adjectives.
Some adjectives also have the form of past participles, e.g., “alquilados” (rented),
“dominado” (dominated).
These participles and participial adjectives can appear
· (a) in post-nominal position:
* una tienda cerrada (a
closed store)
* interlocutores
cambiantes (changing interlocutors)
·
(b)
copulative position
* Los resultados fueron inesperados. (The results were unexpected.)
* Las cortinas est<n descoloridas. (The curtains are faded.)
The semantic distinction between participles and adjectives is that participles refer directly to the event denoted by the verb and cast the referent of the modified noun into one of the roles of that event. Adjectives, on the other hand, refer to a state that characterizes the referent of the modified noun.
It is not always easy to tell the difference. Here are some clues / tests to tell the difference:
(1) If you can add the adverb "muy" (very) in front of the participial form, then it is probably an adjective. For this test to work, however, the adjective must be scalar or gradable.
(2) If there are dependents on the participial form (a direct object, or an agent), then it is likely that it is a verb. Thus most postnominal modifiers will be verbs, since their position almost guarantees the presence of additional dependents.
(3) If the word is not listed in an on-line dictionary like Acontecer as an adjective, it is likely to be a verb.
(5) When in doubt, make your best guess and discuss the issue with Owen. Participles, which are sometimes coded as adjectives, are generally coded here as verbs. Participles are the "-ando", "-iendo" and "-ado", "-ido" forms of the verb and are not main verbs. For example in the sentence "Los interlocutores est<n cambiando” (The interlocutors are changing), “cambiando” is a present participle and modifies the "interlocutores". Similarly, in the sentence "El presupuesto nacional de Bolivia... fue promulgado...tras un dilatado debate parlamentario ..." (The Bolivian national budget… was made public this Friday by President Gonzalo Sanchez de Lozada...) "dilatado" is a past participle and modifies "debate". Since these are coded as verbs, they will assign semantic roles.
· Copular Adjectives
See the manual section on "copular constructions" for how to handle such sentences as "En todo Moscú se están realizando proyectos, pero están lejos de ser satisfactorios..." (All over Moscow, projects are being carried out, but they are far from satisfying…)
Copular constructions
When the main verb is a form of the copula, the head of the clause will vary depending on the type of copular sentence. Predicative copular constructions will have the predicate as their head; e.g. “Juan es alto” (John is tall). Equative copular constructions will have the copula as their head; e.g. “Clark Kent es Superman” (Clark Kent is Superman). There are two verbs in Spanish that correspond to the English copula in Predicative copular constructions: ser and estar. So, the copula used will be listed under the features of the headword. For equative copular constructions, only “ser” can be used.
Conjunction has its own part-of-speech (Conj). The conjunction (y, o, pero,etc) (and, or, but, etc) is palced as a dependent of the first conjunct with role Mod, and the second conjunct is a dependent of the conjunction with role Obj.
If a comma acts as a conjunction, it is treated as such
(given part-of-speech Conj and analyzed as in the above paragraph); "[...] la región comienza a
mostrar avances significativos, y prueba de ello es que en los últimos tres
años no se habla ya de crisis, la inflación está cediendo...(the region is
beginning to show significant advances, and proof of that is in the last three
years there has no longer been any talk of a crisis, inflation is yielding…)
However, note that in
", prueba de ello es que..." (, proof of that is…) the
comma does not serve as a conjunction (since there is an explicit "y"
–and-), and it is removed at IL0. The last comma does serve as a conjunction.
Empty Nodes
An empty node is a node that does not corrrespond to a word (or other graphical
manifestation such as a punctuation mark) in the input string.
In all cases, when you create an empty node, give it a wpos feature so that it
ends up in a position that roughly corresponds to its grammatical function
(i.e., if it is a subject, to the left of its goerning verb, and so on).
There are (at leat) two types of empty nodes.
Empty nominal nodes: big-PRO, and related cases
These are cases of empty nodes where the meaning can be derived from the syntactic context:
· Big-PRO is the missing subject in embedded infinitivals. For example, in "[...] un joint venture ruso-turco que acaba de terminar la construcción de un edificio...” ([…] a Russian-Turkish joint venture that has just finished construction on a small executive office building…), the implicit subject of "terminar" is (co-referential with) "un joint venture ruso-turco".
In these cases, we introduce an empty node and identify the node with which it is co-referential. We then copy the co-referential node's word and lexeme values to the empty node, but add brackets around the value: "< venture>".
Empty nominal nodes: little-pro, missing argument in passive, and related cases
· The missing argument in agentless passives. For example, in "[...] los 300 metros cuadrados... fueron alquilados esta mañana..." ([…] the three hundred meters… were rented this morning…), we know from syntax that there is another argument role which is not explicitly filled, namely the deep subject. This role is added at IL0. We cannot tell syntactically what this is, only pragmatically.
· Arbitrary empty subjects, usually in adjunct clauses. For example, in "[...]el liberar los mercados permitiría un aumento en el nivel de vida..." ([…] freeing up the markets would allow for an increase in the standard of living…), the subject of "liberar" (freeing up) is not specified.
· Missing subject implied in the verb. In the structure "Igualmente señaló que en mayo próximo..." (Likewise, it indicated that next May…), there is no explicit subject; however, we know that the subject is a third person singular. We introduce an empty node "<pro>" and we add to it the features of the verb.
In these cases, we label both the lexeme and the word feature of the new node "<pro>". In case of doubt ("<pro>" or " <venture >"), ask yourself: can I tell from syntax alone what this node means? If no, "<pro>". If yes, fill in the lexeme.
Remove all punctuation, except meaningful punctuation. Examples:
· Quotes -- leave them (open and closed) attached to the constituent that is quoted.
· Commas that act as conjuncts (see Conjunction)
Do remove:
· All non-conjunction commas.
· All sentence-final punctuation.
· All dashes and so on.