Each node in the dependency tree can be thought of as an
attribute-value matrix, i.e., a bundle of features with values. All
values must be set for each node in the tree. This will require
checking each node before finishing the analysis. Here is a list of
features:
Position (wpos). The linear position of the word in the sentence. This should not be modified or annotated, except for new empty nodes created by the annotator, which should always be given a wpos feature which inserts the new node in the place where it would belong if it were not empty.
Word (lex). This is the inflected word form associated with the node. It is almost always correctly displayed already. Example: "anunció" (he/she announced)
Part-of-Speech (POS). This is the lexical class, taken from a short list. Example: verb. Specific options:
o V -- verbs, but not auxiliary verbs (=Aux)
o N -- common nouns
o PN -- proper nouns
o Adj -- adjectives
o Adv -- adverbs
o P -- prepositions and subordinating conjunctions
o Pron --pronouns
o Conj -- coordinating conjunctions, but not subordinating
conjunctions; also includes the comma used in enumerations as a
substitute for "y" (and) or "o" (or)
o Det -- determiners
o Aux -- auxiliary verbs
o Pun -- punctuation marks, but not comma used as conjunctions
o Sym -- various symbols (%, =, and the like)
o Uh -- speech-specific sounds, even if meaningful (such as /ui/)
o Misc -- everything else, including greetings (hola, -Hello-) and interjections (vale -O.K.-)
Citation form . This is the base form (lexeme) of the inflected form. A first "guess" will be provided, which needs to be checked and possibly corrected. Example: for the expression “demostró” (he/she showed), <demostrar> (show).
num -- number: singular or plural.
gen -- gender: masculine (m),
feminine (f),
or common (c). Common
gender is for nouns which
end
in “–e”, such as estudiante (student), which may be
either masculine or femenine.
det --determiner: definite and indefinite. The
definite
article, el|la|los|las,
always precedes the noun in Spanish. Sometimes, the definite
article may also precede an infinitive: e.g. el liberar (freeing). The
indefinite article un|una|unos|unas can also precede the noun in
Spanish.
tense – present (pres): demuestro (I prove); preterit (pret): demostró (he proved); future (fut): demostrará (he will prove).
aspect – progressive (prog): estar realizando (be carrying out); or perfect(perf): haber dejado (have left).
voice- active: alquilo (I rent); or passive (pas): fueron alquilados (they were rented).
mood – indicative (ind), subjunctive (sub), conditional (cond) or imperative (imp).
num --number: singular (sg) or plural (pl).
per -- person: first (1), second (2) or third (3). The subject is often omitted in Spanish but the ending of the verb indicates the person and number. So, in Igualmente señaló (He/She also pointed out), we can assume that the subject is a third person singular.
DOclitic -- direct object clitic pronoun: lo|los|la|las
(him|them|her|them). In Spanish, the direct object
is sometimes repeated. For example, in the clause "Esta
inversión ... el gobierno la
considera ..." ("This investment... the government
considers
it..."), both esta inversión (this
investment) and la (it)
fulfill the function of direct object (Obj). In such cases, we
remove the node for lo (him),
la (her), los| las (them) and will add "DOclitic:
lo| la| los| las" to the features of the
verb. If the Direct Object is not doubled, then the clitic stays
as a separate argument of the verb.
IOclitic -- indirect object clitic pronoun: le or les. In Spanish, the indirect object is often repeated, usually for emphasis. For example, in the clause "[...] a los Arabigos todavía les queda un camino por recorrer"(the Arabic ones still have a way to go), both a los Arabigos (to the Arabic ones) and les (them) fulfill the function of indirect object (Obj2). In such cases, we remove the node for le (him/her/it) or les (them) and will add "IOclitic:le| les" to the features of the verb. If the Indirect Object is not doubled, then the clitic stays as a separate argument of the verb.
refle-- reflexive: se instalaron (they installed themselves)
none -- use "none" when the word is not inflected, other than infinitive verbs (e.g. adjectives in the base form)
Deep Syntactic Role (DSyntRole). The DRole reflects the argument patterns of a verb in its default active form. Thus, it indicates the role of a daughter argument node with respect to its mother predicate node in a somewhat abstract representation.
o Root. This is the main word (usually the verb) in a sentence.
Verbs are heads of sentences and clauses.
Back to verbs
The head of any complete clausal utterance is the main verb. Incomplete utterances (NPs, PPs, Greetings) should have as their head the usual head for that type of phrase.
Auxiliary verbs (ser/estar -be- and haber -have-) are deleted. Their meaning is represented as features on the main verb (for example, aspect: progressive). Modals (poder -can-, deber -should-, tener que -have to, must-) are syntactically very much like auxiliaries, but they are included in IL0 for semantic reasons as dependents on the main verb, which is always in the infinitive form in Spanish. In all cases, when the main verb is missing, as in VP ellipsis, an empty verb node should be created and used as the head of the entire clause.
Sequences of auxiliary verbs (había sido alquilado -it had been rented, podría haber estado lloviendo -it could have been raining-) should be annotated with the main verb as the head, and all auxiliaries removed and modals represented as dependents on heads.
When the main verb is a form of the copula, the head of the clause will be the predicate. There are only two kinds of copular structures in Spanish: equative and predicative. There are two verbs in Spanish that correspond to the English copula in Predicative copular constructions: ser and estar. So, the copula used will be listed under the features of the headword. For equative copular constructions, only “ser” can be used.
Back to verbs
Argument vs. Adjuncts
In distinguising between arguments and adjuncts , consistency is the mosy important thing. This distinction will matter most for annotating empty categories. In addition, each argument will be annotated with a feature encoding its grammatical role. All non-arguments will be annotated as modifiers, including function words.
The only NPs that will be considered arguments for annotation purposes are:
NPs that never appear with a preposition;
NPs that appear with the preposition "a" (personal "a"). In Spanish, the preposition "a" precedes an animate direct object with many verbs: prevenir a (warn), enfrentar a (face), etc.;
NPs that appear as part of an obligatory prepositional complement (e.g. 552.4 millones de dólares serán destinados a proyectos de inversión... -52.4 million dollars will be allocated to investement projects-); or Se está pensando en un proceso gradual... (We are thinking about a gradual process..).
Back to verbs
The role of each argument (subject, object, indirect object) must be annotated as a feature of its node. The deep grammatical relations should be annotated particularly when there is a functional role reversal, i.e. a mismatch between surface subject and deep subject. There are three possible cases:
A) In an impersonal construction, (e.g. se están realizando proyectos/ projects are being carried out), the surface subject should be annotated as the deep object and a created node, "<pro>", will become the deep subject. We delete the node for the pronoun "se", and include "imper" as a feature of the verb.
B) For a reflexive construction, we need to distinguish
between two different cases: "real"reflexive verbs
(the subject and the object are one same person) such as matarse
(kill oneself), and "inherent"reflexives. In the former, the surface
subject should be annotated as both deep subject and deep
object. For Inherent reflexives, i.e., verbs that are conjugated
like
reflexive verbs but have a different subject and object (e.g. enfrentarse
a –to face- in "[...] sus
grandiosos proyectos fracasaron tras enfrentarse a interlocutores
cambiantes y a menudo adversarios..." -[...] their grandiose
projects
failed upon encountering changing and frequently adversary
interlocutors-). Here, we should not include a
separate node for the pronoun se,
but rather list it next to the verb, e.g., <enfrentar+se>,
and annotate the verb with the feature "refle."
C) In a passive construction, (e.g. fueron
alquilados –they
were
rented) the surface subject should be annotated as the deep
object. If the logical subject in the form of a por phrase (e.g.: "por
el presidente Gonzalo Sanchez de Lozada” -by the President
Gonzalo
Sanchez de Lozada) is present, it should be
annotated as the deep subject. If it is absent, an empty
<pro> node should be
created for the deep subject.
Back to verbs
See the general discussion under Empty
Nodes
Back to verbs
Raising verbs will not have a missing category. Instead, annotate them with the surface subject as the direct dependent of the lower verb. In other words, in a raising construction, it is really the lower verb that is imposing the selectional restrictions on the subject of the whole clause.
Verbs (and adjectives) that will be regarded as raising predicates here include parecer hacer algo (seem to do something), necesitar hacer algo (need to do something), soler hacer algo (tend to do something), empezar a/ comenzar a hacer algo (start to do something), resultar estar/ser/tener (turn out to be/to have), ir a hacer algo (be going to do something (gonna)), continuar haciendo algo (continue doing something), estar seguro/-a de ser/estar/tener (be certain about being/having), ser probable (be likely), acabar de hacer algo (finish doing something), venir haciendo algo (be doing something).
1. La compra parece <estar> excluida de momento... (Purchase seems to be ruled out at the moment...)
2. [...] las economías latinoamericanas comienzan a plantearse modelos con mayor pragmatismo. ([..] the Latin American economies are beginning to show models with increased pragmatism).Back to verbs
Control structures should have an empty node included as the subject of the lower verb.
Subject control structures, such as those having intentar hacer algo (try to do something) as their head, are easy to confuse with raising structures (e.g. headed by parecer hacer algo -seem to do something-) because they appear to be the same on the surface.
...Sánchez de Lozada intentará mejorar la distribución de recursos... (Sanchez de Lozada will try to improve the distribution of resources...)
Juan parece desatender sus obligaciones. (John seems to neglect his duties).
Some common subject control verbs/adjectives are intentar
hacer algo (try do something), esperar hacer algo (hope/
expect to do something), querer hacer algo (want/wanna
do something), estar deseando hacer algo (be keen to do
something), estar ansioso/-a por hacer algo (be
eager to do something), desear hacer algo (wish to do
something), decidir hacer algo (decidir to do something), ser
tonto por hacer algo (be silly to do something), ser
dichoso/-a por hacer algo
(be lucky to do something).
Object control verbs include: persuadir (persuade), forzar (force). An empty
node must be included as the dependent of the lower verb.
Note that although querer (want) is a subject control
verb when the subject of querer is
the same as the subject of the embedded clause, when the subject of the
lower clause is different, it is not.
Back to verbs
Non-finite (present participial, past participial or
infinitival)
can appear with (1) or without subjects (2 and 3).
1. Cediendo la
inflación, en los últimos tres años ya no se habla
de crisis. (With inflation lowering,
there is no more talk about
crisis in the last three years).
3. Fundada la empresa "Russian Real Estate" en 1989, recientemente realizó un estudio ... (Founded in 1989, the firm "Russian Real State" recently carried out a study...)
When they appear without subjects, an empty "<pro>" node should be included as a dependent of the verb. In general, non-finite clauses will be dependents of main verbs.
Back to verbs
Small clause complements will be analyzed with the predication as the head of the small clause and dependent on the head verb. The predication may be nominal, prepositional, or adjectival.In the following, the small clause is bracketed:
1. Esta inversión, según
Cossio, aun
cuando no es significativa el gobierno la considera [la única
manera de asegurar crecimiento]... (This investment, according
to
Cossio, although not significant, the government considers it [the only
way to ensure growth]...)
2. ... y la
duración de los trabajos que el informe estima [de tres a cinco
veces superior]... (...and the duration of projects, which
the report estimates ["three to five times greater"])
The analysis of small clauses is identical to predicative
copular constructions, since the overt copula is omitted anyway at IL0.
In the case of a past participle-headed predication, like the
following, the participle should be tagged as a verb as well. The
missing arguments (the deep subject) needs to be added.
Back to verbs
In Spanish, the verb "haber" (there is, there are) is conjugated in the 3rd person singular to indicate existence.
1. [...] hubo la convicción... ([…]there was always the conviction…)
2. No habrá liberación masivas del Café retenido… (There Will Not Be A Massive Release of Stockpiled Coffee)
In this type of structure, the form of the verb “haber” is
the head, and the surface subject is also the deep
subject.
Back to verbs
As with declarative clauses, the head of a question will be its main/lexical verb. The interrogative pronoun will be a dependent of the main verb like any other argument.
When the interrogative pronoun is part of a long-distance dependency, it will not be a dependent of the highest main verb, but rather on the embedded main verb heading the clause in which the interrogative pronoun originated. The linear order will allow a reconstruction of the pronoun's surface position. In cases of long-distance dependencies, there may be "crossing arcs". This is ok.
Back to verbs
If an overt subject is not present, as in (1), include an empty noun; otherwise an imperative will have the same analysis as a declarative sentence.
Back to verbs
A relative clause will be a dependent of whatever
it modifies, in most cases a noun. As with other clauses, its main verb
will be its own head. The relativizer will be a dependent of the main
verb like any other argument or adjunct.
In long-distance dependencies(e.g., Éste es el presupuesto que el ministro creyó que el parlamento había aprobado. -This is the budget that the Minister thought the Parliament had approved-), the relativizer will not be a dependent of the highest main verb, but of the embedded main verb heading the clause in which it originated. The linear order will allow a reconstruction of its surface position.
Reduced relative clauses (e.g., Según un estudio realizado recientemente por ...; According to a study recently performed by...) are analyzed like regular relative clauses without overt relative pronoun. They have only an object node inserted, but not an empty complementizer, nor an empty auxiliary.
Reduced relative clauses appear similar to non-finite past or present participial clauses and may be difficult to distinguish from these. However, they will always depend on a nominal rather than a verbal head. Two tests to use to decide whether the clause is modifying the verb or a noun:
Can you insert mientras (while) or siendo (being) at the beginning without changing the meaning? If yes, it should modify a VP; otherwise, it's a dependent of the NP.
Back to verbs
The surface vs. deep subject of a passive construction can be indicated through the use of the features. The grammatical subject (usually the patient) will be indicated as the deep object.
The underlying subject (usually the agent), if expressed,
will be annotated as the deep subject. If it is not expressed, an empty
node should be included.
The node for the auxiliary ser (be) will be
deleted, but we will include it under the features of the participle.
1. [...] los 300 metros cuadrados
del
tercer piso...
fueron alquilados esta mañana... ([…] the 300 square
meters
on the third floor […]were rented this morning…)
2. Las mismas fueron
inscritas en las
ofertas de la Unión Europea en la Ronda Uruguay. (The
same
agreements were recorded in the European Union's proposals at the
Uruguay Round.)
3. El presupuesto nacional de
Bolivia... fue
promulgado recién este viernes por el presidente Gonzalo Sanchez
de Lozada. (The Bolivian national budget… was made public this
Friday by President Gonzalo Sanchez de Lozada)
Back to verbs
VP-ellipsis should be annotated with an empty verbal head as the root node. Any auxiliaries and the subject will be dependents of this node. No missing arguments should be added. Also see section on empty nodes
The head of a noun phrase is the head noun. A definite determiner (el| la| los| las), or an indefinite one (un| una| unos| unas) will be included in the features of the head noun; any other determiner is a dependent of the head noun. Adjectives are separate dependents from determiners. If there are multiple adjectives, the default structure will simply have each adjective as a direct dependent of the noun. This is the case for multiple determiners also.
Proper nouns should have the value PN for feature POS. They are treated largely like nouns, except that compound proper nouns are not analyzed syntactically as if they were common nouns. So in América Latina, América is the head, has POS PN, and carries the other features of this proper noun. Latina is a dependent on América (with SRole Mod), and also has POS PN.
In a noun phrase consisting of
only a quantifier, the quantifier should be the head of the NP. Any
modifying phrases are directly dependent on it.
1. Los Doce
acordaron una
tregua...
(The Twelve agreed to a truce…)
Adjectives and adverbs will be coded in much the same way
that
nouns and verbs are coded. The same procedure is followed.
General
Adverbs and adjectives depend on the lexemes they
modify --
adjectives for nouns, adverbs for verbs. For example, in the
phrase sus grandiosos proyectos (their
grandiose projects), the adjective grandiosos modifies the
noun proyectos by identifying the scope of the
projects. In "distribuir
más equitativamente", the adverb "equitativamente" modifies the
verb by specifying the manner in which the action was performed.
Degree
The degree of modification can be specified by modifying
adverbs such as muy (very),
as in muy demandados (very
demanded), or bien (well),
as in bien presentes (well
established).
In addition, degree can be expressed by way of
comparative and superlative constructions. For the comparative, the
modifiers más (more) and menos (less) precede
the adjective in
its positive form: soy más alto que Juan (I’m taller
than Juan) soy menos inteligente que tú (I’m less
intelligent than you). In the superlative, el|
la| los| las
(the) and más
or menos precede the adjective: el más nuevo
(the newest one).
In order to simplify the lookup procedure in Omega, and
to allow for a common interlingual representation of degree, adjectives
and adverbs will be shown in their base form (called their "positive
degree"). If they are in the text as comparatives or superlatives, that
will be indicated as a feature of their node.
In Spanish, there are a few irregular comparative forms.
These will also be represented in the parse tree
in their base form. Below is a short list, with the positive form in
capitals, followed by the irregular comparative and superlative forms.
BUENO/BUENA/BUENOS/BUENAS/BIEN (good/well) -- mejor (better)
-- el/la mejor; los/las mejores (the
best)
MALO/MALA/ MALOS/MALAS/MAL (bad/badly) --
peor (worse) -- el/la peor; los/las
peores (the worst)
JOVEN/JOVENES (young) -- menor (younger)--
el/la menor; los.las menores (the youngest)
VIEJO/VIEJA/VIEJOS/VIEJAS (old) -- mayor (older) -- el/la mayor; los/las mayores (the oldest)
Participial adjectives
Participial forms of verbs, i.e., those ending in "-ando" or
"-iendo" (present
participle) or in "-ado" or "-ido" (past
participle), often show up in the
same syntactic positions as adjectives.
In
addition, some adjectives have the form of past participles, e.g.,
cerrada (closed), inesperados
(unexpected).
These participles and participial adjectives can
appear in:
(a) in
post-nominal position:
* interlocutores
cambiantes (changing interlocutors)
* una tienda cerrada
(a closed store)
(b)
copulative position
* Los
resultados fueron acumpando. (The results were accumulating.)
* Las cortinas
están descoloridas.
(The curtains are faded.)
The semantic distinction between participles and adjectives is that participles refer directly to the event denoted by the verb and cast the referent of the modified noun into one of the roles of that event. Adjectives, on the other hand, refer to a state that characterizes the referent of the modified noun. It is not always easy to tell the difference, but here are some tests to help tell the difference:
For
now, there will be separate nodes for V and Prep.
Annotators will annotate each with the correct concept, and if that
concept conflates meaning of the preposition in the verb, e.g.: enfrentarse
a (be faced with), visitar a (visit with), acabar de
(to have just finished) then mark the preposition as "EMPTY".
At IL2, the preposition will disappear.
To verbs
There are two verbs in Spanish that correspond to the English copula: ser and estar. So, the copula used will be listed under the features of the headword. Sentences whose main verb is a copula fall into two types: equative, and predicative. Equative use of to be equates two entities ([...] lo más importante es mantener un equilibrio fiscal...; [...] the most important thing is to maintain a fiscal balance...) while the predicative use asserts that the post-verbal predicate holds of the deep subject (Juan es médico; John is a doctor). For equative copular constructions in Spanish, only “ser” can be used.
Both Equative or Predicative copular constructions will have the predicate (noun, adjective, or preposition) as their head. Note that the grammatical role of the predicate reflects the role of the predicative construction in the sentence. In [...] pero <los proyectos> están lejos de ser satisfactorios.. (but <the works> are far from being satisfactory...), ser (be) is a modifier because it depends on lejos (far)
Conjunction has its own part-of-speech (Conj). The
conjunction
(y, o, pero,etc) (and, or, but, etc) is placed as a dependent of
the first conjunct with role Mod, and the second conjunct is a
dependent of the conjunction with role Obj.
If a comma acts as a conjunction, it is treated as such
(given part-of-speech Conj and analyzed as in the above paragraph); "[...]
la región comienza a mostrar avances significativos, y prueba de
ello es que en los últimos tres años no se habla ya de
crisis, la inflación está cediendo...(the region is
beginning to show significant advances, and proof of that is in the
last three years there has no longer been any talk of a crisis,
inflation is yielding…) However, note that in
"[...],y prueba de
ello es que..." ([...],
and proof of that is…) the comma does not serve as a conjunction (since
there is an explicit "y" –and-), and it is removed at IL0. The
last comma does serve as a conjunction.
To verbs
An empty node is a node that does not corrrespond to a word (or other
graphical manifestation such as a punctuation mark) in the input string.
In all cases, when you create an empty node, give it a wpos feature so
that it ends up in a position that roughly corresponds to its
grammatical function (i.e., if it is a subject, to the left of its
governing verb, and so on).
There are (at least) two types of empty nodes.
Big-PRO, and related cases
These are cases of empty nodes where the meaning can be derived from the syntactic context:
Big-PRO is the missing subject in embedded infinitivals. For example, in "[...] un joint venture ruso-turco acaba de terminar la construcción de un edificio...” ([…] a Russian-Turkish joint venture has just finished construction on a small executive office building…), the implicit subject of "terminar" is (co-referential with) "el joint venture ruso-turco".
VP ellipsis is the term for cases in which the main verb is replaced by the adverb también (too), as in 1, or by a modal, as in 2:
VP ellipsis requires an empty verbal head; the adverb or the modal are deleted in the usual manner and replaced as needed by features. In addition, add all missing arguments (but not adjuncts), as described above. In these cases, we introduce an empty node and identify the node with which it is co-referential. We then copy the co-referential node's word and lexeme values to the empty node, but add brackets around the value: "< venture>".
Gapping. In gapping, a verb is deleted in a conjunction (Francisca comió un melocotón y Elisa, un albaricoque. Francis ate a peach, and Elise, an apricot.). In the second conjunct, the verb must be restored as an empty node, as with VP ellipsis.
These include:
missing por phrases in passives. For example, in "[...] los 300 metros cuadrados... fueron alquilados esta mañana..." ([…] the three hundred meters… were rented this morning…), we know that there is another argument role which is not explicitly mentioned, namely the person who rented the space.
arbitrary empty subjects in adjunct clauses. For example, in "[...]el liberar los mercados permitiría un aumento en el nivel de vida..." ([…] freeing up the markets would allow for an increase in the standard of living…), the subject of "liberar" (freeing up) is not specified.
In these cases, we cannot tell what the understood missing item is syntactically, but rather only pragmatically. We introduce an empty node and we label both the lexeme and the word feature of the new node "<pro>". In case of doubt (i.e. whether it is big-Pro "<venture>" or " little-pro <pro >"), ask yourself: can I tell from syntax alone what this node means? If no, "<pro>". If yes, fill in a copy of the co-referent lexeme.
Remove all punctuation, except meaningful punctuation. Examples:
Quotes -- leave them (open and closed) attached to the constituent that is quoted.
Commas that act as conjuncts (see Conjunction)
Do remove:
All non-conjunction commas.
All sentence-final punctuation.
All dashes and so on.