Features on All Nodes

Features on Nodes  

·              Position (wpos) The linear position of the word in the sentence. This should not be modified or annotated, except for new empty nodes created by the annotator, which should always be given the wpos 100.

·              Word (lex.) This is the inflected word form associated with the node. It is almost always correctly displayed already. Example: "anunció" (he/she announced)

·              Part-of-Speech (POS). This is the lexical class, taken from a short list.  Specific options:

              o V -- verbs

              o N -- common nouns

              o PN -- proper nouns

              o Adj -- adjectives

              o Adv -- adverbs

              o P -- prepositions and subordinating conjunctions

              o Pron – pronouns

              o Num -- numerals

              o G -- ?

              o Conj -- coordinating conjunctions, but not subordinating conjunctions; also includes the comma used in enumerations instead of repeated "y

              o Det -- determiners

              o Pun -- punctuation marks, but not the comma used in conjunctions

              o Sym -- various symbols (%, =,  and the like)

              o Uh -- speech-specific sounds, even if meaningful (such as /UH HUH/)

              o Misc -- everything else, including greetings (Hola) and interjections (Okay)

·              Citation form. This is the citation form (lexeme) of the inflected form. A first "guess" will be included, which needs to be checked and corrected. Example: “demostró” (he/she showed) <demostrar> (show)

·              Morphological Features (Morph). A complete specification of the morphological features needed to derive the inflected word form from the base form. The options are grouped by part-of-speech; in the menu of the GRAPH tool, all options are displayed at once and the GRAPH tool does not enforce a proper choice of features given the part-of-speech. Possibilities are:

           o NOUNS (including proper nouns and determiners)

               o  num – number: singular or plural
               o  gen -- gender: masculine, feminine or common (“<c>”).  Common gender is when a word ends in “–e”, such as “presente” (present) or                estudiante” (student).
               o  det: definite-- the definite article, "el" "la", always precedes nouns in Spanish.

                     o VERBS (including auxiliaries: “ser” (be) for the passive voice; “estar” (be) for progressive tenses, and “haber” (have) for perfect tenses)


    o  tense – present, preterit, conditional, future: "canto" (I sing) , "canté” (I sang)

 

    o aspect – progressive: “realizando” (carrying out); or perfect:  ha dejado” (has left)

 

    o  voice- active or passive: "fueron alquilados" (they were rented)

                                                                                                                                                                                                                 

    o  mood – indicative, subjunctive, conditional or imperative

    o  num -- singular or plural

    o  per -- in Spanish the subject is often omitted because the ending of the verb designates the person. So, in "Igualmente señaló" (He/She also pointed out), we can assume that the subject is a third person

    o  pcp-- past participle "retenido" (retained)

    o  ger -- gerund "pensando" (thinking)

    o  inf -- base form when used in infinitive  "reducir" (reduce)

    o  refle-- reflexive form of the verb "se instalaron" (they installed themselves)

    o  imper -- impersonal structure of the verb "se ve" (it is seen)

         o ADJECTIVES and ADVERBS

 

    o  comp -- morphological comparative.  In Spanish, there are four words that have a morphological comparative form (See “Adjectives and Adverbs” sections for more details).

  

    o  sup – morphological superlative.  In Spanish, there are four words that have a morphological superlative form (See “Adjectives and Adverbs” sections for more details).

 

             o In comparatives and superlatives formed with más (more) and menos (less), leave the adjective unlabelled.

·              Deep role (DRole). This is the role of the node with respect to its mother, in some deeper representation. This is a little murky. We will use strictly syntactic criteria. Specifically, DRole is different from SRole only if there is a form of the verb in which it is realized with more arguments. DRole reflects the argument patterns of the verb if it were in its active, non-ergative form.

 

o Root. This is the main word (usually the verb) in a sentence. 

o Subj -- deep subject. The surface subject, except for passives or ergative verbs (fueron inscritas), in which case there is no deep subject or it is expressed (for passives) by the by phrase. The deep subject may, but need not, agree in person and number with the tensed verb; empty surface subjects are like overt surface subjects in that they may or may not be deep subjects.

o Obj -- deep object. This will most often not have a preposition associated with it in the underlying form (on the surface it might).  There is an exception in Spanish: the “personal” a. In Spanish, we use “a” before a noun object when the object is a definite person or a personified thing.  Examples:  ¿A quiJn ve Vd.? (Whom do you see?); Veo a Juan (I see John).  This is in surface subject position for passives and ergatives. This is also the deep role of a complement of a preposition.

o Obj2 -- deep indirect object.

o PObj -- deep prepositional object. An object that is always dominated by a preposition. Example: poner el libro (Obj) sobre (PObj) la mesa (Obj) (put the book on the table) but dar el libro (Obj) a Antonio (Obj2) (give the book to Anthony).

o Mod -- all adjuncts, including modifiers, auxiliaries, appositions, and the like.

·              Done. This feature is only a check to make sure that the default values have been checked. Set it to "Y" when you are done with the features for one node.

 

If DRole is omitted, it is assumed to be the same as SRole (which is frequently the case). There is a certain redundancy among these features, which is intended (for error checking).

Verbs

Verbs are heads of sentences and clauses.

 

·              Choosing a head

 

The head of any complete clausal utterance is the main verb. 

When the main verb is a form of the copula, the head of the clause will vary depending on the type of copular sentence. Predicative copular constructions will have the predicate as their head.  Equative copular constructions will have the copula as their head. There are two verbs in Spanish that correspond to the English copula in Predicative copular constructions: ser and estar.  So, the copula used will be listed under the features of the headword. For equative copular constructions, only “ser” can be used.

·              Grammatical Relations

 

The role of each argument (subject, object, indirect object) must be annotated as a feature of its node.

The deep grammatical relations should be annotated particularly when there is a functional role reversal, i.e. a mismatch between surface subject and deep subject.  There are three possible cases:

A) In an impersonal construction, (e.g. se están realizando proyectos/ projects are being carried out), the surface subject should be annotated as the deep object.  A created node,  "<pro>", will become the deep subject.  We delete the node for the pronoun "se", and include "imp" as a feature of the verb.

B) In a reflexive construction.  We should distinguish between two different kinds of reflexive verbs: "real" reflexive verbs (the subject and the object are one same person) such as "matarse" (kill oneself), and "inherent" reflexives.  In the former, the surface subject should be annotated as both deep subject and deep object.  "Inherent" reflexives are verbs that are conjugated like a reflexive verb (e.g. "enfrentarse a" –to face), but that have a different object, i.e. other than the subject: " [...] sus grandiosos proyectos fracasaron tras enfrentarse a interlocutores cambiantes y a menudo adversarios..." (their grandiose projects failed upon encountering changing and frequently adversary interlocutors).  In the latter category, we should not include an empty node for the pronoun "se", but list it next to the verb: "<enfrentar+se>" and annotate the verb with the feature "refle."

C) In a passive construction, (e.g. fueron alquilados –they were rented) the surface subject should be annotated as the deep object.  The grammatical subject (usually the patient) should not be annotated as the deep subject; instead, an empty "<pro>" node should be included.

·                Non-finite clauses

 

Verb phrases without subjects

Non-finite (gerundive and infinitive) verb phrases (as present participles or infinitives) can appear with or without subjects (1 and 2). Past participles (3) can only appear without subjects.

   1. [...] liberar los mercados permitiría un aumento en el nivel de vida... ([…] freeing up the markets would allow for an increase in the standard of living…)
   2. [...] el acuerdo suscrito a nivel mundial por los productores desde octubre pasado para defender los precios. ([…]the agreement signed at the world level by producers last October to defend prices.)
 
3. Según un estudio realizado por la empresa "Russian Real Estate", especializada en la inmobiliaria de oficinas... (According to a study recently performed by the "Russian Real Estate" business, specializing in office space real estate…)

When they appear without subjects, an empty "<pro>" node should be included as a dependent of the verb.  In general, non-finite clauses will be dependents of main verbs.

·              There-insertion

 

In Spanish, the verb "haber" (there is, there are) is conjugated in the 3rd person (singular and plural) to indicate existence.

   1. [...] hubo la convicción... ([…]there was always the conviction…)
   2. No habrá liberación masivas del Café retenido(There Will Not Be A Massive Release  of Stockpiled Coffee)

In this type of structure, the form of the verb “haber” is the head, and the surface subject is also the deep subject.

·              Relative Clauses

 

A relative clause will be the dependent of whatever it modifies, in most cases a noun. As with other clauses, its main verb will be its own head. The relativizer will be a dependent of the main verb like any other argument.

1.  [...] América Latina es una de las regiones del mundo con mayor grado de desigualdad, lo cual es fácilmente verificable... ([…] Latin America is one of the regions of the world with the greatest degree of inequality, which is easily verifiable…)

2.  [...] una política que durante muchos años tuvo un "sesgo concentrador"... ([…] a policy which had for many years had a "bias toward concentration"…)

3. La nueva Organización Común del Mercado de la Banana (OCMB), que entró en vigor en julio de 1993,... (The new Common Organization for Banana Marketing and Distribution (OCMB) which became operational in July 1993,…)

4. [...] Bélgica, Holanda y Alemania--cuyos puertos sirven de eje distribuidor... ([…]-Belgium, Holland and Germany-whose ports serve as the distribution center…)

·              Passives

         The surface vs. deep subject of a passive construction can be indicated through the use of the   features. The grammatical subject (usually the patient) will be indicated as the underlying object.  We will create an empty "<pro>" node for the deep subject.  The surface subject will be annotated as the deep object.

1. [...] los 300 metros cuadrados del tercer piso... fueron alquilados esta mañana... ([…] the 300 square meters on the third floor […]were rented this morning…)

2. Las mismas fueron inscritas en las ofertas de la Unión Europea en la Ronda Uruguay. (The same agreements were recorded in the European Union's proposals at the Uruguay Round.)

3. El presupuesto nacional de Bolivia ... fue promulgado recién este viernes por el presidente Gonzalo Sanchez de Lozada. (The Bolivian national budget… was made public this Friday by President Gonzalo Sanchez de Lozada)

Nouns and Proper Nouns

·             Nominal modifiers
The head of a noun phrase is the head noun. A definite determiner (el, la, los, las) will be included in the features of the head noun; any other determiner is a dependent of the head noun. Adjectives are separate dependents from determiners. If there are multiple adjectives, the default structure will simply have each adjective as a direct dependent of the noun. This is the case for multiple determiners also.

·             Proper Nouns
Proper nouns should have the value PN for feature POS. They are treated largely like nouns, except that compound proper nouns are not analyzed syntactically as if they were common nouns, but rather given right-branching structures.  So in América latina, América is the head, has POS PN, and carries the other features of this proper noun.  Latina is a dependent on América (with SRole Adj), and also has POS PN.

·             Quantifier headed NPs

In a noun phrase consisting of only a quantifier, the quantifier should be the head of the NP. Any modifying phrases are directly dependent on it.

Los Doce acordaron una tregua... (The Twelve agreed to a truce…)

Adjectives and Adverbs

Adjectives and adverbs will be coded in much the same way that nouns and verbs are coded. The same procedure is followed.

·               General

Adverbs and adjectives point to modifying concepts -- adjectives for nouns, adverbs for verbs.  For example, in the phrase "sus grandiosos proyectos" (their grandiose projects), the adjective "grandiosos" modifies the concept "proyectos" by identifying the scope of the projects.  In "No se puede seguir aplicando un neoliberalismo a ultranza" (Neoliberalism at Any Price Is No Longer Applicable), the adverbial phrase "a ultranza" (at any price) modifies the verb by specifying the manner in which the action was performed.

·              Degree

The degree of the modification can be specified by other modifiers such as "muy" (very) or "bien" (well), as in "muy demandados" (very demanded) or "bien presentes" (well established).  These degree modifiers are also adverbs.

In addition, there are two kinds of degree specification that have a special form for irregular adjectives.  They are known as the comparative and superlative forms.  In the first, the degree of modification is specified by comparing the case in question to one other case: [...] las economías latinoamericanas comienzan a plantearse modelos con mayor pragmatismo,... ([…] Latin American economies are beginning to utilize more pragmatic models…)  In the second, the degree of modification is specified by comparing to all other cases:  "lo más importante" (the most important one).

In Spanish the comparative and superlative degrees are represented as follows.   For the comparative form, the modifiers "más" (more) and "menos" (less) precede the adjective in its positive form: "soy más alto que Juan" (I’m taller than Juan),  "soy menos inteligente que td" (I’m less intelligent than you).  In the superlative structure,  "el/la/los/las" (the) and " más" or "menos" precede the adjective: "el más nuevo" (the newest one).  (This method of representation is called "compositional" because the meaning is expressed through the "composition" of several words.)

In order to simplify the lookup procedure in Omega, and to allow for a common interlingual representation of degree, adjectives and adverbs will be shown in their base form (called their "positive degree"). If they are in the text as comparatives or superlatives, that will be indicated as a feature of their node.

There are a few words in Spanish that have irregular forms of comparison. These will also be represented in the parse tree in their base form. Below is a short list, with the positive form in capitals, followed by the irregular comparative and superlative forms.

BUENO/BUENA/BUENOS/BUENAS/BIEN  (good/well)  -- mejor (better) -- el/la mejor; los/las mejores (the best)
MALO/MALA/ MALOS/MALAS/MAL
(bad/badly) -- peor (worse) -- el/la peor; los/las peores (the worst)
JOVEN/JOVENES
(young) -- menor (younger)-- el/la menor; los.las menores (the youngest)
VIEJO/VIEJA/VIEJOS/VIEJAS
(old) -- mayor (older) -- el/la mayor; los/las mayores (the oldest)

·              Participial adjectives

Quite often participial forms of a verb will show up in syntactic positions also occupied by adjectives.  Some adjectives also have the form of past participles, e.g., “alquilados” (rented), “dominado” (dominated).

These participles and participial adjectives can appear

·              (a) in post-nominal position:

            una tienda cerrada (a closed store)
            *  interlocutores cambiantes (changing interlocutors)

·              (b) copulative position

Los resultados fueron inesperados. (The results were unexpected.)

*  Las cortinas est<n descoloridas. (The curtains are faded.)

 

The semantic distinction between participles and adjectives is that participles refer directly to the event denoted by the verb and cast the referent of the modified noun into one of the roles of that event. Adjectives, on the other hand, refer to a state that characterizes the referent of the modified noun.

It is not always easy to tell the difference. Here are some clues / tests to tell the difference:

(1) If you can add the adverb "muy" (very) in front of the participial form, then it is probably an adjective. For this test to work, however, the adjective must be scalar or gradable. 

(2) If there are dependents on the participial form (a direct object, or an agent), then it is likely that it is a verb. Thus most postnominal modifiers will be verbs, since their position almost guarantees the presence of additional dependents.

(3) If the word is not listed in an on-line dictionary like Acontecer as an adjective, it is likely to be a verb.

(5) When in doubt, make your best guess and discuss the issue with Owen. Participles, which are sometimes coded as adjectives, are generally coded here as verbs.  Participles are the "-ando", "-iendo" and "-ado", "-ido" forms of the verb and are not main verbs.  For example in the sentence "Los interlocutores est<n cambiando” (The interlocutors are changing), “cambiando” is a present participle and modifies the "interlocutores".  Similarly, in the sentence "El presupuesto nacional de Bolivia... fue promulgado...tras un dilatado debate parlamentario ..." (The Bolivian national budget… was made public this Friday by President Gonzalo Sanchez de Lozada...) "dilatado" is a past participle and modifies "debate".  Since these are coded as verbs, they will assign semantic roles.

·              Copular Adjectives

See the manual section on "copular constructions" for how to handle such sentences as "En todo Moscú se están realizando proyectos, pero están lejos de ser satisfactorios..." (All over Moscow, projects are being carried out, but they are far from satisfying…)

 

        Copular Constructions

Copular constructions

When the main verb is a form of the copula, the head of the clause will vary depending on the type of copular sentence. Predicative copular constructions will have the predicate as their head; e.g. “Juan es alto” (John is tall). Equative copular constructions will have the copula as their head; e.g. “Clark Kent es Superman” (Clark Kent is Superman). There are two verbs in Spanish that correspond to the English copula in Predicative copular constructions: ser and estar.  So, the copula used will be listed under the features of the headword. For equative copular constructions, only “ser” can be used.

 

        Conjunctions

Conjunction has its own part-of-speech (Conj). The conjunction (y, o, pero,etc) (and, or, but, etc) is palced as a dependent of the first conjunct with role Mod, and the second conjunct is a dependent of the conjunction with role Obj.

If a comma acts as a conjunction, it is treated as such (given part-of-speech Conj and analyzed as in the above paragraph); "[...] la región comienza a mostrar avances significativos, y prueba de ello es que en los últimos tres años no se habla ya de crisis, la inflación está cediendo...(the region is beginning to show significant advances, and proof of that is in the last three years there has no longer been any talk of a crisis, inflation is yielding…)  However, note that in ", prueba de ello es que..." (, proof of that is…) the comma does not serve as a conjunction (since there is an explicit "y" –and-), and it is removed at IL0. The last comma does serve as a conjunction.


Empty Nodes

An empty node is a node that does not corrrespond to a word (or other graphical manifestation such as a punctuation mark) in the input string.

In all cases, when you create an empty node, give it a wpos feature so that it ends up in a position that roughly corresponds to its grammatical function (i.e., if it is a subject, to the left of its goerning verb, and so on).

There are (at leat) two types of empty nodes.

Empty nominal nodes: big-PRO, and related cases

These are cases of empty nodes where the meaning can be derived from the syntactic context:

·              Big-PRO is the missing subject in embedded infinitivals. For example, in "[...] un joint venture ruso-turco que acaba de terminar la construcción de un edificio...” ([…] a Russian-Turkish joint venture that has just finished construction on a small executive office building…), the implicit subject of "terminar" is (co-referential with) "un joint venture ruso-turco".

In these cases, we introduce an empty node and identify the node with which it is co-referential. We then copy the co-referential node's word and lexeme values to the empty node, but add brackets around the value: "< venture>".

Empty nominal nodes: little-pro, missing argument in passive, and related cases

·              The missing argument in agentless passives. For example, in "[...] los 300 metros cuadrados... fueron alquilados esta mañana..." ([…] the three hundred meters… were rented this morning…), we know from syntax that there is another argument role which is not explicitly filled, namely the deep subject. This role is added at IL0. We cannot tell syntactically what this is, only pragmatically.

·              Arbitrary empty subjects, usually in adjunct clauses. For example, in "[...]el liberar los mercados permitiría un aumento en el nivel de vida..." ([…] freeing up the markets would allow for an increase in the standard of living…), the subject of "liberar" (freeing up) is not specified.

·              Missing subject implied in the verb.  In the structure "Igualmente señaló que en mayo próximo..." (Likewise, it indicated that next May…), there is no explicit subject; however, we know that the subject is a third person singular.  We introduce an empty node "<pro>" and we add to it the features of the verb.

In these cases, we label both the lexeme and the word feature of the new node "<pro>". In case of doubt ("<pro>" or " <venture >"), ask yourself: can I tell from syntax alone what this node means? If no, "<pro>". If yes, fill in the lexeme.

Punctuation

          Remove all punctuation, except meaningful punctuation. Examples:

·              Quotes -- leave them (open and closed) attached to the constituent that is quoted. 

·              Commas that act as conjuncts (see Conjunction)

Do remove:

·              All non-conjunction commas.

·              All sentence-final punctuation.

·              All dashes and so on.