Spanish-English in Temple

Spanish Morphology

Spanish morphological analysis is performed by SPOST, the CRL's Spanish part of speech tagger, developed as part of the PANGLOSS Machine Translation System. SPOST utilizes a stemming algorithm to produce a citation form upon which dictionary lookup occurs. Matching dictionary entries contain the probable morphological features that are assigned to a word. SPOST then applies context-sensitive "fix-up" rules to correct and/or disambiguate among the assigned features.

To illustrate, consider the following sentence: "Al momento de su venta a Iberia, VIASA contaba con ocho aviones." SPOST's first pass produces the analysis:

Al/preposition
momento/noun(masculine,sgl)
de/preposition
su/adjective(neuter, sgl)
venta/verb(pres_ind, sgl, first)
a/preposition
Iberia/proper_noun
,/punctuation
VIASA/proper_noun
contaba con/verb(impf_ind, sgl, first)
ocho/noun(masculine, plural)
aviones/avion/noun(masculine, plural).
SPOST's context-sensitive fix-up rules are subsequently applied, and an existing rule:

[su/adj word/verb] ==> [su/adj word/noun]

changes the invalid verbal analysis of 'venta' to venta/noun(feminine, sgl).

Lexicon Features

The Temple Spanish Lexicon was partially derived from the Collins Spanish-English Bilingual Dictionary. It contains approximately 60,000 entries, each of which has the following form:

  1. citation form
  2. morphological category (tag and features)
  3. optional Collins domain category
  4. corresponding English translation(s)

A sample lexical entry follows:

  1. hermosura
  2. noun(feminine)
  3. gen
  4. beauty; loveliness; splendour; lavishness; handsomeness

Bilingual glossary

A bilingual glossary of approximately 20,500 phrases was constructed from an existing PANGLOSS glossary. Each glossary entry consists of a source languages phrase and one or more corresponding target languages glosses. The entries are written using word citation forms and utilize variables to represent certain closed-class words, and morphological transfer and agreement information.

A sample glossary entry follows:

SOURCE: que si [dop:1] tener[:2]
TARGET: of course [pronoun:1] have[:2]