|
|
|
Spanish morphological analysis is performed by SPOST, the CRL's
Spanish part of speech tagger, developed as part of the PANGLOSS Machine
Translation System. SPOST utilizes a stemming algorithm to produce a
citation form upon which dictionary lookup occurs. Matching dictionary
entries contain the probable morphological features that are assigned
to a word. SPOST then applies context-sensitive "fix-up" rules to
correct and/or disambiguate among the assigned features.
To illustrate, consider the following sentence:
"Al momento de su venta a Iberia, VIASA contaba con ocho aviones."
SPOST's first pass produces the analysis:
Al/preposition
momento/noun(masculine,sgl)
de/preposition
su/adjective(neuter, sgl)
venta/verb(pres_ind, sgl, first)
a/preposition
Iberia/proper_noun
,/punctuation
VIASA/proper_noun
contaba con/verb(impf_ind, sgl, first)
ocho/noun(masculine, plural)
aviones/avion/noun(masculine, plural).
SPOST's context-sensitive fix-up rules are subsequently applied, and an
existing rule:[su/adj word/verb] ==> [su/adj word/noun] The Temple Spanish Lexicon was partially derived from the Collins
Spanish-English Bilingual Dictionary. It contains approximately 60,000
entries, each of which has the following form:
A sample lexical entry follows:
A bilingual glossary of approximately 20,500 phrases was constructed from an existing PANGLOSS glossary. Each glossary entry consists of a source languages phrase and one or more corresponding target languages glosses. The entries are written using word citation forms and utilize variables to represent certain closed-class words, and morphological transfer and agreement information.
A sample glossary entry follows:
SOURCE: que si [dop:1] tener[:2] TARGET: of course [pronoun:1] have[:2]