Amharic BurmeseChechen Guarani Maguandanao Uighur Acquirer Tools Acquirer LoginResources

You are here: NMSU >> CRL >> SAY.         en español

The goal of CRL's SAY project is to develop resources for lesser studied languages. Currently the languages being studied are the following:
  • Amharic: One of the languages spoken in Ethiopia.
  • Burmese:Official language of Myanmar.
  • Chechen: The language spoken by the people of Chechnya.
  • Guarani­:A language spoken in Paraguay and Argentina.
  • Maguindanao One of the languages spoken in the Phillipines.
  • Uighur A language spoken in Xinjiang, China.


The SAY team consists of CRL's computational linguists, software developers and native speakers or experts in Amharic, Burmese,Chechen , Guarani, Maguindanao and Uighur.


The current tasks are training the aquirers, customizing CRL's tools to support the aquisition task, developing new user interfaces to make the tools easier to use, and developing training materials for all acquisition tools.


For each language CRL is developing a monolingual text corpus of 250,000 words, a parallel bilingual text corpus of 250,000 words, and a bilingual lexicon of 10,000 headwords. All these resources are free to download.


The SAY project acquirers need to register. Once registration is complete, they should login to work on their tasks--translation, lexicon acquisition, part of speech tagging, named-entity tagging, ...
GO TO RESOURCES >> PROBLEMS? SEND E-MAIL >>
Acknowledgements [en español].
Translating the Sentence Corpus | [en español]
           (Only for special elicitation corpus of English sentences)
Language Acquisition Userguide Version 3.3 [en español].
Simple Named Entity Guidelines [en español].
Time Annotation Guidelines [en español].