New Mexico
State
University
Home Research CRL Staff Publications Resources Employment CRL Internal

Language Resources Home | Persian | Chinese | Korean | Urdu | Somali
Arabic CRL Resources
Arabic is the principle language of Algeria, Bahrain, Egypt, Iraq, Israel (as one of the official languages), Jordan, Kuwait, Lebanon, Libya, Mauritania, Morocco, Oman, Qatar, Saudi Arabia, Syria, Sudan, Tunisia, United Arab Emirates and Yemen. It is spoken by around 250 million people, but is understood by up to four times more among Muslims around the world. Arabic is also central to other languages in the Muslim world, as a large exporter of words and expressions. Arabic writing is also used for other languages like Persian and Urdu.

Arabic is divided into 3 separate dialects: Classical written Arabic; written Modern Standard Arabic; and spoken Arabic. Classical written Arabic is principally defined as the Arabic used in the Holy Koran and in the earliest literature from the Arabian peninsula, but is the core of much literature up until our time. Modern Standard Arabic is a modernization of the structures of classical Arabic, and includes words for modern phenomena as well as a rich addition from the many regional dialects all over the Arabic world. Spoken Arabic is a mixed form, which has many variations, and often a dominating influence from local languages (from before the introduction of Arabic). Differences between the different variants of spoken Arabic can be large enough to make them mutually incomprehensible. Hence it could be correct to refer to the different versions as separate lanuages named according to their areas, like Moroccan, Cairo Arabic, North Syrian Arabic etc.

Soure:Encyclopaedia of the Orient.

Sample Arabic text

For further information about Arabic please visit the Ethnologue's Website.
The Arabic-English dictionary:
The Arabic-English dictionary is essentially a morphological dictionary with English translations. It does not contain usual part-of-speech information nor proper citation forms. Instead, an entry key is a morphological stem, typically a sub-string of an inflected word. All stem variants for the same word are listed. Each entry contains a morphological category (number of the inflectional paradigm for that stem, field. The following table shows the common stem categories:

001..025Common noun stems without orthographic change
026..055Common noun stems with orthographic change
056..057Function Words
058..059Proper Noun stems
060..081Perfect verbs stems
082..088Perfect/Imperfect verb stems
089..129Imperfect verb stems

For more information, please see the file here.

The Arabic-English dictionary file made available for download is in XML format and contains 122920 entries including the Arabic proper names.

     


Arabic Morphological Analyzer:
CRL's Morphological Analyzer generates analyses for texts in Arabic, Persian and Urdu. The analyzer is written in C (ANSI C) and been tested for Unix/Windows/Linux.

The analyzer uses a validation table to generate the valid morphemes. Three rules of validation are implemented prefix-suffix, prefix-root, and root-suffix. These rules are used to validate the possible concatenations of prefix-root, root-suffix.

The analyzer outputs all possible valid combinations with the appropriate part-of-speech (POS) and other features about the prefix and the suffix. For efficiency the rules are hard-coded in the analyzer because they are fixed and could be enumerated.

The package contains the source code for the analyzer, test examples in Arabic, Persian and Urdu and their outputs. A readme file contains details about compiling and running the analyzer on the different platforms.

A description of the analyzer and tables of features recognized for each language can be found here.

An example output for the Arabic word :
((3,14), root='..', PreffTag=ValP005,pref=det,det=the, Stem=R018, Desc="Common noun stems without orthographic change", SuffTag=ValS006,SuffType=feminine_singular,text_form='') ((3,14), root='..', PreffTag=ValP005,pref=det,det=the, Stem=R019, Desc="Common noun stems without orthographic change", SuffTag=ValS006,SuffType=feminine_singular,text_form='') ((3,14), root='..', PreffTag=ValP005,pref=det,det=the, Stem=R020, Desc="Common noun stems without orthographic change", SuffTag=ValS006,SuffType=feminine_singular,text_form='') ((3,14), root='...', PreffTag=ValP012,pref=prefval11,prefval11, Stem=R082, Desc="Perfect/Imperfect verb stems", SuffTag=ValS001,SuffType=null_suffix,text_form='')