New Mexico
State
University
Home Research CRL Staff Publications Resources Employment CRL Internal

Language Resources Home | Arabic | Persian | Chinese | Korean | Somali
Urdu Resources
Urdu is the native language of about 60 million people and spoken by as many as 140 million. It is the official language of Pakistan and the major language spoken by Muslims in Jammu and Kashmir in India. It is also spoken in other countries such as Afghanistan, Bahrain, Bangladesh, Saudi Arabia, and United Kingdom.

Urdu is an Indo-Iranian (subfamily of Indo-European) language and is mutually intelligible with Hindi. In fact, the major differences between the two languages are that Urdu uses a Perso-Arabic script (as opposed to Devanagari) and it has borrowed a good deal of vocabulary from Persian and Arabic. Traditionally known as Hindustani, the first dictionaries and grammars of the language were compiled in thew 18th century by European missionaries and scholars. More recently works of native writers have focused on dialect studies and the development of Urdu literature. In 1903, the Anjuman Tarraqi-e Urdu, an Urdu language academy, was established and has played an important role in changing the status of the language in both India and Pakistan. In 1979, the government of Pakistan founded the Muqtadara (the National Language Authority) to coordinate the work of national and provincial organization in developing Urdu as an official language. It has published numerous volumes including terminologies and glossaries and language instruction materials. In India, an Urdu encyclopedia, an Urdu dictionary, and several glossaries and terminologies have also been published. The government has set up the Tarraqi-e Urdu Bureau to coordinate provincial Urdu academies and other organizations.

Sample Urdu text

For further information about Urdu, please visit the Ethnologue's website
Urdu-English Dictionary
Urdu-English dictionary contains 8831 entries including proper names and uses Urdu characters:

     

Urdu Resource Package
Our Urdu Resource Package includes a description of Urdu morphology, a prototype Urdu-to-English machine translation system, and a preprocessor, which recognizes dates and performs morphological analysis.

Urdu Morphological Analyzer:
CRL's Morphological Analyzer generates analyses for texts in Arabic, Persian and Urdu. The analyzer is written in C (ANSI C) and been tested for Unix/Windows/Linux.

The analyzer uses a validation table to generate the valid morphemes. Three rules of validation are implemented prefix-suffix, prefix-root, and root-suffix. These rules are used to validate the possible concatenations of prefix-root, root-suffix.

The analyzer outputs all possible valid combinations with the appropriate part-of-speech (POS) and other features about the prefix and the suffix. For efficiency the rules are hard-coded in the analyzer because they are fixed and could be enumerated.

The package contains the source code for the analyzer, test examples in Arabic, Persian and Urdu and their outputs. A readme file contains details about compiling and running the analyzer on the different platforms.

A description of the analyzer and tables of features recognized for each language can be found here.