| Name | Language | Category | Location | Size |
| | | | | |
| Arabic Corpora | Arabic | Corpora |
home/tide/languages/arabic/corpora/raw/
home/tide/languages/arabic/corpora/src/
home/tide/languages/arabic/glossaries/raw/corpus.arabic.1.gz
|
60 Gzip files
140 files
#
|
| Chinese Corpora | Chinese | Corpora |
home/ursa/chinese-trec/data/xinhua
home/ursa/data/corpora/chinese_trec/xinhua
home/mikro/chinese/texts/corpora/200
home/ursa/chinese-trec/data/peoples-daily
home/corpora/LDC/chinese_treebank/data
home/norm/Text/CM
home/norm1/norm/PH/Hanzi.PH.gz
home/ursa/chinese-trec/topics
home/topics.CH29-CH54.chinese.english
home/mikro/chinese/texts/XINHUA
home/mikro/steve/sem-anal/v6/chinese
|
38 MB files
47.6MB
370K
132MB
3.2MB
23.6MB files
4.1MB
23K
23K
44K
142K
|
| Serbo-Croatian Corpora | Serbo-Croatian | Corpora |
home/mcm2/corelli/scr/corpora/raw/sipka
home/tide/languages/croatian/corpora
home/mcm2/corelli/scr/corpora/raw/sipka
home/mcm2/corelli/scr/corpora/src
home/tide/languages/croatian/corpora/old/raw/eci
home/tide/languages/croatian/corpora/old/raw/nbsc
home/wanying/tide/corpora/nbsc
home/corpora/literature/Yugoslav.Corpus
home/corpora/Serbo-Croatian/corpora
home/corpora/Serbo-Croatian/corpora/parallel
|
12MB
83KB
*
83KB
70MB 19 files
4.2MB 34 files
4.2MB
4.2MB
7.8MB
1.4MB
|
| English Corpora | English | Corpora |
home/crest/hshin/olympic/glossary
home/norm/PAHO
home/norm/PAHO
home/norm3/norm/uconcord/english
home/tide/languages/japanese/corpora/bi/english
home/corpora/literature/ota/english
home/corpora/english/engwork/mosceng
home/ursa/data/corpora/english/ap_88
home/ursa/data/corpora/english/ap_89
home/ursa/data/corpora/english/ap_90
home/ursa/data/corpora/english/CL.topics.026-053.english.ucs2
home/corpora/spider/Ann/la1
home/corpora/spider/ICA/la1
home/corpora/spider/Johnny
home/corpora/spider/fbis-4
home/corpora/spider/la2
home/corpora/spider/LAT
home/corpora/spider/stats/anthm10.txt
home/corpora/spider/stats/baskerville.txt
home/corpora/spider/stats/lwmen10.txt
home/corpora/spider/stats/tarzan.txt
|
277KB
182 @ 4.1MB
179 @ 4.1MB
#
802KB
18.2MB
13MB
322 @ 497MB
364 @ 533MB
364 @ 498MB
30KB
21MB
21MB
21MB
29 @ 106MB
56 @ 149MB
45KB
101KB
12KB
#
139KB
|
| French Corpora | French | Corpora |
home/ursa/data/corpora/french/1988
home/ursa/data/corpora/french/1988/1989
home/ursa/data/corpora/french/1988/1990
home/ursa/data/corpora/french/CL.topics.026-053.french.ucs2
home/corpora/literature/ota/french/domjuan.1292
home/corpora/literature/ota/french/exercises.192
|
366 @173MB
365 @ 180MB
365 @ 149MB
32.8KB
129KB
90KB
|
| Italian Corpora | Italian | Corpora |
home/corpora/literature/ota/italian/verga.1917
| 80KB |
| Japanese Corpora | Japanese | Corpora |
home/tide/languages/japanese/corpora/bi/japanese
home/tide/languages/japanese/corpora/mono/raw
|
700KB file
6.2MB
|
| Korean Corpora | Korean | Corpora |
home/rzajac/mcm/src/CRL/lang/kor/test
home/hshin/hshin2/corpus
|
#
#
|
| Persian Corpora | Persian | Corpora |
home/mcm/shiraz/lang_resources/corpora/raw/all_hamshahri/webfiles
home/mcm/shiraz/lang_resources/corpora/raw/hamshahri
home/mcm/shiraz/lang_resources/corpora/raw/hamshahri99
home/mcm/shiraz/lang_resources/corpora/raw/utf8files
home/mcm/shiraz/lang_resources/corpora/raw/newutf8s
home/mcm/shiraz/lang_resources/corpora/raw/j-eslami
home/mcm/shiraz/lang_resources/corpora/src/120sentences/Corpus120.txt
home/mcm/shiraz/lang_resources/corpora/src
home/mcm/shiraz/lang_resources/corpora/src
home/mcm/shiraz/lang_resources/corpora/src/bilingual_corpus
home/mcm/shiraz/lang_resources/corpora/src/bilingual_corpus
|
#
2.36MB
1.4MB
#
1.4MB
358KB
#
#
#
1.1MB
7.6MB
|
| Russian Corpora | Russain | Corpora |
home/tide/languages/russian/corpora/mono/raw/cmpwrld/cmpwrld.txt.gz
home/tide/languages/russian/corpora/mono/raw
home/tide/languages/russian/corpora/mono/raw/moscnews
home/tide/languages/russian/corpora/mono/raw/news
home/tide/languages/russian/corpora/mono/raw/boris1.gz
home/tide/languages/russian/corpora/mono/raw/palms.gz
home/tide/languages/russian/corpora/mono/raw/relcom.gz
home/tide/languages/russian/corpora/mono/raw/src
home/tide/languages/russian/corpora/mono/raw/runtime
home/mcm/corelli/rus/corpora
home/wanying/tide/build/russian/morphology/test/data
|
22KB
3494 files
1.4MB gz
420KB
1.1k
39KB
5KB
589KB
589KB
195KB
364KB
|
| Spanish Corpora | Spanish | Corpora |
home/tide/crlapps/sp_disambiguation
home/tide/languages/spanish/corpora/raw/sp.docs.jl
home/tide/languages/spanish/corpora/raw
home/tide/languages/spanish/corpora/raw
home/norm/PAHO
home/tide/languages/spanish/corpora/raw
home/corpora/spider/af960104
|
108KB
859KB
9.6KB
208KB
179 @ 4.1MB
181 @ 4.1MB
679KB
|
| Turkish Corpora | Turkish | Corpora |
home/corpora/spider/stats/turk1.txt
home/mcm2/expedition/lang_resources/turkish/corpora/alltexts
home/mcm2/expedition/lang_resources/turkish
|
1.1KB
2.5MB
50KB
|
| ATR | | | | 0 K |
| Arabic Glossary | Arabic | Glossary, Corpus | | 17188 K |
| JArticles | | | | 0 K |
| LDC/dso | English | Corpus | | 37151 K |
| LDC/treebank | | | | 655904 K |
| LDC-97 | Arabic, Chinese | text resource | | 185473 K |
| LDC-98 | Arabic, Chinese | Corpus | | 214623 K |
| LDC-00 | Korean | | | 305514 K |
| LDC-01 | Arabic | | | 1106841 K |
| MUC5 | | | | 30169 K |
| MUC7 | | | | 12629 K |
| Serbo-Croatian | Serbo-Croatian | | | 15582 K |
| Spanish | Spanish | | | 1233 K |
| Sun | Dutch, English, French | | 294 A | 11155 K |
| UN | | | | 27550 K |
| acronyms | | | | 211 K |
| celex_1_0 | | Dictionary | | 59471 K |
| eci | | | | 185909 K |
| eci2 | | | 294 A | |
| efe_archive | | | | 594288 K |
| efetoday | | | | 1331 K |
| english | English | | | 7859 K |
| hott | | | | 79 K |
| iata-codes | | | | 341 K |
| ilo_sample | | | | 38860 K |
| irs | | | | 2458 K |
| japanese | Japanese | | 294 A. | 0 K |
| juris | | | | 899224 K |
| literature | | | | 23211 K |
| misc | English | | | 5229 K |
| paho.tmp | | | | 70877 K |
| reuters | | | | 20897 K |
| russian | Russian | | | 88258 K |
| spider | English, Chinese, Persian, Russian | | | 3661339 K |
| wordnet-semcor | | | | 19367 K |
| wsj | | | | 354216 K |
| LDC Multi-Lingual | | Multi-Language | 294 A | |
| LDC:OPA | English, German | Mulit-Language | Room 294A | |
| 12 Vestia | | | | |
| KPA Korean Text | Korean | Text | 294A | |
| ILO Sample Data | English | Text | 294A | 35MB |
| Aligned Spanish/English sentence from UN corpus | English, Spanish | Text | 294A | |
| Russian text | Russian | Text |
home/corpora/russian
|
#
|
| HNC Software | English | Text | Room 294A | 380 MB |
| HNC English Collection Index | English | Word list | 294A | 290 MB |
| LCd 97 Chinese | Chinese | texts | 294A | |
| LDC-97 Thai/Arabic | Thai, Arabic | texts | 294A | |
| HNC English Collection Index | English | text | 294A | (4543 docs) 100MB |
| Croatian-English I & English-Croatian I | English, Croatian | Dictionary | 294A | |
| Serbo-Croatian Dictionary | Serbo-Croatian | Dictionary | 294A | |
| Harper-Collins Electronic Reference | French, English | text | 294A | |
| Fonts | English | Text | 294A | |
| screng8.txt | English | Text | 294A | |
| SEC | English | Text | 294A | |
| Celex Lexical Database | English, Dutch, German | Text | 294A | |
| BRS/Search for UNIX | English | Text | 294A | |
| Biotechnology Citation Index | English | Bibliography | 294A | |
| Amaryllis | French | Text | 294A | |
| Air Travel Information System | English | Text | 294A | |
| LDC Speech Recognition Corpus Disc | English | Speech texts | 294A | |
| Korean Newswire | Korean | Foreign text | 294A | |
| IPAL Japanese Verb | Japanese | Dictionary | 294A | |
| Stored UN Data | Spanish, English | Tar file | 294A | 1.5 GB |
| Spanish-English Patto Disks | Spanish, English | Language texts | 294A | |
| Japanese OpenWindows Developer's Guide | Japanese | Texts | 294A | |
| Collins Bilingual Dictionaries | English, Spanish | Dictionary | 294A | |
| Texas Tech Biomechanics Lab | English | Texts | 294A | |
| Collins Bilingual Dictionaries Large Spanish/English typeset | English, Spanish | dictionaries | 294A | 123 files in directory |