CRL README Inventory list


Links to individual language resources:
Arabic Resources
Chinese Resources
English Resources
French Resources
Italian Resources
Japanese Resources
Korean Resources
Persian Resources
Russian Resources
Serbo-Croatian Resources
Spanish Resources
Turkish Resources
NameLanguageCategoryLocationSize
     
  Arabic Corpora  Arabic Corpora  home/tide/languages/arabic/corpora/raw/
home/tide/languages/arabic/corpora/src/
home/tide/languages/arabic/glossaries/raw/corpus.arabic.1.gz
  60 Gzip files
140 files
#
  Chinese Corpora  Chinese  Corpora  home/ursa/chinese-trec/data/xinhua
home/ursa/data/corpora/chinese_trec/xinhua
home/mikro/chinese/texts/corpora/200
home/ursa/chinese-trec/data/peoples-daily
home/corpora/LDC/chinese_treebank/data
home/norm/Text/CM
home/norm1/norm/PH/Hanzi.PH.gz
home/ursa/chinese-trec/topics
home/topics.CH29-CH54.chinese.english
home/mikro/chinese/texts/XINHUA
home/mikro/steve/sem-anal/v6/chinese
  38 MB files
47.6MB
370K
132MB
3.2MB
23.6MB files
4.1MB
23K
23K
44K
142K
  Serbo-Croatian Corpora  Serbo-Croatian  Corpora  home/mcm2/corelli/scr/corpora/raw/sipka
home/tide/languages/croatian/corpora
home/mcm2/corelli/scr/corpora/raw/sipka
home/mcm2/corelli/scr/corpora/src
home/tide/languages/croatian/corpora/old/raw/eci
home/tide/languages/croatian/corpora/old/raw/nbsc
home/wanying/tide/corpora/nbsc
home/corpora/literature/Yugoslav.Corpus
home/corpora/Serbo-Croatian/corpora
home/corpora/Serbo-Croatian/corpora/parallel
  12MB
83KB
*
83KB
70MB 19 files
4.2MB 34 files
4.2MB
4.2MB
7.8MB
1.4MB
  English Corpora  English  Corpora  home/crest/hshin/olympic/glossary
home/norm/PAHO
home/norm/PAHO
home/norm3/norm/uconcord/english
home/tide/languages/japanese/corpora/bi/english
home/corpora/literature/ota/english
home/corpora/english/engwork/mosceng
home/ursa/data/corpora/english/ap_88
home/ursa/data/corpora/english/ap_89
home/ursa/data/corpora/english/ap_90
home/ursa/data/corpora/english/CL.topics.026-053.english.ucs2
home/corpora/spider/Ann/la1
home/corpora/spider/ICA/la1
home/corpora/spider/Johnny
home/corpora/spider/fbis-4
home/corpora/spider/la2
home/corpora/spider/LAT
home/corpora/spider/stats/anthm10.txt
home/corpora/spider/stats/baskerville.txt
home/corpora/spider/stats/lwmen10.txt
home/corpora/spider/stats/tarzan.txt
  277KB
182 @ 4.1MB
179 @ 4.1MB
#
802KB
18.2MB
13MB
322 @ 497MB
364 @ 533MB
364 @ 498MB
30KB
21MB
21MB
21MB
29 @ 106MB
56 @ 149MB
45KB
101KB
12KB
#
139KB
  French Corpora  French  Corpora  home/ursa/data/corpora/french/1988
home/ursa/data/corpora/french/1988/1989
home/ursa/data/corpora/french/1988/1990
home/ursa/data/corpora/french/CL.topics.026-053.french.ucs2
home/corpora/literature/ota/french/domjuan.1292
home/corpora/literature/ota/french/exercises.192
  366 @173MB
365 @ 180MB
365 @ 149MB
32.8KB
129KB
90KB
  Italian Corpora  Italian  Corpora  home/corpora/literature/ota/italian/verga.1917
  80KB
  Japanese Corpora  Japanese  Corpora  home/tide/languages/japanese/corpora/bi/japanese
home/tide/languages/japanese/corpora/mono/raw
  700KB file
6.2MB
  Korean Corpora  Korean  Corpora  home/rzajac/mcm/src/CRL/lang/kor/test
home/hshin/hshin2/corpus
  #
#
  Persian Corpora  Persian  Corpora  home/mcm/shiraz/lang_resources/corpora/raw/all_hamshahri/webfiles
home/mcm/shiraz/lang_resources/corpora/raw/hamshahri
home/mcm/shiraz/lang_resources/corpora/raw/hamshahri99
home/mcm/shiraz/lang_resources/corpora/raw/utf8files
home/mcm/shiraz/lang_resources/corpora/raw/newutf8s
home/mcm/shiraz/lang_resources/corpora/raw/j-eslami
home/mcm/shiraz/lang_resources/corpora/src/120sentences/Corpus120.txt
home/mcm/shiraz/lang_resources/corpora/src
home/mcm/shiraz/lang_resources/corpora/src
home/mcm/shiraz/lang_resources/corpora/src/bilingual_corpus
home/mcm/shiraz/lang_resources/corpora/src/bilingual_corpus
  #
2.36MB
1.4MB
#
1.4MB
358KB
#
#
#
1.1MB
7.6MB
  Russian Corpora  Russain  Corpora  home/tide/languages/russian/corpora/mono/raw/cmpwrld/cmpwrld.txt.gz
home/tide/languages/russian/corpora/mono/raw
home/tide/languages/russian/corpora/mono/raw/moscnews
home/tide/languages/russian/corpora/mono/raw/news
home/tide/languages/russian/corpora/mono/raw/boris1.gz
home/tide/languages/russian/corpora/mono/raw/palms.gz
home/tide/languages/russian/corpora/mono/raw/relcom.gz
home/tide/languages/russian/corpora/mono/raw/src
home/tide/languages/russian/corpora/mono/raw/runtime
home/mcm/corelli/rus/corpora
home/wanying/tide/build/russian/morphology/test/data
  22KB
3494 files
1.4MB gz
420KB
1.1k
39KB
5KB
589KB
589KB
195KB
364KB
  Spanish Corpora  Spanish  Corpora  home/tide/crlapps/sp_disambiguation
home/tide/languages/spanish/corpora/raw/sp.docs.jl
home/tide/languages/spanish/corpora/raw
home/tide/languages/spanish/corpora/raw
home/norm/PAHO
home/tide/languages/spanish/corpora/raw
home/corpora/spider/af960104
  108KB
859KB
9.6KB
208KB
179 @ 4.1MB
181 @ 4.1MB
679KB
  Turkish Corpora  Turkish  Corpora  home/corpora/spider/stats/turk1.txt
home/mcm2/expedition/lang_resources/turkish/corpora/alltexts
home/mcm2/expedition/lang_resources/turkish
  1.1KB
2.5MB
50KB
  ATR       0 K
  Arabic Glossary  Arabic  Glossary, Corpus    17188 K
  JArticles       0 K
  LDC/dso  English  Corpus    37151 K
  LDC/treebank       655904 K
  LDC-97  Arabic, Chinese  text resource    185473 K
  LDC-98  Arabic, Chinese  Corpus    214623 K
  LDC-00  Korean      305514 K
  LDC-01  Arabic      1106841 K
  MUC5       30169 K
  MUC7       12629 K
  Serbo-Croatian  Serbo-Croatian      15582 K
  Spanish  Spanish      1233 K
  Sun  Dutch, English, French    294 A  11155 K
  UN       27550 K
  acronyms       211 K
  celex_1_0   Dictionary    59471 K
  eci       185909 K
  eci2     294 A 
  efe_archive       594288 K
  efetoday       1331 K
  english  English      7859 K
  hott       79 K
  iata-codes       341 K
  ilo_sample       38860 K
  irs       2458 K
  japanese  Japanese    294 A.  0 K
  juris       899224 K
  literature       23211 K
  misc  English      5229 K
  paho.tmp       70877 K
  reuters       20897 K
  russian  Russian      88258 K
  spider  English, Chinese, Persian, Russian      3661339 K
  wordnet-semcor       19367 K
  wsj       354216 K
  LDC Multi-Lingual   Multi-Language  294 A 
  LDC:OPA  English, German  Mulit-Language  Room 294A 
  12 Vestia       
  KPA Korean Text  Korean  Text  294A 
  ILO Sample Data  English  Text  294A  35MB
  Aligned Spanish/English sentence from UN corpus  English, Spanish  Text  294A 
  Russian text  Russian  Text  home/corpora/russian
  #
  HNC Software  English  Text  Room 294A  380 MB
  HNC English Collection Index  English  Word list  294A  290 MB
  LCd 97 Chinese  Chinese  texts  294A 
  LDC-97 Thai/Arabic  Thai, Arabic  texts  294A 
  HNC English Collection Index  English  text  294A  (4543 docs) 100MB
  Croatian-English I & English-Croatian I  English, Croatian  Dictionary  294A 
  Serbo-Croatian Dictionary  Serbo-Croatian  Dictionary  294A 
  Harper-Collins Electronic Reference  French, English  text  294A 
  Fonts  English  Text  294A 
  screng8.txt  English  Text  294A 
  SEC  English  Text  294A 
  Celex Lexical Database  English, Dutch, German  Text  294A 
  BRS/Search for UNIX  English  Text  294A 
  Biotechnology Citation Index  English  Bibliography  294A 
  Amaryllis  French  Text  294A 
  Air Travel Information System  English  Text  294A 
  LDC Speech Recognition Corpus Disc  English  Speech texts  294A 
  Korean Newswire  Korean  Foreign text  294A 
  IPAL Japanese Verb  Japanese  Dictionary  294A 
  Stored UN Data  Spanish, English  Tar file  294A  1.5 GB
  Spanish-English Patto Disks  Spanish, English  Language texts  294A 
  Japanese OpenWindows Developer's Guide  Japanese  Texts  294A 
  Collins Bilingual Dictionaries  English, Spanish  Dictionary  294A 
  Texas Tech Biomechanics Lab  English  Texts  294A 
  Collins Bilingual Dictionaries Large Spanish/English typeset  English, Spanish  dictionaries  294A  123 files in directory