=== PROJECT SUMMARY ===
ORGANIZATION: The Regents of New Mexico State University SUBCONTRACTORS: None PRINCIPAL INVESTIGATOR: Louise Guthrie, louise@nmsu.edu, 505-646-5466 TITLE OF EFFORT: The Consortium for Lexical Research (CLR) OBJECTIVE: The purpose of the Consortium for Lexical Research (CLR) is to establish a clearinghouse and repository for sharable natural language processing resources. These materials accelerate the development of natural language understanding systems by making essential base resources more widely available, and by promoting the reusability of resources such as lexicons, part of speech taggers, parsers, concordances, etc. A principle objective has been to negotiate with publishers to make machine-readable dictionaries (MRD's) more widely available to researchers. APPROACH: First, advance the idea of reusability through promotion of a central repository which can legally and securely house sharable resources. Second, forge relationships with publishers to facilitate incorporating their lexicography into human language technology (HLT) and natural language processing (NLP) research. Publicity networks, including a widely-read newsletter, were designed to attract and inform donors and members. Legal Membership and Provider (Donor) Agreements were developed by NMSU lawyers; customized contracts are now also available. Computer networking facilities have been arranged with various levels of security to ensure controlled access to users with appropriate credentials. Additionally, a WWW Mosaic site was created which contains an on-line catalog of holdings, sample files of electronic dictionaries, and ftp access to files. PROGRESS: CLR's focus in FY94 has been on the acquisition of new materials, membership renewal, and the recruitment of new members. Acquisitions have been our main emphasis, and in the last year the members-only archive holdings more than tripled. An especially worthwhile effort was the collection of software from around the world that is freely available but not easily tractable. Centralizing these resources and creating a "one stop" source for current NLP tools has proven to be a valuable service for our members. More recently, progress has been made toward acquiring lexicons: at this time we have negotiated for Italian, Indonesian, and Korean lexicons to be placed in the archive within the next 6 months, and a Spanish and an Arabic lexicon within a year. We completed negotiations with Harper Collins Publishers and Longman Ltd. that enable members to license MRD's for research purposes. These contracts provide members with the opportunity to acquire both monolingual and bi-lingual MRD's in a dozen languages, at a specially reduced prices for academic researchers. More importantly, there now exists a standardized application format, fee structure, license agreement, and timetable by which NLP researchers can procure an MRD. Our excellent relations with Longman's will lead to the release of six new resources this fall, which CLR will coordinate and distribute. Membership in CLR grew another 30% last year. Current membership totals 66: 38 universities, 22 companies (including Apple, Xerox, and Siemens), and 6 U.S. government agencies. These figures include all 24 participants of the Fifth Message Understanding Conference (MUC-5). This past fiscal year we invoiced $26,000 from membership dues and brokerage fees for dictionaries. FY94 ACCOMPLISHMENTS: 1) In 1994 CLR served as the distribution center for data required by participants in MUC-5. CLR's facilities provided a secure and monitored means of distributing the large volumes of data (such as gazetteers, rules, training texts, etc.), required to build Intelligent Extraction systems. 2) CLR raised it's visibility by establishing a WWW Mosaic site, and by increasing it's newsletter circulation to over 1400 email addresses. Last year, our ftp site accesses averaged about 8,000 per month. 3) A CLR workshop entitled "Large Scale Multilingual Lexical Knowledge Acquisition", is being held in July 1994 in Pisa, Italy. The workshop is sponsored jointly by CLR and the Istituto di Linguistica Computazionale in Pisa. It's two part goal is to contrast and compare ongoing efforts in large-scale semantic knowledge acquisition, and to discuss concrete transatlantic research cooperation and resource evaluation. Approximately 20 invited researchers will participate. 4) CLR will also open the Sharable Natural Language Resources Conference sponsored by Dr. Yuji Matsumoto in Nara, Japan with a paper that discusses obstacles and solutions to sharability and reusability. This conference offers CLR a significant opportunity to make alliances for aquistions with like-minded researchers in Asia. FY95 PLANS: 1) Locate funding support that permits CLR to protect the existing ARPA investment in resources and to capitalize on the exponential growth we've experienced in the past year. 2) Lobby ARPA to designate CLR as the official ARPA and/or federal agency repository for HLT and NLP tools and resources. 3) Develop contracts between funding agencies and grant or contract recipients which would provide that specified data or software results be made available through CLR. 4) Establish Joint-Membership privileges for members of the Lexical Data Consortium (LDC). CLR access would be an inclusive benefit of LDC membership dues. 5) Finalize negotiations and contracts with Websters Dictionary which will allow us to distribute their complete line of dictionaries and lexical resources through CLR. This should be done by late fall 1994. 6) Develop strategic alliances in Asia and Europe with the goal of adding 2 new CLR ftp sites. All 3 sites (New Mexico, Asia, Europe) would then make acquisitions and recruit members, and CLR holdings would be mirrored at each site. TECHNOLOGY TRANSITION: CLR is designed to transfer technology. Below is a sampling of current commitments to make resources built by or under contract to Federal Agencies available through our archives . Melissa Holland, Army Research Institute; Spanish lexicon, Spanish linguistic analysis tools, German-English military dictionary, Arabic lexicon. Eduard Hovy, USC, Information Sciences Institute; Japanese lexicons (those not copyright protected) and full suite of access tools built for the Pangloss MT Project. John White and Teri O'Connell, PRC, for the ARPA MT Evaluation Project; parallel texts in Japanese, French, and Spanish, and all evaluation forms and data used in the MT project. DATE PREPARED: 7/1/94 ========================================================================== FTP AND WORLD WIDE WEB ACCESS FTP address: clr.nmsu.edu path: CLR/ files "catalog" and "catalog.ps" are our descriptive catalogs of materials. Path newsletter/ has all newsletters. WWW Mosaic site: http://crl.nmsu.edu/clr.html Catalog can be browsed or downloaded. Two most recent newsletters are online. This file is available by linking to "CLR ARPA Proposal". =====================