CUSTOM DATABASES


Special text resources can be incorporated into applications such as Cibola or Oleada using existing text management tools. For the US Department of Defense, Oleada has, for example, been extended to include a Chiefs-of-State reference text and a Gazetteer. The interfaces for new resources are developed to fit the special needs of the data, but the text processing and underlying information retrieval engine are capable of handling most text requirements.

New text resources are typically indexed using one of two information retrieval engines specifically designed for the Oleada environment. The first system is ideal for unchanging data, like a fixed dictionary or reference work. The second information retrieval system is optimized for dynamically changing data like custom glossaries and Translation Memory databases, where rapid update and text modification is required. The newly created resources are then available through the Norm Data Server (NDS) in a full client-server architecture. Oleada client interfaces communicate with the server in a custom language designed for flexible access to a diverse range of textual needs, including dictionaries, thesauri, glossaries and other resources.

Operation

Custom interfaces for new Oleada text resources can be designed and built quickly due to the advantages afforded by the rich communication architecture of the NDS server and Oleada client. Some important features of this arrangement include:

languages, proximity and boolean matching operations and IDF weighted retrieval.
  • A document-signature information retrieval engine that supports fuzzy-term expansion, glob-style wildcard matching, phrase weighting and rapid document deletion and replacement operations with near-real-time indexing for new and modified documents.
  • 8-bit clean indexing and querying.
  • Configurable server that can accommodate hundreds of resources for hundreds of clients.
  • Rapid prototyping of new resource indexes and client-server communication using TCL/DP interfacing language.
  • Advanced tokenization tools for handling complex SGML markup of customer texts.
  • User-centered design of client interfaces for new resources.

    Status

    For the Department of Defense, CRL has developed a Chiefs-of-State and Gazetteer resource that use the vector-based retrieval engine to supply breakdowns of country-specific information on demand to custom interfaces in Oleada.

    In the Chiefs-of-State resource, queries can be by country, person or office, with fuzzy matching strategies providing near misses automatically if an initial search fails. Using the Gazetteer resource, place names can be rapidly associated with their country or region, and map coordinates can even be searched or retrieved.

    In each case, the powerful full-text indexing strategies provide more flexibility in searching than a traditional database scheme could provide, while offloading the processing and memory overhead of the full-text indexes to a single central server.


  • Oleada/Cíbola Home Page
    Last Modified: 10:38am MDT, July 16, 1996