MULTILINGUAL TEXT


CRL has two toolkits for producing multilingual text: the Multilingual Unicode Text Toolkit (MUTT) and the TIPSTER User Interface Toolkit, or TUIT.

MUTT

MUTT is a set of components that allows multilingual text processing at various levels, with the text represented in Unicode form.

MUTT has been developed for Unix platforms using the X11 Window System and the OSF Motif Toolkit, and was built almost entirely based on freely available fonts and information from the Internet.

Supported Languages in MUTT

Highlights

  • Bengali
  • Burmese
  • Devanagari
  • Gujarati
  • Gurmukhi
  • Kannada
  • Khmer
  • Khutsuri
  • Malayalam
  • Mongolian
  • Oriya
  • Sinhalese
  • Tamil
  • Telugu
  • Tibetan
  • Tifinagh
  • Urdu

TUIT

The TIPSTER Architecture has been designed to enable a variety of different text applications to use a set of common text processing modules. Since user interfaces work best when customized for particular applications, it is appropriator that no particular user interface styles or conventions are described in the TIPSTER Architecture specification. However, the Computing Research Laboratory (CRL) has constructed several TIPSTER applications that use a common set of configurable Graphical User Interface (GUI) functions. These GUIs were constructed using CRL's TIPSTER User Interface Toolkit (TUIT). TUIT is a software library that can be used to construct multilingual TIPSTER user interfaces for a set of common user tasks. CRL developed TUIT to support their work to integrate TIPSTER modules for the 6 and 12 month TIPSTER II demonstrations as well as their Oleada and Temple demonstration projects.

Document Editing and Browsing

The TUIT Application Programming Interface (API) and software library supports document editing and browsing. The TUIT Editor (TED) is a GUI that can be used to view and edit multilingual texts. TED takes advantage of CRL's X-multi-attributed-text (Xmat) widget. The GUI is unique in that it provides methods for input, edit, and display of text in multiple languages. TED is being used in several government sponsored projects at CRL, and is appropriate for other projects that require multilingual text display and edit capabilities. Before TUIT, TIPSTER applications that needed multilingual text display and edit capabilities required developers to use the Motif API and Xmat API, and write all TIPSTER document browsing functions using the Motif and Xmat libraries. Ease of incorporating and configuring new applications improve significantly with the TUIT library with its own API as shown in Figure 1. Applications are able to call Xmat library functions on the created widgets as well.

The TUIT API supports the creation of windows, menus, and dialogs. This functionality includes:

An application can include all of this functionality with a single TUIT API function call. A document browser window such as that shown in Figure 2 would be created with this single function call.

Attribute and Annotation Support

In applications that incorporate a TIPSTER compliant document manager, the TUIT API also supports TIPSTER document attribute and annotation browsing and editing

Annotation and attribute browsing and editing allows users to show, create, or delete document attributes, annotations and annotation attributes. There are also interfaces for grouping annotations by type or attribute values and for hiding or showing these annotations groups. Annotated text can be displayed with color highlighting, or with different font styles. Users can also create their own text annotations to be stored with documents.

Document Manager GUI

Another TUIT API function creates window, dialogs and menus for managing TIPSTER collections and documents. It provides an interface that enables users to:

Extraction and Detection Support

TIPSTER compliant extraction modules can be easily integrated with the TUIT GUI. For example Japanese/Chinese interactive segmentation of documents is possible in the current system using CRL's Chinese segmentation system and a TIPSTER front end to JUMAN. The segmentation is preserved as document annotations.

Configuration support

TUIT is configurable at run time on a number of dimensions through a standard configuration format using TCL-style syntax.

Status

TUIT has been fully tested in the Oleada and Temple demonstration projects for SunOS 4.x and 5.x (Solaris).


Oleada/Cíbola Home Page
Last Modified: 11:26am MDT, July 19, 1996