WORD FREQUENCY


CRL's Word Frequency tool provides users with a simple interface for viewing word statistics for individual documents or large collections of documents. In addition, word frequencies in individual documents or smaller sub-collections can be automatically compared to larger collections to identify ``distinctive'' words in the document that are significant with respect to the larger collection. This feature can be used to identify important ``domain specific'' words. By looking at the these frequency lists, a language analyst or instructor can improve their coverage and avoid missing prominent words.

The word frequency tool also works with TIPSTER documents and collections and

takes advantage of word segmentation annotations to count Chinese and Japanese words. Documents and collections are processed quickly and results can be re-accessed though collection attributes.

Highlights


Oleada/Cíbola Home Page
Last Modified: 01:05pm MDT, July 25, 1996