
| CRL's Word Frequency tool provides
users with a simple interface for viewing word statistics for
individual documents or large collections of documents. In addition,
word frequencies in individual documents or smaller sub-collections
can be automatically compared to larger collections to identify
``distinctive'' words in the document that are significant with
respect to the larger collection. This feature can be used to identify
important ``domain specific'' words. By looking at the these frequency
lists, a language analyst or instructor can improve their coverage and
avoid missing prominent words.
The word frequency tool also works with TIPSTER documents and collections and | takes advantage of word segmentation
annotations to count Chinese and Japanese words. Documents and
collections are processed quickly and results can be re-accessed
though collection attributes.
|