The UCData Unicode Character Properties and Bidi Algorithm Package

This library is freeware.

Download source [Version: 2.9 Date: 25 March 2005]. (.tar.bz2) (.tar.gz) (.zip)

The UCData library provides code to generate and use a compact set of small databases of Unicode character properties. These properties include: A Java class with most of the same API is provided and uses the same data files as the C library, loading the files from a base URL.

UCData includes two other items that may be of interest:

UCData API Documentation.


Pretty Good Bidi Algorithm

The Pretty Good Bidi Algorithm is a small, fairly simple, reasonably fast implicit bidirectional reordering algorithm that works pretty good. The purpose of this implementation is to demonstrate that approaches other than those adhering strictly to the Unicode reference algorithm are possible.

As far as it has been tested, this implementation produces the same results as the reference bidi reordering implementation provided at http://www.unicode.org/reports/tr9/BidiReferenceJava/. More involved testing will be done prior to the release of version 3.0.

The PGBA currently only handles implicit reordering of Unicode text and does not yet handle the explicit bidi codes such as LRE, RLE, LRO, RLO, PDF.

Some things about the PGBA:

PGBA API Documentation.


Tuned Boyer-Moore implementation for UTF-8 text

This implementation of a Tuned Boyer-Moore routine for UTF-8 text capable of case-insensitive matching was written in response to a complaint that one wasn't readily available that developers could use as a reference or simply incorporate in their projects. Although this implementation does depend on the underlying UCData library, it can easily be retrofitted to use some other character property lookup library. It also happens to be reasonably fast, but could probably be improved speedwise.

I will leave it to other developers to make improvements and share them, just as the basic routine has been shared with them.

No documentation of the API is available yet, but there is an example at the end of the utf8bm.c file.