Tools for acquiring Burmese text


Burmese for the SAY Project

The current proposal is to use the RFA (Radio Free Asia) encoding of Burmese. It is a simple variant that almost conforms to Unicode 5.0 and won't be difficult to convert to the next generation of Unicode Burmese when it is finalized.

It is currently expected that Burmese text will arrive in the custom Pa Oh visual Burmese encoding and will be converted locally to other encodings for display and processing. Tools used internally to convert between encodings will be available in the next section.

Converting from Pa Oh to RFA encoding for display

The paoh2utf8 tool will convert this custom visual encoding (used with a font called Pa Oh) to the RFA encoding. It is available as:

Source:http://crl.nmsu.edu/say/tools/paoh2utf8.c
Windows Binary:Not Yet Available

Burmese font for RFA

To support RFA encoded text, the font available at the RFA Burmese site will need to be installed.

Unicode Burmese


Burmese Encodings

There are plenty of Burmese encodings in use on the Internet, almost all of them are visual encodings based on specific fonts. For various reasons, Unicode Burmese is not supported very well at the moment. Additions to the Unicode Burmese block and new information about how to code for specific character shapes is in the process of being added to Unicode (as of January 2007).

Two of the more popular non-Unicode Burmese encodings are WinMyanmar (formerly known as Winnwa) and Academy.

Converting Burmese to Unicode

Converting WinMyanmar and Academy to Unicode can be done using a SIL (Summer Institute of Linguistics) tool called TECKit. To use the TECKit, you first have to download the source or executables from:

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=TECkitDownloads

Then you need to get the appropriate Burmese mapping tables from:

http://crl.nmsu.edu/say/tools/myanmarteckit.zip.

The original source for the TECKit Burmese mapping tables is at:

http://www.thanlwinsoft.org/ThanLwinSoft/Downloads/Converters/MyanmarTECkit20050522.tar.gz

Once you have the TECKit executables, the docs/ subdirectory has an MS Word document called TECKit_version_2.1.doc (or something similar). This will have instructions on how to run the executable files.

Typing Burmese text on Windows 2000/XP

To support input of the next generation of Unicode Burmese, the following Keyman keyboard is suggested.

http://www.thanlwinsoft.org/ThanLwinSoft/Downloads/Keyboards/myWinE.exe

Burmese Fonts

There is an OpenType font available to display the next generation of Unicode Burmese, and it is available as:

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=Padauk

The important thing about this font is that it renders Burmese quite well, as well as some other languages that use the Burmese script.

This font depends on the SIL Graphite system to take advantage of the font's capabilities. Graphite support is being added to many different applications.