Tabula Rasa
Turning Text into Data
Tabula Rasa is an attempt to reduce two of
the major bottlenecks of information extraction; defining text extraction
tasks and developing tools to aid in producing structured data or templates.
Tabula Rasa is a `meta-tool' that analysts can use to build tools
that help with template filling tasks. One Tabula Rasa component,
tredit, enables designers of information extraction tasks, like
those used in ARPA's Message Understanding Conferences (MUC), to create
and edit template definitions. Another Tabula Rasa component, the
runtime tool-builder, uses these definitions to automatically generate
Graphical User Interface (GUI) tools that analysts can use to create filled
templates.
Current automatic information extraction
systems are usually inaccurate and have long development times. Even when
the accuracy of the technology is adequate there is still a need for completed
keys (filled templates) for training automatic systems and to allow system
performance to be objectively tested. To produce these keys a human analyst
must first carry out the template filling task. Tabula Rasa facilitates
the production of keys for new domains by helping anal
To get more information about Tabula Rasa you can read the on-line
Users
Manual. You can also download
a Solaris or SunOs version of Tabula Rasa .
New Mexico State University. All rights reserved.