MIKROKOSMOS



Definition of a TMR

ABSTRACT:


Introduction

Motivation
The TMR is critical to a knowledge-based machine translation system, where the TME (the result of analysis of the input text) serves as input for generation of text in one or more natural languages.

Defining a TMR

What is a TMR?

A text meaning representation (TMR) is a language-neutral description of the linguistic information conveyed in a natural language text, and is derived by semantic and syntactic analysis of the text. The TMR captures not only the meaning of individual elements in the text, but also the relations between those elements. The TMR not only provides information about the lexical-semantic dependencies in the text, but also represents stylistic factors, discourse relations, and other pragmatic factors present in the discourse structure of a text.

Connection to Lexicon and Ontology


Components of a TMR

TMR Structure

The TMR is divided into seven sections. These seven sections combine to convey the overall meaning of the original text.

The first section of a TMR is a "table of contents" which, in practice, is the last section to be filled in. The table of contents provides a summary of the predicates, relations, and stylistic factors found in the text. This section is followed by a "statement" section where the scope of the text, the speaker/sriter, hearer/reader, time of the speaking/writing, etc. are given.

Next comes the "TMR body", where sentences in a natural language text are represented in an interlingua, a language-neutral format. The text is translated, generally clause by clause, from the original natural language into the interlingua. A clause typically equates to an interlingual head, which can be an event, a property, an attribute, or an object concept. Most heads have agents (sunjects) and themes (objects), although neither is required; all heads must have time (suffixed by a one-up number which provides a mechanism for later relating the heads temporally), aspect, and polarity (an indication of whether the clause is in the affirmative or negative). Information about the head is given in a slot-filler format. Fillers are suffixed by an instance number, so that in a given text each occurrence of a concept has a unique number. The head below represents the clause "Ajinomoto decided to underwrite...":


%decide_1
    agent       %company_1        ;Ajinomoto
    theme       %underwrite_1
    time        %time_1
    aspect      %aspect_1
    polarity    positive

Heads can have other slots (e.g. COTHEME, ACCOMPANIER, BENEFICIARY, PURPOSE, MANNER, ATTITUDE, LOCATION, FOCUS, etc.), as needed to convey the meaning of the original text.

Once the TMR body is complete (i.e. the meaning of the text has been conveyed in interlingua), the "attitude" section of the TMR is filled out. Although attitudes are fillers under the heads, they are not broken out until after the TMR body (unless the attitude itself is a head). An example of an attitude is given in Section 3.2.2. below.

Next, the "temporal relations" section documents of a temporal nature between clauses. this is followed by the domain relations section, where relations are made between syntactic elements. (See examples of both types of relations in Section 3.2.3. below).

The final section of the TMR is the "coreference section". Here separate references in the TMR body to the same object or event are matched. For example, if %company_1 and %company_3 in the TME refer to the same company, they are coreferenced.

TMR Notation

Notations have been devised to facilitate representations in the TMR of concepts from the natural language, and of the relations between these concepts. Some of these notations are simple symbols, such as:

%
instantiated ontological concept (%company)
$
named instance ($"Ajinomoto Dannon",$Japan)
&
symbolic constant (&red, &blue)
*
concept in the ontology (*company)
*x*
special variable (*author*, *unknown*)
~
"approximately", used with numbers (~200 machines)
Other notations involve more complex equations. Attitudes, for example, are represented with several slots and fillers. An attitude of potential, such as in "A stadium can be constructed" would be represented as:

%attitude_1

type potential

attributed-to *author*

scope %construct_4

time %time_10

value 1.0

Relations require that a connection be made between two textual elements. For example, the following is a TMR representation for the temporal relation "The text was written (%time_0) after Fujitsu announced its tie-up with Telecom Australia (%time_1)":


%temporal-relation_1

type after

arg_1 %time_0

arg_2 %time_1

Domain relations relate the content of textual elements. For example, the following is a representation of the relation triggered by "also" in "This month Tokyo Kaijo Kasai Hoken has joined with Daiwa Shoken (%create_1)... Also, both Nisshin Kasai Kaijo Hoken and Dowa Kasai Kaijo Hoken have tied up with Yamaichi Shoken (%create_2)...":


%domain-relation_1

type addition

arg_1 %create_2

arg_2 %create_1

A third type of relations, quantifire relation, reflect relations between quantities. The notations used in quantifier relations are:

=
equality
<
less than
=<
less than or equalto
>
greater than
>=
greater than or equal to
integer-integer
range
mult
multiply
sum
sum
For example, "35 percent of 500-600 million yen" would be:

%quantifier-relation_1


type  mult
   arg_1  0.35
   arg_2  %amount_1

%amount_1

unit JPY ;; Japanese yen quantity 500,000,000-600,000,000

Conventions also have been developed for representing such things as time, rates, and sets. The notations for time are (YY=year, MM=month, DD=day):

YYMMDD
at/on YYMMDD

>=YYMMDD
on or after YYMMDD

=<YYMMDD
on or before YYMMDD

>YYMMDD
after YYMMDD

<YYMMDD
before YYMMDD

Time can have AT, START, END, DURATION, and UNIT slots. For example, an event that started after 8 January 1992 and lasted for 5 years would be shown as:


%time_2
    start  >    920108
       duration  5
       unit  *   year

Rate is represented by UNIT, INTERVAL and QUANTITY. For example, a rate of 100,000 tons per year would be shown as follows:


%rate_1
  unit      *ton
  interval  *year
  quantity  100,000

A set is used to accomdate multiple fillers for a slot. For example, "Nihon Gosei Gomu (%company_1) and Kurare Isoprene Chemicals (%company_2) announced" would be represented as:


%announce_1
    agent  %set_1

    ...

%set_1
    cardinality   2
    members
       %company_1
       %company_2

Sets are also used to convey a variety of constructions, one of which is a listing of objects whose member type is known, but whose individual members are not all known. "Chilled foods, such as raw noodles" would be put in a set as follows:


%set_2
  member-type    %food_2
  cardinality    >  1
  member
      %noodle_2

TMR Experience and Development Methodology

References

Carlson, Lynn and Sergei Nirenburg. 1992. Practical World Modeling for NLP Applications. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, 235-236.

Gooodman, Kenneth and Sergei Nirenburg (eds.). 1991. The KBMT Project: A Case Study in Knowledge-Based Machine Translation. Los Altos, CA: Morgan Kaufmann.

Meyer, Ingrid, Boyan Onyshkevych, and Lynn Carlson. 1990. Lexicographic Principles and Design for Knowledge-Based Machine Translation. Technical Report CMU-CMT-90-118. Center for Machine Translation. Carnegie Mellon University.

Nirenburg, Sergei and C. Defrise. 1993. Lexical and Conceptyal Structure for Knowledge-Based Machine Translation. In J. Pustejovsky (ed.), Semantics and the Lexicon. Dordrecht: Kluwer.

Nirenburg, Sergei and Lori Levin. 1991. Syntax-Driven and Ontology-Driven Lexical Semantics. In Lexical Semantics and Knowledge Representation: Proceedings of a Workshop Sponsored by the Special Interest Group on the Lexicon of the Association for Computational Linguistics. 9-19.

Nirenburg, Sergei, Jaime Carbonell, Masaru Tomita and Kenneth Goodman. 1992. Machine Translation: A Knowledge-Based Approach. Los Altos, CA: Morgan Kaufmann.

Onyshkevych, Boyan and Sergei Nirenburg. 1991. Lexicon, Ontology and Text Meaning. In Lexical Semantics and Knowledge Representation: Proceedings of a Workshop Sponsored by the Special Interest Group on the Lexicon of the Association for Computational Linguistics, 238-249.