(1) Speech Acts
Although there are various types of speech acts (questions, requests,
statements, etc.), the statement speech act is the type seen most
often in the joint venture corpus. We use "statement_1" as the
default speech act covering the whole article. This represents the
author's narration of the facts. Additional speech acts, such as
quotations, may also be present in an article. Note, however, that we
have consciously chosen not to characterize the usual "announce" events in a newpaper articles as a speech act.
(2) Proposition
The proposition is the fundamental organizing element in the body of the TMR.
The information contained in the proposition is given in a slot-filler
format. The slots HEAD, TIME, ASPECT, and POLARITY are required, but
those of ATTITUDE, MODALITY, and RELATION are added as necessary.
The filler of a slot is an ontological concept name suffixed by an
instance number, so that in any given text each occurrence of a
concept has a unique number. The proposition-level representation for
the clause "Ajinomoto decided to underwrite..." would be:
%proposition_1 head %decide_1 time %time_1 aspect %aspect_1 polarity &positive %decide_1 agent %company_1 theme %underwrite_1 %company_1 name $Ajinomoto
When the head is an object, a state is implied . Thus, the sentence "There is a big, black, wooden house." can be represented as:
%proposition_1
head %house_1
time %time_1
aspect %aspect_1
polarity positive
%house_1
size .8
color &black
material %%wood
(4) Stylistic Factors
The reporting style of the joint ventures texts typically is neutral, so
stylistic factors such as formality, respect, etc. are set at a default of
0.5 (neither formal nor informal, neither high nor low level of respect,
etc.). Values can range from 0 to 1.
Symbols
The following general symbols are used in TMRs (see also
Time and
Quantifier Relations):
(6) Attitudes
Attitudes are used to reflect the way elements in the text are
conveyed by an intelligent agent (typically the speaker/writer of the
text). Definitions and examples of the attitudes are given in
"Attitude - Definitions and Examples".
Attitudes that are not heads are broken out at the end of the TMR,
with a pointer included in the TMR body.
Attitudes have the following required slots: TYPE, ATTRIBUTED-TO, SCOPE, TIME, and VALUE. The TYPE slot is filled with either EVALUATIVE or SALIENCE. ATTRIBUTED-TO is filled with the agent or entity who possesses the attitude. SCOPE identifies the segments of the TMR (and corresponding text) covered by the attitude, and TIME is the time at which the attitude holds. VALUE is assigned on a scale of 0 to 1.0, with 1.0 being the positive end of the scale. For example, for the evaluative attitude "It is the best", the value would be 1.0, whereas "It is the worst" would have a value of 0. Between these values would fall "It is all right" (value somewhere between .5 and .8) and "It is not good" (value less than .3). The intermediate values are somewhat arbitrarily, set for the present, with ranges, including less than or greater than, allowing for approximate values to be assigned.
"GM has a high regard for Toyota business practices." would be represented as follows:
%attitude_1 type evaluative attributed-to %company_1 ;GM scope %practice_1 ;Toyota business practices time %time_1 value >0.8
(7) Modalities
Modalities are represented in the same way as attitudes. They have
the same slots, with the TYPE slot containing one of the following
values: EPISTEMIC, DEONTIC, VOLITIVE, or POTENTIAL. With the
exception of the modality on the %statement_1 (which is broken out
within the statement section), modalities, like attitudes, are broken
out at the end of the TMR, with a pointer in the TMR body. (see
examples in JJV0002 %statement_1, %modality_1; %modality_5; %modality_6).
"A company announced..." would be represented as:
%modality_1 type epistemic attributed-to %company_1 scope %announce_1 time %time_1 value 1.0Attitudes and modalities can be combined to capture a particular meaning in a text. For example, in "There is also concern that ... licensing and know-how disputes will occur", an epistemic modality reflects the belief that the situation may occur, while an evaluative attitude captures the less than positive feeling about the event taking place.
%proposition_4
head %occur_1
time %time_4
aspect %aspect_4
polarity positive
modality %modality_4
attitude %attitude_1
%modality_4
type *epistemic
attributed-to *author*
scope %occur_1
time %time_16
value >0.5
%attitude_1
type *evaluative
attributed-to *author*
scope %occur_1
time %time_16
value <0.4
(8) Temporal Relations
Temporal relations are given at the end of the TMR, following the
attitudes section; there is no reference to temporal relations within
the TMR body. Temporal relations indicate the relative timing of one
event in the text in relation to another.
Temporal relations must have the following slots: TYPE, ARG_1, and ARG_2, where fillers for ARG_1 and ARG_2 are times (e.g. %time_1), and TYPE indicates the relation between the two times, filled by one of the values AT, AFTER, or DURING. For example, "the event whose time is %time_2 occurred after the event whose time is %time_3" would be represented as:
%temp-rel_1
type after
arg_1 %time_2
arg_2 %time_3
A temporal relation also may have a VALUE slot, to indicate the
relative distance between two times. If %time_2 occurred just after
%time_1, the temporal relation would look like this:
%temp-rel_2
type after
arg_1 %time_2
arg_2 %time_1
value > 0.2
(9) Focus
At present focus is used in three
ways:
(10) Coreferences
All instances of an object or an event are given a one-up number
(%company_1, %company_2, %company_3, etc.), and those instances that
refer to the same object or event are coreferenced at the end of the
TMR (e.g. if %company_1 and %company_3 both refer to Tokyo Bank, they
would be coreferenced at the end of the TMR). Coreferencing is
facilitated if margin notes are inserted wherever a coreference
occurs. (see example in JJV0375: %coreference_3).
(11) Domain Relations
Domain relations represent connections between events, states or
objects in the text. These connections can be quite general, scoping
over large portions of text, or more specific, and limited in scope
(e.g. linking consecutive heads). Definitions and examples of domain
relations are given in Appendix IV. Domain relations are listed at
the end of the TMR, following the temporal relations.
Domain relations have the following slots: TYPE, ARG_1, and ARG_2, where TYPE is filled with the appropriate domain relation from the "Definitions of Domain Relations" paper (Appendix IV), and ARG_1 and ARG_2 are filled with the TMR elements between which the relation exists. For example, "Companies A and B are going to create a tie-up (%create_1). In addition, companies C and D are going to create a tie-up (create_2)." would be represented as:
%domain-rel_1 type addition arg_1 %create_1 arg_2 %create_2A domain relation may also have a value slot. For example, a value slot on a comparison domain relation would indicate degree of similarity. "The (video-recorded) image is similar to the photograph." would be represented as:
%domain-rel_1 type comparison arg_1 %image_1 arg_2 %photograph_1 value 0.7
(12) Textual Relations
No textual relations were found in the first 10 JJV TMRs, so we have
not refined a method of representing these.
(13) Time
The time slot under each head is given a one-up number (e.g. %time_0,
%time _1, etc.). If more information is known, such as the actual
date, the TIME slot can be further broken out (see JJV0029:
statement_1, %time_0). It is also possible to represent a
period of time (see JJV0108: %issue_1, %time_3). The possible slots for
TIME are as follows: AT, START, END, DURATION, and UNIT. Possible fillers
for these slots would be:
(14) Aspect
While verb tense can give clues to ASPECT, tense will be handled
largely by the TIME slot. Aspect concerns only the nature of the
verb. Aspect can have any or all of the following slots: PHASE,
ITERATION, DURATION, TELICITY. Leave a slot blank if you are unable
to determine its proper filler. These are the slots with their possible
fillers:
phase end iteration single duration momentary telicity true
An accomplishment (build, design) would be:
phase end iteration duration prolonged telicity trueAspect for a process (manufacture, produce) would be represented as:
phase continue iteration multiple duration prolonged telicity false
(15) Polarity
Polarity is `positive' for a clause stated in the affirmative, and
'negative' for a clause stated in the negative.
Case Roles and Circumstantial Roles
Case roles
are the arguments or typical
roles that a predicate can take; circumstantial roles relate events to
more circumstantial pieces of information that describe them, such as
location, time, etc. Both case roles and circumstantial roles will be
defined as properties (i.e. as relations) in the ontology, and will
appear as properties of events in the TMR. Typically these roles
appear as slots under a TMR head. (see
Case Roles and Circumstantial Roles for Predicates).
(16) Agent - In Japanese, the agent frequently is not repeated across related sentences; the reader generally keeps track of the agent. In TMRs, the agent (or other filler) is migrated within a sentence, but not across sentences. For example, if there are multiple heads in a sentence, and the agent is the same for more than one head, the agent is instantiated for all the relevant heads, even if it is not specifically stated; if the agent (or other filler is not restated in a new sentence, it is represented as *unknown*.
(17) Cotheme
It was decided that theme/co-theme was a reasonable way of
representing verbs such as "designate", "name", "elect", "sell", etc.
For example, "Company 1 will sell structures to the Japanese market as sports and event facilities" would be represented as follows:
%sell_1
agent %company_1
theme %structure_1
co-theme %facility_1
recipient %market_1
time %time_8
aspect %aspect_8
polarity positive
Agent
In Japanese and Spanish, the agent frequently is not repeated across
related sentences; the reader generally keeps track of the agent. In
TMRs, the agent (or other filler) is migrated within a sentence, and
across sentences, where appropriate. For example, if there are
multiple heads in a sentence, and the agent is the same for more than
one head, the agent is instantiated for all the relevant heads, even
if it is not specifically stated. If the same agent (or other filler)
is left unstated in a subsequent sentence, it is represented by the
next instance of the same agent/filler, and is followed by a comment
";inferred". This instance would then be coreferenced with the
previous instance(s) of the same agent. If the agent is ambiguous,
the AGENT slot should be filled with "*unknown*".
%rate_1
unit *ton interval *year quantity 100,000
(19) Company - 'Company` is often a filler for case roles of an event in the joint venture domain (e.g. slots such as AGENT, ACCOMPANIER, BENEFICIARY, EXPERIENCER, SOURCE, and DESTINATION.)
A company can have both a NAME and an ALIAS slot. If a company name is mentioned, and then a shortened version is given within the same sentence, fill in both the NAME and ALIAS slots; if a shortened name appears in a different sentence, instantiate a new company, fill in the shortened form in the NAME slot, and coreference at the end of the TMR.
(20) Country - 'Country' is a valid filler for the slots listed under (17) Company, above, as well as LOCATION.
(21) Sets - Sets are used to represent a broad range of phenomena such as: definite and indefinite sets, partially enumerated sets, relations between sets and subsets, ordinals, superlatives, and existentials.
(22) Quantifier Relations - Quantifier relations are relations between numerical quantities. The operators for quantifier relations are as follows:
For example:
a. "Mitsubishi made the same number of TVs [%amount_1] this year as last year [%amount_2]"
%quant-rel_1
type =
arg_1 %amount_1
arg_2 %amount_2
b. "35 percent of 500-600 million yen [%amount_1]"
quant-rel_2
type mult
arg_1 0.35
arg_2 %amount_1
(23) Amount - Amounts are expressed in terms of UNIT and QUANTITY. For example, 14 billion Japanese yen would be represented as:
%money_1 quantity 14 billion unit JPY ;Japanese yen