SIXTH WORKSHOP ON INTERLINGUAS
to be held
Tuesday, September 23, 2003
in conjunction with
MT SUMMIT IX
New Orleans, Louisiana, USA


CALL FOR PARTICIPATION/PAPERS

Background

At the Fifth Workshop, held in October 2002, the focus was on inter-coder reliability in coding thematic roles. Participants were provided with a dependency structure for each of 11 sentences. Each word was then to be assigned a thematic role from a list of thematic roles previously provided and defined by the workshop organizers.

As a result of that workshop, a multi-site proposal was prepared for the development of a large-scale, multi-lingual, IL-tagged resource. The tagging of predicate-argument structure and thematic roles was suggested as the initial task.

This workshop is intended to be a first step towards that resource. Before a set of thematic roles can be selected and guidelines for coding developed, there needs to be general agreement at a more basic level of determining the events, objects, and states that are represented in the text. Only if there is general agreement among coders and across languages will it be possible to begin looking at what roles these events have and how objects or other events fill those roles.

Individuals and groups are invited to participate in this workshop.

The Task

Although participants will write a short paper for the workshop, the primary aim of the workshop is to see if there is general agreement on identifying and categorizing events and objects in a multi-lingual text.

The multi-lingual corpus to be annotated consists of an article from the UNESCO Courier and its translation from English into fifteen other languages: English, French, Spanish, Chinese, Arabic, German, Russian, Persian, Italian, Catalan, Vietnamese, Malay, Greek, Bulgarian, Tamil, and Portuguese.

Participants will select one non-English version and will be responsible for providing an independent translation of that text into English. Participants may choose a language other than those listed above, but, if so, they will have to provide a translation of the English text into that language. The English translation (and the additional non-English translation) should be made available to other workshop participants.

The task is to mark up this non-English version and the two English versions of the text. Texts will be marked up to identify objects, events, and states. Objects are the concrete (or abstract) entities that the text is about. Events are the actual activities or actions in which these objects take part. States of affairs describe characteristics of objects. Together, objects, events, and states form the story told by the text.

Portions of the text will refer to these events, states, and objects. It is the coders task to identified these sections. Not every lexical item in the text will be marked up. (In particular, relations and manners will also not be marked. Coders will be responsible both for partitioning and marking up their texts.

Each identified object and event will receive a concept-type label and these labels should be coindexed if the same event or object is referred to in more than one location in the text (see example below). In addition, if time and interest permit, participants may indicate which other objects or events participate in events. If there are unspecified participants, these may also be indicated by an appropriate role label.

Finally, after the text has been marked up, the participants will gather the events, objects, and states into three lists for comparison purposes.

Participants will provide a (joint) written report for the workshop on the process and results of their markup/tagging. These reports will presented during the morning session of the workshop. The afternoon will be devoted to a general discussion of the task and examination of inter-participant reliability, cross-linguistic variation, and variation across multiple English versions of the same text.

Important Dates:
23 May, 2003 -- Initial notification to participate
23 June, 2003 -- Final notification of intent to participate in the workshop and provision of English translation and target language translation (if needed)
23 July, 2003 -- Submission date for event/object/state coding
23 August, 2003 -- Submission of site report

As an example, here are the first two paragraphs of and English text.

Hoarding Caused by Earthquake Predictions in Chile

The population of the Chilean port city of Antofagasta increased sharply its purchases of provisions and essential articles, alarmed by announcements regarding an earthquake accompanied by tidal wave which may affect northern Chile and the south of neighboring Peru at any moment, admitted the government Office of Emergencies (ONEMI).

Here is how we envision the markup to look:

Hoarding / HOARDING1-EVENT (agent / theme)
Caused by / CAUSE1-EVENT (PREDICTING1 / HOARDING1)
Earthquake / EARTHQUAKE1-EVENT (experiencer)
Predictions / PREDICTING1-EVENT (agent / EARTHQUAKE1)
in
Chile / CHILE1-OBJECT
The population / POPULATION1-OBJECT
of
the Chilean port city of Antofagasta / ANTOFAGASTA1-OBJECT
increased sharply / INCREASING1-EVENT (POPULATION1 / PURCHASING1)
its / ANTOFAGASTA1-OBJECT
purchases / PURCHASING1-EVENT (POPULATION1 / PROVISIONS1 & ARTICLES1)
of
provisions / PROVISIONS1-OBJECT
and
essential articles, / ARTICLES1-OBJECT
alarmed / ALARMING1-EVENT (ANNOUNCING1 / POPULATION1)
by
announcements / ANNOUNCING1-EVENT (agent / EARTHQUAKE1 & TIDALWAVE1)
regarding
an earthquake / EARTHQUAKE1-EVENT (experiencer)
accompanied / ACCOMPANY1-STATE (EARTHQUAKE1 / TIDALWAVE1)
by
tidal wave / TIDALWAVE1-OBJECT
which
may affect / AFFECTING1-EVENT (EARTHQUAKE1 & TIDALWAVE1 / CHILE1 & PERU1)
northern Chile / CHILE1-OBJECT
and
the south of neighboring Peru / PERU1-OBJECT
at any moment, / MOMENT1-OBJECT
admitted / ADMITTING1-EVENT (OFFICE_OF_EMERGENCIES1 / INCREASING1)
the government Office of Emergencies / OFFICE_OF_EMERGENCIES1-OBJECT (ONEMI). / OFFICE_OF_EMERGENCIES1-OBJECT

Objects: {CHILE1, POPULATION1, ANTOFAGASTA1, PROVISIONS1, ARTICLES1, TIDALWAVE1, PERU1, MOMENT1, OFFICE_OF_EMERGENCIES1}

Events: {HOARDING1, CAUSE1, EARTHQUAKE1, PREDICTING1, INCREASING1, PURCHASING1, ALARMING1, ANNOUCING1, AFFECTING1, ADMITTING1}

States: {ACCOMPANY1}

Contact:

Steve Helmreich
Computing Research Laboratory
New Mexico State University
Box 30001/3CRL
Las Cruces, New Mexico
USA
Tel: 505 646 2141
Fax: 505 646 6218
e-mail: shelmrei@crl.nmsu.edu

Comments/questions may be mailed to Steve Helmreich at: shelmrei@crl.nmsu.edu



To the SIG-IL Workshop Home Page
Last Updated: May 6, 2003

Copyright 2003 Computing Research Lab.