Next:
About this document
Copyright Notice: The following set of slides are copyrighted by
Kavi Mahesh, the Computing Research Lab at New Mexico State University,
and its affiliated members of the Mikrokosmos project. No part of this
may be used for commercial purposes (including consultance services).
Any use of these slides must acknowledge the the authors and CRL, NMSU.
If you use the material in these slides in your own writings or
presentations, we would appreciate if you can send us a copy of such
articles.
Thank you!
Kavi Mahesh,
CRL, NMSU
in
A Situated Ontology for Practical NLP
Kavi Mahesh
What is an Ontology?
The branch of metaphysics that studies the nature of existence.
No........
An ontology is a database with information about
-
what categories (or concepts) exist in the world/domain,
-
what their properties are,
-
and how they relate to each other.
The Mikrokosmos View of an Ontology
-
Unique? No; ``Natural''? No; Provably correct? No.
In practical NLP, if you are able to prove theorems, you are doing the
wrong thing! (Sergei Nirenburg).
-
A sharable computational resource.
-
An empirically constructed artifact.
-
Created in situation, not discovered.
-
Situated in the framework of multilingual NLP.
-
Domain: Mergers and Acquisitions of Companies.
Ontology Development: Desiderata
-
Utility: What you want is what we acquire; not what we
acquire is what you get!
-
Limited proliferation of concepts.
# concepts < # words.
-
Situated development. Increase the ratio:
-
Language independence. Ideal situation: develop 2 lexica + 1 ontology
concurrently.
-
Comprehensibility and simplicity. E.g., no disjunctive inheritance.
-
Separation of episodic knowledge: separate onomasticon.
-
Well-formed and internally consistent.
-
Compatible with the lexicon.
Terminology
-
Concept, frame, and node.
-
Instance.
-
Slot, facet, and filler.
-
Parent, child, ancestor, descendant, sibling, and orphan.
-
Free-standing entity: Object and event.
-
Property, relation, and attribute.
-
Inheritance and subsumption.
-
Value, default, and constraint (or sem).
-
Inverse.
-
Domain and range.
-
Literal.
Types of Ontologies
-
Formal, philosophical, and computational ontologies.
-
Narrow and broad coverage ontologies. E.g., for holes on surfaces or
for the naive physics of liquids.
-
Episodic and encyclopedic ones. E.g., CyC.
-
Word sense taxonomies (sometimes with nearly atomic ``concepts'').
E.g., PENMAN, SENSUS, and WORDNET.
-
Maximally decompositional ontologies. E.g., Schank's Conceptual Dependencies.
-
Implicit and undefined ontologies.
Why do we need an ontology?
What has an ontology got to do with NLP?
-
Source of selectional preferences for resolving ambiguities.
E.g., ``El grupo Roche adquirió Docteur Andreu.''
Did Roche acquire or learn Docteur Andreu?
-
Inference from content: resolve ambiguities, fill gaps, etc.
-
Inference from topology of graph (Onto-Search): metonymy, metaphor,
and compound noun processing.
Why do we need an ontology?
What has an ontology got to do with Lexical Semantics?
-
Provides substrate for grounding word meanings.
-
Helps partition multilingual MT work.
-
Makes lexical representation parsimonious.
The ``dejar'' example: 54 entries in Collins
7 in our
lexicon (Nirenburg, Raskin, & Onyshkevych, 1995).
-
Vague and incomplete lexical representations. E.g., Agent of event.
-
Variable-depth semantics. E.g., ``Marketing'' = market.
-
Reduce manual effort: suggest concepts for derived words.
Topology
-
Directed graph (with a few cycles).
-
Only two patterns in subgraphs: Relation and Attribute.
-
Multiple inheritance: ``plex'' structure or tangled hierarchy with
numerous cross-links.
Acquisition Methodology
-
Situated development: immediate interaction and feedback.
-
Driven by not dictated by lexicon acquisition.
-
``Request - negotiation - response'' cycle.
-
Pseudo algorithm for acquisition: Given a word sense,
-
Is there a concept already?
-
Does it need a new concept?
-
To add a concept: Discriminate from the root.
-
Decide on a name.
-
Add a definition string.
-
Add slots, facets, fillers, inverses, property definitions.
-
Various guidelines.
Technology
Tools for acquisition:
-
Browsing and
-
Graphical editing: Mikrokarat toolkit (Ralf Brown, CMU).
-
Need good searching tools.
Tools for lexicon interactions:
-
Onto-Request interface.
-
Lex-Onto discrepancy report generator.
Tools for maintenance and quality control:
-
Translators between Framepac (C++ objects) and Framekit (Lisp-like
ASCII) formats.
-
Several consistency checking programs.
-
Programs to verify conformance with guidelines.
Guidelines: What Not to Add
-
Do not add instances.
-
Do not decompose unnecessarily.
-
Do not add if there is already one ``close'' to it. E.g., ``suggest''
= ``urge.''
-
Do not add collections; use the set notation.
-
Do not add language-specific stuff.
-
Do not add specialized events with particular arguments.
Guidelines: Naming a Concept
-
Use ``scientific'' rather than lay terms.
-
Use the English words.
-
Use only alphabetic characters and `-'.
-
Do not use plurals in concept names.
-
Consistency across concepts is more important than conformance
with a dictionary. E.g., for-profit and non-profit; not for-profit and nonprofit.
-
Do not use names longer than three words.
-
Avoid compound nouns. E.g., do not use time-unit; use unit-of-time instead.
-
Use shorter names for more common word senses. E.g., bank and
bank-river.
-
For relation names, append typical prepositions. E.g., employed-by and employer-of.
-
Use a word in only one sense. E.g., If grocery-store, then no
store-medicine; use preserve-medicine instead; but then no
peach-preserve.
Pathological Problems
-
Complex events and ontological instances.
-
Properties are not free-standing; cannot be fillers.
-
Multiple inheritance is conjunctive and ambiguous.
-
All relations are binary.
-
All scoping is over propositions.
-
Need to block inheritance: use of *nothing*.
-
Inherent dualities between states and properties, between events and
objects, and between objects and properties.
MIKROKOSMOS: The Task
Automatic Natural-Language Interpretation
-
Input: Real text
-
Languages: Spanish (and also Russian, Japanese, and Arabic).
-
Type: News articles.
-
Domain: Mergers and acquisitions of companies.
-
Output: a comprehensive Text Meaning Representation ( TMR).
Applications:
-
Multilingual Machine Translation
-
Information Extraction
-
Information Retrieval
-
Question Answering
MIKROKOSMOS: Research
-
Representation: TMR, lexicon, ontology
-
Knowledge Acquisition
-
Robust processing: failure recovery
-
Develop mikro theories
-
Integrate various mikro theories
-
Ultility: Show that TMRs are useful for translation, IE, and IR.
Mikrokosmos: Engineering
-
Developing tools:
-
Browsers
-
Editors
-
Corpus search tools
-
Dictionary look-up tools
-
Interfaces for inter-group communication
-
Output presentation
-
Large-scale knowledge acquisition
-
Programming
Sample Input
Roche Compra Docteur Andreu
El grupo Roche, a través de su compañía en España, adquirió el
laboratorio farmacéutico Doctor Andreu, se informó hoy aquí.
La comunicación oficial no precisó el monto de la operación
realizada entre Productos Roche SA y Unión Explosivos Río Tinto
SA, hasta ahora mayoritaria en el accionariado.
Fuentes financieras consultadas cifraron la operación en unos
10.000 millones de pesetas. Según el acuerdo firmado hoy en
Madrid, los productos del Doctor Andreu continuarán siendo
producidos y comercializados con el mismo nombre. Doctor Andreu,
cuya fama la obtuvo a partir de las "pastillitas" para la tos,
está bien introducido en las áreas de cardiología, reumatología y
especialidades publicitarias.
Las actividades del grupo Roche, con sede central en Basilea
(Suiza), incluyen el desarrollo, la producción y la
comercialización de medicamentos, productos para el diagnóstico,
así como de vitaminas y productos químicos.
A nivel mundial, cuenta con compañías en más de 50 países con
casi 50.000 empleados. Doctor Andreu es una compañía farmacéutica
dedicada a la producción y comercialización de fármacos y
productos veterinarios. Con sede en Barcelona, cuenta con más de
400 empleados.
En el ejercicio pasado facturó unos 3.490 millones de pesetas.
En 1988, el Grupo Roche alcanzó unas ventas totales de 8.690
millones de francos suizos, de las que aproximadamente un 41 por
ciento correspondieron a su división farmacéutica. El beneficio
neto -el mejor de su historia- se elevó a 641,5 millones de
francos suizos y la rentabilidad sobre las ventas aumentó del 6,3
al 7,4 por ciento.
El "cash flow" se incrementó en un 21 por ciento, alcanzando
1.179 millones de francos o el 14 por ciento de las ventas del
grupo.
Las inversiones en investigación y desarrollo (I+D) fueron de
1.210 millones de francos suizos, el 14 por ciento del total de
sus ventas.
Productos Roche cuenta con una plantilla de 600 personas y
alcanzó unas ventas totales de 9.747 millones de pesetas, un 12,5
por ciento superiores al año 1987.
Sus beneficios fueron de 218 millones y el "cash flow" de 356
millones. Las inversiones realizadas totalizaron 223 millones de
pesetas.
Sample Output: TMR
Contents of a TMR
-
Speech act
-
Propositions
-
Stylistic factors
-
Attitudes
-
Focus
-
Temporal relations
-
Quantitative relations
-
Coreference relations
-
Domain relations
-
Textual relations
Each proposition has:
-
A head
-
Time
-
Aspect
-
Attitude
-
Modality
-
Relations
Knowledge Acquisition for Large-Scale NLP
Lexicon acquisition:
-
For each language
-
Source: Native experts
-
Technology: Interactive tools
-
Automation: generate more entries from manually acquired ones.
-
Lexical rule engine
-
Derivational morphology
Knowledge Acquisition for Large-Scale NLP
What is an Ontology?
-
A symbol system for representing meaning
-
Classification of all things in the domain
-
Relationships between symbols
-
Richly connected network of such symbols (called concepts).
Why do we need it?
-
Meaning representation grounded in a well-defined and structured symbol
system.
-
Provides constraints to help resolve ambiguities.
-
Makes the lexicon less verbose.
-
Allows the lexicon to specify incomplete information.
Knowledge Acquisition for Large-Scale NLP
How do we acquire an ontology?
-
Language independent
-
Sources:
-
Previous ontologies
-
Needs of lexicographers
-
Domain experts
-
Texts and corpora
-
Intuition
-
Technology: Interactive graphical browsers and editors
-
Automation: Consistency checking and quality improvement.
in
An Ontology for Multilingual NLP
The Mikrokosmos Ontology
-
Language independent
-
Situated in multilingual NLP
-
A computational knowledge source: world knowledge
-
Purpose: To produce Text Meaning
Representations (TMRs) from texts and generate texts from TMRs.
-
Provides substrate for representing meaning in lexica and in TMRs.
Ontology and Lexical Semantics
-
Ontology: A tangled subsumption hierarchy of concepts with rich
internal structure.
-
Concept: A primitive symbol for meaning representation, its
attributes, and relationships with other concepts.
-
Lexical Semantics:
-
Map word senses to concepts in the ontology.
-
Combine, constrain, or relax mappings to concepts.
-
Add properties to concepts: e.g., time, aspect, ...
-
Lexical semantics using ontologies:
-
No uninterpreted symbols.
-
Parsimonious lexical representations.
-
Allow incomplete lexical representations.
-
Variable depth semantics.
Example: Swim and Float
English: Swim
SWIM; Float
FLOAT.
in
Spanish: Nadar
SWIM; Flotar
FLOAT.
in
Russian: Plyt'
SWIM or FLOAT; Plavat'
SWIM
or FLOAT.
in
What concepts do we need to represent these verb meanings?
Example: Shovel and Clear
English:
Shovel
SHIFT-MATERIAL;
Clear
SHIFT-MATERIAL.
in
Spanish:
Limpiar (Clear)
SHIFT-MATERIAL;
``Quitar con pala'' (Shovel):
SHIFT-MATERIAL.
Ontology Acquisition
-
Situated development: immediate interaction and feedback.
-
Driven but not dictated by lexicon acquisition.
-
``Request - negotiation - response'' cycle.
-
Various guidelines.
What concepts do we need:
-
Multilingual data.
-
Limited proliferation of concepts.
-
Concept for each word: e.g., SWIM and FLOAT
-
One concept for many words: e.g., SHIFT-MATERIAL
-
Identify language independent dimensions of meaning variation
-
Populate the ontology in the ``microcosm'': e.g., add DIVE and SINK.
Conceptual Dimensions: Swim, Float, ...
Why are we building an ontology?
-
For practical natural language processing
-
For Mikrokosmos: Multi-lingual, knowledge-based machine translation.
-
For interlingual meaning representation:
grounded in ontology.
-
For ambiguity resolution: representing and checking selectional
constraints.
What do we mean by an ontology?
An ontology for NLP
purposes is a knowledge base that:
-
is a taxonomic organization of concepts for meaning representation
-
interconnected richly through relations among the concepts.
Nature of the ontology:
-
Unique? No; ``Natural''? No; Provably correct? No.
-
An empirically constructed artifact.
-
Created in situation, not discovered.
-
A sharable computational resource.
-
Situated in the framework of multilingual NLP.
What have we built so far?
The Mikrokosmos ontology:
-
4200 concepts in a taxonomy.
-
Spanish lexicon (about 7000 words) linked to the ontology.
-
Depth > 10; average fan-out < 5.
-
Acquisition rate: Up to 40 concepts per person-day.
-
Connectivity per concept: 14!
-
Accessibility: Through C++ objects, Lisp expressions, plain text, keyword
search, and graphical browsing.
What do we care about our ontology?
The 5 commitments:
- 1.
-
Broad coverage
Why: Input texts are real-world, unedited, and unrestricted.
- 2.
-
Rich properties and interconnections
Why: need to check how well constraints are satisfied.
- 3.
-
Ease of understanding, searching and browsing
Why: Non-expert lexicographers need to find
concepts given only a rough word "sense."
- 4.
-
NLP-oriented: developed for machine translation
Why: Purpose is (literal) meaning representation.
- 5.
-
Economy/cost-effectiveness/tractability
Why: we don't have several person-centuries; we only
have several person months.
How to conform to the commitments?
-
Situated development:
-
Makes acquisition tractable.
-
Increases connectivity: lexicographers suggest more properties.
-
Enhances ease of understanding and searching.
-
Increase the ratio:
-
Guidelines for:
-
What to add as a concept; what not to add.
-
Where to place a concept;
-
What to name a concept;
-
How to write a definition string; and
-
How to rearrange a part of the ontology.
-
Possible actions for a lexicon request;
Ontology for MT
Encyclopedia
Types of knowledge we don't need:
-
Complex events
-
Unnecessary distinctions: Walk is WALK; Run is
RUN.
-
Typical, episodic, and procedural knowledge.
-
Scientific knowledge: Newton's second law.
Kavi Mahesh
Sun Nov 12 15:14:34 MST 1995