LIP6 1998/048: THÈSE de DOCTORAT de l'UNIVERSITÉ PARIS 6
LIP6 /
LIP6 research
reports
286 pages - Octobre/October 1998 -
French document.
PostScript : 3120 Ko /Kb
Contact : par mail / e-mail
Thème/Team: Apprentissage et Acquisition de Connaissances
Titre français : Construction d'ontologies à partir de textes techniques - application aux systèmes documentaires
Titre anglais : Ontology construction fron technical texts - application to documentary systems
Abstract : Our thesis deals with the problem of domain ontology acquisition from technical texts. We define the "annotated regional ontology": it consists of a conceptual network describing a particular domain. In this network, concepts are connected to linguistic expressions and to the corpus from which they were built. We propose a methodology and tools for the construction of regional ontologies from technical documentation. Our proposal is based on principles from the differential semantics theory of F. Rastier.
Our methodology, called "Interactive Conceptual Analysis" (ICA) puts the technical documentation in the core of the knowledge acquisition process, and it uses text analysis tools. The ICA takes place in two stages: a preliminary elicitation stage, called "macroscopic analysis" and an iterative refinement stage, called "microscopic analysis". The ICA takes efficiently into account the human factor, represented by the expert / knowledge engineer team. Our methodology is fully corpus-based, it doesn't need any external conceptual resource.
We developed support tools for the ICA: (1) lexiclass performs an automatic clustering of linguistic expressions in function of syntactic relations they hold in the text; (2) The tools of "conceptual structures generation" which use both the results of the preliminary morpho-syntactic analysis and the current version of the ontology to propose new candidate conceptual structures to be added to the ontology.
Our thesis took place at the Research and Development Division of Electricité de France, within a project dealing with "Technical Documentation Consultation Systems" (TDCS). A TDCS is presented as a hypertext allowing a context-based access to the technical documentation dealing with a given domain, via two structured indexes, one representing the domain and the other the tasks. A preliminary knowledge engineering process is needed to build conceptual models before the indexes. Our methodology and tools have been used within a project of TDCS building in the domain of electrical network planning.
Key-words : knowledge engineering, natural language processing, knowledge representation, hypertext, semantics
Publications internes LIP6 1998 / LIP6 research reports 1998