< Terug naar vorige pagina

Publicatie

Constructing Ontological-underpinned Terminological Resources: a Categorisation Framework API

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

In this article we describe our reasons for preferring an application programming interface (API) over a relational- or XML-database, to construct terminological- and lexicographical resources. We will explain how our research, aimed at developing terminological- and lexicographical databases that could be used and supported by a broad range of specialised software tools, has led to this opinion. This research has spanned several projects for which we developed, different multilingual ontological-underpinned lexical resources, and specialised software tools to support these resource-development tasks.
Because various applications should be able to use the resulting lexical resources, we wanted to structure the resources by means of an application ontology. Application ontologies can be interpreted by different applications and may thus facilitate the integration of the software tools. At first we tried to use the Protégé ontology editor API [2] to store the required lexical information. This API facilitated the development of an ontological structure and we could easily develop software tools based on it. However, it proved difficult (if not impossible) to store all the lexical information by simply using this ontological structure. We therefore expanded the ontological structure to include the required lexical information. The resulting structure we call a Categorisation Framework (CF) and we use it to categorise lexical information. We shall explain how the CF can be used to structure and store all kinds of lexical information.
Due to the multilingual- and specialised nature of the resources it was necessary for different domain experts to collaborate, while constructing the domain ontologies and gathering the lexical information. By implementing an XML-format to represent the CF, we ensured that the resources could be developed and exchanged in a modular way. The XML-format made it also possible to include existing structured information, e.g. databases, by converting them into this XML-format.
Although the XML-format proved to be extremely useful during the development of the lexical resources, it became clear that the size and complexity of the total resources required a more efficient database-format. We therefore implemented the CF also as a relational database using JavaDB.
To handle the CF and use the information in our software tools, we developed a Java API. Our software tools for corpus compilation, linguistic ontology development and terminology management all use this API. Using the CF API makes it easy to manage the CF, and to store the information in both XML and relational database format. The main advantage of the CF API is that it facilitates the development of specialised software tools for lexicography, terminography, and linguistic ontology engineering. Using the CF API, different software tools can process the appropriate CF information. New projects may simply reuse information from previous projects, while the flexible and customisable nature of the CF enables the addition of extra lexical information.
In chapter 1 we will describe the Categorisation Framework and how it can be used to structure and store all kinds of lexical and ontological information. In chapter 2, we will discuss the current use and advantages of the CF API.
Boek: Terminology and Knowledge Engineering
Series: Terminology and Knowledge Engineering
Jaar van publicatie:2008
Trefwoorden:API, categorisation framework, ontology, terminology, Termontography