Construction of Thematic Representations of Texts Based on Domain-Specific Thesaurus
نویسندگان
چکیده
The paper considers interrelations between lexical cohesion and the thematic structure of a text. The technique of automatic construction of the thematic representation of the text contexts is described. The technique uses knowledge from Sociopolitical thesaurus, which was specially developed as a tool for automatic text processing.
منابع مشابه
Conceptual Business Process Structuring by Extracting Knowledge from Natural Language Texts
This article discusses methods of constructing a formalized structure of a subject domain based on analysis of natural language texts, including discovering objects, their properties and related actions, followed by discovering business processes specific to the subject domain and the formation of thesaurus and business processes of the subject domain. At the same time the thesaurus can be chan...
متن کاملHow to Thematically Segemt Texts by Using Lexical Cohesion?
This article outlines a quantitative method for segmenting texts into thematically coherent units. This method relies on a network of lexical collocations to compute the thematic coherence of the different parts of a text from the lexical cohesiveness of their words. We also present the results of an experiment about locating boundaries between a series of concatened texts. 1 I n t r o d u c t ...
متن کاملOntologies, Taxonomies, Thesauri: Learning from Texts
The use of ontologies as representations of knowledge is widespread but their construction, until recently, has been entirely manual. We argue in this paper for the use of text corpora and automated natural language processing methods for the construction of ontologies. We delineate the challenges and present criteria for the selection of appropriate methods. We distinguish three major steps in...
متن کاملDiscovering and visualizing narrative themes
This paper presents a framework for indexing and browsing databases of stories, in particular characterizing and visually exploring each narrative’s thematic content. We introduce a method for discovering thematic content in texts via lexical dissimilarity statistics. A maximumlikelihood algorithm clusters words into pools of similar meaning, using a thesaurus for rough estimates of word sense ...
متن کاملAutomatic Ontology Extraction from Unstructured Texts
Construction of the ontology of a specific domain currently relies on the intuition of a knowledge engineer, and the typical output is a thesaurus of terms, each of which is expected to denote a concept. Ontological ‘engineers’ tend to hand-craft these thesauri on an ad-hoc basis and on a relatively smallscale. Workers in the specific domain create their own special language, and one device for...
متن کامل