Approach for Transforming Monolingual Text Corpus into XML Corpus
نویسندگان
چکیده
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملCorpus Portal for Search in Monolingual Corpora
A simple and flexible schema for storing and presenting monolingual language resources is proposed. In this format, data for 18 different languages is already available in various sizes. The data is provided free of charge for online use and download. The main target is to ease the application of algorithms for monolingual and interlingual studies.
متن کاملCorpus Based Method of Transforming Nominalized Phrases into Clauses for Text Mining Application
Nominalization is a linguistic phenomenon in which events usually described in terms of clauses are expressed in the form of noun phrases. Extracting event structures is an important task in text mining applications. To achieve this goal, clauses are parsed and the argument structure of main verbs are extracted from the parsed results. This kind of preprocessing has been commonly done in the pa...
متن کاملA Corpus-Based Approach to Text Partition
A text partition model is proposed to determine the boundaries of discourse structures. It is based on association of noun-noun relations and noun-verb relations defined on discourse level and sentence level. Three factors are considered: 1) repetition of words, 2) importance of words, and 3) collocational semantics. Ten texts serve as experimental objects. The applications of the results to se...
متن کاملAnatomy of an XML-based Text Corpus Server
This document describes an XML-based data model for annotated, modular text corpora along with a WWW-interface for browsing such corpora, reading the texts, searching for examples, and extracting information of word usages. The interface is based solely on programs and techniques belonging to the XML-family. The corpus model is designed in such a way that new parts (texts, sub-corpora) can be e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Applied Information Systems
سال: 2012
ISSN: 2249-0868
DOI: 10.5120/ijais12-450225