Applying corpus methods to written academic texts: Explorations of MICUSP
نویسندگان
چکیده
منابع مشابه
Coreference in Spoken vs. Written Texts: a Corpus-based Analysis
This paper describes an empirical study of coreference in spoken vs. written text. We focus on the comparison of two particular text types, interviews and popular science texts, as instances of spoken and written texts since they display quite different discourse structures. We believe in fact, that the correlation of difficulties in coreference resolution and varying discourse structures requi...
متن کاملMetadiscourse Markers Revisited in EFL Context: The Case of Iranian Academic Learners’ Perception of Written Texts
Moving in line with the postulation that metadiscourse (MD) markers help transform a dry and tortuous piece of text into a coherent and reader-friendly one, the researchers in the current study attempted to investigate the effect different metadiscourse markers might have on Iranian EFL learners’ perception of written texts. To this end, 120 undergraduate English students were given three diffe...
متن کاملExtracting salient sublexical units from written texts: “Emophon,” a corpus-based approach to phonological iconicity
A GROWING BODY OF LITERATURE IN PSYCHOLOGY, LINGUISTICS, AND THE NEUROSCIENCES HAS PAID INCREASING ATTENTION TO THE UNDERSTANDING OF THE RELATIONSHIPS BETWEEN PHONOLOGICAL REPRESENTATIONS OF WORDS AND THEIR MEANING: a phenomenon also known as phonological iconicity. In this article, we investigate how a text's intended emotional meaning, particularly in literature and poetry, may be reflected a...
متن کاملAutomatic Structuring of Written Texts
This paper deals with automatic structuring and sentence boundary labelling in natural language texts. We describe the implemented structure tagging algorithm and heuristic rules that are used for automatic or semiautomatic labelling. Inside the detected sentence the algorithm performs a decomposition to clauses and then marks the parts of text which do not form a sentence, i.e. headings, signa...
متن کاملBasiLex: an 11.5 million words corpus of Dutch texts written for children
This article discusses Basilex, a 13.5-million tokens, 11.5-million Dutch words corpus of written language offered to children in the elementary school age, which was recently finalized. The corpus is automatically analyzed at the levels of part-of-speech tagging and lemmatization, and a limited amount of polysemous words has been partly automatically disambiguated. Also, a lemma-based lexicon ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Writing Research
سال: 2010
ISSN: 2030-1006,2294-3307
DOI: 10.17239/jowr-2010.02.02.2