نتایج جستجو برای: corpus linguistics
تعداد نتایج: 98006 فیلتر نتایج به سال:
as a member of larger familyof formulaic sequences , lexical bundles play different discourse functions in written research articles. this study investigated the use of four-word lexical bundles in published research articles in medicine via natural language processing by computational linguistics. a corpus of 2,420,914 words was extracted from 790 research articles in 33 medical disciplines. f...
This article surveys linguistic annotation in corpora and corpus linguistics. We first define the concept of 'corpus' as a radial category and then, in Section 2, discuss a variety of kinds of information for which corpora are annotated and that are exploited in contemporary corpus linguistics. Section 3 then exemplifies many current formats of annotation with an eye to highlighting both the di...
Arabic is not just one language, but rather a collection of dialects in addition to Modern Standard Arabic (MSA). While MSA is used in formal situations, dialects are the language of every day life. Until recently, there was very little dialectal Arabic in written form. With the advent of social-media, however, the landscape has changed. We provide the first romanized code-switched Algerian Ara...
Corpus linguistics grew up in the domain of written (and literary) varieties, while its recent methodological revolution is due to computer-assisted capacity elaborating massive amounts text data. On other hand, so-called ‘low-density varieties’, including spoken varieties as well minority communities, have been confined a rather marginal role. Among others, this technical problems connected sc...
Corpora have played an important role in modern linguistics. I review some of the ways in which corpora have been relied upon in linguistics and how they have become increasingly common as sources of data in linguistic research. I then illustrate how corpora allow linguists to explore low-level patterns of co-occurrence associated with the verb in English. The corpus-based research reported her...
This paper introduces ACL RD-TEC: a dataset for evaluating the extraction and classification of terms from literature in the domain of computational linguistics. The dataset is derived from the Association for Computational Linguistics anthology reference corpus (ACL ARC). In its first release, the ACL RD-TEC consists of automatically segmented, part-of-speech-tagged ACL ARC documents, three li...
There has been an increasing interest in recent years in the enrichment of natural language corpora in terms of annotation with explicit linguistic information. This interest manifests itself most prominently in two areas of linguistics: corpus linguistics and computational linguistics. For corpus linguistics, the long standing practice has been to work on raw, i.e., unannotated text. While raw...
Corpora have been put to many different uses in fields as varied as natural language processing, critical discourse analysis and applied linguistics, to mention just a few. As is to be expected, within each of those areas corpora fulfil different roles, from providing data to build statistical machine translation systems to revealing ideological stance in politicallysensitive texts. ‘Corpus lin...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید