نتایج جستجو برای: corpus linguistic

تعداد نتایج: 113027  

2017
Eszter Simon Nikolett Mus

For most of the Uralic languages, there is a lack of systematically collected, consequently transcribed and morphologically annotated text corpora. This paper sums up the steps, the preliminary results and the future directions of building a linguistic corpus of some Uralic languages, namely Tundra Nenets, Udmurt, Synya Khanty, and Surgut Khanty. The experiences of building a corpus containing ...

2017
Hans van Halteren

A new technique is introduced, linguistic profiling, in which large numbers of counts of linguistic features are used as a text profile, which can then be compared to average profiles for groups of texts. The technique proves to be quite effective for authorship verification and recogni­ tion. The best parameter settings yield a False Accept Rate of 8.1% at a False Re­ ject Rate equal to zero f...

2015
Stefan Th. Gries Nick C. Ellis

The advent of usage-/exemplar-based approaches has resulted in a major change in the theoretical landscape of linguistics, but also in the range of methodologies that are brought to bear on the study of language acquisition/learning, structure, and use. In particular, methods from corpus linguistics are now frequently used to study distributional characteristics of linguistics units and what th...

2007
Anil Kumar Singh Harshit Surana

Quantitative measurement of inter-language distance is a useful technique for studying diachronic and synchronic relations between languages. Such measures have been used successfully for purposes like deriving language taxonomies and language reconstruction, but they have mostly been applied to handcrafted word lists. Can we instead use corpus based measures for comparative study of languages?...

2010
Fabienne Fritzinger Alexander Fraser

Compound splitting is an important problem in many NLP applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting produce a correct splitting more often, but corpus-driven approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We ...

2016
Yong-Hun Lee Ki-Suk Jun

In this paper, the use of two modals (can and may) in four varieties of English (British, India, Philippines, and USA) was compared and the characteristics of each variety were statistically analyzed. After all the sample sentences were extracted from each component of the ICE corpus, a total of twenty linguistic factors were encoded. Then, the collected data were statistically analyzed with R....

2005
Niraj Aswani Valentin Tablan Hamish Cunningham

The need for efficient corpus indexing and querying arises frequently both in machine learning-based and human-engineered natural language processing systems. This paper presents the ANNIC system, which can index documents not only by content, but also by their linguististic annotations and features. It also enables users to formulate versatile queries mixing keywords and linguistic information...

2008
Nicolás Morales Javier Tejedor Javier Garrido Salas José Colás Pasamontes Doroteo Torre Toledano

This paper describes a new speech corpus, STC-TIMIT, and discusses the process of design, development and its distribution through LDC. The STC-TIMIT corpus is derived from the widely used TIMIT corpus by sending it through a real and single telephone channel. TIMIT is phonetically balanced, covers the dialectal diversity in continental USA and has been extensively used as a benchmark for speec...

2000
Christopher Cieri

Responding to demands for very large, easily accessible, reusable news corpora to support research in the topic detection and tracking paradigm, the Linguistic Data Consortium created the TDT corpora. In addition to supporting research in the Topic Detection and Tracking program, the TDT corpora were collected and annotated with an eye toward reuse and re-annotation. Their value is confirmed in...

2017
Fabian Barteld Johanna Flick

In this paper we present LEA (Linguistic Exercises with Annotation tools). LEA is a new didactic concept helping students to become familiar with corpus linguistic methods and annotation tools. The main idea behind LEA is that classical linguistic exercises are being solved with annotation tools. We will present the advantages of this method (e.g. didactic benefits, automatic correction) and de...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید