نتایج جستجو برای: corpora creation

تعداد نتایج: 147847  

1999
Marion Klein

This paper describes the state of the art of coding schemes for dialogue acts and the efforts to establish a standard in this field. We present a review and comparison of currently available schemes and outline the comparison problems we had due to domain , task, and language dependencies of schemes. We discuss solution strategies which have in mind the reusability of corpora. Reusability is a ...

Journal: :Studies in Second Language Learning and Teaching 2023

This paper reports on the creation of specialized word lists in traditional Chinese medicine (TCM), which is a discipline using vocabulary across languages (i.e., and English) involves learners with different L1 backgrounds. First, TCM Word List 2,778 words was established from corpora textbooks journal articles. Selection criteria included meaning, keyness corpus general written English compar...

2002
Kiril Simov

Grammar learning and refinement on the basis of language resources is very appealing in comparison with manual development of formal grammar. But in order to learn a complex grammar a complex resource is needed. Thus the creation of language resources and learning of grammars from them have to be aware of each other. In this paper we define a formal basis for annotation of corpora with respect ...

2003
Anna Pappa

This article presents a three steps algorithm for morphological disambiguation between the definite article and the personal pronoun in French language. Tested accuracy in a large untagged corpora exceeds 98% with less than 1% of error. Our method has been also experimented on unlabeled Greek corpora and the results prove the system’s portability to other languages with similar structure. Not a...

Journal: :Speech Communication 2014
Laurent Besacier Etienne Barnard Alexey Karpov Tanja Schultz

The creation of language and acoustic resources, for any given spoken language, is typically a costly task. For example, a large amount of time and money is required to properly create annotated speech corpora for automatic speech recognition (ASR), domain-specific text corpora for language modeling (LM), etc. The development of speech technologies (ASR, Text-to-Speech) for the already highreso...

Journal: :Procesamiento del Lenguaje Natural 2002
José Luis Aguirre Moreno Alberto Álvarez Lugrís Xavier Gómez Guinovart

In this article we present a complete and normalized morphosyntactic tagset for the annotation of linguistic corpora in Galician. The elaboration of this tagset, designed by the Computational Linguistics Group (SLI) of the University of Vigo, following strictly the EAGLES recommendations (Leech and Wilson, 1996), includes the creation of an intermediate tagset that allows us to establish a corr...

2010
Milos Jakubícek Adam Kilgarriff Diana McCarthy Pavel Rychlý

For many linguistic investigations, the first step is to find examples. In the 21st century, they should all be found, not invented. Thus linguists need flexible tools for finding even quite rare phenomena. To support linguists well, they need to be fast even where corpora are very large and queries are complex. We present extensions to the CQL ’Corpus Query Language’ for intuitive creation of ...

2011
Tommaso Caselli Valentina Bartalesi Lenzi Rachele Sprugnoli Emanuele Pianta Irina Prodanof

This paper presents the annotation guidelines and specifications which have been developed for the creation of the Italian TimeBank, a language resource composed of two corpora manually annotated with temporal and event information. In particular, the adaptation of the TimeML scheme to Italian is described, and a special attention is given to the methodology used for the realization of the anno...

2004
Bozo Bekavac Petya Osenova Kiril Ivanov Simov Marko Tadic

This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters of similarity like: specific topics, pre-defined time span and data size. The idea of ‘light’ and ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید