نتایج جستجو برای: corpora creation

تعداد نتایج: 147847  

Journal: :JLCL 2011
Dain Kaplan Ryu Iida Kikuko Nishina Takenobu Tokunaga

Recent research trends of the last five years show that richly annotated corpora inspire novel research. These richly annotated corpora are indispensable for progressing research, but also more difficult to manage and maintain due to increasing complexity – what is needed is a way to manage the annotation project in its entirety. However, annotation project management has received little attent...

2016
Vladimír Benko

The Aranea Project is targeted at creation of a family of Gigaword web-corpora for a dozen of languages that could be used for teaching languageand linguistics-related subjects at Slovak universities, as well as for research purposes in various areas of linguistics. All corpora are being built according to a standard methodology and using the same set of tools for processing and annotation, whi...

2013
Alexander Bazo Manuel Burghardt Christian Wolff

In this paper we present Tworpus, an easy-to-use tool for the creation of tailored Twitter corpora. Tworpus allows scholars to create corpora without having to know about the Twitter Application Programming Interface (API) and related technical aspects. At the same time our tool complies with Twitter’s ”rules of the road” on how to use tweet data. Corpora may be composed in various sizes and fo...

2004

This paper focuses on the next step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to provide parallel corpora annotated with detailed deep ...

Journal: :CoRR 2017
Baiyang Wang Diego Klabjan

Generative adversarial nets (GANs) have been successfully applied to the artificial generation of image data. In terms of text data, much has been done on the artificial generation of natural language from a single corpus. We consider multiple text corpora as the input data, for which there can be two applications of GANs: (1) the creation of consistent cross-corpus word embeddings given differ...

2013
Hui-Chuan Lu Yu-Hsin Chu

In the development of corpus linguistics, the creation of corpora has had a critical role in corpus-based studies. The majority of created corpora have been associated with English and native languages, while other languages and types of corpora have received relatively less attention. Because an increasing number of corpora have been constructed, and each corpus is constructed for a definite p...

2015
Christos Christodoulopoulos Mark Steedman

We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparison between the English translation and other En...

2007
Victoria Arranz

This paper describes the creation of linguistically enriched aligned corpora for Catalan, Spanish and US-English for Speech-to-Speech Translation. These corpora are obtained from two diierent sources: US-English transcribed speech data and transcriptions of conversations recorded in Catalan and Spanish. After human translation, a large trilingual spontaneous speech corpus has been obtained. Thi...

2004
Ulrike Gut Jan-Torsten Milde Holger Voormann Ulrich Heid

This paper is concerned with querying annotated speech corpora. A growing number of such corpora is currently being created worldwide; however, their usefulness for a wider research community is restricted by the lack of standard tools for creating, editing, annotating, storing and querying them. Two solutions for these problems are presented here: the XML-based data format TASX for corpus crea...

2015
Chiragkumar Patel Sunil Kumar Kopparapu

Speech corpus is an important and primary requirement for several speech tasks. Building a speech corpora is a lengthy, time consuming and expensive process, it typically involves collection of a large set of textual utterances and then selective distribution of these text utterances among a set of speakers, called speaker sheets. These speaker sheets are articulated by speakers to generate the...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید