نتایج جستجو برای: corpora creation

تعداد نتایج: 147847  

Journal: :Procesamiento del Lenguaje Natural 2017
Arantxa Otegi Oier Imaz Arantza Díaz de Ilarraza Mikel Iruskieta Larraitz Uria

The reduced size of corpora in some areas of research is due to the lack of tools to process massively and easily the language under study. In this article, we present ANALHITZA, a tool which is being developed within the Clarink project, whose aim is the creation of linguistic technologies that are useful for research on Social Sciences and Humanities. ANALHITZA has been designed to extract li...

2001
Toomas Altosaar Matti Karjalainen Martti Vainio

Collections of annotated spoken language have formed an important basis for the development of speech technology. Their existence has promoted speech analysis research as well as enabled robust synthesis and recognition methods to be developed. However, many complex relationships remain unspecified within a corpus due to a lack of meta-data that describes the raw information in sufficient detai...

2008
Christophe Veaux Grégory Beller Xavier Rodet

Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis to statistical modeling of speech phenomena like prosody or expressivity. In all cases, these usages require a wide range of tools for corpus creation, labeling, symbolic and acoustic ...

Journal: : 2023

Nowadays the methodology of teaching foreign languages and translation practice involves using development modern computer applications called parallel corpora texts various genres. Such developments haven‘t been developed in Kazakhstan yet, though or so-called bitexts were used for comparative analysis applied linguistics long before.In practice, can be getting referential information, samples...

2003
Emanuele Pianta Luisa Bentivogli

In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a text can be seen as a linguistic annotation task which is easier than manual annotation with formal schemes. After translation, formal annotations can be automatically derived...

2004
Eckhard Bick Heli Uibo Kaili Müürisep

Treebank creation is a very labor-consuming task, especially if the applications intended include machine learning, gold standard parser evaluation or teaching, since only a manually checked syntactically annotated corpus can provide optimal support for these purposes. There are, however, possibilities to make the annotation process (partly) automatic, saving (manual) annotation time and/or all...

Journal: :Journal of biomedical informatics 2009
Carlos Cano Thomas Monaghan Armando Blanco Dennis P. Wall Leonid Peshkin

Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text mining for relation extraction in the biomedical literature. Relation extraction systems cannot be developed without substantial datasets annotated...

2016
Alex Becker Fabio Kepler Sara Candeias

In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and...

2002
Martin Wynne

What will an archive of language resources look like in the future? It is to be expected that developments in computer technology will have an impact on the nature of language resources which will be created in the future. A projection current trends into the future helps us to see that there will be more multimedia and multilingual resources. It is also likely that increasing internet bandwidt...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید