corpora creation

نتایج جستجو برای: corpora creation

تعداد نتایج: 147847 فیلتر نتایج به سال:

ANALHITZA: a tool to extract linguistic information from large corpora in Humanities research

Journal: :Procesamiento del Lenguaje Natural 2017

Arantxa Otegi Oier Imaz Arantza Díaz de Ilarraza Mikel Iruskieta Larraitz Uria

The reduced size of corpora in some areas of research is due to the lack of tools to process massively and easily the language under study. In this article, we present ANALHITZA, a tool which is being developed within the Clarink project, whose aim is the creation of linguistic technologies that are useful for research on Social Sciences and Humanities. ANALHITZA has been designed to extract li...

متن کامل

Three-dimensional modelling of speech corpora: added value through visualisation

2001

Toomas Altosaar Matti Karjalainen Martti Vainio

Collections of annotated spoken language have formed an important basis for the development of speech technology. Their existence has promoted speech analysis research as well as enabled robust synthesis and recognition methods to be developed. However, many complex relationships remain unspecified within a corpus due to a lack of meta-data that describes the raw information in sufficient detai...

متن کامل

IrcamCorpusTools: an Extensible Platform for Spoken Corpora Exploitation

2008

Christophe Veaux Grégory Beller Xavier Rodet

Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis to statistical modeling of speech phenomena like prosody or expressivity. In all cases, these usages require a wide range of tools for corpus creation, labeling, symbolic and acoustic ...

متن کامل

Corpora and Translation. Are Corpora Still an Academic Luxury?

Journal: :Vertimo studijos 2019

متن کامل

THE USE OF PARALLEL CORPORA IN TEACHING LANGUAGES AND TRANSLATION PRACTICE

Journal: : 2023

Nowadays the methodology of teaching foreign languages and translation practice involves using development modern computer applications called parallel corpora texts various genres. Such developments haven‘t been developed in Kazakhstan yet, though or so-called bitexts were used for comparative analysis applied linguistics long before.In practice, can be getting referential information, samples...

متن کامل

Translation as Annotation

2003

Emanuele Pianta Luisa Bentivogli

In this paper we illustrate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the key notion that translating a text can be seen as a linguistic annotation task which is easier than manual annotation with formal schemes. After translation, formal annotations can be automatically derived...

متن کامل

Arborest – a VISL-Style Treebank Derived from an Estonian Constraint Grammar Corpus

2004

Eckhard Bick Heli Uibo Kaili Müürisep

Treebank creation is a very labor-consuming task, especially if the applications intended include machine learning, gold standard parser evaluation or teaching, since only a manually checked syntactically annotated corpus can provide optimal support for these purposes. There are, however, possibilities to make the annotation process (partly) automatic, saving (manual) annotation time and/or all...

متن کامل

Collaborative text-annotation resource for disease-centered relation extraction from biomedical text

Journal: :Journal of biomedical informatics 2009

Carlos Cano Thomas Monaghan Armando Blanco Dennis P. Wall Leonid Peshkin

Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text mining for relation extraction in the biomedical literature. Relation extraction systems cannot be developed without substantial datasets annotated...

متن کامل

A Web Tool for Building Parallel Corpora of Spoken and Sign Languages

2016

Alex Becker Fabio Kepler Sara Candeias

In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and...

متن کامل

The Language Resource Archive of the 21st Century

2002

Martin Wynne

What will an archive of language resources look like in the future? It is to be expected that developments in computer technology will have an impact on the nature of language resources which will be created in the future. A projection current trends into the future helps us to see that there will be more multimedia and multilingual resources. It is also likely that increasing internet bandwidt...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید