نتایج جستجو برای: corpora creation

تعداد نتایج: 147847  

Journal: :journal of teaching language skills 2012
majid hayati hossein shokouhi fahimeh hadadi

the present study aimed to analyze reprint request e-mail messages written by postgraduates (ma students) of two fields of study, namely physics and efl, to realize the differences and similarities between the two email types. to investigate the purpose of the study, a sample of 100 e-mail messages, 50 physics and 50 efl, were analyzed according to swales’ (1990) model for reprint requests and ...

2009
A. Kumaran K. Saravanan Naren Datha B. Ashok Vikram Dendi

In this demo, we present a wiki-style platform – WikiBABEL – that enables easy collaborative creation of multilingual content in many nonEnglish Wikipedias, by leveraging the relatively larger and more stable content in the English Wikipedia. The platform provides an intuitive user interface that maintains the user focus on the multilingual Wikipedia content creation, by engaging search tools f...

2011
Narayan Choudhary Girish Nath Jha

This paper presents a description of the parallel corpora being created simultaneously in 12 major Indian languages including English under a nationally funded project named Indian Languages Corpora Initiative (ILCI) run through a consortium of institutions across India. The project runs in two phases. The first phase of the project has two distinct goals creating parallel sentence aligned corp...

2006
Julia S. Trushkina

This paper describes design and creation of a multilingual parallel corpus for South African languages. One of the applications of the corpus, namely, the induction of a part-of-speech tagger for Afrikaans from the data, is presented in the paper. Development of the Afrikaans part-of-speech tagger is based on a modified method for induction of linguistic tools from parallel corpora originally p...

2006
Nick Campbell

This paper presents a summary of some expressive speech data collected over a period of several years and suggests that its variation is not best described by the term “emotion”. Further, that the term may be misleading when used as a descriptor for the creation of expressive speech corpora. The paper proposes that we might benefit from first considering what other dimensions of speech variatio...

2007
Vít Novácek Maciej Dabrowski Sebastian Ryszard Kruk Siegfried Handschuh

In this paper we propose an ontology (formal knowledge base) creation methodology based on integrating external ontologies into the one developed by a community of the domain experts. We present the MarcOntX agent, a service, which allows to automate the process of generating suggestions of changes to the ontology. The suggestions are inferred from the external sources, such as large corpora of...

2000
Matej Rojc Zdravko Kacic

Statistic approaches in speech technology, either based on statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g. text corpora, pronunciation lexica and speech databases. This paper presents the system architecture for rapid construction of morphologic and phonetic lexica for Slovenian language....

2012
Stephanie Gokhman Jeff Hancock Poornima Prabhu Myle Ott Claire Cardie

In this study, we explore several popular techniques for obtaining corpora for deception research. Through a survey of traditional as well as non-gold standard creation approaches, we identify advantages and limitations of these techniques for webbased deception detection and offer crowdsourcing as a novel avenue toward achieving a gold standard corpus. Through an indepth case study of online h...

2004
Alejandro Renato José A. Alvarez

The present article describes the creation, labelling and main characteristics of a corpus of spoken Latin American Spanish. The corpus was collected with several objectives in mind: a) to fulfill our own research needs in the study of Latin American Spanish prosodic phenomena, where the absence of available corpora has already been noticed [1][6], b) to be able to experiment with prosodic mode...

2014
Mikaël Morardo Éric Villemonte de la Clergerie

We present the components of a processing chain for the creation, visualization, and validation of lexical resources (formed of terms and relations between terms). The core of the chain is a component for building lexical networks relying on Harris’ distributional hypothesis applied on the syntactic dependencies produced by the French parser FRMG on large corpora. Another important aspect conce...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید