نتایج جستجو برای: corpora creation
تعداد نتایج: 147847 فیلتر نتایج به سال:
We discuss building digital language resources (such as annotated corpora, lexicons, ontologies, terminologies, tools), which are the main prerequisite for successful communication and information management in the e-society of the 21 century. We give an overview of the main requirements and best practices, and point to necessary steps for creation and maintenance of standardsbased and reusable...
This paper introduces the first project of its kind within the Southern African language engineering context. It focuses on the role of idiosyncratic linguistic and pragmatic features of the different languages concerned and how these features are to be accommodated within (a) the creation of applicable speech corpora and (b) the design of the system at large. An introduction to the multilingua...
Parallel corpora are one of the key resources in natural language processing. In spite of their importance in many multi-lingual applications, no large-scale English-Persian corpus has been made available so far, given the difficulties in its creation and the intensive labors required. In this paper, the construction process of Tehran English-Persian parallel corpus (TEP) using movie subtitles,...
The creation of text corpora requires a sequence of processing steps in order to constitute, normalize, and then to directly exploit it by a given application. This paper presents a generic approach for text normalization and concentrates on the aspects of methodology and linguistic engineering, which serve to develop a multipurpose multilingual text corpus. This approach was applied to French,...
The interaction of natural language processing and the Semantic Web have lead to the creation of a new paradigm known as Linguistic Linked Open Data (LLOD), whereby traditional language resources are made available as linked data. Conversely, the publication of corpora, machine-readable dictionaries as linked data has opened new resources to Semantic Web researchers and allowed new tools to be ...
In this paper, we describe the design and collection of corpora for diphone synthesis, the voice building process, and our experience in the creation of a new, publically available database of ten diphone sets of one American English speaker for the Festival Speech Synthesis System [3], using the FestVox document and tools [1]. In support of our goal to make the tools and techniques available f...
To achieve widespread acceptance, speech Understanding technology needs to be domain independent. Deep understanding, however, appears to require knowledge that is tiomain specific. Speech understanding technology, therefore, must be partitioned into domain-independent and domainspecific components. Development of domain-independent components could be promoted by creation of semantically annot...
The creation of richly annotated, extendable and reusable corpora of multimodal interactions is an expensive and time-consuming task. Support from tools to create annotations is indispensable. This paper argues that annotation tools should be focused on specific classes of annotation problems to make the annotation process more efficient. The central part of the paper discusses how the properti...
In this work, the creation of a large-scale Arabic to French statistical machine translation system is presented. We introduce all necessary steps from corpus aquisition, preprocessing the data to training and optimizing the system and eventual evaluation. Since no corpora existed previously, we collected large amounts of data from the web. Arabic word segmentation was crucial to reduce the ove...
Spoken language and interaction lie at the core of human experience. The primary medium of communication is speech, with some estimating the ratio of spoken-written language to be as high as 90%-10% (Cermák, 2009, p. 115). Yet they have remained poor cousins in the building of corpora to date. Not only are spoken corpora much smaller than written corpora (Xiao, 2008), the overwhelming focus in ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید