ArmSpeech: Armenian Spoken Language Corpus
نویسندگان
چکیده
The Armenian language is an independent branch of the Indo-European family and official Republic Armenia Artsakh. According to various reliable sources, average 3 million people in 10-12 Diaspora use as their native language. largest communities outside are United States America, Canada, Russian Federation, Islamic Iran, French Republic, Syrian Arab Lebanese Republic. This paper presents ArmSpeech speech corpus. a collection annotated intended for natural processing (NLP) technologies research development. designed speech-to-text text-to-speech purposes but can be used other domains also (e.g. identification). Corpus contains 6206 high-quality audio samples: 11 hours 46 minutes 26 seconds (11.77 hours) multiple speakers any age, gender accent. results, this most extensive corpus public domain recognition, synthesis spoken identification systems.
منابع مشابه
Corpus of Spoken Slovak Language
In this paper a short description of activities towards building a general speech corpus of spoken Slovak language is given. Different rôles and specific features of text corpus and speech corpus are investigated as well as the most frequent mistakes and misunderstandings of the concept of a speech corpus are mentioned. The concept of a big representative corpus of spoken language and its desir...
متن کاملSpoken language corpus for machine interpretation research
This paper describes a database consisting of speech and language, which we are currently constructing for the purpose of the research on machine interpretation. The database contains bilingual data of lectures and dialogues. We have collected the speech of about 72 hours in total and transcribed it into the text manually. We have investigated the database in order to acquire empirical knowledg...
متن کاملSpoken language identification using the speechdat corpus
Current language identification systems vary significantly in their complexity. The systems that use higher level linguistic information have the best performance. Nevertheless, that information is hard to collect for each new language. The system presented in this paper is easily extendable to new languages because it uses very little linguistic information. In fact, the presented system needs...
متن کاملThe ATIS Spoken Language Systems Pilot Corpus
Speech research has made tremendous progress in the past using the following paradigm: de ne the research problem, collect a corpus to objectively measure progress, and solve the research problem. Natural language research, on the other hand, has typically progressed without the bene t of any corpus of data with which to test research hypotheses. We describe the Air Travel Information System (A...
متن کاملA corpus-centered approach to spoken language translation
This paper reports the latest performance of components and features of a project named CorpusCentered Computation (C'3), which targets a translation technology suitable for spoken language translation. C3 places corpora at the center of the technology. Translation knowledge is extracted from corpora by both EBMT and SMT methods, translation quality is gauged by referring to corpora, the best t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International journal of scientific advances
سال: 2022
ISSN: ['2708-7972']
DOI: https://doi.org/10.51542/ijscia.v3i3.25