Development of Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish
نویسندگان
چکیده
The development of an automatic index system of broadcast news requires appropriate Video and Language Resources (LR) to design all the components of the system. Nowadays, large and well-defined resources can be found in most widely used languages (Informedia), but there is a lot of work to do with respect to minority languages. The main goal of this work is the design of resources in Basque and Spanish for the transcription of broadcast news. These two languages have been chosen because they are both official in the Basque Autonomous Community and they are used in the Basque Public Radio and Television EITB (EITB).
منابع مشابه
Language Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish
Automatic Indexing of Broadcast News is a developing research area of great recent interest [1]. This paper describes the development steps for designing an automatic index system of broadcast news for both Basque and Spanish. This application requires of appropriate Language Resources to design all the components of the system. Nowadays, large and well-defined resources can be found in most wi...
متن کاملA Spoken Document Retrieval System for TV Broadcast News in Spanish and Basque
This paper presents a spoken document retrieval system (Hearch) looking like a conventional search tool, which retrieves audio/video segments based on the automatic transcription of speech contents. The system consists of a backend that captures, processes and indexes audio/video resources, and a front-end that allows to search contents, configure various modules and display performance statist...
متن کاملThe need to create a media block for the convergence of overseas news networks
As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...
متن کاملNew bilingual speech databases for audio diarization
This paper describes the process of collecting and recording two new bilingual speech databases in Spanish and Basque. They are designed primarily for speaker diarization in two different application domains: broadcast news audio and recorded meetings. First, both databases have been manually segmented. Next, several diarization experiments have been carried out in order to evaluate them. Our b...
متن کاملTranscrigal: A Bilingual System for Automatic Indexing of Broadcast News
This paper describes a Broadcast News (BN) database called Transcrigal-DB. The news shows are mainly in Galician language, although around 11% of data is in Spanish. This database has been constructed for automatic speech recognition (ASR) purposes. A BN-ASR reference system is also described and evaluated on the test partition of Transcrigal-DB. The reference system has been designed having in...
متن کامل