Transcription of Multi-variety Portuguese Media Contents
نویسندگان
چکیده
Current automatic transcription technology applied to media contents is an important medium that not only allows generating subtitles, but also enables data search and retrieval capabilities over multimedia streams. Among others, one of the most important challenges that transcription systems have to deal with is speaker accent variability. In this work we study the importance of accent variability for three broad varieties of Portuguese: African Portuguese, Brazilian Portuguese and European Portuguese. Then, we propose a multi-variety transcription system based on the combination of variety identification followed by specific variety-dependent transcription systems.
منابع مشابه
The Presence and Influence of English in the Portuguese Financial Media
As the lingua franca of the 21st century, English has become the main language for intercultural communication for those wanting to embrace globalization. In Portugal, it is the second language of most public and private domains influencing its culture and discourses. Language contact situations transform languages by the incorporations they make from other languages and Portugal has...
متن کاملAn Updated Portrait of the Portuguese Web
This study presents an updated characterization of the Portuguese Web derived from a crawl of 48 million contents belonging to all media types (2.5 TB of data), performed in March, 2008. The resulting data was analyzed to characterize contents, sites and domains. This study was performed within the scope of the Portuguese Web Archive.
متن کاملExploiting variety-dependent phones in portuguese variety identification applied to broadcast news transcription
This paper presents a Variety IDentification (VID) approach and its application to broadcast news transcription for Portuguese. The phonotactic VID system, based on Phone Recognition and Language Modelling, focuses on a single tokenizer that combines distinctive knowledge about differences between the target varieties. This knowledge is introduced into a Multi-Layer Perceptron phone recognizer ...
متن کاملPronunciation Rules in Portuguese Regional Speech (PORT REG) for Coarticulation Process
This paper describes one aspect of an ongoing work to incorporate pronunciation variability in the Portuguese (PORT) speech system. This work focuses on the linguistic rules to improve the grapheme-(multi)phone transcription algorithm that will be implemented. Portuguese ‘Beira Interior’ regional speech (PORT-BI REG) is considered to be in the realm of coarticulation (post-lexical) phenomena. A...
متن کاملAutomatic Speech Recognition and Identification of African Portuguese
This document deals with speech recognition of different Portuguese varieties, it resumes results from the author’s diploma thesis [9]. The performance of a hybrid large vocabulary continuous speech recognizer, which combines multi-layer perceptrons and Hidden Markov Models, degrades heavily in the presence of African Portuguese varieties in broadcast news. Variety-specific acoustic and languag...
متن کامل