Informatized Caption Enhancement Based on Ibm Watson Api and Speaker Pronunciation Time-db
نویسندگان
چکیده
This paper aims to improve the inaccuracy problem of the existing informatized caption in the noisy environment by using the additional caption information. The IBM Watson API can automatically generate the informatized caption including the timing information and the speaker ID information from the voice information input. In this IBM Watson API, when there is noise in the voice signal, the recognition results are not good, causing the informatized caption error. Especially, it is more easily found in movies such as background music and special sound. Specifically, to reduce caption error, additional captions and voice information are entered at the same time, and the result of the informatized caption of voice information from IBM Watson API is compared with the original text to automatically detect and modify the error part. Based on the database containing the average pronunciation time, each word for each speaker is changed into the informatized caption in this process. In this way, more precise informatized captions could be generated based on the IBM Watson API.
منابع مشابه
CAPT and its Effect on English Language Pronunciation Enhancement: Evidence from Bilinguals and Monolinguals
Nowadays there are several challenges for English teachers as well as researchers regarding how to teach foreign language pronunciation more effectively. The current study aimed to explore the effect of computer-assisted pronunciation teaching (CAPT) on Persian monolinguals and Turkmen- Persian and also Baloch- Persian bilinguals’ pronunciation considering production and perception. A sample of...
متن کاملGame-based Teaching of Stress Placement on Multi-syllabic English Words
Accurate pronunciation is an important component of language ability and the main outward linguistic sign of whether someone is a native speaker of a language or not. An area of particular difficulty for Persian-speaking learners of English, which may cause 'foreign accent' or misunderstanding in speaking, is placement of stress on multi-syllable words. Game-based pronunciation teaching can be ...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملArticulatory feature-based conditional pronunciation modeling for speaker verification
Because of the differences in education background, accents, etc., different persons have their unique way of pronunciation. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation modeling (CPM) technique for speaker verification. The proposed technique aims to establish a link between articulatory properties (e.g., manners and places of a...
متن کاملImproving pronunciation modeling for non-native speech recognition
In this paper, three different approaches to pronunciation modeling are investigated. Two existing pronunciation modeling approaches, namely the pronunciation dictionary and n-best rescoring approach are modified to work with little amount of non-native speech. We also propose a speaker clustering approach, which capable of grouping the speakers based on their pronunciation habits. Given some s...
متن کامل