Decision Combination in Speech Metadata Extraction
نویسنده
چکیده
Speech metadata extraction can both improve speech recognition and enable novel Interactive Voice Response applications. Unlike the previous research, which concentrates on the frame-level signal processing and pattern classification, this paper systematically studies the behavior of decision combination at the utterance level. We analyze the asymptotic characteristics, and the factors affecting frame-level classification. In addition, we introduce new methods to more accurately and efficiently combine frame-level decisions, including phoneme/power-based weighting and smart sampling. Experimental results in gender classification are presented.
منابع مشابه
بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگیهای استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز
The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...
متن کاملاستانداردهای آرشیوی، در نرمافزارهای دسترسی آزاد و پیشنهاد نرمافزار مناسب برای مراکز آرشیوی داخلی
The purpose of this study is Study of Descriptive Metadata Standards in Archival open source software, to determine the most appropriate descriptive metadata standard (s) and also Encoder Software support of these standards. The approach of present study is combination and library methods, Delphi and descriptive survey are used. Data gathering in library study is fiche, in the Delphi method is ...
متن کاملExamining the Contributions of Automatic Speech Transcriptions and Metadata Sources for Searching Spontaneous Conversational Speech
The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically related metadata. An important question is what can be expected from search of such transcriptions and different sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech te...
متن کاملReference metadata extraction using a hierarchical knowledge representation framework
The integration of bibliographical information on scholarly publications available on the Internet is an important task in the academic community. Accurate reference metadata extraction from such publications is essential for the integration of metadata from heterogeneous reference sources. In this paper, we propose a hierarchical template-based reference metadata extraction method for scholarl...
متن کاملDNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification
In this study we deal with the three sub-challenges of the Interspeech ComParE Challenge 2017, where the goal is to identify child-directed speech, speakers having a cold, and different types of snoring sounds. For the first two sub-challenges we propose a simple, two-step feature extraction and classification scheme: first we perform frame-level classification via Deep Neural Networks (DNNs), ...
متن کامل