نتایج جستجو برای: speech learning model
تعداد نتایج: 2641683 فیلتر نتایج به سال:
Major speech production models from speech science literature and a number of popular statistical “generative” models of speech used in speech technology are surveyed. Strengths and weaknesses of these two styles of speech models are analyzed, pointing to the need to integrate the respective strengths while eliminating the respective weaknesses. As an example, a statistical task-dynamic model o...
In this paper, we propose a semi-supervised learning of acoustic driven phrase breaks and its usefulness for text-to-speech systems. In this work, we derive a set of initial hypothesis of phrase breaks in a speech signal using pause as an acoustic cue. As these initial estimates are obtained based on knowledge of speech production and speech signal processing, one could treat the hypothesized p...
Arabic speech recognition suffers from the scarcity of properly labeled data. In this project, we introduce a pipeline that performs semi-supervised segmentation of audio then— after hand-labeling a small dataset—feeds labeled segments to a supervised learning framework to select, through many rounds of hyperparameter optimization, an ensemble of models to infer labels for a larger dataset; usi...
A Mobile Virtual Assistant (MVA) is a communication agent that recognizes and understands free speech, and performs actions such as retrieving information and completing transactions. One essential characteristic of MVAs is their ability to learn and adapt without supervision. This paper describes our ongoing research in developing more intelligent MVAs that recognize and understand very large ...
This paper presents our investigations on emotional state categorization from speech signals with a psychologically inspired computational model against human performance under the same experimental setup. Based on psychological studies, we propose a multistage categorization strategy which allows establishing an automatic categorization model flexibly for a given emotional speech categorizatio...
Recently, Deep Neural Network (DNN) based bottleneck features proved to be very effective in i-vector based speaker recognition. However, the bottleneck feature extraction is usually fully optimized for speech rather than speaker recognition task. In this paper, we explore whether DNNs suboptimal for speech recognition can provide better bottleneck features for speaker recognition. We experimen...
Research in automatic Part of Speech (POS) tagging has been dominated by Markov Model (MM) taggers. Brill [1, 3, 6], has recently described a transformation-based system with comparable accuracy, and simpler algorithms and representation than MM taggers. We present a set-based formal model of natural language ambiguity and semantic tagging that forms a basis for the generalisation of the transf...
Error-corrective post-processing (ECPP) has great potential to reduce speech recognition errors beyond that obtained by speech model improvement. ECPP approaches aim to learn error-corrective rules to directly reduce speech recognition errors. This paper presents our investigation into one such approach, incremental learning of maximum a posteriori (MAP) context-dependent edit operations. Limit...
Modeling concepts using supervised or unsupervised machine learning approaches are becoming more and more important for video semantic indexing, retrieval and filtering applications. Naturally, videos include multimodality audio, speech, visual and text data, that are combined to inferred therein the overall semantic concepts. However, in literature, most researches were mostly conducted within...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید