Nasal Speech Sounds Detection Using Connectionist Temporal Classification
نویسندگان
چکیده
Phone attributes, known also as distinctive or phonological features, belong to important classification of the speech sounds used in automatic speech processing. Training of conventional phone attribute detectors (classifiers), either based on acoustic measurements or deep learning approaches, requires decent phone boundary segmentation. This paper proposes a solution to train a phone attribute detector without phone alignment using an end-to-end phone attribute modeling based on the connectionist temporal classification. Experiments, performed for the nasal phone attribute on the LibriSpeech database, confirm that the proposed system outperforms conventional deep neural network detector, trained even on the same training data. Further improvements are observed with more training data. Conventional complex system that consists of feature extraction, phone forcealignment and deep neural network training is replaced by a more simpler Python package based on PyTorch, released as open-source.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملDetermining the effective features in classification of heart sounds using trained intelligent network and genetic algorithm
Heart diseases are among the most important causes of mortality in the world, especially in industrial countries. Using heart sounds and the features extracted from them are among the non-aggressive diagnosis and prognosis methods for heart diseases. In this study, the time-scale, Cepstral, frequency, temporal and turbulence features are saved and extracted from the heart sounds, and then they ...
متن کاملAdvancing Connectionist Temporal Classification With Attention Modeling
In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network r...
متن کاملAcoustic parameters for automatic detection of nasal manner
Of all the sounds in any language, nasals are the only class of sounds with dominant speech output from the nasal cavity as opposed to the oral cavity. This gives nasals some special properties including presence of zeros in the spectrum, concentration of energy at lower frequencies, higher formant density, higher losses, and stability. In this paper we propose acoustic correlates for the lingu...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017