Effects of frequency shifts on perceived naturalness and gender information in speech
نویسندگان
چکیده
In natural speech, there is a moderate correlation between the fundamental frequency and formant frequencies across talkers. The present study used a high-quality vocoder to manipulate these properties and determine their contribution to perceived naturalness and voice gender. The stimuli were re-synthesized sentences spoken by two adult males and two adult females. Scale factors were chosen for each sentence and for each talker to produce frequency-shifted versions with a specified mean fundamental frequency (F0) ranging from 60 Hz to 450 Hz in 10 steps, paired with 10 steps in geometric mean formant frequencies ranging from 850 Hz to 2500 Hz. Listeners judged frequency-shifted sentences as more natural when F0 and formant frequencies followed the co-variation of F0 and formant frequencies in natural voices. Sentences with low F0s and low formant frequencies were perceived as masculine, while sentences with high F0 and high formant frequencies were assigned high ratings of femininity. Sentences with “mismatched” F0 and formant frequencies were assigned ratings near the midpoint of the range, indicating gender ambiguity. Frequency-shifted sentences derived from male talkers received consistently higher ratings of masculinity than those derived from females, while sentences from female talkers received higher ratings of femininity, even when assigned scale factors appropriate for the opposite gender, indicating that factors other than F0 and mean formant frequencies contribute to perceived gender.
منابع مشابه
Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملA Study on the Frequency of Occurrence and Usage of Anglicism in Speech of Young Iranian Telegram Users
This paper investigates the frequency of occurrence of English borrowed words in terms of three variables of age, gender, and educational status. To do so, a corpus including the extant files of participants in a target group of telegram social networking was selected and analyzed. The quantitative study of the data shows that the occurrence of the loanwords is much more frequent in the speech ...
متن کاملPhonological Reduction in Swedish
In this paper, the importance of pronunciation variation modelling is discussed. As a first step in developing a model of Swedish pronunciation variation due to speaking style and speech rate, a tentative reduction rule system has been developed. An assessment experiment testing the impact of phonological reduction, as defined by this system, on the perceived naturalness of speech synthesis was...
متن کاملThe Effects of Culture and Gender on the Recognition of Emotional Speech: Evidence from Persian Speakers Living in a Collectivist Society
This paper reports on a behavioral study that explores the role of culture and gender in the recognition of emotional speech in an under investigated cultural context (a collectivist society: i.e., Iran). Participants were asked to recognize the emotional prosody of a set of validated emotional vocal portrayals (including the five basic emotions). Findings of the experiment were then comp...
متن کاملOn-line experimental methods to evaluate text-to-speech (TTS) synthesis: effects of voice gender and signal quality on intelligibility, naturalness and preference
Three experiments are reported that use new experimental methods for the evaluation of text-to-speech (TTS) synthesis from the user’s perspective. Experiment 1, using sentence stimuli, and Experiment 2, using discrete ‘‘call centre’’ word stimuli, investigated the effect of voice gender and signal quality on the intelligibility of three concatenative TTS synthesis systems. Accuracy and search t...
متن کامل