speech problem

Automatic generation of prosodic structure for high quality Mandarin speech synthesis

1996

Fu-Chiang Chou Chiu-yu Tseng Lin-Shan Lee

A key problem for today's speech synthesis technology is to automatically generate an appropriate hierarchical prosodic structure for text input and incorporate it into synthesized speech[1][2]. This paper presents a method for such a problem in Mandarin Chinese. This method uses a speech database for the training of a statistical model to generate the prosodic structure and determine prosodic ...

متن کامل

Hate Speech Detection with Comment Embeddings

2015

Nemanja Djuric Jing Zhou Robin Morris Mihajlo Grbovic Vladan Radosavljevic Narayan Bhamidipati

We address the problem of hate speech detection in online user comments. Hate speech, defined as an “abusive speech targeting specific group characteristics, such as ethnicity, religion, or gender”, is an important problem plaguing websites that allow users to leave feedback, having a negative impact on their online business and overall user experience. We propose to learn distributed low-dimen...

متن کامل

Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks

2016

Anurag Kumar Dinei A. F. Florêncio

In this paper we consider the problem of speech enhancement in real-world like conditions where multiple noises can simultaneously corrupt speech. Most of the current literature on speech enhancement focus primarily on presence of single noise in corrupted speech which is far from real-world environments. Specifically, we deal with improving speech quality in office environment where multiple s...

متن کامل

IDIAP Martigny - Valais - Suisse Continuous Audio � Visual Speech Recognition

1998

Juergen Luettin

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audiovisual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We t a c kle the problem of joint temporal mo...

متن کامل

One DODer's View of ARPA Spoken Language Directions

1994

Calvin Olano

DOD support for ARPA/HLT speech research stems from the belief that large vocabulary continuous speech recognition and speech understanding will have advantages in portability, and in the variety of applications sustainable from a single, basic speech system. With some imagination one can envision applications in such remotely related areas as speaker identification and language identification....

متن کامل

Continuous Audio-Visual Speech Recognition

1998

Juergen Luettin Stéphane Dupont

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audio-visual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal mode...

متن کامل

A comparison of auditory and blind separation techniques for speech segregation

Journal: :IEEE Trans. Speech and Audio Processing 2001

André J. W. van der Kouwe DeLiang Wang Guy J. Brown

A fundamental problem in auditory and speech processing is the segregation of speech from concurrent sounds. This problem has been a focus of study in computational auditory scene analysis (CASA), and it has also been recently investigated from the perspective of blind source separation. Using a standard corpus of voiced speech mixed with interfering sounds, we report a comparison between CASA ...

متن کامل

Vocal Tract Area Function Estimation Using Particle Swarm

Journal: :JCP 2008

Mahmoud A. Ismail

The problem of estimating the Vocal Tract Area Functions (VTAF) from speech signals arise in many applications. However, the existing methods for solving this problem were designed for small subset of speech sounds. This research presents a novel method for estimating VTAF from speech signal based on Particle Swarm Optimization (PSO). This method is general and can be applied for large classes ...

متن کامل

Continuous Audio-visual Speech Recognition Continuous Audio-visual Speech Recognition

1998

Juergen Luettin

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audiovisual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal model...

متن کامل

Scandinavia INVESTIGATIONS ON CONVERSATIONAL SPEECH RECOGNITION

2001

P. Beyerlein X. Aubert M. Harris C. Meyer H. Schramm

Automatic speech recognition of real-life conversational speech is a precondition for building natural human-centered man-machine interfaces. Being able to extract speech utterances from real-life broadcast news audio streams and transcribing them with an overall word accuracy of 83% we are still faced with the problem of transcribing true conversational speech in real-life (i.e. bad) backgroun...

متن کامل