Sentence boundary detection in arabic speech
نویسندگان
چکیده
This paper presents an automatic system to detect sentence boundaries in speech recognition transcripts. Two systems were developed that use independent sources of information. One is a linguistic system that uses linguistic features in a statistical language model while the other is an acoustic system that uses prosodic features in a feed-forward neural network model. A third system was developed that combines the scores from the acoustic and the linguistic systems in a Maximum-Likelihood framework. All systems outlined in this paper are essentially language-independent but all our experiments were conducted on the Arabic Broadcast News speech recognition transcripts. Our experiments show that while the acoustic system outperforms the linguistic system, the combined system achieves the best performance at detecting sentence boundaries.
منابع مشابه
Dependency structure analysis and sentence boundary detection in spontaneous Japanese
This paper addresses automatic detection of dependencies between Japanese phrasal units called bunsetsus, and sentence boundaries in a spontaneous speech corpus. In spontaneous speech, the biggest problem with dependency structure analysis is that sentence boundaries are ambiguous. In this paper, we propose two methods for improving the accuracy of sentence boundary detection in spontaneous Jap...
متن کاملImproving Automatic Sentence Boundary Detection with Confusion Networks
We extend existing methods for automatic sentence boundary detection by leveraging multiple recognizer hypotheses in order to provide robustness to speech recognition errors. For each hypothesized word sequence, an HMM is used to estimate the posterior probability of a sentence boundary at each word boundary. The hypotheses are combined using confusion networks to determine the overall most lik...
متن کاملEvaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts
This paper is motivated by the automation of neuropsychological tests involving discourse analysis in the retellings of narratives by patients with potential cognitive impairment. In this scenario the task of sentence boundary detection in speech transcripts is important as discourse analysis involves the application of Natural Language Processing tools, such as taggers and parsers, which depen...
متن کاملUsing Conditional Random Fields for Sentence Boundary Detection in Speech
Sentence boundary detection in speech is important for enriching speech recognition output, making it easier for humans to read and downstream modules to process. In previous work, we have developed hidden Markov model (HMM) and maximum entropy (Maxent) classifiers that integrate textual and prosodic knowledge sources for detecting sentence boundaries. In this paper, we evaluate the use of a co...
متن کاملA Segment Based Approach for ProsodicBoundary
Successful detection of the position of prosodic phrase boundaries is useful for the rescoring of the sentence hypotheses in a speech recognition system. In addition, knowledge about prosodic boundaries may be used in a speech understanding system for disambiguation. In this paper, a segment oriented approach to prosodic boundary detection is presented. In contrast to word oriented methods (e.g...
متن کامل