Automatic linguistic segmentation of conversational speech

نویسندگان

  • Andreas Stolcke
  • Elizabeth Shriberg
چکیده

As speech recognition moves toward more unconstrained domains such as conversational speech, we encounter a need to be able to segment (or resegment) waveforms and recognizer output into linguistically meaningful units, such a sentences. Toward this end, we present a simple automatic segmenter of transcripts based on N-gram language modeling. We also study the relevance of several word-level features for segmentation performance. Using only word-level information, we achieve 85% recall and 70% precision on linguistic boundary detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ethnomethodology and Conversational Analysis

In a speech community, people utilize their communicative competence which they have acquired from their society as part of their distinctive sociolinguistic identity. They negotiate and share meanings, because they have commonsense knowledge about the world, and have universal practical reasoning. Their commonsense knowledge is embodied in their language. Thus, not only does social life depend...

متن کامل

"blind" Speech Segmentation: Automatic Segmentation of Speech without Linguistic Knowledge

A new automatic speech segmentation procedure, called the \Blind" speech segmentation, is presented. This procedure allows a speech sample to be segmented into sub-word units without the knowledge of any linguistic information (such as, orthographic or phonetic transcription). Hence, this procedure involves nding the optimal number of sub-word segments in the given speech sample, before locatin...

متن کامل

Participant Subjectivity and Involvement as a Basis for Discourse Segmentation

We propose a framework for analyzing episodic conversational activities in terms of expressed relationships between the participants and utterance content. We test the hypothesis that linguistic features which express such properties, e.g. tense, aspect, and person deixis, are a useful basis for automatic intentional discourse segmentation. We present a novel algorithm and test our hypothesis o...

متن کامل

Incorporating linguistic knowledge into automatic dialect identification of Spanish

Automatic dialect identification, like automatic language identification , has often been approached through the use of phonetic frequencies and phonetic sequence modeling. While such statistical systems perform well on language identification problems, they are less adept at the more difficult problem of automatic dialect identification, particularly on short segments of speech. In this paper ...

متن کامل

A prosodically labeled database of spontaneous speech

This paper describes a prosodically labeled database of conversational speech, representing a subset of the Switchboard and Callhome corpora. The prosodic transcription system is a simplification of the ToBI system aimed at phenomena that would be most useful for automatic transcription and linguistic analysis of conversational speech. The transcription method and a distributional analysis of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996