Multi-speaker meeting audio segmentation

نویسندگان

  • Tin Lay Nwe
  • Minghui Dong
  • Swe Zin Kalayar Khine
  • Haizhou Li
چکیده

This paper presents segmentation of multi-speaker meeting audio into four different classes: local speech, crosstalk, overlapped speech and non-speech sounds. Firstly, Bayesian Information Criterion (BIC) segmentation method is used to pre-segment the meeting according to speaker changing points. Then, harmonicity information is integrated into acoustic features to differentiate speech from non-speech audio segments. We use cascaded subband filters spread in pitch and harmonic frequency scales to characterize the harmonicity information. Finally, total energy and multi-pitch tracking algorithm are used to classify speech segments into local speech, overlapped speech and crosstalk audio types. Experiments conducted on subset of ICSI meeting corpus shown promising results in classifying four audio types.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker segmentation and clustering in meetings

This paper describes the issue of automatic speaker segmentation and clustering for natural, multi-speaker meeting conversations. Two systems were developed and evaluated in the NIST RT-04S Meeting Recognition Evaluation, the Multiple Distant Microphone (MDM) system and the Individual Headset Microphone (IHM) system. The MDM system achieved a speaker diarization performance of 28.17%. This syst...

متن کامل

The Nist 2004 Spring Rich Transcription Evaluation: Two-axis Merging Strategy in the Context of Multiple Distant Microphone Based Meeting Speaker Segmentation

This paper presents the ELISA speaker segmentation approach applied on multiple audio channel meeting recordings in the framework of NIST RT’04s meeting (spring) evaluation campaign. As done for BN data speaker segmentation, the ELISA “meeting” system involves two speaker segmentation systems developed individually by the CLIPS and LIA laboratories. The main originality consists in a “two-axis”...

متن کامل

Towards Robust Speech Acquisition using Sensor Arrays

An integrated system approach was developed to address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array processing techniques have presented a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. These techniques relied on accurate sp...

متن کامل

Varying Microphone Patterns for Meeting Speech Segmentation Using Spatial Audio Cues

Meetings, common to many business environments, generally involve stationary participants. Thus, participant location information can be used to segment meeting speech recordings into each speaker’s ‘turn’. The authors’ previous work proposed the use of spatial audio cues to represent the speaker locations. This paper studies the validity of using spatial audio cues for meeting speech segmentat...

متن کامل

Robust Unsupervised Speaker Segmentation for Audio Diarization

Audio diarization Reynolds & Carrasquillo (2005) is the process of partitioning an input audio stream into homogeneous regions according to their specific audio sources. These sources can include audio type (speech, music, background noise, ect.), speaker identity and channel characteristics. With the continually increasing number of larges volumes of spoken documents including broadcasts, voic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008