Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval
نویسندگان
چکیده
This paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of AoE-IT Multimedia Repository. We have also developed the Multimedia Markup Language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese syllable recognizer. Our investigation indicates that there is a large discrepancy in recognition performance, i.e. dropping from 59% to 39% in syllable accuracy (and corresponding reliability in audio indexing), as we move from anchor speech recorded in the studio to reporter/interview speech recorded in the field. Hence we present several automatic methods to extract anchor/studio speech from the audio tracks for retrieval: (i) extraction based only on video information using a fuzzy c-means algorithm; (ii) extraction based only on audio information using Gaussian Mixture Models; and (iii) a fusion strategy that combines videoand audio-based extraction. This paper presents the performance of various extraction techniques and the related retrieval performance in a known-item spoken document retrieval task.
منابع مشابه
Spoken Document Retrieval and Summarization
Huge, continually increasing quantities of multimedia content including speech information are filling up our computers, networks and lives. It is obvious that speech is one of the most important sources of information for multimedia content, as it is the speech of the content that tells us of the subjects, topics and concepts. As a result, the associated spoken documents of the multimedia cont...
متن کاملAudio Hot Spotting And Retrieval Using Multiple Features
This paper reports our on-going efforts to exploit multiple features derived from an audio stream using source material such as broadcast news, teleconferences, and meetings. These features are derived from algorithms including automatic speech recognition, automatic speech indexing, speaker identification, prosodic and audio feature extraction. We describe our research prototype – the Audio Ho...
متن کاملa Spoken Document Retrieval Application in the Oral History Domain
The application of automatic speech recognition in the broadcast news domain is well studied. Recognition performance is generally high and accordingly, spoken document retrieval can successfully be applied in this domain, as demonstrated by a number of commercial systems. In other domains, a similar recognition performance is hard to obtain, or even far out of reach, for example due to lack of...
متن کاملA robust fusion method for multilingual spoken document retrieval systems employing tiered resources
In this study, we present two novel fusion approaches to merge subword and word based retrieval methods within a multilingual spoken document retrieval (SDR) system. Considering the fact that more than 6000 languages are spoken in the world today, resources (e.g., text and audio data, pronunciation lexicon) needed to develop Automatic Speech Recognition (ASR) systems for such a range of languag...
متن کاملAutomatic Spoken Document Processing for Retrieval and Browsing
Ever increasing computing power and connectivity bandwidth together with falling storage costs is resulting in overwhelming amounts of multimedia data being produced, exchanged, and stored. One key application area in this realm is the search and retrieval of spoken audio documents. As storage becomes cheaper, the availability and usefulness of large collections of spoken documents is limited s...
متن کامل