Same news is good news: automatically collecting reoccurring radio news stories

نویسندگان

  • Stefan Rapp
  • Grzegorz Dogil
چکیده

We present methods for finding same or almost same news stories in the hourly radio news broadcasts. Our procedures are able to detect reoccuring news stories of subsequent news broadcasts spoken by the same or different announcers only from the speech signal. They allow to establish a large database of repeated and professionally read speech at low costs that is especially interesting for prosody research, but also, e.g., for concept-to-speech and socio-linguistic studies. An automatically recorded complete radio news broadcast is first segmented into individual news stories using HMM recognition. Then, the word sequence estimates of the stories are either compared directly (naive method) or realigned with the signal of other stories (realignment method) in order to find out which stories were read before and which not. Both methods can be further improved by computing “meta distances” that also take into account distances to other stories. We evaluate and compare the usefulness of the proposed methods on real life data. We find that the realignment method combined with meta distances is the most reliable of the methods and that it is well suited for the task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Radio : Content Filtering and Delivery for Broadcast Audio

Synthetic News Radio uses automatic speech recognition and clustered text news stories to automatically find story boundaries in an audio news broadcast, and it creates semantic representations that can match stories of similar content through audio-based queries. Current speech recognition technology cannot by itself produce enough information to accurately characterize news audio; therefore, ...

متن کامل

The Físchlár-News-Stories System: Personalised Access to an Archive of TV News

The “Físchlár” systems are a family of tools for capturing, analysis, indexing, browsing, searching and summarisation of digital video information. Físchlár-News-Stories, described in this paper, is one of those systems, and provides access to a growing archive of broadcast TV news. Físchlár-News-Stories has several notable features including the fact that it automatically records TV news and s...

متن کامل

NewsViz: Emotional Visualization of News Stories

The NewsViz system aims to enhance news reading experiences by integrating 30 seconds long Flash-animations into news article web pages depicting their content and emotional aspects. NewsViz interprets football match news texts automatically and creates abstract 2D visualizations. The user interface enables animators to further refine the animations. Here, we focus on the emotion extraction com...

متن کامل

Implications of News Segments and Movies for Enhancing Listening Comprehension of Language Learners

Abstract Armed with technological development, the present study aimed at gauging the effectiveness of exposure to news and movies as two types of audiovisual programs in improving language learners’ listening comprehension at the intermediate level. To this end, a listening comprehension test was administered to 108 language learners and finally 60 language learners were selected as intermedia...

متن کامل

Browsing System

In this demo, we present a system we have developed for automatic broadcast-quality video indexing that successfully combines results from the fields of speaker verification, acoustic analysis, very large vocabulary caption character recognition, content based sampling of video, information retrieval, dialogue systems, and ASF media delivery over IP. The prototype system of this demo is availab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998