The THISL Spoken Document Retrieval System
نویسندگان
چکیده
THISL is an ESPRIT Long Term Research Project focused the development and construction of a system to items from an archive of television and radio news broadcasts. In this paper we outline our spoken document retrieval system based on the ABBOT speech recognizer and a text retrieval system based on Okapi term-weighting . The system has been evaluated as part of the TREC-6 and TREC-7 spoken document retrieval evaluations and we report on the results of the TREC-7 evaluation based on a document collection of 100 hours of North American broadcast news.
منابع مشابه
The Thisl SDR System at TREC-9
This paper describes our participation in the TREC-9 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of a realtime version of a hybrid connectionist/HMM large vocabulary speech recognition system and a probabilistic text retrieval system. This paper describes the configuration of the speech recognition and text retrieval systems, including segmentation and query expansion. ...
متن کاملRecognition, indexing and retrieval of british broadcast news with the THISL system
This paper described the THISL spoken document retrieval system for British and North American Broadcast News. The system is based on the ABBOT large vocabulary speech recognizer and a probabilistic text retrieval system. We discuss the development of a realtime British English Broadcast News system, and its integration into a spoken document retrieval system. Detailed evaluation is performed u...
متن کاملRetrieval Of Broadcast News Documents With the THISL System
This paper describes the THISL system that participated in the TREC-7 evaluation, Spoken Document Retrieval (SDR) Track, and presents the results obtained, together with some analysis. The THISL system is based on the ABBOT speech recognition system and the thislIR text retrieval system. In this evaluation we were concerned with investigating the suitability for SDR of a recognizer running at l...
متن کاملA Hybrid Approach to Spoken Query Processing in Document Retrieval System
In the context of the THISL spoken document retrieval system, we present a hybrid approach to spoken query processing, which enables to increase recognition rates and to extract relevant information for the application. The query processing is distributed between grammar and language model, based on the assumption that a query can be decomposed in two relatively independent parts; the addressin...
متن کاملThematic indexing of spoken documents by using self-organizing maps
A method is presented to provide a useful searchable index for spoken audio documents. The task diiers from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take advantage of the large size of the database and select the best index terms for each document with th...
متن کامل