Confidence measure based language identification
نویسندگان
چکیده
In this paper we present a new application for confidence measures in spoken language processing. In today’s computerized dialogue systems, language identification (LID) is typically achieved via dedicated modules. In our approach, LID is integrated into the speech recognizer, therefore profiting from high-level linguistic knowledge at very little extra cost. Our new approach is based on a word lattice based confidence measure [3], which was originally devised for unsupervised training. In this work, we show that the confidence based language identification algorithm outperforms conventional score based methods. Also, this method is less dependent on the acoustic characteristics of the transmission channel than score based methods. By introducing additional parameters, unknown languages can be rejected. The proposed method is compared to a score based approach on the Verbmobil database, a three language task.
منابع مشابه
Language Identification With Confidence Limits
A statistical classification algorithm and its application to language identification from noisy input are described. The main innovation is to compute confidence limits on the classification, so that the algorithm terminates when enough evidence to make a clear decision has been made, and so avoiding problems with categories that have similar characteristics. A second application, to genre ide...
متن کاملOffline Language-free Writer Identification based on Speeded-up Robust Features
This article proposes offline language-free writer identification based on speeded-up robust features (SURF), goes through training, enrollment, and identification stages. In all stages, an isotropic Box filter is first used to segment the handwritten text image into word regions (WRs). Then, the SURF descriptors (SUDs) of word region and the corresponding scales and orientations (SOs) are extr...
متن کاملIntegrating Complementary Features with a Confidence Measure for Speaker Identification
This paper investigates the effectiveness of integrating complementary acoustic features for improved speaker identification performance. The complementary contributions of two acoustic features, i.e. the conventional vocal tract related features MFCC and the recently proposed vocal source related features WOCOR, for speaker identification are studied. An integrating system, which performs a sc...
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملNew Confidence Measures for Statistical Machine Translation
A confidence measure is able to estimate the reliability of an hypothesis provided by a machine translation system. The problem of confidence measure can be seen as a process of testing : we want to decide whether the most probable sequence of words provided by the machine translation system is correct or not. In the following we describe several original word-level confidence measures for mach...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000