A Segment-Based Automatic Language Identification System
نویسندگان
چکیده
We have developed a four-language automatic language identification system for high-quality speech. The system uses a neural network-based segmentation algorithm to segment speech into seven broad phonetic categories. Phonetic and prosodic features computed on these categories are then input to a second network that performs the language classification. The system was trained and tested on separate sets of speakers of Ameri-can English, Japanese, Mandarin Chinese and Tamil. It currently performs with an accuracy of 89.5% on the utterances of the test set.
منابع مشابه
Automatic Language Identification Using a Segment - Based Approach 1
A segment-based Automatic Language Identi cation (ALI) system has been developed. The system was designed around a formal probabilistic framework. This framework forms the basis for investigating the ALI approach proposed by House and Neuburg which utilizes phonotactic constraints of languages. The system incorporates di erent components which model the phonotactic, prosodic, and acoustic prope...
متن کاملAutomatic language identification using a segment-based approach
Automatic Language Identification (ALI) is the problem of automatically identifying the language of an utterance through the use of a computer. In 1977, House and Neuburg proposed an approach to ALI which focused on the phonotactic constraints of different languages. Their work suggested that simple language models could be used effectively for language identification if an accurate phonetic re...
متن کاملOffline Language-free Writer Identification based on Speeded-up Robust Features
This article proposes offline language-free writer identification based on speeded-up robust features (SURF), goes through training, enrollment, and identification stages. In all stages, an isotropic Box filter is first used to segment the handwritten text image into word regions (WRs). Then, the SURF descriptors (SUDs) of word region and the corresponding scales and orientations (SOs) are extr...
متن کاملRecent improvements in an approach to segment-based automatic language identification
In 1993, a segment-based system for Automatic Language Identi cation (ALI) was developed and introduced. The system incorporates phonetic, acoustic, and prosodic information within a probabilistic framework. The original system was trained and tested using the OGI MultiLanguage Telephone Speech Corpus and achieved an accuracy of 57.3% in identifying the language of test utterances from the OGI ...
متن کاملبرچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کامل