Building a Turkish ASR system with minimal resources

نویسندگان

  • Arianna Bisazza
  • Roberto Gretter
چکیده

We present an open-vocabulary Turkish news transcription system built with almost no language-specific resources. Our acoustic models are bootstrapped from those of a well trained source language (Italian), without using any Turkish transcribed data. For language modeling, we apply unsupervised word segmentation induced with a state-of-the-art technique (Creutz and Lagus, 2005) and we introduce a novel method to lexicalize suffixes and to recover their surface form in context without need of a morphological analyzer. Encouraging results obtained on a small test set are presented and discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model

One of the key challenges involved in building statistical automatic speech recognition (ASR) systems is modeling the relationship between subword units or “lexical units” and acoustic feature observations. To model this relationship two types of resources are needed, namely, acoustic resources i.e., speech data with word level transcriptions and lexical resources where each word is transcribed...

متن کامل

Towards Turkish ASR: Anatomy of a rule-based Turkish g2p

This paper describes the architecture and implementation of a rule-based grapheme to phoneme converter for Turkish. The system accepts surface form as input, outputs SAMPA mapping of the all parallel pronounciations according to the morphological analysis together with stress positions. The system has been implemented in Python.

متن کامل

Using resources from a closely-related language to develop ASR for a very under-resourced language: a case study for iban

This paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approaches which exploit resources from a closely-related language (Malay). We developed a semi-supervise...

متن کامل

Incidence of cancer in the Turkish Republic of Northern Cyprus.

BACKGROUND/AIM This study analyzed the incidence, trends, and common types of cancer in the Turkish Republic of Northern Cyprus (TRNC). MATERIALS AND METHODS This study is based on data collected from the office of the North Cyprus Cancer Registry, Ministry of Health, for 2007-2012. Data were arranged on the basis of age group, sex, and cancer site. Age standardized incidence rates (ASRs) wer...

متن کامل

The Green Future: Architecture + Sustainability; Green Architecture and Impacts of it on Urban Planning and Urban Design

Green architecture, or green design, is an approach to building that minimizes harmful effects on human health and the environment. The “green” architect or designer attempts to safeguard air, water, and earth by choosing eco-friendly building materials and construction practices. So, green architecture is Building and structure design philosophy that aims at minimal use of non-renewable and/or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012