Interlingual map task corpus collection

نویسندگان

  • Hayakawa Akira
  • Nick Campbell
  • Saturnino Luz
چکیده

We present a prototype interlingual communication system that is being used to collect a corpus of task based dialogues between speakers of different languages. This corpus will be used to assess human reactions to an automated speech-to-speech translation system. In this demonstration we show how the HCRC Map Task can be adapted to support data collection in this interlingual environment, and how we used easily accessible speech and language technology for the rapid prototyping of the system used for data collection. An explanation of the nature and purpose of the data we are collecting is also presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ILMT-s2s Corpus ― A Multimodal Interlingual Map Task Corpus

This paper presents the multimodal Interlingual Map Task Corpus (ILMT-s2s corpus) collected at Trinity College Dublin, and discuss some of the issues related to the collection and analysis of the data. The corpus design is inspired by the HCRC Map Task Corpus which was initially designed to support the investigation of linguistic phenomena, and has been the focus of a variety of studies of comm...

متن کامل

Strategies Used in the Translation of Interlingual Subtitling

This study was an attempt to identify the interlingual strategies employed to translate English subtitles into Persian and to determine their frequency, as well. Contrary to many countries, subtitling is a new field in Iran. The study, a corpus-based, comparative, descriptive, non-judgmental analysis of an English-Persian parallel corpus, comprised English audio scripts of five movies of differ...

متن کامل

Interlingual Annotation for MT Development

MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation ...

متن کامل

Exploiting the Leipzig Corpora Collection

In this paper the Leipzig Corpora Collection is introduced as a contribution to the idea that there is need for standardization of multilingual language resources. We explain the steps of building, processing and presenting corpora of comparable sizes and in a uniform format. Results from intraand interlingual comparisons of corpora are given and methods that can build upon these corpora

متن کامل

Using Wikipedia for Named-Entity Translation

In this paper we present a system for translating named-entities from Basque to English using Wikipedia’s knowledge. We can exploit interlingual links from Wikipedia (WIL) to get named-entity translation, but entities without interlingual links can be translated using the Wikipedia as a corpus, suggesting new interlingual links. In this second case the interlingual links can be used as a test c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014