Towards speech-to-text translation without speech recognition
نویسندگان
چکیده
We explore the problem of translating speech to text in low-resource scenarios where neither automatic speech recognition (ASR) nor machine translation (MT) are available, but we have training data in the form of audio paired with text translations. We present the first system for this problem applied to a realistic multi-speaker dataset, the CALLHOME Spanish-English speech translation corpus. Our approach uses unsupervised term discovery (UTD) to cluster repeated patterns in the audio, creating a pseudotext, which we pair with translations to create a parallel text and train a simple bag-of-words MT model. We identify the challenges faced by the system, finding that the difficulty of cross-speaker UTD results in low recall, but that our system is still able to correctly translate some content words in test data.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملA New Approach to Speech-Input Statistical Translation
The statistical pattern recognition is a promising framework for text-to-text translation. However, a natural extension to speech-input translation is not straightforward. In this paper, we present a method to deal with the speech input statistical translation problem that could be considered as a step towards a fully integrated recognition-translation procedure. In this version a word graph wa...
متن کاملTowards real-time multilingual multimodal speech-to-speech translation
Speech-to-speech translation technology enables natural oral communication between different language speaking people. Many research projects have addressed speech-to-speech translation (S2ST) technology, such as ATR [1], VERBMOBIL [2], C-STAR [3], NESPOLE! [4], BABYLON [5], GALE [6], and EU-bridge [7]. The speechto-speech translation system is normally composed of automatic speech recognition ...
متن کاملThe Effect of Private Speech and Self-Regulation on Translation Quality among Iranian Translation Students: A Mixed-Methods Study
The current study presents findings from a mixed-methods study of investigating the self-regulatory role of private speech (self-talk) on students’ translation quality. The aim of the study was to validate the adapted version of a self-verbalization questionnaire. The construct validity and reliability of the scale were supported by the CFA which revealed that all items reached the acceptable f...
متن کاملListen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation
Current speech translation systems integrate (loosely or closely) two main modules: source language speech recognition (ASR) and source-to-target text translation (MT). In these approaches, source language text transcript (as a sequence or as a graph) appears as mandatory to produce a text hypothesis in the target language. In the meantime, deep neural networks have yielded breakthroughs in dif...
متن کامل