Visual Text Correction

نویسندگان

  • Amir Mazaheri
  • Mubarak Shah
چکیده

Videos, images, and sentences are mediums that can express the same semantics. One can imagine a picture by reading a sentence or can describe a scene with some words. However, even small changes in a sentence can cause a significant semantic inconsistency with the corresponding video/image. For example, by changing the verb of a sentence, the meaning may drastically change. There have been many efforts to encode a video/sentence and decode it as a sentence/video. In this research, we study a new scenario in which both the sentence and the video are given, but the sentence is inaccurate. A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description. This paper introduces a new problem, called Visual Text Correction (VTC), i.e., finding and replacing an inaccurate word in the textual description of a video. We propose a deep network that can simultaneously detect an inaccuracy in a sentence, and fix it by replacing the inaccurate word(s). Our method leverages the semantic interdependence of videos and words, as well as the short-term and long-term relations of the words in a sentence. In our formulation, part of a visual feature vector for every single word is dynamically selected through a gating process. Furthermore, to train and evaluate our model, we propose an approach to automatically construct a large dataset for VTC problem. Our experiments and performance analysis demonstrates that the proposed method provides very good results and also highlights the general challenges in solving the VTC problem. To the best of our knowledge, this work is the first of its kind for the Visual Text Correction task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...

متن کامل

Chording with Spatial Mnemonics: Automatic Error Correction for Eyes-Free Text Entry

Chording is a technique that allows users to enter text without visual feedback. Traditional chording strategies are hampered by the substantial training effort required as users need to memorize chords. In this study an alternative chording scheme that uses spatial mnemonics to accelerate learning is proposed. Users mentally visualize the appearance of each character as a 3 × 3 pixel grid. The...

متن کامل

Comparative Approach to the Relationship Between Text and Hand Visual Language in Tahmasebi’s Shahnameh Pictures

The painters of Tahmasbi Shahnameh, in order to depict the text full of the story of Shahnameh, tried to convey emotions and excitement to the audience by using the visual language of the hand. Due to the multiplicity of applications of this type of nonverbal communication in different situations, the painter may have undergone changes in parts of her painting under the influence of various fac...

متن کامل

Semiotic Analysis of Written Signs in the Road Sign Systems of Tehran City

Introduction: as a component of the urban landscape, road sign systems are among the most critical elements of urban environments. Generally speaking, the written signs dominate the design of these systems. These signs can also foster aesthetic and visual pleasure compellingly and innovatively. Furthermore, they perpetuate a specific image in the minds of their observers. This research seeks to...

متن کامل

Visual Error Resolution Strategy for highly-structured text entry using Speech Recognition in FP6-ALLADIN project

Man-Machine Interaction using only speech input is not well received by users, even for high performance recognizers (WER of about 2%). In most free text dictation application, attaining users intention is more important than specific speech tools performance, and low transaction success rate results in user’s rejection to speech interfaces [6]. For highly-structured text entry, users will bett...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.01967  شماره 

صفحات  -

تاریخ انتشار 2018