Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

نویسندگان

چکیده

Unifying text detection and recognition in an end-to-end training fashion has become a new trend for reading the wild, as these two tasks are highly relevant complementary. In this paper, we investigate problem of scene spotting, which aims at simultaneous natural images. An trainable neural network named Mask TextSpotter is presented. Different from previous spotters that follow pipeline consisting proposal generation sequence-to-sequence network, enjoys simple smooth learning procedure, both can be achieved directly two-dimensional space via semantic segmentation. Further, spatial attention module proposed to enhance performance universality. Benefiting representation on recognition, it easily handles instances irregular shapes, instance, curved text. We evaluate four English datasets one multi-language dataset, achieving consistently superior over state-of-the-art methods tasks. Moreover, further our method separately, significantly outperforms regular recognition.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog

We present a novel end-to-end trainable neural network model for task-oriented dialog systems. The model is able to track dialog state, issue API calls to knowledge base (KB), and incorporate structured KB query results into system responses to successfully complete task-oriented dialogs. The proposed model produces well-structured system responses by jointly learning belief tracking and KB res...

متن کامل

Fast End-to-End Trainable Guided Filter

Image processing and pixel-wise dense prediction have been advanced by harnessing the capabilities of deep learning. One central issue of deep learning is the limited capacity to handle joint upsampling. We present a deep learning building block for joint upsampling, namely guided filtering layer. This layer aims at efficiently generating the highresolution output given the corresponding low-re...

متن کامل

End-to-End Trainable Attentive Decoder for Hierarchical Entity Classification

We address fine-grained entity classification and propose a novel attention-based recurrent neural network (RNN) encoderdecoder that generates paths in the type hierarchy and can be trained end-to-end. We show that our model performs better on fine-grained entity classification than prior work that relies on flat or local classifiers that do not directly model hierarchical structure.

متن کامل

Supervised Hashing with End-to-End Binary Deep Neural Network

Image hashing is a popular technique applied to large scale content-based visual retrieval due to its compact and efficient binary codes. Our work proposes a new end-to-end deep network architecture for supervised hashing which directly learns binary codes from input images and maintains good properties over binary codes such as similarity preservation, independence, and balancing. Furthermore,...

متن کامل

End-to-End Deep Neural Network for Automatic Speech Recognition

We investigate the efficacy of deep neural networks on speech recognition. Specifically, we implement an end-to-end deep learning system that utilizes mel-filter bank features to directly output to spoken phonemes without the need of a traditional Hidden Markov Model for decoding. The system will comprise of two variants of neural networks for phoneme recognition. In particular, we utilize conv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Pattern Analysis and Machine Intelligence

سال: 2021

ISSN: ['1939-3539', '2160-9292', '0162-8828']

DOI: https://doi.org/10.1109/tpami.2019.2937086