captioning order

Social Image Captioning: Exploring Visual Attention and User Attention

2018

Leiquan Wang Xiaoliang Chu Weishan Zhang Yiwei Wei Weichen Sun Chunlei Wu

Image captioning with a natural language has been an emerging trend. However, the social image, associated with a set of user-contributed tags, has been rarely investigated for a similar task. The user-contributed tags, which could reflect the user attention, have been neglected in conventional image captioning. Most existing image captioning models cannot be applied directly to social image ca...

متن کامل

Changes on the Horizon for the Multimedia Community

Journal: :IEEE MultiMedia 2017

Yong Rui

The Impact of Deep Learning The development of AI algorithms, represented by deep learning, has bolstered multimedia research. In particular, deep learning has led to a multimodality-based algorithm framework, enabling the effective fusion and use of cross-domain data. Take image and video captioning, for example. A couple of years ago, tagging was the only way to describe images and videos. Bu...

متن کامل

Speech recognition with a seamlessly updated language model for real-time closed-captioning

2010

Toru Imai Shinichi Homma Akio Kobayashi Takahiro Oku Shoei Sato

It is desirable to consistently and seamlessly update a language model of speech recognition without stopping it for online applications such as real-time closed-captioning. This paper proposes a novel speech recognition system that enables the model to be updated at any time even while it is running. It can run the second decoder with the latest model in parallel, and their priority that must ...

متن کامل

End-to-End Video Captioning with Multitask Reinforcement Learning

2018

Lijun Li Boqing Gong

Although end-to-end (E2E) learning has led to promising performance on a variety of tasks, it is often impeded by hardware constraints (e.g., GPU memories) and is prone to overfitting. When it comes to video captioning, one of the most challenging benchmark tasks in computer vision and machine learning, those limitations of E2E learning are especially amplified by the fact that both the input v...

متن کامل

Automatic Video Captioning using Deep Neural Network

2017

Thang Huy Nguyen Raymond Ptucha

Video understanding has become increasingly important as surveillance, social, and informational videos weave themselves into our everyday lives. Video captioning offers a simple way to summarize, index, and search the data. Most video captioning models utilize a video encoder and captioning decoder framework. Hierarchical encoders can abstractly capture clip level temporal features to represen...

متن کامل

End-to-End Dense Video Captioning with Masked Transformer

2018

Luowei Zhou Yingbo Zhou Jason J. Corso Richard Socher Caiming Xiong

Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevent...

متن کامل

Captioning of Live TV Commentaries from the Olympic Games in Sochi: Some Interesting Insights

2014

Josef V. Psutka Ales Prazák Josef Psutka Vlasta Radová

In this paper, we describe our effort and some interesting insights obtained during captioning more than 70 hours of live TV broadcasts from the Olympic Games in Sochi. The closed captioning was prepared for ČT Sport, the sport channel of the public service broadcaster in the Czech Republic. We will briefly discuss our solution for distributed captioning architecture on live TV programs using r...

متن کامل

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Journal: :CoRR 2017

Marcella Cornia Lorenzo Baraldi Giuseppe Serra Rita Cucchiara

Image captioning has been recently gaining a lot of attention thanks to the impressive achievements shown by deep captioning architectures, which combine Convolutional Neural Networks to extract image representations, and Recurrent Neural Networks to generate the corresponding captions. At the same time, a significant research effort has been dedicated to the development of saliency prediction ...

متن کامل

Online TV Captioning of Czech Parliamentary Sessions

2010

Jan Trmal Ales Prazák Zdenek Loose Josef Psutka

In the paper we introduce the on-line captioning system developed by our teams and used by the Czech Television (CTV), the public service broadcaster in the Czech Republic. The research project is targeted at incorporation of speech technologies into the CTV environment. One of the key missions is the development of captioning system supporting captioning of a “live” acoustic track. It can be e...

متن کامل

Guided Open Vocabulary Image Captioning with Constrained Beam Search

2017

Peter Anderson Basura Fernando Mark Johnson Stephen Gould

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-tr...

متن کامل