نتایج جستجو برای: captioning order
تعداد نتایج: 908879 فیلتر نتایج به سال:
Recently, attention-based image captioning models, which are expected to ground correct regions for proper word generations, have achieved remarkable performance. However, some researchers argued “deviated focus” problem of existing attention mechanisms in determining the effective and influential features. In this paper, we present A2 - an attention-aligned Transformer captioning, guides learn...
Recent years have witnessed the rapid progress of image captioning. However, demands for large memory storage and heavy computational burden prevent these captioning models from being deployed on mobile devices. The main obstacles lie in heavyweight visual feature extractors (i.e., object detectors) complicated cross-modal fusion networks. To this end, we propose LightCap, a lightweight caption...
In this paper, we introduce the CapERA dataset, which upgrades Event Recognition in Aerial Videos (ERA) dataset to aerial video captioning. The newly proposed aims advance visual–language-understanding tasks for UAV videos by providing each with diverse textual descriptions. To build 2864 are manually annotated a caption that includes information such as main event, object, place, action, numbe...
The Transformer-based approach represents the state-of-the-art in image captioning. However, existing studies have shown Transformer has a problem that irrelevant tokens with overlapping neighbors incorrectly attend to each other relatively large attention scores. We believe this limitation is due incompleteness of Self-Attention Network (SAN) and Feed-Forward (FFN). To solve problem, we presen...
It is encouraged to see that progress has been made bridge videos and natural language. However, mainstream video captioning methods suffer from slow inference speed due the sequential manner of autoregressive decoding, prefer generating generic descriptions insufficient training visual words (e.g., nouns verbs) inadequate decoding paradigm. In this paper, we propose a non-autoregressive based ...
• A diverse captioning model of full convolution design is proposed. We develop a new evaluation metric to assess the sentence diversity. Our method achieves superior performance compared state-of-the-art benchmarks. Automatically describing video content with text description challenging but important task, which has been attracting lot attention in computer vision community. Previous works ma...
Image paragraph captioning aims to automatically generate a from given image. It is an extension of image in terms generating multiple sentences instead single one, and it more challenging because paragraphs are longer, informative, linguistically complicated. Because consists several sentences, effective method should consistent rather than contradictory ones. still open question how achieve t...
As cross-domain research combining computer vision and natural language processing, the current image captioning mainly considers how to improve visual features; less attention has been paid utilizing inherent properties of boost performance. Facing this challenge, we proposed a textual mechanism, which can obtain semantic relevance between words by scanning all generated words. The retrospect ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید