Collective Generation of Natural Image Descriptions
نویسندگان
چکیده
We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines.
منابع مشابه
Improvement of generative adversarial networks for automatic text-to-image generation
This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...
متن کاملGenerating Image Descriptions with Gold Standard Visual Inputs: Motivation, Evaluation and Baselines
In this paper, we present the task of generating image descriptions with gold standard visual detections as input, rather than directly from an image. This allows the Natural Language Generation community to focus on the text generation process, rather than dealing with the noise and complications arising from the visual detection process. We propose a fine-grained evaluation metric specificall...
متن کاملGenetic Structure of Wheat (Triticum aestivum L.) Grain Characteristics by Using Image Processing and Generation Mean Analysis Techniques
Wheat (Triticum aestivum L.) is known to be the world-leading cereal grain and the most important food in the world of agriculture. Wheat offers a great wealth of material for genetic studies due to its wide ecological distribution and host of variation for various morphological and physiological characters. To evaluate the genetic control of physical traits of grain in two crosses of winter ...
متن کاملComposing Simple Image Descriptions using Web-scale N-grams
Studying natural language, and especially how people describe the world around them can help us better understand the visual world. In turn, it can also help us in the quest to generate natural language that describes this world in a human manner. We present a simple yet effective approach to automatically compose image descriptions given computer vision based inputs and using web-scale n-grams...
متن کاملMidge: Generating Image Descriptions From Computer Vision Detections
This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation syst...
متن کامل