Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions: Supplementary Material
نویسندگان
چکیده
Prepositions in Image Descriptions: Supplementary Material Arnau Ramisa* Josiah Wang* Ying Lu Emmanuel Dellandrea Francesc Moreno-Noguer Robert Gaizauskas 1 Institut de Robòtica i Informàtica Industrial (UPC-CSIC), Barcelona, Spain 2 Department of Computer Science, University of Sheffield, UK 3 LIRIS, École Centrale de Lyon, France {aramisa, fmoreno}@iri.upc.edu {j.k.wang, r.gaizauskas}@sheffield.ac.uk {ying.lu, emmanuel.dellandrea}@ec-lyon.fr
منابع مشابه
Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions
We investigate the role that geometric, textual and visual features play in the task of predicting a preposition that links two visual entities depicted in an image. The task is an important part of the subsequent process of generating image descriptions. We explore the prediction of prepositions for a pair of entities, both in the case when the labels of such entities are known and unknown. In...
متن کاملNatural Language Descriptions for Human Activities in Video Streams
There has been continuous growth in the volume and ubiquity of video material. It has become essential to define video semantics in order to aid the searchability and retrieval of this data. We present a framework that produces textual descriptions of video, based on the visual semantic content. Detected action classes rendered as verbs, participant objects converted to noun phrases, visual pro...
متن کاملGenerating Descriptions of Spatial Relations between Objects in Images
We investigate the task of predicting prepositions that can be used to describe the spatial relationships between pairs of objects depicted in images. We explore the extent to which such spatial prepositions can be predicted from (a) language information, (b) visual information, and (c) combinations of the two. In this paper we describe the dataset of object pairs and prepositions we have creat...
متن کاملI Can Has Cheezburger? A Nonparanormal Approach to Combining Textual and Visual Information for Predicting and Generating Popular Meme Descriptions
The advent of social media has brought Internet memes, a unique social phenomenon, to the front stage of the Web. Embodied in the form of images with text descriptions, little do we know about the “language of memes”. In this paper, we statistically study the correlations among popular memes and their wordings, and generate meme descriptions from raw images. To do this, we take a multimodal app...
متن کاملTags Re-ranking Using Multi-level Features in Automatic Image Annotation
Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...
متن کامل