نتایج جستجو برای: text similarity

تعداد نتایج: 268086  

2011
Chee Wee Leong Rada Mihalcea

Measures of similarity have traditionally focused on computing the semantic relatedness between pairs of words and texts. In this paper, we construct an evaluation framework to quantify cross-modal semantic relationships that exist between arbitrary pairs of words and images. We study the effectiveness of a corpus-based approach to automatically derive the semantic relatedness between words and...

2016
Vladislav Kubon Markéta Lopatková Tomás Hercig

This paper compares various means of measuring of word order freedom applied to data from syntactically annotated corpora for 23 languages. The corpora are part of the HamleDT project, the word order statistics are relative frequencies of all word order combinations of subject, predicate and object both in main and subordinated clauses. The measures include Euclidean distance, max-min distance,...

2014
Peter Jansen Mihai Surdeanu Peter Clark

We propose a robust answer reranking model for non-factoid questions that integrates lexical semantics with discourse information, driven by two representations of discourse: a shallow representation centered around discourse markers, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and ...

2003
Sofie Van Gijsel Carl Vogel

Techniques from corpus linguistics are applied to the analysis of a number of European right-wing parties in an effort to extend methods for ranking parties on a left-right spectrum within and across countries and languages. Focus is placed on parties not in government, and analysis is derived from corpora derived from election manifestos published by those parties. The techniques applied are o...

2017
Amrith Krishna Pavankumar Satuluri Harshavardhan Ponnada Muneeb Ahmed Gulab Arora Kaustubh Hiware Pawan Goyal

Derivational nouns are widely used in Sanskrit corpora and is a prevalent means of productivity in the language. Currently there exists no analyser that identifies the derivational nouns. We propose a semi supervised approach for identification of derivational nouns in Sanskrit. We not only identify the derivational words, but also link them to their corresponding source words. The novelty of o...

2004
Nuno Seco Tony Veale Jer Hayes

Information Content (IC) is an important dimension of word knowledge when assessing the similarity of two terms or word senses. The conventional way of measuring the IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with statistics on their actual usage in text as derived from a large corpus. In this paper we present a wholly intrinsic measu...

Journal: :CoRR 2017
Ayushman Dash John Cristian Borges Gamboa Sheraz Ahmed Marcus Liwicki Muhammad Zeshan Afzal

In this work, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network (GAN) for synthesizing images from their text descriptions. Former approaches have tried to condition the generative process on the textual data; but allying it to the usage of class information, known to diversify the generated samples and ...

2010
Geetu Ambwani Anthony Davis

We present a representation of documents as directed, weighted graphs, modeling the range of influence of terms within the document as well as contextually determined semantic relatedness among terms. We then show the usefulness of this kind of representation in topic segmentation. Our boundary detection algorithm uses this graph to determine topical coherence and potential topic shifts, and do...

2011
Silvia Necsulescu

The present work constitutes a PhD project that aims to overcome the problem caused by data sparsity in the task of acquisition of lexical resources. In any corpus of any length, many words are infrequent, thus they co-occur with a small set of words. Nevertheless, they can co-occur with many other words. Our goal is to discover some more possible co-occurring words for low-frequent words relyi...

2013
Abdellah Fourtassi Emmanuel Dupoux

Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We sho...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید