text similarity

Measuring the semantic relatedness between words and images

2011

Chee Wee Leong Rada Mihalcea

Measures of similarity have traditionally focused on computing the semantic relatedness between pairs of words and texts. In this paper, we construct an evaluation framework to quantify cross-modal semantic relationships that exist between arbitrary pairs of words and images. We study the effectiveness of a corpus-based approach to automatically derive the semantic relatedness between words and...

متن کامل

Searching for a Measure of Word Order Freedom

2016

Vladislav Kubon Markéta Lopatková Tomás Hercig

This paper compares various means of measuring of word order freedom applied to data from syntactically annotated corpora for 23 languages. The corpora are part of the HamleDT project, the word order statistics are relative frequencies of all word order combinations of subject, predicate and object both in main and subordinated clauses. The measures include Euclidean distance, max-min distance,...

متن کامل

Discourse Complements Lexical Semantics for Non-factoid Answer Reranking

2014

Peter Jansen Mihai Surdeanu Peter Clark

We propose a robust answer reranking model for non-factoid questions that integrates lexical semantics with discourse information, driven by two representations of discourse: a shallow representation centered around discourse markers, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and ...

متن کامل

Inducing a cline from corpora of political manifestos

2003

Sofie Van Gijsel Carl Vogel

Techniques from corpus linguistics are applied to the analysis of a number of European right-wing parties in an effort to extend methods for ranking parties on a left-right spectrum within and across countries and languages. Focus is placed on parties not in government, and analysis is derived from corpora derived from election manifestos published by those parties. The techniques applied are o...

متن کامل

A Graph Based Semi-Supervised Approach for Analysis of Derivational Nouns in Sanskrit

2017

Amrith Krishna Pavankumar Satuluri Harshavardhan Ponnada Muneeb Ahmed Gulab Arora Kaustubh Hiware Pawan Goyal

Derivational nouns are widely used in Sanskrit corpora and is a prevalent means of productivity in the language. Currently there exists no analyser that identifies the derivational nouns. We propose a semi supervised approach for identification of derivational nouns in Sanskrit. We not only identify the derivational words, but also link them to their corresponding source words. The novelty of o...

متن کامل

An Intrinsic Information Content Metric for Semantic Similarity in WordNet

2004

Nuno Seco Tony Veale Jer Hayes

Information Content (IC) is an important dimension of word knowledge when assessing the similarity of two terms or word senses. The conventional way of measuring the IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with statistics on their actual usage in text as derived from a large corpus. In this paper we present a wholly intrinsic measu...

متن کامل

TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network

Journal: :CoRR 2017

Ayushman Dash John Cristian Borges Gamboa Sheraz Ahmed Marcus Liwicki Muhammad Zeshan Afzal

In this work, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network (GAN) for synthesizing images from their text descriptions. Former approaches have tried to condition the generative process on the textual data; but allying it to the usage of class information, known to diversify the generated samples and ...

متن کامل

Contextually-Mediated Semantic Similarity Graphs for Topic Segmentation

2010

Geetu Ambwani Anthony Davis

We present a representation of documents as directed, weighted graphs, modeling the range of influence of terms within the document as well as contextually determined semantic relatedness among terms. We then show the usefulness of this kind of representation in topic segmentation. Our boundary detection algorithm uses this graph to determine topical coherence and potential topic shifts, and do...

متن کامل

Automatic Acquisition of Possible Contexts for Low-Frequent Words

2011

Silvia Necsulescu

The present work constitutes a PhD project that aims to overcome the problem caused by data sparsity in the task of acquisition of lexical resources. In any corpus of any length, many words are infrequent, thus they co-occur with a small set of words. Nevertheless, they can co-occur with many other words. Our goal is to discover some more possible co-occurring words for low-frequent words relyi...

متن کامل

A corpus-based evaluation method for Distributional Semantic Models

2013

Abdellah Fourtassi Emmanuel Dupoux

Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We sho...

متن کامل