text similarity

Measuring The Semantic Similarity Of Texts

2005

Courtney D. Corley Rada Mihalcea

This paper presents a knowledge-based method for measuring the semanticsimilarity of texts. While there is a large body of previous work focused on finding the semantic similarity of concepts and words, the application of these wordoriented methods to text similarity has not been yet explored. In this paper, we introduce a method that combines wordto-word similarity metrics into a text-totext m...

متن کامل

English-Persian Plagiarism Detection based on a Semantic Approach

Journal: Journal of Artificial Intelligence and Data Mining 2017

F. Safi-Esfahani, M.H. Nadimi-Shahraki, Sh. Rakian,

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

Analysis of the Effect of Distance Metric across Languages on Verse Similarity in the Qur'an

2016

Pan Huang Amna Basharat Khaled Rasheed

Text similarity measures have been widely studied and used in machine learning and information retrieval for many years. However, few applications of text similarity have dealt with multi-lingual translations of a specific document. Additionally, the growing number of texts with more translations being generated increases the challenge of distinguishing or identifying the similarity and differe...

متن کامل

Kohonen Networks with Graph-based Augmented Metrics

2005

Peter Andras Olusola Idowu

Correct and efficient text classification is a major challenge in today’s world of rapidly increasing amount of accessible electronic text data. Kohonen networks have been applied to document classification with comparable success to other document clustering methods. An important challenge is to devise text similarity metrics that can improve the performance of text classification Kohonen netw...

متن کامل

Document dissimilarity within and across languages: A benchmarking study

Journal: :LLC 2014

Richard S. Forsyth Serge Sharoff

Quantifying the similarity or dissimilarity between documents is an important task in authorship attribution, information retrieval, plagiarism detection, text mining, and many other areas of linguistic computing. Numerous similarity indices have been devised and used, but relatively little attention has been paid to calibrating such indices against externally imposed standards, mainly because ...

متن کامل

DKPro Similarity: An Open Source Framework for Text Similarity

2013

Daniel Bär Torsten Zesch Iryna Gurevych

We present DKPro Similarity, an open source framework for text similarity. Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. DKPro Similarity comprises a wide variety of measures ranging from ones based on simple n-grams and common subsequences to high-dimensional vector comparisons and structural, stylistic, and p...

متن کامل

Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads

2013

Emily Jamison Iryna Gurevych

Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balan...

متن کامل

Content Recommendation in APOSDLE using the Associative Network

Journal: :J. UCS 2010

Hermann Stern Rene Kaiser Philip Hofmair Peter Kraker Stefanie N. Lindstaedt

One of the success factors of Work Integrated Learning (WIL) is to provide the appropriate content to the users, both suitable for the topics they are currently working on, and their experience level in these topics. Our main contributions in this paper are (i) overcoming the problem of sparse content annotation by using a network based recommendation approach called Associative Network, which ...

متن کامل

A Study on the Role of Similarity Measures in Visual Text Analytics

2013

F. San Roman S. R. D. de Pinho Rosane Minghim Maria Cristina Ferreira de Oliveira

Text Analytics is essential for a large number of applications and good approaches to obtain visual mappings of text are paramount. Many visualization techniques, such as similarity based point placement layouts, have proved useful to support visual analysis of documents. However, they are sensitive to data quality, which, in turn, relies on a critical preprocessing step that involves text ‘cle...

متن کامل

Identification Semi-Automatique de Mots-Germes pour l'Analyse de Sentiments et son Intensité

2017

Amal Htait Sébastien Fournier Patrice Bellot

For the purpose of opinion exploring in tweets, this article presents a sentiment classification of tweets content. First, we present a method to identify new sentiment similarity seed words. These seed words are used for predicting sentiment intensity of other words and short phrases in co-occurrence. Then, for testing sentiment similarity, we use: Similarity Measures methods between words and...

متن کامل