jaccard similarity coefficient

نتایج جستجو برای: jaccard similarity coefficient

تعداد نتایج: 274076 فیلتر نتایج به سال:

Set-Theoretic Alignment for Comparable Corpora

2016

Thierry Etchegoyhen Andoni Azpeitia

We describe and evaluate a simple method to extract parallel sentences from comparable corpora. The approach, termed STACC, is based on expanded lexical sets and the Jaccard similarity coefficient. We evaluate our system against state-of-theart methods on a large range of datasets in different domains, for ten language pairs, showing that it either matches or outperforms current methods across ...

متن کامل

Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L)

2004

Andréia da Silva Meyer Antonio Augusto Franco Garcia Anete Pereira de Souza Cláudio Lopes de Souza

The objective of this study was to evaluate whether different similarity coefficients used with dominant markers can influence the results of cluster analysis, using eighteen inbred lines of maize from two different populations, BR-105 and BR-106. These were analyzed by AFLP and RAPD markers and eight similarity coefficients were calculated: Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple-mat...

متن کامل

An Investigation on the Influence of Frequency on the Lexical Organization of Verbs

2010

Daniel German Aline Villavicencio Maity Siqueira

This work extends the study of Germann et al. (2010) in investigating the lexical organization of verbs. Particularly, we look at the influence of frequency on the process of lexical acquis ition and use. We examine data obtained from psycholinguistic action naming tasks performed by children and adults (speakers of Brazilian Portuguese), and analyze some characteristics of the verbs used by ea...

متن کامل

Metric Learning for Synonym Acquisition

2008

Nobuyuki Shimizu Masato Hagiwara Yasuhiro Ogawa Katsuhiko Toyama Hiroshi Nakagawa

The distance or similarity metric plays an important role in many natural language processing (NLP) tasks. Previous studies have demonstrated the effectiveness of a number of metrics such as the Jaccard coefficient, especially in synonym acquisition. While the existing metrics perform quite well, to further improve performance, we propose the use of a supervised machine learning algorithm that ...

متن کامل

Complex Neutrosophic Similarity Measures in Medical Diagnosis

2016

Kalyan Mondal Mumtaz Ali Surapati Pramanik Florentin Smarandache

This paper presents some similarity measures between complex neutrosophic sets. A complex neutrosophic set is a generalization of neutrosophic set whose complex-valued truth membership function, complex-valued indeterminacy membership function, and complex valued falsity membership functions are the combinations of realvalued truth amplitude term in association with phase term, real-valued inde...

متن کامل

Practical Applications of Locality Sensitive Hashing for Unstructured Data

2014

Working with large amounts of unstructured data (e.g., text documents) has become important for many business, engineering and scientific applications. The purpose of this article is to demonstrate how the practical Data Scientist can implement a Locality Sensitive Hashing system from start to finish in order to drastically reduce the time required to perform a similarity search in high dimensi...

متن کامل

Simple and Efficient Algorithm for Approximate Dictionary Matching

2010

Naoaki Okazaki Jun'ichi Tsujii

This paper presents a simple and efficient algorithm for approximate dictionary matching designed for similarity measures such as cosine, Dice, Jaccard, and overlap coefficients. We propose this algorithm, called CPMerge, for the τ overlap join of inverted lists. First we show that this task is solvable exactly by a τ -overlap join. Given inverted lists retrieved for a query, the algorithm coll...

متن کامل

Using structural information and citation evidence to detect significant plagiarism cases in scientific publications

Journal: :JASIST 2012

Salha Alzahrani Vasile Palade Naomie Salim Ajith Abraham

In plagiarism detection (PD) systems, two important problems should be considered: the problem of retrieving candidate documents that are globally similar to a document q under investigation, and the problem of side-by-side comparison of q and its candidates to pinpoint plagiarized fragments in detail. In this article, the authors investigate the usage of structural information of scientific pu...

متن کامل

Cluster-wise assessment of cluster stability

Journal: :Computational Statistics & Data Analysis 2007

Christian Hennig

Stability in cluster analysis is strongly dependent on the data set, especially on how well separated and how homogeneous the clusters are. In the same clustering, some clusters may be very stable and others may be extremely unstable. The Jaccard coefficient, a similarity measure between sets, is used as a clusterwise measure of cluster stability, which is assessed by the bootstrap distribution...

متن کامل

Author Name Disambiguation Using a New Categorical Distribution Similarity

2012

Shaohua Li Gao Cong Chunyan Miao

Author name ambiguity has been a long-standing problem which impairs the accuracy of publication retrieval and bibliometric methods. Most of the existing disambiguation methods are built on similarity measures, e.g., “Jaccard Coefficient”, between two sets of papers to be disambiguated, each set represented by a set of categorical features, e.g., coauthors and published venues. Such measures pe...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید