نتایج جستجو برای: jaccard similarity coefficient

تعداد نتایج: 274076  

2016
Thierry Etchegoyhen Andoni Azpeitia

We describe and evaluate a simple method to extract parallel sentences from comparable corpora. The approach, termed STACC, is based on expanded lexical sets and the Jaccard similarity coefficient. We evaluate our system against state-of-theart methods on a large range of datasets in different domains, for ten language pairs, showing that it either matches or outperforms current methods across ...

2004
Andréia da Silva Meyer Antonio Augusto Franco Garcia Anete Pereira de Souza Cláudio Lopes de Souza

The objective of this study was to evaluate whether different similarity coefficients used with dominant markers can influence the results of cluster analysis, using eighteen inbred lines of maize from two different populations, BR-105 and BR-106. These were analyzed by AFLP and RAPD markers and eight similarity coefficients were calculated: Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple-mat...

2010
Daniel German Aline Villavicencio Maity Siqueira

This work extends the study of Germann et al. (2010) in investigating the lexical organization of verbs. Particularly, we look at the influence of frequency on the process of lexical acquis ition and use. We examine data obtained from psycholinguistic action naming tasks performed by children and adults (speakers of Brazilian Portuguese), and analyze some characteristics of the verbs used by ea...

2008
Nobuyuki Shimizu Masato Hagiwara Yasuhiro Ogawa Katsuhiko Toyama Hiroshi Nakagawa

The distance or similarity metric plays an important role in many natural language processing (NLP) tasks. Previous studies have demonstrated the effectiveness of a number of metrics such as the Jaccard coefficient, especially in synonym acquisition. While the existing metrics perform quite well, to further improve performance, we propose the use of a supervised machine learning algorithm that ...

2016
Kalyan Mondal Mumtaz Ali Surapati Pramanik Florentin Smarandache

This paper presents some similarity measures between complex neutrosophic sets. A complex neutrosophic set is a generalization of neutrosophic set whose complex-valued truth membership function, complex-valued indeterminacy membership function, and complex valued falsity membership functions are the combinations of realvalued truth amplitude term in association with phase term, real-valued inde...

2014

Working with large amounts of unstructured data (e.g., text documents) has become important for many business, engineering and scientific applications. The purpose of this article is to demonstrate how the practical Data Scientist can implement a Locality Sensitive Hashing system from start to finish in order to drastically reduce the time required to perform a similarity search in high dimensi...

2010
Naoaki Okazaki Jun'ichi Tsujii

This paper presents a simple and efficient algorithm for approximate dictionary matching designed for similarity measures such as cosine, Dice, Jaccard, and overlap coefficients. We propose this algorithm, called CPMerge, for the τ overlap join of inverted lists. First we show that this task is solvable exactly by a τ -overlap join. Given inverted lists retrieved for a query, the algorithm coll...

Journal: :JASIST 2012
Salha Alzahrani Vasile Palade Naomie Salim Ajith Abraham

In plagiarism detection (PD) systems, two important problems should be considered: the problem of retrieving candidate documents that are globally similar to a document q under investigation, and the problem of side-by-side comparison of q and its candidates to pinpoint plagiarized fragments in detail. In this article, the authors investigate the usage of structural information of scientific pu...

Journal: :Computational Statistics & Data Analysis 2007
Christian Hennig

Stability in cluster analysis is strongly dependent on the data set, especially on how well separated and how homogeneous the clusters are. In the same clustering, some clusters may be very stable and others may be extremely unstable. The Jaccard coefficient, a similarity measure between sets, is used as a clusterwise measure of cluster stability, which is assessed by the bootstrap distribution...

2012
Shaohua Li Gao Cong Chunyan Miao

Author name ambiguity has been a long-standing problem which impairs the accuracy of publication retrieval and bibliometric methods. Most of the existing disambiguation methods are built on similarity measures, e.g., “Jaccard Coefficient”, between two sets of papers to be disambiguated, each set represented by a set of categorical features, e.g., coauthors and published venues. Such measures pe...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید