similarity algorithms

Comparison of Sequence Similarity Measures for Distant Evolutionary Relationships

2012

Abhishek Majumdar Peter Z. Revesz

Sequence similarity algorithms are used to reconstruct increasing large evolutionary trees involving increasingly distant evolutionary relationships. This paper proposes two sequence similarity algorithms, called the Greedy Tiling and the Random Tiling algorithms, that are both based on the idea of tiling one sequence by parts of another sequence. Experimental comparisons show that the new algo...

متن کامل

Learning Feature Weights for Similarity Measures

1998

Yong Wang

When employing a similarity function to measure the similarity between two cases, one large problem is how to determine the feature weights. This paper presents a new method for learning feature weights in a similarity function from the given similarity information. The similarity information can be divided into two kinds: One is called qualitative similarity information which represents the si...

متن کامل

Learning Similarity Measures in S

2004

Ning Liu Benyu Zhang Jun Yan Qiang Yang Shuicheng Yan Zheng Chen Fengshan Bai

Many machine learning and data mining algorithms on the similarity metrics. The Cosine similarity, wh the inner product of two normalized feature vectors, most commonly used similarity measures. Howev practical tasks such as text categorization an clustering, the Cosine similarity is calculated assumption that the input space is an orthogonal usually could not be satisfied due to synonymy an Va...

متن کامل

Representing Stimulus Similarity

2002

Daniel J. Navarro

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Prelude 1 The Very Idea of Representation . . . . . . . . . . . . . . . . . . . . . . . . . 2 Types of Similarity . . . . . . . . . . . . . . . . . ...

متن کامل

Optimal Dimension Order: A Generic Technique for the Similarity Join

2002

Christian Böhm Florian Krebs Hans-Peter Kriegel

The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The similarity join combines two point sets of a multidimensional vector space such that the result contains all point pairs where the distance does not exceed a given Parameter ε. Although the similarity join is clearly CP...

متن کامل

Effective Early Termination Techniques for Text Similarity Join Operator

2005

Selma Ayse Özalp Özgür Ulusoy

Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity comp...

متن کامل

Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks

2013

Takeshi Hase Samik Ghosh Ryota Yamanaka Hiroaki Kitano

Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorit...

متن کامل

Entity resolution for probabilistic data

Journal: :Inf. Sci. 2014

Naser Ayat Reza Akbarinia Hamideh Afsarmanesh Patrick Valduriez

Entity resolution is the problem of identifying the tuples that represent the same real world entity. In this paper, we address the problem of entity resolution over probabilistic data (ERPD), which arises in many applications that have to deal with probabilistic data. To deal with the ERPD problem, we distinguish between two classes of similarity functions, i.e. context-free and context-sensit...

متن کامل

gSSJoin: a GPU-based Set Similarity Join Algorithm

2016

Sidney Ribeiro-Júnior Rafael David Quirino Leonardo Ribeiro Wellington Santos Martins

Set similarity join is a core operation for text data integration, cleaning, and mining. Previous research work on improving the performance of set similarity joins mostly focused on sequential, CPU-based algorithms. Main optimizations of such algorithms exploit high threshold values and the underlying data characteristics to derive efficient filters. In this paper, we investigate strategies to...

متن کامل

Enhancing Document Clustering Using Hybrid Models for Semantic Similarity

2010

Ahmed K. Farahat Mohamed S. Kamel

Different document representation models have been proposed to measure semantic similarity between documents using corpus statistics. Some of these models explicitly estimate semantic similarity based on measures of correlations between terms, while others apply dimension reduction techniques to obtain latent representation of concepts. This paper proposes new hybrid models that combine explici...

متن کامل