similarity algorithms

NLPCC 2016 Shared Task Chinese Words Similarity Measure via Ensemble Learning Based on Multiple Resources

2016

Shutian Ma Xiaoyong Zhang Chengzhi Zhang

Many Chinese words similarity measure algorithms have been introduced since it’s a fundamental issue in various tasks of natural language processing. Previous work focused mainly on using existing semantic knowledge bases or large-scale corpora. However, knowledge base and corpus have limitations for broad coverage and data update. Thus, ensemble learning is then used to improve performance by ...

متن کامل

Discriminative Topic Segmentation of Text and Speech

2010

Mehryar Mohri Pedro J. Moreno Eugene Weinstein

We explore automated discovery of topicallycoherent segments in speech or text sequences. We give two new discriminative topic segmentation algorithms which employ a new measure of text similarity based on word co-occurrence. Both algorithms function by finding extrema in the similarity signal over the text, with the latter algorithm using a compact support-vector based description of a window ...

متن کامل

Expected Sequence Similarity Maximization

2010

Cyril Allauzen Shankar Kumar Wolfgang Macherey Mehryar Mohri Michael Riley

This paper presents efficient algorithms for expected similarity maximization, which coincides with minimum Bayes decoding for a similarity-based loss function. Our algorithms are designed for similarity functions that are sequence kernels in a general class of positive definite symmetric kernels. We discuss both a general algorithm and a more efficient algorithm applicable in a common unambigu...

متن کامل

Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing

2007

Günes Erkan Arzucan Özgür Dragomir R. Radev

We introduce a relation extraction method to identify the sentences in biomedical text that indicate an interaction among the protein names mentioned. Our approach is based on the analysis of the paths between two protein names in the dependency parse trees of the sentences. Given two dependency trees, we define two separate similarity functions (kernels) based on cosine similarity and edit dis...

متن کامل

The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

2000

Christian Böhm Bernhard Braunmüller Hans-Peter Kriegel

Numerous data mining algorithms rely heavily on similarity queries. Although many or even all of the performed queries do not depend on each other, the algorithms process them in a sequential way. Recently, a novel technique for efficiently processing multiple similarity queries issued simultaneously has been introduced. It was shown that multiple similarity queries substantially speed-up query...

متن کامل

Comparing Human and Algorithm Performance on Estimating Word-Based Semantic Similarity

2014

Nils Batram Markus Krause Paul-Olivier Dehaye

Understanding natural language is an inherently complex task for computer algorithms. Crowdsourcing natural language tasks such as semantic similarity is therefore a promising approach. In this paper, we investigate the performance of crowdworkers and compare them to offline contributors as well as to state of the art algorithms. We will illustrate that algorithms do outperform single human con...

متن کامل

A Niche Based Genetic Algorithm for Image Registration

2007

Giuseppe Pascale Luigi Troiano

Image registration aims to find the unknown set of transformations able to reduce two or more images to a common reference frame. Image registration can be regarded as an optimization problem, where the goal is to maximize a measure of image similarity. The measure of similarity on the overall image can be computationally expensive, leading to measure the similarity of smaller subimages. Howeve...

متن کامل

Clustering Algorithms: Study and Performance Evaluation Using Weka Tool

2013

Bhoj Raj Sharma

Data mining is the process of analyzing data from different perspectives and summarizing it into useful information. Clustering is a procedure to organizing the objects in to groups or clustered together, based on the principle of maximizing the intra-class similarity and minimizing the inter class similarity. The various clustering algorithms are analyzed and compare the performance of cluster...

متن کامل

Similarity Index based Link Prediction Algorithms in Social Networks: A Survey

2016

Pulipati Srilatha Ramakrishnan Manjula

Social networking sites have gained much popularity in the recent years. With millions of people connected virtually generate loads of data to be analyzed to infer meaningful associations among links. Link prediction algorithm is one such problem, wherein existing nodes, links and their attributes are analyzed to predict the possibility of potential links, which are likely to happen over a peri...

متن کامل

Similarity measure and domain adaptation in multiple mixture model clustering: An application to image processing

2017

Siow Hoo Leong Seng Huat Ong

This paper considers three crucial issues in processing scaled down image, the representation of partial image, similarity measure and domain adaptation. Two Gaussian mixture model based algorithms are proposed to effectively preserve image details and avoids image degradation. Multiple partial images are clustered separately through Gaussian mixture model clustering with a scan and select proc...

متن کامل